OpenML
Filter results by:
test
0 runs0 likes0 downloads0 reach0 impact
1000 instances - 21 features - classes - 0 missing values
test
0 runs0 likes0 downloads0 reach0 impact
1000 instances - 21 features - classes - 0 missing values
test
0 runs0 likes0 downloads0 reach0 impact
1000 instances - 21 features - classes - 0 missing values
test
0 runs0 likes0 downloads0 reach0 impact
1000 instances - 21 features - classes - 0 missing values
test
0 runs0 likes0 downloads0 reach0 impact
270 instances - 14 features - classes - 0 missing values
test
0 runs0 likes0 downloads0 reach0 impact
270 instances - 14 features - classes - 0 missing values
test
0 runs0 likes0 downloads0 reach0 impact
270 instances - 14 features - classes - 0 missing values
test
0 runs0 likes0 downloads0 reach0 impact
891 instances - 12 features - classes - 866 missing values
test
0 runs0 likes0 downloads0 reach0 impact
891 instances - 12 features - classes - 866 missing values
test
0 runs0 likes0 downloads0 reach0 impact
891 instances - 12 features - classes - 866 missing values
test
0 runs0 likes0 downloads0 reach0 impact
891 instances - 12 features - classes - 866 missing values
test
0 runs0 likes0 downloads0 reach0 impact
891 instances - 12 features - classes - 866 missing values
This data is used to test water contamination
0 runs0 likes0 downloads0 reach0 impact
26 instances - 8 features - classes - 0 missing values
AutoML challenge 2014. Original task: regression. Test and validation sets can be obtained on the Cha Learn website: https://automl.chalearn.org/data
0 runs0 likes0 downloads0 reach0 impact
400000 instances - 101 features - 0 classes - 0 missing values
AutoML challenge 2014. Original task: regression. Test and validation sets can be obtained on the Cha Learn website: https://automl.chalearn.org/data
0 runs0 likes0 downloads0 reach0 impact
99 instances - 200001 features - 0 classes - 0 missing values
% Title: Flora % Source: https://automl.chalearn.org/data % % Dataset from the first ChaLearn AutoML challenge (2014). % Only the training data is included, as there were no labels for validation and…
0 runs0 likes0 downloads0 reach0 impact
15000 instances - 200001 features - 0 classes - 0 missing values
Version with url set as row id, creator data missing due to bad formatting.**Author**: Kelwin Fernandes (INESC TEC, Universidade doPorto), Pedro Vinagre (ALGORITMI Research Centre, Universidade do…
0 runs0 likes0 downloads0 reach0 impact
39644 instances - 60 features - 0 classes - 0 missing values
Modified version for the automl benchmark. Regroups information for about 7800 different US colleges. Including geographical information, stats about the population attending and post graduation…
0 runs0 likes0 downloads0 reach0 impact
7063 instances - 45 features - 0 classes - 104249 missing values
This is a preprocessed version of the anneal dataset (version 1). All missing values are treated as a nominal value with label '?'. (Quotes for clarity). Because this is not good…
0 runs0 likes0 downloads0 reach0 impact
898 instances - 39 features - 5 classes - 0 missing values
This data is derived from the 2012 KDD Cup. The data is subsampled to 0.1% of the original number of instances, downsampling the majority class (click=0) so that the target feature is reasonably…
0 runs0 likes0 downloads0 reach0 impact
39948 instances - 10 features - 2 classes - 0 missing values
No data.
0 runs0 likes0 downloads0 reach0 impact
1000000 instances - 19 features - 4 classes - 0 missing values
The Inpatient Utilization and Payment Public Use File (Inpatient PUF) provides information on inpatient discharges for Medicare fee-for-service beneficiaries. The Inpatient PUF includes information on…
0 runs0 likes0 downloads0 reach0 impact
163065 instances - 12 features - 0 classes - 0 missing values
Airlines Departure Delay Prediction (Regression). Original data can be found at: http://www.transtats.bts.gov This is a processed version of the original data, designed to predict departure delay (in…
0 runs0 likes0 downloads0 reach0 impact
1000000 instances - 10 features - 0 classes - 0 missing values
Version with corrected feature types. 'PrivacySuppressed' are converted to None. Regroups information for about 7800 different US colleges. Including geographical information, stats about the…
0 runs0 likes0 downloads0 reach0 impact
7063 instances - 47 features - 0 classes - 104305 missing values
Bike sharing systems are new generation of traditional bike rentals where whole process from membership, rental and return back has become automatic. Through these systems, user is able to easily rent…
0 runs0 likes0 downloads0 reach0 impact
17379 instances - 13 features - 0 classes - 0 missing values
iris with ignored features Sepal.Width and Petal.Length
0 runs0 likes0 downloads0 reach0 impact
150 instances - 5 features - 3 classes - 0 missing values
Coal mining requires working in hazardous conditions. Miners in an underground coal mine can face several threats, such as, e.g. methane explosions or rock-burst. To provide protection for people…
0 runs0 likes0 downloads0 reach0 impact
9199930 instances - 34 features - classes - 0 missing values
testing temperature and ph
0 runs0 likes0 downloads0 reach0 impact
26 instances - 8 features - classes - 0 missing values
A subset of the 3D dataset from Princeton\'s COS 429 Computer Vision course. The dataset consists of 40 models organised into 4 classes of 10 objects each.
0 runs0 likes0 downloads0 reach0 impact
16000 instances - 4 features - classes - 0 missing values
Bike sharing systems are new generation of traditional bike rentals where whole process from membership, rental and return back has become automatic. Through these systems, user is able to easily rent…
0 runs0 likes0 downloads0 reach0 impact
17379 instances - 13 features - 0 classes - 0 missing values
Insurance is a network for evaluating car insurance risks. The insurance data set contains the following 27 variables: GoodStudent (good student): a two-level factor with levels False and True. Age…
0 runs0 likes0 downloads0 reach0 impact
20000 instances - 27 features - classes - 0 missing values
A Gold Standard of ~4,400 questions, answers, and comments from Stack Overflow, manually annotated for polarity. The dataset has been used for developing the EMTk toolkit for polarity detection from…
0 runs0 likes0 downloads0 reach0 impact
4423 instances - 2 features - 3 classes - 0 missing values
Real-world data set about the perching behaviour of two species of lizards in the South Bimini island, from Shoener (1968). The lizards data set contains the following variables: Species (the species…
0 runs0 likes0 downloads0 reach0 impact
409 instances - 3 features - classes - 0 missing values
whitewine
0 runs0 likes0 downloads0 reach0 impact
78 instances - 4 features - classes - 0 missing values
2nd Place Lightgbm Solution of Kaggle Porto Seguro’s Safe Driver Prediction
0 runs0 likes0 downloads0 reach0 impact
595212 instances - 224 features - 0 classes - 0 missing values
Rotating hyperplane is a stream generator that generates d-dimensional classification problems in which the prediction is defined by a rotating hyperplane. By changing the orientation and position of…
0 runs0 likes0 downloads0 reach0 impact
500000 instances - 11 features - classes - 0 missing values
The dataset was reproduced following instructions from this paper: https://arxiv.org/pdf/2108.04884.pdf. The data originates from ACS PUMS.
0 runs0 likes0 downloads0 reach0 impact
1138289 instances - 19 features - classes - 0 missing values
ACSPublicCoverage dataset reproduced from this paper: https://arxiv.org/pdf/2108.04884.pdf.
0 runs0 likes0 downloads0 reach0 impact
1138289 instances - 20 features - 0 classes - 0 missing values
The ACSIncome dataset is one of five datasets created by Ding et al. as an improved alternative to the popular UCI Adult dataset. The authors compiled data from the American Community Survey (ACS)…
0 runs0 likes0 downloads0 reach0 impact
1664500 instances - 12 features - 0 classes - 0 missing values
A synthetic dataset from Lauritzen and Spiegelhalter (1988) about lung diseases (tuberculosis, lung cancer or bronchitis) and visits to Asia. A data frame with 5000 rows and 8 binary variables: D…
0 runs0 likes0 downloads0 reach0 impact
5000 instances - 8 features - classes - 0 missing values
Probable risk factors for coronary thrombosis, comprising data from 1841 men. The coronary data set contains the following 6 variables: Smoking (smoking): a two-level factor with levels no and yes. M.…
0 runs0 likes0 downloads0 reach0 impact
1841 instances - 6 features - classes - 0 missing values
Hailfinder is a Bayesian network designed to forecast severe summer hail in northeastern Colorado. The hailfinder data set contains the following 56 variables: N07muVerMo (10.7mu vertical motion): a…
0 runs0 likes0 downloads0 reach0 impact
20000 instances - 56 features - classes - 0 missing values
This data set measures the running time of a matrix-matrix product $A \times B = C$, where all matrices have size 2048 x 2048, using a parameterizable *SGEMM GPU* (Single Precision General Matrix…
0 runs0 likes0 downloads0 reach0 impact
241600 instances - 18 features - 0 classes - 0 missing values
25 personality self report items taken from the International Personality Item Pool (ipip.ori.org) were included as part of the Synthetic Aperture Personality Assessment (SAPA) web based personality…
0 runs0 likes0 downloads0 reach0 impact
2800 instances - 28 features - classes - 731 missing values
A Gold Standard of ~4,400 questions, answers, and comments from Stack Overflow, manually annotated for polarity. The dataset has been used for developing the EMTk toolkit for polarity detection from…
0 runs0 likes0 downloads0 reach0 impact
3097 instances - 2 features - 3 classes - 0 missing values
A Gold Standard of ~4,400 questions, answers, and comments from Stack Overflow, manually annotated for polarity. The dataset has been used for developing the EMTk toolkit for polarity detection from…
0 runs0 likes0 downloads0 reach0 impact
1326 instances - 2 features - 3 classes - 0 missing values
A Gold Standard of ~4,400 questions, answers, and comments from Stack Overflow, manually annotated for polarity. The dataset has been used for developing the EMTk toolkit for polarity detection from…
0 runs0 likes0 downloads0 reach0 impact
4423 instances - 2 features - 3 classes - 0 missing values
The ALARM ("A Logical Alarm Reduction Mechanism") is a Bayesian network designed to provide an alarm message system for patient monitoring. The alarm data set contains the following 37 variables: CVP…
0 runs0 likes0 downloads0 reach0 impact
20000 instances - 37 features - classes - 0 missing values
Car buying information
0 runs0 likes0 downloads0 reach0 impact
1750 instances - 7 features - classes - 0 missing values
The ACSIncome dataset is one of five datasets created by Ding et al. as an improved alternative to the popular UCI Adult dataset. The authors compiled data from the American Community Survey (ACS)…
0 runs0 likes0 downloads0 reach0 impact
1664500 instances - 12 features - classes - 0 missing values
The ACSIncome dataset is one of five datasets created by Ding et al. as an improved alternative to the popular UCI Adult dataset. The authors compiled data from the American Community Survey (ACS)…
0 runs0 likes0 downloads0 reach0 impact
1664500 instances - 12 features - classes - 0 missing values
redwine dataset
0 runs0 likes0 downloads0 reach0 impact
571 instances - 4 features - classes - 0 missing values
redwine data
0 runs0 likes0 downloads0 reach0 impact
571 instances - 4 features - classes - 0 missing values
redwine data
0 runs0 likes0 downloads0 reach0 impact
571 instances - 4 features - classes - 0 missing values
red
0 runs0 likes0 downloads0 reach0 impact
571 instances - 4 features - classes - 0 missing values
.. _diabetes_dataset: Diabetes dataset ---------------- Ten baseline variables, age, sex, body mass index, average blood pressure, and six blood serum measurements were obtained for each of n = 442…
0 runs0 likes0 downloads0 reach0 impact
442 instances - 11 features - 0 classes - 0 missing values
une description test
0 runs0 likes0 downloads0 reach0 impact
992 instances - 16 features - 0 classes - 0 missing values
Context Traffic data collected from the several Wavetronix radar sensors deployed by the City of Austin. Dataset is augmented with geo coordinates from sensor location dataset. Source:…
0 runs0 likes0 downloads0 reach0 impact
4603861 instances - 12 features - classes - 0 missing values
Context Buying a diamond can be frustrating and expensive. It inspired me to create this dataset of 119K natural and lab-created diamonds from brilliantearth.com to demystify the value of the 4 Cs…
0 runs0 likes0 downloads0 reach0 impact
119307 instances - 11 features - classes - 0 missing values
This dataset is the collection of 3112 minerals, their chemical compositions, crystal structure, physical and optical properties. The properties that are included in this database are the Crystal…
0 runs0 likes0 downloads0 reach0 impact
3112 instances - 140 features - classes - 0 missing values
Context It's the preprocessed train data from Quora Insincere Questions competition 2018 The original train data is preprocessed to remove stop words, numbers, punctuations, common words and converted…
0 runs0 likes0 downloads0 reach0 impact
1306122 instances - 4 features - classes - 1 missing values
Context With the growing dependency that our society has on the Internet, the amount of data that goes through networks keeps increasing. Network monitoring and analysis of consumption behavior…
0 runs0 likes0 downloads0 reach0 impact
1249 instances - 113 features - classes - 0 missing values
Palmer Penguins Dataset The goal of palmerpenguins is to provide a great dataset for data exploration visualization, as an alternative to iris. About the data Data were collected and made available by…
0 runs0 likes0 downloads0 reach0 impact
344 instances - 7 features - classes - 19 missing values
Context Infrastructure-as-code (IaC) is the DevOps strategy that allows management and provisioning of infrastructure through the definition of machine-readable files and automation around them,…
0 runs0 likes0 downloads0 reach0 impact
227272 instances - 113 features - 0 classes - 757758 missing values
Context New Zealand lies on a fault-line that runs through its spine. This fault line aka Alpine Fault is very active and forms a part of the "Ring of Fire". Content This is a list of all the earth…
0 runs0 likes0 downloads0 reach0 impact
20648 instances - 5 features - classes - 0 missing values
Context Who doesn't love a good handheld videogame? There's plenty of titles to explore for Data Analytics or a playthrough if you own the console Content This dataset contains release and user review…
0 runs0 likes0 downloads0 reach0 impact
Context Amazon.com is one of the largest electronic commerce and cloud computing companies. Just a few Amazon related facts They lost 4.8 million in August 2013, when their website went down for 40…
0 runs0 likes0 downloads0 reach0 impact
2023070 instances - 4 features - classes - 0 missing values
Context The COVID-19 dataset in Indonesia was created to find out various factors that could be taken into consideration in decision making related to the level of stringency in each province in…
0 runs0 likes0 downloads0 reach0 impact
21759 instances - 38 features - classes - 47848 missing values
Context This Online Retail II data set contains all the transactions occurring for a UK-based and registered, non-store online retail between 01/12/2009 and 09/12/2011.The company mainly sells unique…
0 runs0 likes0 downloads0 reach0 impact
1067371 instances - 8 features - classes - 247481 missing values
Context So it's Halloween again dear Kagglers! And what better way of celebrating than with some NLP! The dataset brings you the reviews of popular Halloween costumes sold on amazon as of November…
0 runs0 likes0 downloads0 reach0 impact
7814 instances - 5 features - classes - 16 missing values
If you think the dataset is useful please vote for it, it's an assignment from my data science class, I'll be appreciate! :)) Context The Department of Finance (DOF) is required by NY State law to…
0 runs0 likes0 downloads0 reach0 impact
22073 instances - 12 features - classes - 76 missing values
Content This Dataset Contains informations about Netflix movies, and how the Netflix website and use it to give me recommendations about movies of the genre that I prefer. Here, I used Netflix's API…
0 runs0 likes0 downloads0 reach0 impact
3323 instances - 10 features - classes - 0 missing values
Context This dataset is a small snap ( sample) out of ocean-depth entries in the original dataset, which keeps increasing day by day. The purpose of this dataset is to allow fellow Scientists/…
0 runs0 likes0 downloads0 reach0 impact
97606 instances - 5 features - 0 classes - 0 missing values
Context This dataset include all games for PlayStation 4 for the present. I used the truetrophies website to create this dataset. Content You can find 1 datasets : games_data.csv: contend up to date…
0 runs0 likes0 downloads0 reach0 impact
1584 instances - 10 features - classes - 0 missing values
A csv file with 80,000+ tweets from January 6th, 2021 -- the day of the capitol hill riots. Made using the Twitter Developer API + Tweepy. Nowhere close to the size of the Parler data dumps, but…
0 runs0 likes0 downloads0 reach0 impact
82309 instances - 14 features - classes - 392323 missing values
### Context Since its inception in 2008, Airbnb has disrupted the traditional hospitality industry as more travellers decide to use Airbnb as their primary means of accommodation. Airbnb offers…
0 runs0 likes0 downloads0 reach0 impact
226030 instances - 17 features - classes - 213880 missing values
Dutch News Articles This dataset contains all the articles published by the NOS as of the 1st of January 2010. The data is obtained by scraping the NOS website. The NOS is one of the biggest (online)…
0 runs0 likes0 downloads0 reach0 impact
237861 instances - 5 features - classes - 0 missing values
Has the curve flattened? Countries around the world are working to flatten the curve of the coronavirus pandemic. Flattening the curve involves reducing the number of new COVID-19 cases from one day…
0 runs0 likes0 downloads0 reach0 impact
16659 instances - 28 features - classes - 24650 missing values
Context Started this for my final year project improved on it during quarantine. Content A mix of used and new car reviews from the year 2000 to 2019 of various brands from edmunds.com. I did not…
0 runs0 likes0 downloads0 reach0 impact
299045 instances - 8 features - classes - 171 missing values
Context Data is collected daily from Our World in Data GitHub repository for covid-19, merged and uploaded. Content The data contains the following information: Country- this is the country for which…
0 runs0 likes0 downloads0 reach0 impact
97606 instances - 5 features - classes - 0 missing values
Context While brainstorming ideas for a statistics project for a course last semester, the idea of utilizing data about microbreweries came up. Unfortunately after some exploration and thought, we…
0 runs0 likes0 downloads0 reach0 impact
2407 instances - 6 features - classes - 8 missing values
Content The data includes the text, whether the tweet is a retweet, whether the tweet is deleted, and so much more. It is sorted by descending date (so the highest rows are from 2009 and the last rows…
0 runs0 likes0 downloads0 reach0 impact
56571 instances - 9 features - classes - 0 missing values
The similarities and differences in the behaviors of different people have long been of interest, particularly in psychology and other social science fields. Understanding human behavior in particular…
0 runs0 likes0 downloads0 reach0 impact
50000 instances - 9 features - classes - 63798 missing values
Context and Content The COVID-19 case surveillance system database includes individual-level data reported to U.S. states and autonomous reporting entities, including New York City and the District of…
0 runs0 likes0 downloads0 reach0 impact
8405079 instances - 11 features - classes - 9543526 missing values
Introduction Dogecoin is an open source peer-to-peer digital currency, favored by Shiba Inus worldwide. It is qualitatively more fun while being technically nearly identical to its close relative…
0 runs0 likes0 downloads0 reach0 impact
1532 instances - 6 features - classes - 0 missing values
Context This data was collected as a course project for the immersive data science course (by General Assembly and Misk Academy). Content This dataset is in a CSV format, it consists of 5717 rows and…
0 runs0 likes0 downloads0 reach0 impact
5717 instances - 15 features - classes - 4146 missing values
Context Yallamotor is website in ksa have a collection of used vehicles for sale. I used the Yallamotor website to create dataset of used vehicles in KSA. Content Dataset includes ( 2287 ) vehicles…
0 runs0 likes0 downloads0 reach0 impact
2287 instances - 6 features - classes - 0 missing values
Detailed data description of Credit Risk dataset: Feature Name Description person_age Age person_income Annual Income personhomeownership Home ownership personemplength Employment length (in years)…
0 runs0 likes0 downloads0 reach0 impact
32581 instances - 12 features - classes - 4011 missing values
Feedback: Mukharbek Organokov organokov.mgmail.com Context Sloan Digital Sky Survey current DR16 Server Data release with Galaxies, Stars and Quasars. License: Creative Commons Attribution license…
0 runs0 likes0 downloads0 reach0 impact
100000 instances - 18 features - classes - 0 missing values
Context This IMDb Indonesian Movies Dataset contains information of 1262 Indonesian movies. The data was gathered using IMDb-Scraper and then was converted and cleaned into a .csv file.…
0 runs0 likes0 downloads0 reach0 impact
1272 instances - 11 features - classes - 1774 missing values
COVID19-Algeria-and-World-Dataset A coronavirus dataset with 104 countries constructed from different reliable sources, where each row represents a country, and the columns represent geographic,…
0 runs0 likes0 downloads0 reach0 impact
38472 instances - 15 features - classes - 11759 missing values
Context Throughout the world of data science, there are many languages and tools that can be used to complete a given task. While you are often able to use whichever tool you prefer, it is often…
0 runs0 likes0 downloads0 reach0 impact
10153 instances - 4 features - classes - 9824 missing values
Context There are some great UFC datasets out there, but I could not find one that included gambling odds. So I went and made one myself. This dataset focuses very generally on the fights and hopes to…
0 runs0 likes0 downloads0 reach0 impact
5528 instances - 11 features - classes - 14168 missing values
Context Agriculture crop production in Tamilnadu, India Content This Dataset Describes the Agriculture Crops Production in Tamilnadu, India. This is from https://data.gov.in/ fully Licensed…
0 runs0 likes0 downloads0 reach0 impact
13547 instances - 7 features - classes - 281 missing values
Content The dataset contains 4689 real estate objects in Riga. Columns description: op_type - offer type ('For rent', 'For sale', 'Buying', 'Renting', 'Change', 'Other'). district - district, where…
0 runs0 likes0 downloads0 reach0 impact
This Dataset is something I found online when I wanted to practice regression models. It is an openly available online dataset at multiple places. Though I do not know the exact origin and collection…
0 runs0 likes0 downloads0 reach0 impact
1338 instances - 7 features - classes - 0 missing values
Context This dataset was scraped from http://www.asapsports.com/, using the code in this repository. I designed the webscraping code to account for most of the variance in the website's formatting,…
0 runs0 likes0 downloads0 reach0 impact
2096 instances - 6 features - classes - 0 missing values
Context This is a continually updated dataset of professional fighters making fight predictions. Content The data is gathered mostly from James Lynch's YouTube channel, where fighters are asked to…
0 runs0 likes0 downloads0 reach0 impact
3401 instances - 6 features - classes - 0 missing values
Latitude - lat Longitude - lon Flood Height - flood_height 0 - No flood 1 - Ankle High 2 - Knee High 3 - Waist High 4 - Neck High 5 - Top of Head High 6 - 1-storey High 7 - 1.5-storey High 8 -…
0 runs0 likes0 downloads0 reach0 impact
3510 instances - 5 features - classes - 0 missing values
If you reach this DATASET, please UPVOTE this dataset to show your appreciation DATASET DETAILS: 1.Previous Close: 264.29, 2.Open: 265.58, 3.Bid: 266.06 x 800, 4.Ask: 266.06 x 900, 5.Day's Range:…
0 runs0 likes0 downloads0 reach0 impact
9823 instances - 7 features - classes - 6 missing values