OpenML
Filter results by:
One of the biggest challenges of an auto dealership purchasing a used car at an auto auction is the risk of that the vehicle might have serious issues that prevent it from being sold to customers. The…
3 runs0 likes3 downloads3 reach14 impact
72983 instances - 33 features - 2 classes - 149271 missing values
__Changes w.r.t. version 1: included one target factor with 7 levels as target variable for the classification. Also deleted the previous 7 binary target variables.__ A dataset of steel plates'…
9051 runs1 likes3 downloads4 reach16 impact
1941 instances - 28 features - 7 classes - 0 missing values
This is a "supervised learning" challenge in machine learning. We are making available 30 datasets, all pre-formatted in given feature representations (this means that each example consists of a fixed…
12 runs0 likes3 downloads3 reach20 impact
65196 instances - 28 features - 100 classes - 0 missing values
The Friedman datasets are 80 artificially generated datasets originating from: J.H. Friedman (1999). Stochastic Gradient Boosting The dataset names are coded as…
0 runs0 likes2 downloads2 reach13 impact
1000 instances - 26 features - 0 classes - 0 missing values
GAMETES_Epistasis_2-Way_1000atts_0.4H_EDM-1_EDM-1_1-pmlb
0 runs0 likes2 downloads2 reach22 impact
1600 instances - 1001 features - 2 classes - 0 missing values
led24-pmlb
31 runs0 likes2 downloads2 reach22 impact
3200 instances - 25 features - 10 classes - 0 missing values
__Major changes w.r.t. version 2: ignored variable 3 in this upload as this seems to be ea perfect predictor.__ Tamilnadu Electricity Board Hourly Readings dataset. Real-time readings were collected…
0 runs0 likes2 downloads2 reach19 impact
45781 instances - 4 features - 20 classes - 0 missing values
Data contains the information of 9144 samples form 220 spectral bands. The classes represent land-use types: alfalfa, corn, grass, hay, oats, soybeans, trees, and wheat.
0 runs0 likes2 downloads2 reach11 impact
9144 instances - 221 features - 8 classes - 0 missing values
The Friedman datasets are 80 artificially generated datasets originating from: J.H. Friedman (1999). Stochastic Gradient Boosting The dataset names are coded as…
0 runs0 likes2 downloads2 reach13 impact
1000 instances - 51 features - 0 classes - 0 missing values
The Friedman datasets are 80 artificially generated datasets originating from: J.H. Friedman (1999). Stochastic Gradient Boosting The dataset names are coded as…
0 runs0 likes2 downloads2 reach14 impact
1000 instances - 51 features - 0 classes - 0 missing values
The Friedman datasets are 80 artificially generated datasets originating from: J.H. Friedman (1999). Stochastic Gradient Boosting The dataset names are coded as…
0 runs0 likes2 downloads2 reach13 impact
1000 instances - 26 features - 0 classes - 0 missing values
PMLB version of the Titanic dataset, which only uses 3 features. See version 1 for the complete version: https://www.openml.org/d/40945
35 runs0 likes2 downloads2 reach23 impact
2201 instances - 4 features - 2 classes - 0 missing values
SOURCE: [ChaLearn Automatic Machine Learning Challenge (AutoML)](https://competitions.codalab.org/competitions/2321), [ChaLearn](https://automl.chalearn.org/data) This is a "supervised learning"…
0 runs0 likes2 downloads2 reach18 impact
416188 instances - 61 features - 355 classes - 0 missing values
__Changes w.r.t. version 1: renamed variables such that they match description.__ ### Dataset: Wilt Data Set ### Abstract: High-resolution Remote Sensing data set (Quickbird). Small number of training…
10966 runs0 likes2 downloads2 reach22 impact
4839 instances - 6 features - 2 classes - 0 missing values
SOURCE: [ChaLearn Automatic Machine Learning Challenge (AutoML)](https://competitions.codalab.org/competitions/2321), [ChaLearn](https://automl.chalearn.org/data) This is a "supervised learning"…
8 runs0 likes2 downloads2 reach19 impact
10000 instances - 2001 features - 5 classes - 0 missing values
SOURCE: [ChaLearn Automatic Machine Learning Challenge (AutoML)](https://competitions.codalab.org/competitions/2321), [ChaLearn](https://automl.chalearn.org/data) This is a "supervised learning"…
8 runs0 likes2 downloads2 reach19 impact
2984 instances - 145 features - 2 classes - 0 missing values
SOURCE: [ChaLearn Automatic Machine Learning Challenge (AutoML)](https://competitions.codalab.org/competitions/2321), [ChaLearn](https://automl.chalearn.org/data) This is a "supervised learning"…
8 runs0 likes2 downloads2 reach20 impact
5124 instances - 21 features - 2 classes - 0 missing values
SOURCE: [ChaLearn Automatic Machine Learning Challenge (AutoML)](https://competitions.codalab.org/competitions/2321), [ChaLearn](https://automl.chalearn.org/data) This is a "supervised learning"…
8 runs1 likes2 downloads3 reach20 impact
5418 instances - 1637 features - 2 classes - 0 missing values
This is the dataset used for the 2016 IDA Industrial Challenge, courtesy of Scania. For a full description, see http://archive.ics.uci.edu/ml/datasets/IDA2016Challenge . This dataset contains both the…
9 runs0 likes2 downloads2 reach19 impact
76000 instances - 171 features - 2 classes - 1078695 missing values
libSVM","AAD group #Dataset from the LIBSVM data repository. Preprocessing: Original data: someone from Germany working with the car industry.
0 runs0 likes1 downloads1 reach16 impact
1243 instances - 23 features - 0 classes - 0 missing values
The Friedman datasets are 80 artificially generated datasets originating from: J.H. Friedman (1999). Stochastic Gradient Boosting The dataset names are coded as…
0 runs0 likes1 downloads1 reach13 impact
1000 instances - 51 features - 0 classes - 0 missing values
The Friedman datasets are 80 artificially generated datasets originating from: J.H. Friedman (1999). Stochastic Gradient Boosting The dataset names are coded as…
0 runs0 likes1 downloads1 reach13 impact
1000 instances - 51 features - 0 classes - 0 missing values
The Friedman datasets are 80 artificially generated datasets originating from: J.H. Friedman (1999). Stochastic Gradient Boosting The dataset names are coded as…
0 runs0 likes1 downloads1 reach13 impact
1000 instances - 26 features - 0 classes - 0 missing values
The Friedman datasets are 80 artificially generated datasets originating from: J.H. Friedman (1999). Stochastic Gradient Boosting The dataset names are coded as…
0 runs0 likes1 downloads1 reach13 impact
1000 instances - 26 features - 0 classes - 0 missing values
GAMETES_Epistasis_2-Way_20atts_0.1H_EDM-1_1-pmlb
31 runs0 likes1 downloads1 reach22 impact
1600 instances - 21 features - 2 classes - 0 missing values
GAMETES_Epistasis_2-Way_20atts_0.4H_EDM-1_1-pmlb
31 runs0 likes1 downloads1 reach22 impact
1600 instances - 21 features - 2 classes - 0 missing values
GAMETES_Epistasis_3-Way_20atts_0.2H_EDM-1_1-pmlb
31 runs0 likes1 downloads1 reach22 impact
1600 instances - 21 features - 2 classes - 0 missing values
GAMETES_Heterogeneity_20atts_1600_Het_0.4_0.2_50_EDM-2_001-pmlb
0 runs0 likes1 downloads1 reach22 impact
1600 instances - 21 features - 2 classes - 0 missing values
GAMETES_Heterogeneity_20atts_1600_Het_0.4_0.2_75_EDM-2_001-pmlb
31 runs0 likes1 downloads1 reach22 impact
1600 instances - 21 features - 2 classes - 0 missing values
The dataset and this description is made available on http://www-stat.stanford.edu/~tibs/ElemStatLearn/data.html. Normalized handwritten digits, automatically scanned from envelopes by the U.S. Postal…
57 runs0 likes1 downloads1 reach11 impact
9298 instances - 257 features - 10 classes - 0 missing values
Data used in an analysis of the Brown and Frown corpora for my doctoral dissertation titled ``Variations in Written English: Characterizing Authors' Rhetorical Language Choices Across Corpora of…
2048 runs0 likes1 downloads1 reach12 impact
1000 instances - 24 features - 30 classes - 0 missing values
microaggregation2_nominal
1 runs0 likes1 downloads1 reach13 impact
20000 instances - 21 features - 5 classes - 0 missing values
The German Traffic Sign Benchmark is a multi-class, single-image classification challenge held at the International Joint Conference on Neural Networks (IJCNN) 2011. We cordially invite researchers…
0 runs0 likes1 downloads1 reach11 impact
51839 instances - 1569 features - 43 classes - 0 missing values
The German Traffic Sign Benchmark is a multi-class, single-image classification challenge held at the International Joint Conference on Neural Networks (IJCNN) 2011. We cordially invite researchers…
0 runs0 likes1 downloads1 reach11 impact
51839 instances - 1569 features - 43 classes - 0 missing values
The German Traffic Sign Benchmark is a multi-class, single-image classification challenge held at the International Joint Conference on Neural Networks (IJCNN) 2011. We cordially invite researchers…
1 runs0 likes1 downloads1 reach11 impact
51839 instances - 257 features - 43 classes - 0 missing values
This version has feature names based on https://www2.1010data.com/documentationcenter/beta/Tutorials/MachineLearningExamples/CensusIncomeDataSet.html Missing data is also properly encoded in this…
0 runs0 likes1 downloads1 reach0 impact
199523 instances - 42 features - 2 classes - 415717 missing values
Dataset KDD98 challenge: https://kdd.ics.uci.edu/databases/kddcup98/kddcup98.html The goal is to estimate the return from a direct mailing in order to maximize donation profits. This dataset…
0 runs0 likes1 downloads1 reach9 impact
82318 instances - 478 features - 2 classes - 2399311 missing values
This dataset contains traffic violation information from all electronic traffic violations issued in the County. Any information that can be used to uniquely identify the vehicle, the vehicle owner or…
0 runs1 likes1 downloads2 reach9 impact
70340 instances - 21 features - 3 classes - 2288 missing values
Much of machine learning research focuses on producing models which perform well on benchmark tasks, in turn improving our understanding of the challenges associated with those tasks. From the…
0 runs0 likes1 downloads1 reach11 impact
70000 instances - 785 features - 10 classes - 0 missing values
test
0 runs0 likes1 downloads1 reach0 impact
299 instances - 13 features - classes - 0 missing values
## **Meta-Album Plankton Dataset (Micro)** The Plankton dataset is created by researchers at the Woods Hole Oceanographic Institution (https://www.whoi.edu/). Imaging FlowCytobot (IFCB) was used for…
0 runs1 likes1 downloads2 reach1 impact
800 instances - 3 features - 20 classes - 800 missing values
This data was extracted from the 1994 Census bureau database by Ronny Kohavi and Barry Becker (Data Mining and Visualization, Silicon Graphics). A set of reasonably clean records was extracted using…
0 runs0 likes1 downloads1 reach0 impact
32561 instances - 15 features - classes - 4262 missing values
Context This dataset was created to make the project "AI Learn to invest" for SaturdaysAI - Euskadi 1st edition. The project can be found in https://github.com/ImanolR87/AI-Learn-to-invest Content…
0 runs1 likes1 downloads2 reach0 impact
405258 instances - 25 features - classes - 0 missing values
#### Information A small classic dataset from Fisher, 1936. One of the earliest datasets used for the evaluation of classification methodologies. #### References * Fisher, R. A. (1936), The use of…
0 runs0 likes1 downloads1 reach0 impact
150 instances - 7 features - classes - 0 missing values
Context Find the best strategies to improve for the next marketing campaign. How can the financial institution have a greater effectiveness for future marketing campaigns? In order to answer this, we…
0 runs1 likes1 downloads2 reach0 impact
11162 instances - 17 features - classes - 0 missing values
The Friedman datasets are 80 artificially generated datasets originating from: J.H. Friedman (1999). Stochastic Gradient Boosting The dataset names are coded as…
0 runs0 likes1 downloads1 reach13 impact
1000 instances - 51 features - 0 classes - 0 missing values
Original data from https://github.com/propublica/compas-analysis/ by ProPublica. The data was subsequently preprocessed and reduced to relevant features for classification. The target variable is…
0 runs0 likes1 downloads1 reach10 impact
5278 instances - 14 features - 2 classes - 0 missing values
)), [PMLB](https://github.com/EpistasisLab/penn-ml-benchmarks/tree/master/datasets/classification/tokyo1) This is Performance co-pilot (PCP) data for the Tokyo server at Silicon Graphics International…
37 runs0 likes1 downloads1 reach22 impact
959 instances - 45 features - 2 classes - 0 missing values
The origin is not clear, but presumably this is an artificial problem representing M-of-N rules. The target is 1 if a certain M 'bits' are '1'? (Joaquin Vanschoren)
31 runs0 likes1 downloads1 reach22 impact
1324 instances - 11 features - 2 classes - 0 missing values
The Sheffield (previously UMIST) Face Database consists of 564 images of 20 individuals (mixed race/gender/appearance). Each individual is shown in a range of poses from profile to frontal views -…
53 runs0 likes1 downloads1 reach16 impact
575 instances - 10305 features - 20 classes - 0 missing values
The goal of this challenge is to expose the research community to real world datasets of interest to 4Paradigm. All datasets are formatted in a uniform way, though the type of data might differ. The…
0 runs0 likes1 downloads1 reach17 impact
100 instances - 10001 features - 2 classes - 0 missing values
The goal of this challenge is to expose the research community to real world datasets of interest to 4Paradigm. All datasets are formatted in a uniform way, though the type of data might differ. The…
1 runs0 likes1 downloads1 reach18 impact
4147 instances - 49 features - 2 classes - 0 missing values
The goal of this challenge is to expose the research community to real world datasets of interest to 4Paradigm. All datasets are formatted in a uniform way, though the type of data might differ. The…
14 runs0 likes1 downloads1 reach21 impact
20000 instances - 4297 features - 2 classes - 0 missing values
The goal of this challenge is to expose the research community to real world datasets of interest to 4Paradigm. All datasets are formatted in a uniform way, though the type of data might differ. The…
13 runs1 likes1 downloads2 reach21 impact
20000 instances - 4297 features - 2 classes - 0 missing values
SOURCE: [ChaLearn Automatic Machine Learning Challenge (AutoML)](https://competitions.codalab.org/competitions/2321), [ChaLearn](https://automl.chalearn.org/data) This is a "supervised learning"…
7 runs0 likes1 downloads1 reach19 impact
8237 instances - 801 features - 7 classes - 0 missing values
The goal of this challenge is to expose the research community to real world datasets of interest to 4Paradigm. All datasets are formatted in a uniform way, though the type of data might differ. The…
0 runs0 likes1 downloads1 reach18 impact
425240 instances - 79 features - 2 classes - 2734000 missing values
The goal of this challenge is to expose the research community to real world datasets of interest to 4Paradigm. All datasets are formatted in a uniform way, though the type of data might differ. The…
15 runs0 likes1 downloads1 reach21 impact
10000 instances - 7201 features - 10 classes - 0 missing values
SOURCE: [ChaLearn Automatic Machine Learning Challenge (AutoML)](https://competitions.codalab.org/competitions/2321), [ChaLearn](https://automl.chalearn.org/data) This is a "supervised learning"…
15 runs0 likes1 downloads1 reach20 impact
58310 instances - 181 features - 10 classes - 0 missing values
SOURCE: [ChaLearn Automatic Machine Learning Challenge (AutoML)](https://competitions.codalab.org/competitions/2321), [ChaLearn](https://automl.chalearn.org/data) This is a "supervised learning"…
14 runs0 likes1 downloads1 reach20 impact
83733 instances - 55 features - 4 classes - 0 missing values
Data reported to the police about the circumstances of personal injury road accidents in Great Britain from 1979, and the maker and model information of vehicles involved in the respective accident.…
0 runs0 likes1 downloads1 reach1 impact
363243 instances - 67 features - 3 classes - 2181757 missing values
This is a test dataset
0 runs0 likes0 downloads0 reach0 impact
Upstream data from the twin Archimedes screw hydro-electric generator on the river Thames at Caversham weir, Reading, UK.
0 runs0 likes0 downloads0 reach0 impact
235 instances - 2 features - 0 classes - 0 missing values
Context This dataset was created by our in house Web Scraping and Data Mining teams at PromptCloud and DataStock. You can download the full dataset here. This sample contains 30K records. Content…
0 runs0 likes0 downloads0 reach0 impact
30000 instances - 16 features - classes - 49458 missing values
DataSample
0 runs0 likes0 downloads0 reach0 impact
This dataset contains QSAR data (from ChEMBL version 17) showing activity values (unit is pseudo-pCI50) of several compounds on drug target TID: 10402, and it has 65 rows and 1026 features (including…
1 runs0 likes0 downloads0 reach0 impact
65 instances - 1026 features - 0 classes - 0 missing values
This dataset contains QSAR data (from ChEMBL version 17) showing activity values (unit is pseudo-pCI50) of several compounds on drug target TID: 30047, and it has 97 rows and 1026 features (including…
1 runs0 likes0 downloads0 reach0 impact
97 instances - 1026 features - 0 classes - 0 missing values
This dataset contains QSAR data (from ChEMBL version 17) showing activity values (unit is pseudo-pCI50) of several compounds on drug target TID: 11054, and it has 344 rows and 1026 features (including…
1 runs0 likes0 downloads0 reach0 impact
344 instances - 1026 features - 0 classes - 0 missing values
This dataset contains QSAR data (from ChEMBL version 17) showing activity values (unit is pseudo-pCI50) of several compounds on drug target TID: 11499, and it has 12 rows and 1026 features (including…
1 runs0 likes0 downloads0 reach0 impact
12 instances - 1026 features - 0 classes - 0 missing values
This dataset contains QSAR data (from ChEMBL version 17) showing activity values (unit is pseudo-pCI50) of several compounds on drug target TID: 11286, and it has 297 rows and 1026 features (including…
1 runs0 likes0 downloads0 reach0 impact
297 instances - 1026 features - 0 classes - 0 missing values
This dataset contains QSAR data (from ChEMBL version 17) showing activity values (unit is pseudo-pCI50) of several compounds on drug target TID: 17024, and it has 373 rows and 1026 features (including…
1 runs0 likes0 downloads0 reach0 impact
373 instances - 1026 features - 0 classes - 0 missing values
This dataset contains QSAR data (from ChEMBL version 17) showing activity values (unit is pseudo-pCI50) of several compounds on drug target TID: 279, and it has 126 rows and 1026 features (including…
1 runs0 likes0 downloads0 reach0 impact
126 instances - 1026 features - 0 classes - 0 missing values
This dataset contains QSAR data (from ChEMBL version 17) showing activity values (unit is pseudo-pCI50) of several compounds on drug target TID: 100848, and it has 60 rows and 1026 features (including…
1 runs0 likes0 downloads0 reach0 impact
60 instances - 1026 features - 0 classes - 0 missing values
This dataset contains QSAR data (from ChEMBL version 17) showing activity values (unit is pseudo-pCI50) of several compounds on drug target TID: 100869, and it has 18 rows and 1026 features (including…
1 runs0 likes0 downloads0 reach0 impact
18 instances - 1026 features - 0 classes - 0 missing values
This dataset contains QSAR data (from ChEMBL version 17) showing activity values (unit is pseudo-pCI50) of several compounds on drug target TID: 10541, and it has 151 rows and 1026 features (including…
1 runs0 likes0 downloads0 reach0 impact
151 instances - 1026 features - 0 classes - 0 missing values
This dataset contains QSAR data (from ChEMBL version 17) showing activity values (unit is pseudo-pCI50) of several compounds on drug target TID: 17075, and it has 15 rows and 1026 features (including…
1 runs0 likes0 downloads0 reach0 impact
15 instances - 1026 features - 0 classes - 0 missing values
This dataset contains QSAR data (from ChEMBL version 17) showing activity values (unit is pseudo-pCI50) of several compounds on drug target TID: 101309, and it has 73 rows and 1026 features (including…
1 runs0 likes0 downloads0 reach0 impact
73 instances - 1026 features - 0 classes - 0 missing values
This dataset contains QSAR data (from ChEMBL version 17) showing activity values (unit is pseudo-pCI50) of several compounds on drug target TID: 12950, and it has 34 rows and 1026 features (including…
1 runs0 likes0 downloads0 reach0 impact
34 instances - 1026 features - 0 classes - 0 missing values
No data.
0 runs0 likes0 downloads0 reach0 impact
336 instances - 15 features - classes - 0 missing values
We use the following representation to collect the dataset age - age bp - blood pressure sg - specific gravity al - albumin su - sugar rbc - red blood cells pc - pus cell pcc - pus cell clumps ba -…
0 runs0 likes0 downloads0 reach0 impact
250 instances - 28 features - classes - 0 missing values
No data.
222 runs0 likes0 downloads0 reach0 impact
1504 instances - 2887 features - 13 classes - 0 missing values
No data.
428 runs0 likes0 downloads0 reach0 impact
1003 instances - 3183 features - 10 classes - 0 missing values
No data.
268 runs0 likes0 downloads0 reach0 impact
3075 instances - 12433 features - 6 classes - 0 missing values
No data.
373 runs0 likes0 downloads0 reach0 impact
918 instances - 3013 features - 10 classes - 0 missing values
No data.
159 runs0 likes0 downloads0 reach0 impact
1657 instances - 3759 features - 25 classes - 0 missing values
No data.
264 runs0 likes0 downloads0 reach0 impact
3204 instances - 13196 features - 6 classes - 0 missing values
No data.
211 runs0 likes0 downloads0 reach0 impact
313 instances - 5805 features - 8 classes - 0 missing values
No data.
163 runs0 likes0 downloads0 reach0 impact
1560 instances - 8461 features - 20 classes - 0 missing values
No data.
216 runs0 likes0 downloads0 reach0 impact
11162 instances - 11466 features - 10 classes - 0 missing values
No data.
203 runs0 likes0 downloads0 reach0 impact
878 instances - 7455 features - 10 classes - 0 missing values
This is one of 41 drug design datasets. The datasets with 1143 features are formed using Adriana.Code software (www.molecular-networks.com/software/adrianacode). The molecules and outputs are taken…
0 runs0 likes0 downloads0 reach0 impact
6 instances - 1143 features - 0 classes - 0 missing values
This is one of 41 drug design datasets. The datasets with 1143 features are formed using Adriana.Code software (www.molecular-networks.com/software/adrianacode). The molecules and outputs are taken…
0 runs0 likes0 downloads0 reach0 impact
34 instances - 1143 features - 0 classes - 0 missing values
This is one of 41 drug design datasets. The datasets with 1143 features are formed using Adriana.Code software (www.molecular-networks.com/software/adrianacode). The molecules and outputs are taken…
0 runs0 likes0 downloads0 reach0 impact
15 instances - 10 features - 0 classes - 0 missing values
No data.
414 runs0 likes0 downloads0 reach0 impact
690 instances - 8262 features - 10 classes - 0 missing values
No data.
220 runs0 likes0 downloads0 reach0 impact
336 instances - 7903 features - 6 classes - 0 missing values
No data.
108 runs0 likes0 downloads0 reach0 impact
927 instances - 10129 features - 7 classes - 0 missing values
No data.
377 runs0 likes0 downloads0 reach0 impact
913 instances - 3101 features - 10 classes - 0 missing values
No data.
219 runs0 likes0 downloads0 reach0 impact
414 instances - 6430 features - 9 classes - 0 missing values
No data.
215 runs0 likes0 downloads0 reach0 impact
204 instances - 5833 features - 6 classes - 0 missing values
No data.
426 runs0 likes0 downloads0 reach0 impact
2463 instances - 2001 features - 17 classes - 0 missing values
No data.
67 runs0 likes0 downloads0 reach0 impact
9558 instances - 26833 features - 44 classes - 0 missing values