Study
AutoML Benchmark Training Datasets

AutoML Benchmark Training Datasets

Created 21-04-2022 by Matthias Feurer Visibility: public
Search these data sets in more detail
One of the biggest challenges of an auto dealership purchasing a used car at an auto auction is the risk of that the vehicle might have serious issues that prevent it from being sold to customers. The…
3 runs0 likes3 downloads3 reach14 impact
72983 instances - 33 features - 2 classes - 149271 missing values
The goal of this challenge is to expose the research community to real world datasets of interest to 4Paradigm. All datasets are formatted in a uniform way, though the type of data might differ. The…
1 runs1 likes3 downloads4 reach18 impact
5832 instances - 309 features - 2 classes - 0 missing values
The goal of this challenge is to expose the research community to real world datasets of interest to 4Paradigm. All datasets are formatted in a uniform way, though the type of data might differ. The…
1 runs0 likes0 downloads0 reach18 impact
3140 instances - 260 features - 2 classes - 0 missing values
__Changes w.r.t. version 1: renamed variables such that they match description.__ ### Dataset: Wilt Data Set ### Abstract: High-resolution Remote Sensing data set (Quickbird). Small number of training…
10966 runs0 likes2 downloads2 reach22 impact
4839 instances - 6 features - 2 classes - 0 missing values
__Changes w.r.t. version 1: included one target factor with 7 levels as target variable for the classification. Also deleted the previous 7 binary target variables.__ A dataset of steel plates'…
9051 runs1 likes3 downloads4 reach16 impact
1941 instances - 28 features - 7 classes - 0 missing values
### Description __Changes to version 1:__ all categorical features transformed as such. This dataset represents a set of possible advertisements on Internet pages. ### Sources (a) Creator and donor:…
1432 runs0 likes5 downloads5 reach24 impact
3279 instances - 1559 features - 2 classes - 0 missing values
In human civilisation, languages evolved first, and then came scripts. The Devanagari script is one of the oldest scripts of India, having evolved from the ancient Brahmi script. It came to be adopted…
43 runs2 likes8 downloads10 reach15 impact
92000 instances - 1025 features - 46 classes - 0 missing values
The satellite dataset comprises of features extracted from satellite observations. In particular, each image was taken under four different light wavelength, two in visible light (green and red) and…
2078 runs3 likes70 downloads73 reach34 impact
5100 instances - 37 features - 2 classes - 0 missing values
A dataset relating characteristics of telephony account features and usage and whether or not the customer churned. Originally used in [Discovering Knowledge in Data: An Introduction to Data…
7512 runs2 likes9 downloads11 reach26 impact
5000 instances - 21 features - 2 classes - 0 missing values
Originally from the StatLog project. The raw data is still available on [UCI](https://archive.ics.uci.edu/ml/datasets/Molecular+Biology+(Splice-junction+Gene+Sequences)). The data consists of 3,186…
7063 runs0 likes9 downloads9 reach26 impact
3186 instances - 181 features - 3 classes - 0 missing values
This data was gathered from participants in experimental speed dating events from 2002-2004. During the events, the attendees would have a four-minute "first date" with every other participant of the…
28211 runs19 likes170 downloads189 reach36 impact
8378 instances - 121 features - 2 classes - 18372 missing values
Citation Request: This dataset is public available for research. The details are described in [Cortez et al., 2009]. Please include this citation if you plan to use this database: P. Cortez, A.…
64 runs2 likes6 downloads8 reach17 impact
4898 instances - 12 features - 7 classes - 0 missing values
This data has been prepared to analyze factors related to readmission as well as other outcomes pertaining to patients with diabetes. The data are submitted on behalf of the Center for Clinical and…
0 runs2 likes18 downloads20 reach17 impact
101766 instances - 50 features - 3 classes - 0 missing values
Creators: Renata Cristina Barros Madeo (Madeo, R. C. B.) Priscilla Koch Wagner (Wagner, P. K.) Sarajane Marques Peres (Peres, S. M.) {renata.si, priscilla.wagner, sarajane} at usp.br…
26636 runs1 likes18 downloads19 reach40 impact
9873 instances - 33 features - 5 classes - 0 missing values
Source: Rami Mustafa A Mohammad ( University of Huddersfield, rami.mohammad '@' hud.ac.uk, rami.mustafa.a '@' gmail.com) Lee McCluskey (University of Huddersfield,t.l.mccluskey '@' hud.ac.uk ) Fadi…
51677 runs1 likes29 downloads30 reach29 impact
11055 instances - 31 features - 2 classes - 0 missing values
Predict a biological response of molecules from their chemical properties. Each row in this data set represents a molecule. The first column contains experimental data describing an actual biological…
48680 runs2 likes40 downloads42 reach36 impact
3751 instances - 1777 features - 2 classes - 0 missing values
### Description MicroMass (pure spectra version) is a dataset to explore machine learning approaches for the identification of microorganisms from mass-spectrometry data. ### Source ``` Pierre Mahé,…
39941 runs1 likes17 downloads18 reach100 impact
571 instances - 1301 features - 20 classes - 0 missing values
QSAR biodegradation Data Set * Abstract: Data set containing values for 41 attributes (molecular descriptors) used to classify 1055 chemicals into 2 classes (ready and not ready biodegradable). *…
267861 runs1 likes25 downloads26 reach30 impact
1055 instances - 42 features - 2 classes - 0 missing values
Forecasting skewed biased stochastic ozone days: analyses, solutions and beyond, Knowledge and Information Systems, Vol. 14, No. 3, 2008. 1 . Abstract: Two ground ozone level data sets are included in…
188264 runs1 likes20 downloads21 reach30 impact
2534 instances - 73 features - 2 classes - 0 missing values
Source: James P Bridge, Sean B Holden and Lawrence C Paulson University of Cambridge Computer Laboratory William Gates Building 15 JJ Thomson Avenue Cambridge CB3 0FD UK +44 (0)1223 763500…
26642 runs1 likes21 downloads22 reach45 impact
6118 instances - 52 features - 6 classes - 0 missing values
Author: Volker Lohweg (University of Applied Sciences, Ostwestfalen-Lippe) Source: [UCI](https://archive.ics.uci.edu/ml/datasets/banknote+authentication) - 2012 Please cite:…
138170 runs6 likes40 downloads46 reach34 impact
1372 instances - 5 features - 2 classes - 0 missing values
Dataset creator and donator: Zhi Liu, e-mail: liuzhi8673 '@' gmail.com, institution: National Engineering Research Center for E-Learning, Hubei Wuhan, China Data Set Information: dataset are derived…
65168 runs2 likes49 downloads51 reach217 impact
1500 instances - 10001 features - 50 classes - 0 missing values
One of the NASA Metrics Data Program defect data sets. Data from flight software for earth orbiting satellite. Data comes from McCabe and Halstead features extractors of source code. These features…
115699 runs0 likes17 downloads17 reach28 impact
1458 instances - 38 features - 2 classes - 0 missing values
Jarkko Salojarvi, Kai Puolamaki, Jaana Simola, Lauri Kovanen, Ilpo Kojo, Samuel Kaski. Inferring Relevance from Eye Movements: Feature Extraction. Helsinki University of Technology, Publications in…
440 runs1 likes12 downloads13 reach16 impact
10936 instances - 28 features - 3 classes - 0 missing values
The MNIST database of handwritten digits with 784 features, raw data available at: http://yann.lecun.com/exdb/mnist/. It can be split in a training set of the first 60,000 examples, and a test set of…
13317 runs9 likes82 downloads91 reach38 impact
70000 instances - 785 features - 10 classes - 0 missing values
PRO FOOTBALL SCORES (raw data appears after the description below) How well do the oddsmakers of Las Vegas predict the outcome of professional football games? Is there really a home field advantage -…
16080 runs0 likes0 downloads0 reach0 impact
672 instances - 10 features - 2 classes - 1200 missing values
One of the datasets used in the book "Analyzing Categorical Data," by Jeffrey S. Simonoff. It contains data on the DMFT Index (Decayed, Missing, and Filled Teeth) before and after different prevention…
27866 runs0 likes0 downloads0 reach0 impact
797 instances - 5 features - 6 classes - 0 missing values
analcatdata A collection of data sets used in the book "Analyzing Categorical Data," by Jeffrey S. Simonoff, Springer-Verlag, New York, 2003. The submission consists of a zip file containing two…
29026 runs0 likes8 downloads8 reach36 impact
841 instances - 71 features - 4 classes - 0 missing values
Data on educational transitions for a sample of 500 Irish schoolchildren aged 11 in 1967. The data were collected by Greaney and Kelleghan (1984), and reanalyzed by Raftery and Hout (1985, 1993). ###…
16174 runs0 likes0 downloads0 reach0 impact
500 instances - 6 features - 2 classes - 32 missing values
The objective was to determine which seedlots in a species are best for soil conservation in seasonally dry hill country. Determination is found by measurement of height, diameter by height, survival,…
27620 runs0 likes12 downloads12 reach11 impact
736 instances - 20 features - 5 classes - 448 missing values
No data.
2198 runs1 likes17 downloads18 reach10 impact
1484 instances - 9 features - 10 classes - 0 missing values
1. Title: Contraceptive Method Choice 2. Sources: (a) Origin: This dataset is a subset of the 1987 National Indonesia Contraceptive Prevalence Survey (b) Creator: Tjen-Sien Lim (limt@stat.wisc.edu)…
24352 runs0 likes21 downloads21 reach12 impact
1473 instances - 10 features - 3 classes - 0 missing values