OpenML
AutoML Benchmark All Classification

AutoML Benchmark All Classification

Created 19-11-2020 by Pieter Gijsbers Visibility: public
Search these data sets in more detail
This is the full version of the KDD Cup 2009 dataset Customer Relationship Management (CRM) is a key element of modern marketing strategies. The KDD Cup 2009 offers the opportunity to work on large…
0 runs0 likes0 downloads0 reach2 impact
50000 instances - 14892 features - 2 classes - 19658569 missing values
This is a smaller version of the original dataset, containing 1M rows. ### Attribute Information * The first column is the class label (1 for signal, 0 for background) * 21 low-level features…
0 runs0 likes0 downloads0 reach2 impact
1000000 instances - 29 features - 2 classes - 0 missing values
INTRUSION DETECTOR LEARNING Software to detect network intrusions protects a computer network from unauthorized users, including perhaps insiders. The intrusion detector learning task is to build a…
0 runs1 likes0 downloads1 reach3 impact
4898431 instances - 42 features - 23 classes - 0 missing values
Training dataset of the 'Porto Seguros Safe Driver Prediction' Kaggle challenge [https://www.kaggle.com/c/porto-seguro-safe-driver-prediction]. The goal was to predict whether a driver will file an…
0 runs0 likes0 downloads0 reach2 impact
595212 instances - 58 features - 2 classes - 846458 missing values
User profile data for San Francisco OkCupid users published in [Kim, A. Y., & Escobedo-Land, A. (2015). OKCupid data for introductory statistics and data science courses. Journal of Statistics…
0 runs0 likes0 downloads0 reach3 impact
50789 instances - 20 features - 3 classes - 154107 missing values
This is the same data as version 5 (OpenML ID = 1220) with '_id' features coded as nominal factor variables.
0 runs0 likes0 downloads0 reach2 impact
39948 instances - 12 features - 2 classes - 0 missing values
Incident reports from the San Franciso Police Department between January 2003 and May 2018, provided by the City and County of San Francisco. The dataset was downloaded on 05.11.2018. from…
0 runs1 likes0 downloads1 reach1 impact
2215023 instances - 9 features - 2 classes - 0 missing values
This is a "supervised learning" challenge in machine learning. We are making available 30 datasets, all pre-formatted in given feature representations (this means that each example consists of a fixed…
12 runs0 likes3 downloads3 reach20 impact
65196 instances - 28 features - 100 classes - 0 missing values
SOURCE: [ChaLearn Automatic Machine Learning Challenge (AutoML)](https://competitions.codalab.org/competitions/2321), [ChaLearn](https://automl.chalearn.org/data) This is a "supervised learning"…
14 runs0 likes1 downloads1 reach20 impact
83733 instances - 55 features - 4 classes - 0 missing values
SOURCE: [ChaLearn Automatic Machine Learning Challenge (AutoML)](https://competitions.codalab.org/competitions/2321), [ChaLearn](https://automl.chalearn.org/data) This is a "supervised learning"…
0 runs0 likes2 downloads2 reach18 impact
416188 instances - 61 features - 355 classes - 0 missing values
SOURCE: [ChaLearn Automatic Machine Learning Challenge (AutoML)](https://competitions.codalab.org/competitions/2321), [ChaLearn](https://automl.chalearn.org/data) This is a "supervised learning"…
15 runs0 likes1 downloads1 reach21 impact
58310 instances - 181 features - 10 classes - 0 missing values
The goal of this challenge is to expose the research community to real world datasets of interest to 4Paradigm. All datasets are formatted in a uniform way, though the type of data might differ. The…
15 runs0 likes1 downloads1 reach21 impact
10000 instances - 7201 features - 10 classes - 0 missing values
SOURCE: [ChaLearn Automatic Machine Learning Challenge (AutoML)](https://competitions.codalab.org/competitions/2321), [ChaLearn](https://automl.chalearn.org/data) This is a "supervised learning"…
7 runs0 likes1 downloads1 reach19 impact
8237 instances - 801 features - 7 classes - 0 missing values
SOURCE: [ChaLearn Automatic Machine Learning Challenge (AutoML)](https://competitions.codalab.org/competitions/2321), [ChaLearn](https://automl.chalearn.org/data) This is a "supervised learning"…
8 runs0 likes2 downloads2 reach19 impact
10000 instances - 2001 features - 5 classes - 0 missing values
One of the biggest challenges of an auto dealership purchasing a used car at an auto auction is the risk of that the vehicle might have serious issues that prevent it from being sold to customers. The…
3 runs0 likes3 downloads3 reach14 impact
72983 instances - 33 features - 2 classes - 149271 missing values
The goal of this challenge is to expose the research community to real world datasets of interest to 4Paradigm. All datasets are formatted in a uniform way, though the type of data might differ. The…
14 runs0 likes1 downloads1 reach21 impact
20000 instances - 4297 features - 2 classes - 0 missing values
The goal of this challenge is to expose the research community to real world datasets of interest to 4Paradigm. All datasets are formatted in a uniform way, though the type of data might differ. The…
13 runs1 likes1 downloads2 reach21 impact
20000 instances - 4297 features - 2 classes - 0 missing values
The goal of this challenge is to expose the research community to real world datasets of interest to 4Paradigm. All datasets are formatted in a uniform way, though the type of data might differ. The…
0 runs0 likes0 downloads0 reach19 impact
3153 instances - 971 features - 2 classes - 0 missing values
The goal of this challenge is to expose the research community to real world datasets of interest to 4Paradigm. All datasets are formatted in a uniform way, though the type of data might differ. The…
0 runs0 likes1 downloads1 reach17 impact
100 instances - 10001 features - 2 classes - 0 missing values
The goal of this challenge is to expose the research community to real world datasets of interest to 4Paradigm. All datasets are formatted in a uniform way, though the type of data might differ. The…
1 runs0 likes1 downloads1 reach18 impact
4147 instances - 49 features - 2 classes - 0 missing values
Byron Roe (byronroe '@' umich.edu) Department of Physics University of Michigan Ann Arbor, MI 48109 This dataset is taken from the MiniBooNE experiment and is used to distinguish electron neutrinos…
12 runs0 likes4 downloads4 reach14 impact
130064 instances - 51 features - 2 classes - 0 missing values
The goal of this challenge is to expose the research community to real world datasets of interest to 4Paradigm. All datasets are formatted in a uniform way, though the type of data might differ. The…
0 runs0 likes1 downloads1 reach18 impact
425240 instances - 79 features - 2 classes - 2734000 missing values
SOURCE: [ChaLearn Automatic Machine Learning Challenge (AutoML)](https://competitions.codalab.org/competitions/2321), [ChaLearn](https://automl.chalearn.org/data) This is a "supervised learning"…
8 runs0 likes2 downloads2 reach20 impact
5124 instances - 21 features - 2 classes - 0 missing values
The goal of this challenge is to expose the research community to real world datasets of interest to 4Paradigm. All datasets are formatted in a uniform way, though the type of data might differ. The…
1 runs1 likes3 downloads4 reach18 impact
5832 instances - 309 features - 2 classes - 0 missing values
The goal of this challenge is to expose the research community to real world datasets of interest to 4Paradigm. All datasets are formatted in a uniform way, though the type of data might differ. The…
1 runs0 likes0 downloads0 reach18 impact
3140 instances - 260 features - 2 classes - 0 missing values
SOURCE: [ChaLearn Automatic Machine Learning Challenge (AutoML)](https://competitions.codalab.org/competitions/2321), [ChaLearn](https://automl.chalearn.org/data) This is a "supervised learning"…
8 runs0 likes2 downloads2 reach19 impact
2984 instances - 145 features - 2 classes - 0 missing values
SOURCE: [ChaLearn Automatic Machine Learning Challenge (AutoML)](https://competitions.codalab.org/competitions/2321), [ChaLearn](https://automl.chalearn.org/data) This is a "supervised learning"…
8 runs1 likes2 downloads3 reach20 impact
5418 instances - 1637 features - 2 classes - 0 missing values
This is the dataset used for the 2016 IDA Industrial Challenge, courtesy of Scania. For a full description, see http://archive.ics.uci.edu/ml/datasets/IDA2016Challenge . This dataset contains both the…
9 runs0 likes2 downloads2 reach19 impact
76000 instances - 171 features - 2 classes - 1078695 missing values
### Description ### This dataset is part of a collection datasets based on the game "Jungle Chess" (a.k.a. Dou Shou Qi). For a description of the rules, please refer to the paper (link attached). The…
6905 runs0 likes6 downloads6 reach20 impact
44819 instances - 7 features - 3 classes - 0 missing values
Fashion-MNIST is a dataset of Zalando's article images, consisting of a training set of 60,000 examples and a test set of 10,000 examples. Each example is a 28x28 grayscale image, associated with a…
455 runs0 likes12 downloads12 reach27 impact
70000 instances - 785 features - 10 classes - 0 missing values
The instances were drawn randomly from a database of 7 outdoor images. The images were hand-segmented to create a classification for every pixel. Each instance is a 3x3 region. __Major changes w.r.t.…
9973 runs0 likes8 downloads8 reach27 impact
2310 instances - 20 features - 7 classes - 0 missing values
__Changes w.r.t. version 1: renamed variables such that they match description.__ ### Dataset: Wilt Data Set ### Abstract: High-resolution Remote Sensing data set (Quickbird). Small number of training…
10966 runs0 likes2 downloads2 reach22 impact
4839 instances - 6 features - 2 classes - 0 missing values
__Changes w.r.t. version 1: included one target factor with 7 levels as target variable for the classification. Also deleted the previous 7 binary target variables.__ A dataset of steel plates'…
9051 runs1 likes3 downloads4 reach16 impact
1941 instances - 28 features - 7 classes - 0 missing values
This dataset was retrieved 2014-11-14 from the UCI site and converted to the ARFF format. __Major changes w.r.t. version 3: dataset from UCI that matches description and data types__ ### Feature…
4207 runs1 likes10 downloads11 reach16 impact
690 instances - 15 features - 2 classes - 0 missing values
### Description __Changes to version 1:__ all categorical features transformed as such. This dataset represents a set of possible advertisements on Internet pages. ### Sources (a) Creator and donor:…
1432 runs0 likes5 downloads5 reach25 impact
3279 instances - 1559 features - 2 classes - 0 missing values
This database was derived from a simple hierarchical decision model originally developed for the demonstration of DEX (M. Bohanec, V. Rajkovic: Expert system for decision making. Sistemica 1(1), pp.…
7180 runs0 likes11 downloads11 reach26 impact
1728 instances - 7 features - 4 classes - 0 missing values
The satellite dataset comprises of features extracted from satellite observations. In particular, each image was taken under four different light wavelength, two in visible light (green and red) and…
2078 runs3 likes70 downloads73 reach34 impact
5100 instances - 37 features - 2 classes - 0 missing values
A dataset relating characteristics of telephony account features and usage and whether or not the customer churned. Originally used in [Discovering Knowledge in Data: An Introduction to Data…
7513 runs2 likes9 downloads11 reach27 impact
5000 instances - 21 features - 2 classes - 0 missing values
Source: [UCI](https://archive.ics.uci.edu/ml/datasets/Statlog+(Shuttle)) Donor: Jason Catlett Basser Department of Computer Science, University of Sydney, N.S.W., Australia Data Set Information:…
12 runs0 likes4 downloads4 reach26 impact
58000 instances - 10 features - 7 classes - 0 missing values
Originally from the StatLog project. The raw data is still available on [UCI](https://archive.ics.uci.edu/ml/datasets/Molecular+Biology+(Splice-junction+Gene+Sequences)). The data consists of 3,186…
7063 runs0 likes9 downloads9 reach27 impact
3186 instances - 181 features - 3 classes - 0 missing values
This database contains all legal 8-ply positions in the game of connect-4 in which neither player has won yet, and in which the next move is not forced. Attributes represent board positions on a 6x6…
9766 runs0 likes12 downloads12 reach29 impact
67557 instances - 43 features - 3 classes - 0 missing values
Citation Request: This dataset is public available for research. The details are described in [Cortez et al., 2009]. Please include this citation if you plan to use this database: P. Cortez, A.…
64 runs2 likes6 downloads8 reach17 impact
4898 instances - 12 features - 7 classes - 0 missing values
The data is cleaned, regularized and encrypted global equity data. The first 21 columns (feature1 - feature21) are features, and target is the binary class you’re trying to predict.
3039 runs2 likes5 downloads7 reach17 impact
96320 instances - 22 features - 2 classes - 0 missing values
This data has been prepared to analyze factors related to readmission as well as other outcomes pertaining to patients with diabetes. The data are submitted on behalf of the Center for Clinical and…
0 runs2 likes18 downloads20 reach17 impact
101766 instances - 50 features - 3 classes - 0 missing values
Creators: Renata Cristina Barros Madeo (Madeo, R. C. B.) Priscilla Koch Wagner (Wagner, P. K.) Sarajane Marques Peres (Peres, S. M.) {renata.si, priscilla.wagner, sarajane} at usp.br…
26636 runs1 likes18 downloads19 reach40 impact
9873 instances - 33 features - 5 classes - 0 missing values
Source: Rami Mustafa A Mohammad ( University of Huddersfield, rami.mohammad '@' hud.ac.uk, rami.mustafa.a '@' gmail.com) Lee McCluskey (University of Huddersfield,t.l.mccluskey '@' hud.ac.uk ) Fadi…
51677 runs1 likes29 downloads30 reach30 impact
11055 instances - 31 features - 2 classes - 0 missing values
### Description The data consists of real historical data collected from 2010 & 2011. Employees are manually allowed or denied access to resources over time. The data is used to create an algorithm…
35721 runs0 likes23 downloads23 reach32 impact
32769 instances - 10 features - 2 classes - 0 missing values
Predict a biological response of molecules from their chemical properties. Each row in this data set represents a molecule. The first column contains experimental data describing an actual biological…
48680 runs2 likes40 downloads42 reach37 impact
3751 instances - 1777 features - 2 classes - 0 missing values
This is the original version of the famous covertype dataset in ARFF format. Predicting forest cover type from cartographic variables only (no remotely sensed data). The actual forest cover type for a…
9 runs1 likes14 downloads15 reach25 impact
581012 instances - 55 features - 7 classes - 0 missing values
Prediction task is to determine whether a person makes over 50K a year. Extraction was done by Barry Becker from the 1994 Census database. A set of reasonably clean records was extracted using the…
14637 runs4 likes34 downloads38 reach40 impact
48842 instances - 15 features - 2 classes - 6465 missing values
### Description MicroMass (pure spectra version) is a dataset to explore machine learning approaches for the identification of microorganisms from mass-spectrometry data. ### Source ``` Pierre Mahé,…
39941 runs1 likes17 downloads18 reach100 impact
571 instances - 1301 features - 20 classes - 0 missing values
QSAR biodegradation Data Set * Abstract: Data set containing values for 41 attributes (molecular descriptors) used to classify 1055 chemicals into 2 classes (ready and not ready biodegradable). *…
267861 runs1 likes25 downloads26 reach30 impact
1055 instances - 42 features - 2 classes - 0 missing values
The aim of this dataset is to distinguish between nasal (class 0) and oral sounds (class 1). Five different attributes were chosen to characterize each vowel: they are the amplitudes of the five first…
218957 runs6 likes41 downloads47 reach32 impact
5404 instances - 6 features - 2 classes - 0 missing values
Forecasting skewed biased stochastic ozone days: analyses, solutions and beyond, Knowledge and Information Systems, Vol. 14, No. 3, 2008. 1 . Abstract: Two ground ozone level data sets are included in…
188264 runs1 likes20 downloads21 reach31 impact
2534 instances - 73 features - 2 classes - 0 missing values
1. Data set title: Nomao Data Set 2. Abstract: Nomao collects data about places (name, phone, localization...) from many sources. Deduplication consists in detecting what data refer to the same place.…
67704 runs0 likes16 downloads16 reach30 impact
34465 instances - 119 features - 2 classes - 0 missing values
Source: James P Bridge, Sean B Holden and Lawrence C Paulson University of Cambridge Computer Laboratory William Gates Building 15 JJ Thomson Avenue Cambridge CB3 0FD UK +44 (0)1223 763500…
26642 runs1 likes21 downloads22 reach45 impact
6118 instances - 52 features - 6 classes - 0 missing values
### Description This is a data set containing 1080 documents of free text business descriptions of Brazilian companies categorized into a subset of 9 categories. ### Source ``` Patrick Marques…
34579 runs0 likes17 downloads17 reach59 impact
1080 instances - 857 features - 9 classes - 0 missing values
Data taken from the Blood Transfusion Service Center in Hsin-Chu City in Taiwan -- this is a classification problem. To demonstrate the RFMTC marketing model (a modified version of RFM), this study…
468690 runs6 likes101 downloads107 reach46 impact
748 instances - 5 features - 2 classes - 0 missing values
The data is related with direct marketing campaigns of a Portuguese banking institution. The marketing campaigns were based on phone calls. Often, more than one contact to the same client was…
65709 runs3 likes41 downloads44 reach34 impact
45211 instances - 17 features - 2 classes - 0 missing values
Dataset creator and donator: Zhi Liu, e-mail: liuzhi8673 '@' gmail.com, institution: National Engineering Research Center for E-Learning, Hubei Wuhan, China Data Set Information: dataset are derived…
65168 runs2 likes49 downloads51 reach218 impact
1500 instances - 10001 features - 50 classes - 0 missing values
Airlines Dataset Inspired in the regression dataset from Elena Ikonomovska. The task is to predict whether a given flight will be delayed, given the information of the scheduled departure.
291 runs0 likes31 downloads31 reach17 impact
539383 instances - 8 features - 2 classes - 0 missing values
Datasets from ACM KDD Cup (http://www.sigkdd.org/kddcup/index.php) KDD Cup 2009 http://www.kddcup-orange.com Converted to ARFF format by TunedIT Customer Relationship Management (CRM) is a key element…
223 runs0 likes18 downloads18 reach20 impact
50000 instances - 231 features - 2 classes - 8024152 missing values
One of the NASA Metrics Data Program defect data sets. Data from software for storage management for receiving and processing ground data. Data comes from McCabe and Halstead features extractors of…
161516 runs2 likes29 downloads31 reach30 impact
2109 instances - 22 features - 2 classes - 0 missing values
One of the NASA Metrics Data Program defect data sets. Data from flight software for earth orbiting satellite. Data comes from McCabe and Halstead features extractors of source code. These features…
115699 runs0 likes17 downloads17 reach28 impact
1458 instances - 38 features - 2 classes - 0 missing values
The objective was to determine which seedlots in a species are best for soil conservation in seasonally dry hill country. Determination is found by measurement of height, diameter by height, survival,…
27620 runs0 likes12 downloads12 reach11 impact
736 instances - 20 features - 5 classes - 448 missing values
No data.
2198 runs1 likes17 downloads18 reach10 impact
1484 instances - 9 features - 10 classes - 0 missing values
NAME vehicle silhouettes PURPOSE to classify a given silhouette as one of four types of vehicle, using a set of features extracted from the silhouette. The vehicle may be viewed from one of many…
31976 runs2 likes35 downloads37 reach14 impact
846 instances - 19 features - 4 classes - 0 missing values
This dataset classifies people described by a set of attributes as good or bad credit risks. This dataset comes with a cost matrix: ``` Good Bad (predicted) Good 0 1 (actual) Bad 5 0 ``` It is worse…
506311 runs28 likes312 downloads340 reach35 impact
1000 instances - 21 features - 2 classes - 0 missing values
1. Title: Contraceptive Method Choice 2. Sources: (a) Origin: This dataset is a subset of the 1987 National Indonesia Contraceptive Prevalence Survey (b) Creator: Tjen-Sien Lim (limt@stat.wisc.edu)…
24352 runs0 likes21 downloads21 reach12 impact
1473 instances - 10 features - 3 classes - 0 missing values
One of a set of 6 datasets describing features of handwritten numerals (0 - 9) extracted from a collection of Dutch utility maps. Corresponding patterns in different datasets correspond to the same…
37793 runs0 likes18 downloads18 reach17 impact
2000 instances - 217 features - 10 classes - 0 missing values
Author: Alen Shapiro Source: [UCI](https://archive.ics.uci.edu/ml/datasets/Chess+(King-Rook+vs.+King-Pawn)) Please cite: [UCI citation policy](https://archive.ics.uci.edu/ml/citation_policy.html) 1.…
274238 runs1 likes44 downloads45 reach19 impact
3196 instances - 37 features - 2 classes - 0 missing values