OpenML

dis (1)

dis-pmlb

31 runs0 likes0 downloads0 reach0 impact
3772 instances - 30 features - 2 classes - 0 missing values

parity5-pmlb

32 runs0 likes0 downloads0 reach0 impact
32 instances - 6 features - 2 classes - 0 missing values

Twenty two observations of the Dwarf planet Ceres as observed by Giueseppe Piazzi and published in the September edition of Monatlicher Correspondenz in 1801. These were the measurements used by Gauss…

0 runs0 likes0 downloads0 reach0 impact
22 instances - 9 features - classes - 17 missing values

olympic-marathon-men (1)

Gold medal winning pace in minutes per kilometer for the men's marathon from the first 1896 until 2016.

0 runs0 likes0 downloads0 reach0 impact
28 instances - 2 features - classes - 0 missing values

delays_zurich_transport (1)

Zurich public transport delay data 2016-10-30 03:30:00 CET - 2016-11-27 01:20:00 CET cleaned and prepared at Open Data Day 2017.

0 runs0 likes0 downloads0 reach0 impact
5465575 instances - 15 features - 0 classes - 132617 missing values

Honey_bee_Seasonal_mortality (1)

Data from https://doi.org/10.5281/zenodo.269636

0 runs0 likes0 downloads0 reach0 impact
4758 instances - 39 features - classes - 0 missing values

pathogen_survey_dataset (3)

#study_1

0 runs0 likes0 downloads0 reach0 impact
944 instances - 17 features - classes - 0 missing values

Satellite (1)

The satellite dataset comprises of features extracted from satellite observations. In particular, each image was taken under four different light wavelength, two in visible light (green and red) and…

2078 runs3 likes70 downloads73 reach34 impact
5100 instances - 37 features - 2 classes - 0 missing values

Speech (1)

"The speech dataset was also provided by (see citation request) and contains real world data from recorded English language. The normal class contains data from persons having an American accent…

1599 runs0 likes0 downloads0 reach0 impact
3686 instances - 401 features - 2 classes - 0 missing values

HappinessRank_2015 (1)

The happiness scores and rankings use data from the Gallup World Poll. The scores are based on answers to the main life evaluation question asked in the poll. This question, known as the Cantril…

2 runs0 likes0 downloads0 reach0 impact
158 instances - 12 features - 0 classes - 0 missing values

Climate (1)

This file holds global land temperatures by country

0 runs0 likes0 downloads0 reach0 impact
577462 instances - 4 features - classes - 64563 missing values

Climate (2)

holds information on average temperature per country

0 runs0 likes0 downloads0 reach0 impact
577462 instances - 4 features - classes - 64563 missing values

Run_or_walk_information (1)

This dataset is gather to detect whether a person is running or walking based on deep neural networks and sensor data collected from iOS devices. The dataset represents 88588 sensor data samples…

1 runs0 likes4 downloads4 reach14 impact
88588 instances - 7 features - 2 classes - 0 missing values

Devnagari-Script (1)

In human civilisation, languages evolved first, and then came scripts. The Devanagari script is one of the oldest scripts of India, having evolved from the ancient Brahmi script. It came to be adopted…

43 runs2 likes8 downloads10 reach15 impact
92000 instances - 1025 features - 46 classes - 0 missing values

CIFAR_10_small (1)

This is a 20,000 instance sample of the original CIFAR-10 dataset. Sampled randomly and stratified, with 2000 examples per class. Training and test set are merged. Find the corresponding task for the…

380 runs0 likes0 downloads0 reach0 impact
20000 instances - 3073 features - 10 classes - 0 missing values

CIFAR_10 (1)

0. airplane 1. automobile 2. bird 3. cat 4. deer 5. dog 6. frog 7. horse 8. ship 9. truck CIFAR-10 contains 6000 images per class. The original train-test split randomly divided these into 5000 train…

160 runs0 likes6 downloads6 reach21 impact
60000 instances - 3073 features - 10 classes - 0 missing values

Titanic (1)

The original Titanic dataset, describing the survival status of individual passengers on the Titanic. The titanic data does not contain information from the crew, but it does contain actual ages of…

0 runs3 likes45 downloads48 reach12 impact
1309 instances - 14 features - 2 classes - 3855 missing values

MiceProtein (4)

Expression levels of 77 proteins measured in the cerebral cortex of 8 classes of control and Down syndrome mice exposed to context fear conditioning, a task used to assess associative learning. The…

9545 runs0 likes0 downloads0 reach21 impact
1080 instances - 82 features - 8 classes - 1396 missing values

feedback (1)

feedback

0 runs0 likes0 downloads0 reach0 impact
38932 instances - 3 features - classes - 0 missing values

feedback_1 (1)

feedback_1

0 runs0 likes0 downloads0 reach0 impact
38932 instances - 3 features - classes - 0 missing values

collins (4)

Data used in an analysis of the Brown and Frown corpora for my doctoral dissertation titled ``Variations in Written English: Characterizing Authors' Rhetorical Language Choices Across Corpora of…

2048 runs0 likes1 downloads1 reach12 impact
1000 instances - 24 features - 30 classes - 0 missing values

car (3)

This database was derived from a simple hierarchical decision model originally developed for the demonstration of DEX (M. Bohanec, V. Rajkovic: Expert system for decision making. Sistemica 1(1), pp.…

7180 runs0 likes11 downloads11 reach25 impact
1728 instances - 7 features - 4 classes - 0 missing values

Bike (1)

Author: Gregory Gay, Tim Menzies, Misty Davies, Karen Gundy-Burlet Source: [Zenodo](https://zenodo.org/record/322475) Please cite: Misty Davies. (2009). bike [Data set]. Zenodo. DOI:…

0 runs0 likes0 downloads0 reach0 impact
4435 instances - 11 features - classes - 0 missing values

Internet-Advertisements (2)

### Description __Changes to version 1:__ all categorical features transformed as such. This dataset represents a set of possible advertisements on Internet pages. ### Sources (a) Creator and donor:…

1432 runs0 likes5 downloads5 reach24 impact
3279 instances - 1559 features - 2 classes - 0 missing values

mfeat-pixel (3)

One of a set of 6 datasets describing features of handwritten numerals (0 - 9) extracted from a collection of Dutch utility maps. The maps were scanned in 8 bit grey value at density of 400dpi,…

11351 runs0 likes0 downloads0 reach0 impact
2000 instances - 241 features - 10 classes - 0 missing values

Australian (4)

This dataset was retrieved 2014-11-14 from the UCI site and converted to the ARFF format. __Major changes w.r.t. version 3: dataset from UCI that matches description and data types__ ### Feature…

4207 runs1 likes10 downloads11 reach15 impact
690 instances - 15 features - 2 classes - 0 missing values

steel-plates-fault (3)

__Changes w.r.t. version 1: included one target factor with 7 levels as target variable for the classification. Also deleted the previous 7 binary target variables.__ A dataset of steel plates'…

9051 runs1 likes3 downloads4 reach16 impact
1941 instances - 28 features - 7 classes - 0 missing values

wilt (2)

__Changes w.r.t. version 1: renamed variables such that they match description.__ ### Dataset: Wilt Data Set ### Abstract: High-resolution Remote Sensing data set (Quickbird). Small number of training…

10966 runs0 likes2 downloads2 reach22 impact
4839 instances - 6 features - 2 classes - 0 missing values

segment (3)

The instances were drawn randomly from a database of 7 outdoor images. The images were hand-segmented to create a classification for every pixel. Each instance is a 3x3 region. __Major changes w.r.t.…

9973 runs0 likes8 downloads8 reach26 impact
2310 instances - 20 features - 7 classes - 0 missing values

tamilnadu-electricity (3)

__Major changes w.r.t. version 2: ignored variable 3 in this upload as this seems to be ea perfect predictor.__ Tamilnadu Electricity Board Hourly Readings dataset. Real-time readings were collected…

0 runs0 likes2 downloads2 reach19 impact
45781 instances - 4 features - 20 classes - 0 missing values

sylva_agnostic (2)

__Major changes w.r.t. version 1: changed binary features to data type factor.__ Dataset from the Agnostic Learning vs. Prior Knowledge Challenge (http://www.agnostic.inf.ethz.ch), which consisted of…

0 runs0 likes0 downloads0 reach0 impact
14395 instances - 217 features - classes - 0 missing values

ada_agnostic (2)

__Major change w.r.t. version 1: updated data type of binary variables to factor type.__ Dataset from the Agnostic Learning vs. Prior Knowledge Challenge (http://www.agnostic.inf.ethz.ch), which…

0 runs0 likes0 downloads0 reach0 impact
4562 instances - 49 features - classes - 0 missing values

climate-model-simulation-crashes (4)

__Major changes w.r.t. version 1: deactivated first two variables as they describe the batch of the experiments and should not be used for prediction. Also transformed the target from numeric to…

8809 runs0 likes4 downloads4 reach13 impact
540 instances - 21 features - 2 classes - 0 missing values

Fashion-MNIST (1)

Fashion-MNIST is a dataset of Zalando's article images, consisting of a training set of 60,000 examples and a test set of 10,000 examples. Each example is a 28x28 grayscale image, associated with a…

455 runs0 likes12 downloads12 reach27 impact
70000 instances - 785 features - 10 classes - 0 missing values

jungle_chess_2pcs_endgame_panther_lion (1)

### Description ### This dataset is part of a collection datasets based on the game "Jungle Chess" (a.k.a. Dou Shou Qi). For a description of the rules, please refer to the paper (link attached). The…

12 runs0 likes0 downloads0 reach0 impact
4704 instances - 47 features - 3 classes - 0 missing values

jungle_chess_2pcs_endgame_panther_lion (2)

### Description ### This dataset is part of a collection datasets based on the game "Jungle Chess" (a.k.a. Dou Shou Qi). For a description of the rules, please refer to the paper (link attached). The…

11 runs0 likes0 downloads0 reach0 impact
4704 instances - 47 features - 3 classes - 0 missing values

jungle_chess_2pcs_endgame_elephant_elephant (1)

### Description ### This dataset is part of a collection datasets based on the game "Jungle Chess" (a.k.a. Dou Shou Qi). For a description of the rules, please refer to the paper (link attached). The…

12 runs0 likes0 downloads0 reach0 impact
2351 instances - 47 features - 2 classes - 0 missing values

jungle_chess_2pcs_endgame_panther_elephant (1)

### Description ### This dataset is part of a collection datasets based on the game "Jungle Chess" (a.k.a. Dou Shou Qi). For a description of the rules, please refer to the paper (link attached). The…

15 runs0 likes0 downloads0 reach0 impact
4704 instances - 47 features - 3 classes - 0 missing values

jungle_chess_2pcs_endgame_complete (1)

### Description ### This dataset is part of a collection datasets based on the game "Jungle Chess" (a.k.a. Dou Shou Qi). For a description of the rules, please refer to the paper (link attached). The…

11 runs0 likes0 downloads0 reach0 impact
44819 instances - 47 features - 3 classes - 10584 missing values

jungle_chess_2pcs_endgame_rat_panther (1)

### Description ### This dataset is part of a collection datasets based on the game "Jungle Chess" (a.k.a. Dou Shou Qi). For a description of the rules, please refer to the paper (link attached). The…

11 runs0 likes0 downloads0 reach0 impact
5880 instances - 47 features - 3 classes - 3528 missing values

jungle_chess_2pcs_endgame_rat_elephant (1)

### Description ### This dataset is part of a collection datasets based on the game "Jungle Chess" (a.k.a. Dou Shou Qi). For a description of the rules, please refer to the paper (link attached). The…

11 runs0 likes0 downloads0 reach0 impact
5880 instances - 47 features - 3 classes - 3528 missing values

jungle_chess_2pcs_endgame_lion_elephant (1)

### Description ### This dataset is part of a collection datasets based on the game "Jungle Chess" (a.k.a. Dou Shou Qi). For a description of the rules, please refer to the paper (link attached). The…

11 runs0 likes0 downloads0 reach0 impact
4704 instances - 47 features - 3 classes - 0 missing values

jungle_chess_2pcs_endgame_rat_rat (1)

### Description ### This dataset is part of a collection datasets based on the game "Jungle Chess" (a.k.a. Dou Shou Qi). For a description of the rules, please refer to the paper (link attached). The…

10 runs0 likes0 downloads0 reach0 impact
3660 instances - 47 features - 2 classes - 0 missing values

jungle_chess_2pcs_endgame_rat_lion (1)

### Description ### This dataset is part of a collection datasets based on the game "Jungle Chess" (a.k.a. Dou Shou Qi). For a description of the rules, please refer to the paper (link attached). The…

10 runs0 likes0 downloads0 reach0 impact
5880 instances - 47 features - 3 classes - 3528 missing values

jungle_chess_2pcs_endgame_lion_lion (1)

### Description ### This dataset is part of a collection datasets based on the game "Jungle Chess" (a.k.a. Dou Shou Qi). For a description of the rules, please refer to the paper (link attached). The…

10 runs0 likes0 downloads0 reach0 impact
2352 instances - 47 features - 2 classes - 0 missing values

Moneyball (2)

In the early 2000s, Billy Beane and Paul DePodesta worked for the Oakland Athletics. While there, they literally changed the game of baseball. They didn't do it using a bat or glove, and they…

3 runs0 likes8 downloads8 reach14 impact
1232 instances - 15 features - 0 classes - 3600 missing values

Short_Track_Speed_Skating (2)

The database covers all the international short track games in the last 5 years. Currently it contains only men's 500m. Detailed lap data including personal time and ranking in each game from seasons…

0 runs0 likes0 downloads0 reach0 impact

gisette (2)

GISETTE is a handwritten digit recognition problem. The problem is to separate the highly confusable digits '4' and '9'. This dataset is one of five datasets of the NIPS 2003 feature selection…

466 runs0 likes0 downloads0 reach0 impact
7000 instances - 5001 features - 2 classes - 0 missing values

jungle_chess_2pcs_raw_endgame_complete (1)

### Description ### This dataset is part of a collection datasets based on the game "Jungle Chess" (a.k.a. Dou Shou Qi). For a description of the rules, please refer to the paper (link attached). The…

6905 runs0 likes6 downloads6 reach19 impact
44819 instances - 7 features - 3 classes - 0 missing values

EMNIST_Balanced (1)

EMNIST Balanced https://www.nist.gov/itl/iad/image-group/emnist-dataset

73 runs0 likes0 downloads0 reach0 impact
131600 instances - 785 features - 47 classes - 0 missing values

mnist_rotation (1)

rotated MNIS digits, from http://www.iro.umontreal.ca/~lisa/twiki/bin/view.cgi/Public/MnistVariations

0 runs0 likes0 downloads0 reach0 impact
62000 instances - 785 features - 0 classes - 0 missing values

SVHN (1)

SVHN is a real-world image dataset for developing machine learning and object recognition algorithms with minimal requirement on data preprocessing and formatting. It can be seen as similar in flavor…

52 runs0 likes0 downloads0 reach0 impact
99289 instances - 3073 features - 10 classes - 0 missing values

USPS (2)

The dataset and this description is made available on http://www-stat.stanford.edu/~tibs/ElemStatLearn/data.html. Normalized handwritten digits, automatically scanned from envelopes by the U.S. Postal…

57 runs0 likes1 downloads1 reach11 impact
9298 instances - 257 features - 10 classes - 0 missing values

Olivetti_Faces (1)

This dataset contains a set of face images taken between April 1992 and April 1994 at AT&T Laboratories Cambridge. As described on the original website: There are ten different images of each of 40…

53 runs0 likes0 downloads0 reach0 impact
400 instances - 4097 features - 40 classes - 0 missing values

UMIST_Faces_Cropped (1)

The Sheffield (previously UMIST) Face Database consists of 564 images of 20 individuals (mixed race/gender/appearance). Each individual is shown in a range of poses from profile to frontal views -…

53 runs0 likes1 downloads1 reach16 impact
575 instances - 10305 features - 20 classes - 0 missing values

spellman_yeast (1)

Two colour spotted cDNA array data set of a series of experiments to identify which genes in Yeast are cell cycle regulated.

0 runs0 likes0 downloads0 reach0 impact
6178 instances - 82 features - classes - 59017 missing values

STL-10 (1)

CIFAR-10 dataset but with some modifications. In particular, each class has fewer labeled training examples than in CIFAR-10, but a very large set of unlabeled examples is provided to learn image…

40 runs0 likes0 downloads0 reach0 impact
13000 instances - 27649 features - 10 classes - 0 missing values

APSFailure (1)

This is the dataset used for the 2016 IDA Industrial Challenge, courtesy of Scania. For a full description, see http://archive.ics.uci.edu/ml/datasets/IDA2016Challenge . This dataset contains both the…

9 runs0 likes2 downloads2 reach19 impact
76000 instances - 171 features - 2 classes - 1078695 missing values

christine (1)

SOURCE: [ChaLearn Automatic Machine Learning Challenge (AutoML)](https://competitions.codalab.org/competitions/2321), [ChaLearn](https://automl.chalearn.org/data) This is a "supervised learning"…

8 runs1 likes2 downloads3 reach20 impact
5418 instances - 1637 features - 2 classes - 0 missing values

jasmine (1)

SOURCE: [ChaLearn Automatic Machine Learning Challenge (AutoML)](https://competitions.codalab.org/competitions/2321), [ChaLearn](https://automl.chalearn.org/data) This is a "supervised learning"…

8 runs0 likes2 downloads2 reach19 impact
2984 instances - 145 features - 2 classes - 0 missing values

madeline (1)

The goal of this challenge is to expose the research community to real world datasets of interest to 4Paradigm. All datasets are formatted in a uniform way, though the type of data might differ. The…

1 runs0 likes0 downloads0 reach18 impact
3140 instances - 260 features - 2 classes - 0 missing values

philippine (1)

The goal of this challenge is to expose the research community to real world datasets of interest to 4Paradigm. All datasets are formatted in a uniform way, though the type of data might differ. The…

1 runs1 likes3 downloads4 reach18 impact
5832 instances - 309 features - 2 classes - 0 missing values

sylvine (1)

SOURCE: [ChaLearn Automatic Machine Learning Challenge (AutoML)](https://competitions.codalab.org/competitions/2321), [ChaLearn](https://automl.chalearn.org/data) This is a "supervised learning"…

8 runs0 likes2 downloads2 reach20 impact
5124 instances - 21 features - 2 classes - 0 missing values

albert (1)

The goal of this challenge is to expose the research community to real world datasets of interest to 4Paradigm. All datasets are formatted in a uniform way, though the type of data might differ. The…

0 runs0 likes1 downloads1 reach18 impact
425240 instances - 79 features - 2 classes - 2734000 missing values

MiniBooNE (1)

Byron Roe (byronroe '@' umich.edu) Department of Physics University of Michigan Ann Arbor, MI 48109 This dataset is taken from the MiniBooNE experiment and is used to distinguish electron neutrinos…

12 runs0 likes4 downloads4 reach14 impact
130064 instances - 51 features - 2 classes - 0 missing values

ada (1)

The goal of this challenge is to expose the research community to real world datasets of interest to 4Paradigm. All datasets are formatted in a uniform way, though the type of data might differ. The…

1 runs0 likes1 downloads1 reach18 impact
4147 instances - 49 features - 2 classes - 0 missing values

arcene (2)

The goal of this challenge is to expose the research community to real world datasets of interest to 4Paradigm. All datasets are formatted in a uniform way, though the type of data might differ. The…

0 runs0 likes1 downloads1 reach17 impact
100 instances - 10001 features - 2 classes - 0 missing values

gina (1)

The goal of this challenge is to expose the research community to real world datasets of interest to 4Paradigm. All datasets are formatted in a uniform way, though the type of data might differ. The…

0 runs0 likes0 downloads0 reach19 impact
3153 instances - 971 features - 2 classes - 0 missing values

guillermo (1)

The goal of this challenge is to expose the research community to real world datasets of interest to 4Paradigm. All datasets are formatted in a uniform way, though the type of data might differ. The…

13 runs1 likes1 downloads2 reach21 impact
20000 instances - 4297 features - 2 classes - 0 missing values

rl (1)

The goal of this challenge is to expose the research community to real world datasets of interest to 4Paradigm. All datasets are formatted in a uniform way, though the type of data might differ. The…

1 runs0 likes0 downloads0 reach15 impact
31406 instances - 23 features - 2 classes - 29756 missing values

riccardo (1)

The goal of this challenge is to expose the research community to real world datasets of interest to 4Paradigm. All datasets are formatted in a uniform way, though the type of data might differ. The…

14 runs0 likes1 downloads1 reach21 impact
20000 instances - 4297 features - 2 classes - 0 missing values

kick (1)

One of the biggest challenges of an auto dealership purchasing a used car at an auto auction is the risk of that the vehicle might have serious issues that prevent it from being sold to customers. The…

3 runs0 likes3 downloads3 reach14 impact
72983 instances - 33 features - 2 classes - 149271 missing values

dilbert (1)

SOURCE: [ChaLearn Automatic Machine Learning Challenge (AutoML)](https://competitions.codalab.org/competitions/2321), [ChaLearn](https://automl.chalearn.org/data) This is a "supervised learning"…

8 runs0 likes2 downloads2 reach19 impact
10000 instances - 2001 features - 5 classes - 0 missing values

fabert (1)

SOURCE: [ChaLearn Automatic Machine Learning Challenge (AutoML)](https://competitions.codalab.org/competitions/2321), [ChaLearn](https://automl.chalearn.org/data) This is a "supervised learning"…

7 runs0 likes1 downloads1 reach19 impact
8237 instances - 801 features - 7 classes - 0 missing values

robert (1)

The goal of this challenge is to expose the research community to real world datasets of interest to 4Paradigm. All datasets are formatted in a uniform way, though the type of data might differ. The…

15 runs0 likes1 downloads1 reach21 impact
10000 instances - 7201 features - 10 classes - 0 missing values

volkert (1)

SOURCE: [ChaLearn Automatic Machine Learning Challenge (AutoML)](https://competitions.codalab.org/competitions/2321), [ChaLearn](https://automl.chalearn.org/data) This is a "supervised learning"…

15 runs0 likes1 downloads1 reach20 impact
58310 instances - 181 features - 10 classes - 0 missing values

dionis (1)

SOURCE: [ChaLearn Automatic Machine Learning Challenge (AutoML)](https://competitions.codalab.org/competitions/2321), [ChaLearn](https://automl.chalearn.org/data) This is a "supervised learning"…

0 runs0 likes2 downloads2 reach18 impact
416188 instances - 61 features - 355 classes - 0 missing values

jannis (1)

SOURCE: [ChaLearn Automatic Machine Learning Challenge (AutoML)](https://competitions.codalab.org/competitions/2321), [ChaLearn](https://automl.chalearn.org/data) This is a "supervised learning"…

14 runs0 likes1 downloads1 reach20 impact
83733 instances - 55 features - 4 classes - 0 missing values

helena (1)

This is a "supervised learning" challenge in machine learning. We are making available 30 datasets, all pre-formatted in given feature representations (this means that each example consists of a fixed…

12 runs0 likes3 downloads3 reach20 impact
65196 instances - 28 features - 100 classes - 0 missing values

TUPRASBoilerData (1)

Sensor data measurements of one Boiler, containing WaterInput/SteamOutput (flow, temperature, pressure) for one month, which is measured every minute.

0 runs0 likes0 downloads0 reach0 impact
44643 instances - 8 features - classes - 44643 missing values

mauna-loa-atmospheric-co2 (1)

These weekly averages are ultimately based on measurements of 4 air samples per hour taken atop intake lines on several towers during steady periods of CO2 concentration of not less than 6 hours per…

0 runs0 likes0 downloads0 reach0 impact
2225 instances - 7 features - 0 classes - 0 missing values

cwurData (1)

Of all the universities in the world, which are the best? Ranking universities is a difficult, political, and controversial practice. There are hundreds of different national and international…

0 runs0 likes0 downloads0 reach0 impact
1029 instances - 14 features - classes - 200 missing values

ozone (1)

Los Angeles ozone pollution data, 1976

0 runs0 likes0 downloads0 reach0 impact

Students (1)

Students

0 runs0 likes0 downloads0 reach0 impact
5820 instances - 33 features - classes - 0 missing values

freMTPL2freq (1)

The dataset freMTPL2freq contains risk features for 677,991 motor third-part liability policies (observed mostly on one year). See https://github.com/dutangc/CASdatasets for more details. The dataset…

0 runs1 likes3 downloads4 reach9 impact
678013 instances - 12 features - classes - 0 missing values

freMTPL2sev (1)

The dataset freMTPL2sev contains claim amounts for 26,639 motor third-part liability policies.

0 runs0 likes0 downloads0 reach0 impact
26639 instances - 2 features - classes - 0 missing values

Klaverjas2018 (1)

Klaverjas is an example of the Jack-Nine card games, which are characterized as trick-taking games where the the Jack and nine of the trump suit are the highest-ranking trumps, and the tens and aces…

0 runs0 likes0 downloads0 reach0 impact
981541 instances - 33 features - 2 classes - 0 missing values

Titanic (4)

The goal is to predict the Fare. Variable description: pclass: A proxy for socio-economic status (SES) 1st = Upper 2nd = Middle 3rd = Lower age: Age is fractional if less than 1. If the age is…

0 runs0 likes4 downloads4 reach11 impact
1307 instances - 8 features - 0 classes - 0 missing values

wine-reviews (1)

130k wine reviews with variety, location, winery, price, and description. Downloaded from Kaggle [https://www.kaggle.com/zynicide/wine-reviews/home] on 29.10.2018. The original data was scraped from…

0 runs0 likes1 downloads1 reach9 impact
129971 instances - 13 features - 0 classes - 204752 missing values

DiabeticMellitus (1)

This data was collected from combine primary and secondary sources, through questionnaire, verbal interview and some part of the hospital’s record department’s data, from the selected…

0 runs0 likes0 downloads0 reach0 impact
281 instances - 98 features - 2 classes - 2 missing values

Click_prediction_small (8)

This is the same data as version 5 (OpenML ID = 1220) with '_id' features coded as nominal factor variables.

0 runs0 likes0 downloads0 reach0 impact
39948 instances - 12 features - 2 classes - 0 missing values

okcupid-stem (2)

User profile data for San Francisco OkCupid users published in [Kim, A. Y., & Escobedo-Land, A. (2015). OKCupid data for introductory statistics and data science courses. Journal of Statistics…

0 runs0 likes0 downloads0 reach0 impact
50789 instances - 20 features - 3 classes - 154107 missing values

sarcasm_detection (1)

It has 3 attributes (ID, tweet, label ) 91299 tweets with non-sarcastic 39998 tweets and 51300 sarcastic tweets.

0 runs0 likes0 downloads0 reach0 impact
91298 instances - 2 features - 0 classes - 0 missing values

birds (4)

Multi-label dataset. The birds dataset consists of 327 audio recordings of 12 different vocalizing bird species. Each sound can be assigned to various bird species.

0 runs0 likes0 downloads0 reach0 impact
645 instances - 279 features - classes - 0 missing values

emotions (4)

Multi-label dataset. Audio dataset (emotions) consists of 593 musical files with 6 clustered emotional labels and 72 predictors. Each song can be labeled with one or more of the labels…

0 runs0 likes0 downloads0 reach0 impact
593 instances - 78 features - classes - 0 missing values

enron (3)

Multi-label dataset. The UC Berkeley enron4 dataset represents a subset of the original enron5 dataset and consists of 1684 cases of emails with 21 labels and 1001 predictor variables.

0 runs0 likes0 downloads0 reach0 impact
1702 instances - 1054 features - classes - 0 missing values

genbase (3)

Multi-label dataset. The genbase dataset contains protein sequences that can be assigned to several classes of protein families.

0 runs0 likes0 downloads0 reach0 impact
662 instances - 1212 features - classes - 0 missing values

image (3)

Multi-label dataset. The image benchmark dataset consists of 2000 natural scene images. Zhou and Zhang (2007) extracted 135 features for each image and made it publicly available as processed image…

0 runs0 likes0 downloads0 reach0 impact
2000 instances - 140 features - classes - 0 missing values

langLog (2)

The langLog dataset includes 1004 textual predictors and was originally compiled in the doctorial thesis of Read (2010). It consists of 956 text samples that can be assigned to one or more topics such…

0 runs0 likes0 downloads0 reach0 impact
1460 instances - 1079 features - classes - 0 missing values

reuters (3)

Multi-label dataset. A subset of the reuters dataset includes 2000 observations for text classification.

0 runs0 likes0 downloads0 reach0 impact
2000 instances - 250 features - classes - 0 missing values

Sign in

Filter results by: