OpenML
Joaquin Vanschoren
Search these datasets in more detail

Joaquin's datasets

test
0 runs0 likes0 downloads0 reach0 impact
578 instances - 4 features - classes - 0 missing values
test
0 runs0 likes0 downloads0 reach0 impact
189 instances - 17 features - classes - 0 missing values
Bike sharing systems are new generation of traditional bike rentals where whole process from membership, rental and return back has become automatic. Through these systems, user is able to easily rent…
0 runs0 likes0 downloads0 reach0 impact
17379 instances - 13 features - 0 classes - 0 missing values
Bike sharing systems are new generation of traditional bike rentals where whole process from membership, rental and return back has become automatic. Through these systems, user is able to easily rent…
0 runs0 likes0 downloads0 reach0 impact
17379 instances - 13 features - 0 classes - 0 missing values
BitcoinHeist Ransomware Dataset Akcora, C.G., Li, Y., Gel, Y.R. and Kantarcioglu, M., 2019. BitcoinHeist. Topological Data Analysis for Ransomware Detection on the Bitcoin Blockchain. IJCAI-PRICAI…
0 runs0 likes0 downloads0 reach0 impact
2916697 instances - 10 features - 29 classes - 0 missing values
50% stratified subsample of the original SVHN data
0 runs0 likes0 downloads0 reach0 impact
49644 instances - 3073 features - 10 classes - 0 missing values
10% stratified subsample of the original SVHN data
0 runs0 likes0 downloads0 reach0 impact
9927 instances - 3073 features - 10 classes - 0 missing values
The weather problem is a tiny dataset that we will use repeatedly to illustrate machine learning methods. Entirely fictitious, it supposedly concerns the conditions that are suitable for playing some…
0 runs0 likes0 downloads0 reach0 impact
14 instances - 5 features - 2 classes - 0 missing values
The weather problem is a tiny dataset that we will use repeatedly to illustrate machine learning methods. Entirely fictitious, it supposedly concerns the conditions that are suitable for playing some…
0 runs0 likes0 downloads0 reach0 impact
14 instances - 5 features - 2 classes - 0 missing values
t
0 runs0 likes0 downloads0 reach0 impact
.. _diabetes_dataset: Diabetes dataset ---------------- Ten baseline variables, age, sex, body mass index, average blood pressure, and six blood serum measurements were obtained for each of n = 442…
0 runs0 likes0 downloads0 reach0 impact
442 instances - 11 features - 0 classes - 0 missing values
.. _diabetes_dataset: Diabetes dataset ---------------- Ten baseline variables, age, sex, body mass index, average blood pressure, and six blood serum measurements were obtained for each of n = 442…
0 runs0 likes0 downloads0 reach0 impact
442 instances - 11 features - 0 classes - 0 missing values
.. _diabetes_dataset: Diabetes dataset ---------------- Ten baseline variables, age, sex, body mass index, average blood pressure, and six blood serum measurements were obtained for each of n = 442…
0 runs0 likes0 downloads0 reach0 impact
442 instances - 11 features - 0 classes - 0 missing values
.. _diabetes_dataset: Diabetes dataset ---------------- Ten baseline variables, age, sex, body mass index, average blood pressure, and six blood serum measurements were obtained for each of n = 442…
0 runs0 likes0 downloads0 reach0 impact
442 instances - 11 features - 0 classes - 0 missing values
.. _diabetes_dataset: Diabetes dataset ---------------- Ten baseline variables, age, sex, body mass index, average blood pressure, and six blood serum measurements were obtained for each of n = 442…
0 runs0 likes0 downloads0 reach0 impact
442 instances - 11 features - 0 classes - 0 missing values
.. _diabetes_dataset: Diabetes dataset ---------------- Ten baseline variables, age, sex, body mass index, average blood pressure, and six blood serum measurements were obtained for each of n = 442…
0 runs0 likes0 downloads0 reach0 impact
442 instances - 11 features - 0 classes - 0 missing values
Test dataset
0 runs0 likes0 downloads0 reach0 impact
Test dataset
0 runs0 likes0 downloads0 reach0 impact
GISETTE is a handwritten digit recognition problem. The problem is to separate the highly confusable digits '4' and '9'. This dataset is one of five datasets of the NIPS 2003 feature selection…
466 runs0 likes0 downloads0 reach0 impact
7000 instances - 5001 features - 2 classes - 0 missing values
The database covers all the international short track games in the last 5 years. Currently it contains only men's 500m. Detailed lap data including personal time and ranking in each game from seasons…
0 runs0 likes0 downloads0 reach0 impact
In the early 2000s, Billy Beane and Paul DePodesta worked for the Oakland Athletics. While there, they literally changed the game of baseball. They didn't do it using a bat or glove, and they…
3 runs0 likes8 downloads8 reach14 impact
1232 instances - 15 features - 0 classes - 3600 missing values
Author: Gregory Gay, Tim Menzies, Misty Davies, Karen Gundy-Burlet Source: [Zenodo](https://zenodo.org/record/322475) Please cite: Misty Davies. (2009). bike [Data set]. Zenodo. DOI:…
0 runs0 likes0 downloads0 reach0 impact
4435 instances - 11 features - classes - 0 missing values
Tamilnadu Electricity Board Hourly Readings dataset. Real-time readings were collected from residential, commercial, industrial and agriculture to find the accuracy consumption in Tamil Nadu, around…
0 runs0 likes0 downloads0 reach0 impact
45781 instances - 4 features - 20 classes - 0 missing values
Data used in an analysis of the Brown and Frown corpora for my doctoral dissertation titled ``Variations in Written English: Characterizing Authors' Rhetorical Language Choices Across Corpora of…
2048 runs0 likes1 downloads1 reach12 impact
1000 instances - 24 features - 30 classes - 0 missing values
Expression levels of 77 proteins measured in the cerebral cortex of 8 classes of control and Down syndrome mice exposed to context fear conditioning, a task used to assess associative learning. The…
9545 runs0 likes0 downloads0 reach21 impact
1080 instances - 82 features - 8 classes - 1396 missing values
Expression levels of 77 proteins measured in the cerebral cortex of 8 classes of control and Down syndrome mice exposed to context fear conditioning, a task used to assess associative learning. The…
0 runs0 likes0 downloads0 reach0 impact
1080 instances - 82 features - 8 classes - 1396 missing values
TBA
0 runs0 likes0 downloads0 reach0 impact
1080 instances - 82 features - 8 classes - 1396 missing values
These are atlantic-mediterranean marine sponges that belong to O.Hadromerida (Demospongiae.Porifera). ### Attribute Information 27 attributes are non-numeric and nominal. 15 attributes are boolean and…
0 runs0 likes0 downloads0 reach0 impact
76 instances - 46 features - classes - 22 missing values
The original Titanic dataset, describing the survival status of individual passengers on the Titanic. The titanic data does not contain information from the crew, but it does contain actual ages of…
0 runs3 likes45 downloads48 reach12 impact
1309 instances - 14 features - 2 classes - 3855 missing values
0. airplane 1. automobile 2. bird 3. cat 4. deer 5. dog 6. frog 7. horse 8. ship 9. truck CIFAR-10 contains 6000 images per class. The original train-test split randomly divided these into 5000 train…
160 runs0 likes6 downloads6 reach21 impact
60000 instances - 3073 features - 10 classes - 0 missing values
This is a 20,000 instance sample of the original CIFAR-10 dataset. Sampled randomly and stratified, with 2000 examples per class. Training and test set are merged. Find the corresponding task for the…
380 runs0 likes0 downloads0 reach0 impact
20000 instances - 3073 features - 10 classes - 0 missing values
test
0 runs0 likes0 downloads0 reach0 impact
150 instances - 5 features - classes - 0 missing values
Small dataset with time series of RAM prices over the years.
0 runs0 likes0 downloads0 reach0 impact
333 instances - 3 features - 0 classes - 0 missing values
This data was gathered from participants in experimental speed dating events from 2002-2004. During the events, the attendees would have a four-minute "first date" with every other participant of the…
28211 runs19 likes170 downloads189 reach36 impact
8378 instances - 121 features - 2 classes - 18372 missing values
This dataset was retrieved 2014-11-14 from the libSVM site. It was normalized to [-1,1] and converted to the ARFF format. ### Feature information There are 6 numerical and 8 categorical attributes,…
208613 runs0 likes0 downloads0 reach0 impact
690 instances - 15 features - 2 classes - 0 missing values
This version has a problem because the target is encoded as numeric while it is categorical. Use version 3 instead.
0 runs0 likes0 downloads0 reach0 impact
690 instances - 15 features - 0 classes - 0 missing values
The data is cleaned, regularized and encrypted global equity data. The first 21 columns (feature1 - feature21) are features, and target is the binary class you’re trying to predict.
3039 runs2 likes5 downloads7 reach17 impact
96320 instances - 22 features - 2 classes - 0 missing values
### Attribute Information * The first column is the class label (1 for signal, 0 for background) * 21 low-level features (kinematic properties): lepton pT, lepton eta, lepton phi, missing energy…
14393 runs0 likes0 downloads0 reach0 impact
98050 instances - 29 features - 2 classes - 9 missing values
Data on tree growth used in the Case Study published in the September, 1995 issue of the Canadian Journal of Statistics. This data set was been provided by Dr. Fernando Camacho, Ontario Hydro…
18757 runs0 likes0 downloads0 reach0 impact
2796 instances - 35 features - 6 classes - 68100 missing values
### Description Cylinder bands UCI dataset - Process delays known as cylinder banding in rotogravure printing were substantially mitigated using control rules discovered by decision tree induction.…
21778 runs0 likes0 downloads0 reach0 impact
540 instances - 40 features - 2 classes - 999 missing values
The experiments were carried out with a group of 30 volunteers within an age bracket of 19-48 years. They performed a protocol of activities composed of six basic activities: three static postures…
83 runs0 likes9 downloads9 reach12 impact
180 instances - 68 features - 6 classes - 0 missing values
### Description The data consists of real historical data collected from 2010 & 2011. Employees are manually allowed or denied access to resources over time. The data is used to create an algorithm…
35721 runs0 likes23 downloads23 reach32 impact
32769 instances - 10 features - 2 classes - 0 missing values
Predict a biological response of molecules from their chemical properties. Each row in this data set represents a molecule. The first column contains experimental data describing an actual biological…
48680 runs2 likes40 downloads42 reach37 impact
3751 instances - 1777 features - 2 classes - 0 missing values
This data contains general demographic information on internet users in 1997. Original Owner [1]Graphics, Visualization, & Usability Center College of Computing Geogia Institute of Technology…
0 runs0 likes0 downloads0 reach0 impact
10108 instances - 72 features - classes - 2699 missing values
This is the original version of the famous covertype dataset in ARFF format. Predicting forest cover type from cartographic variables only (no remotely sensed data). The actual forest cover type for a…
9 runs1 likes14 downloads15 reach25 impact
581012 instances - 55 features - 7 classes - 0 missing values
Prediction task is to determine whether a person makes over 50K a year. Extraction was done by Barry Becker from the 1994 Census database. A set of reasonably clean records was extracted using the…
14637 runs4 likes34 downloads38 reach40 impact
48842 instances - 15 features - 2 classes - 6465 missing values
This data is derived from the 2012 KDD Cup. The data is subsampled to 0.1% of the original number of instances, downsampling the majority class (click=0) so that the target feature is reasonably…
63421 runs0 likes0 downloads0 reach0 impact
39948 instances - 10 features - 2 classes - 0 missing values
This data is derived from the 2012 KDD Cup. The data is subsampled to 1% of the original number of instances, downsampling the majority class (click=0) so that the target feature is reasonably…
313 runs0 likes0 downloads0 reach0 impact
399482 instances - 12 features - 2 classes - 0 missing values
Balanced version of click prediction data
36 runs0 likes15 downloads15 reach13 impact
1997410 instances - 12 features - 2 classes - 0 missing values
Even smaller sample of version 1
0 runs0 likes0 downloads0 reach0 impact
149639 instances - 12 features - 2 classes - 0 missing values
Data on predicting clicks on ads in a search engine.
0 runs0 likes0 downloads0 reach0 impact
1496391 instances - 10 features - 2 classes - 0 missing values
### Description This dataset represents a set of possible advertisements on Internet pages ### Sources (a) Creator and donor: Nicholas Kushmerick - nick@ucd.ie ### Dataset Information The features…
98414 runs0 likes0 downloads0 reach0 impact
Airlines Dataset Inspired in the regression dataset from Elena Ikonomovska. The task is to predict whether a given flight will be delayed, given the information of the scheduled departure.
291 runs0 likes31 downloads31 reach17 impact
539383 instances - 8 features - 2 classes - 0 missing values
We consider the following problem: You are running a cloud computing service, where customers contract to run computing services (tasks). Each task has a duration, an earliest start and latest end,…
0 runs0 likes0 downloads0 reach0 impact
No data.
697 runs0 likes0 downloads0 reach0 impact
320 instances - 9 features - 2 classes - 0 missing values
GEMLeR provides a collection of gene expression datasets that can be used for benchmarking gene expression oriented machine learning algorithms. They can be used for estimation of different quality…
2862 runs0 likes0 downloads0 reach0 impact
1545 instances - 10936 features - 2 classes - 0 missing values
GEMLeR provides a collection of gene expression datasets that can be used for benchmarking gene expression oriented machine learning algorithms. They can be used for estimation of different quality…
2855 runs0 likes0 downloads0 reach0 impact
542 instances - 10936 features - 2 classes - 0 missing values
GEMLeR provides a collection of gene expression datasets that can be used for benchmarking gene expression oriented machine learning algorithms. They can be used for estimation of different quality…
65 runs0 likes0 downloads0 reach0 impact
185 instances - 10936 features - 2 classes - 0 missing values
GEMLeR provides a collection of gene expression datasets that can be used for benchmarking gene expression oriented machine learning algorithms. They can be used for estimation of different quality…
66 runs0 likes0 downloads0 reach0 impact
386 instances - 10936 features - 2 classes - 0 missing values
GEMLeR provides a collection of gene expression datasets that can be used for benchmarking gene expression oriented machine learning algorithms. They can be used for estimation of different quality…
79 runs0 likes0 downloads0 reach0 impact
322 instances - 10936 features - 2 classes - 0 missing values
GEMLeR provides a collection of gene expression datasets that can be used for benchmarking gene expression oriented machine learning algorithms. They can be used for estimation of different quality…
2834 runs0 likes0 downloads0 reach0 impact
1545 instances - 10936 features - 2 classes - 0 missing values
GEMLeR provides a collection of gene expression datasets that can be used for benchmarking gene expression oriented machine learning algorithms. They can be used for estimation of different quality…
65 runs0 likes0 downloads0 reach0 impact
410 instances - 10936 features - 2 classes - 0 missing values
GEMLeR provides a collection of gene expression datasets that can be used for benchmarking gene expression oriented machine learning algorithms. They can be used for estimation of different quality…
66 runs0 likes0 downloads0 reach0 impact
259 instances - 10936 features - 2 classes - 0 missing values
GEMLeR provides a collection of gene expression datasets that can be used for benchmarking gene expression oriented machine learning algorithms. They can be used for estimation of different quality…
2860 runs0 likes0 downloads0 reach0 impact
604 instances - 10936 features - 2 classes - 0 missing values
GEMLeR provides a collection of gene expression datasets that can be used for benchmarking gene expression oriented machine learning algorithms. They can be used for estimation of different quality…
65 runs0 likes0 downloads0 reach0 impact
321 instances - 10936 features - 2 classes - 0 missing values
GEMLeR provides a collection of gene expression datasets that can be used for benchmarking gene expression oriented machine learning algorithms. They can be used for estimation of different quality…
77 runs0 likes0 downloads0 reach0 impact
275 instances - 10936 features - 2 classes - 0 missing values
GEMLeR provides a collection of gene expression datasets that can be used for benchmarking gene expression oriented machine learning algorithms. They can be used for estimation of different quality…
66 runs0 likes0 downloads0 reach0 impact
195 instances - 10936 features - 2 classes - 0 missing values
GEMLeR provides a collection of gene expression datasets that can be used for benchmarking gene expression oriented machine learning algorithms. They can be used for estimation of different quality…
76 runs0 likes0 downloads0 reach0 impact
187 instances - 10936 features - 2 classes - 0 missing values
GEMLeR provides a collection of gene expression datasets that can be used for benchmarking gene expression oriented machine learning algorithms. They can be used for estimation of different quality…
77 runs0 likes0 downloads0 reach0 impact
484 instances - 10936 features - 2 classes - 0 missing values
GEMLeR provides a collection of gene expression datasets that can be used for benchmarking gene expression oriented machine learning algorithms. They can be used for estimation of different quality…
65 runs0 likes0 downloads0 reach0 impact
267 instances - 10936 features - 2 classes - 0 missing values
GEMLeR provides a collection of gene expression datasets that can be used for benchmarking gene expression oriented machine learning algorithms. They can be used for estimation of different quality…
65 runs0 likes0 downloads0 reach0 impact
138 instances - 10936 features - 2 classes - 0 missing values
GEMLeR provides a collection of gene expression datasets that can be used for benchmarking gene expression oriented machine learning algorithms. They can be used for estimation of different quality…
65 runs0 likes0 downloads0 reach0 impact
470 instances - 10936 features - 2 classes - 0 missing values
GEMLeR provides a collection of gene expression datasets that can be used for benchmarking gene expression oriented machine learning algorithms. They can be used for estimation of different quality…
67 runs0 likes0 downloads0 reach0 impact
458 instances - 10936 features - 2 classes - 0 missing values
GEMLeR provides a collection of gene expression datasets that can be used for benchmarking gene expression oriented machine learning algorithms. They can be used for estimation of different quality…
65 runs0 likes0 downloads0 reach0 impact
468 instances - 10937 features - 2 classes - 0 missing values
GEMLeR provides a collection of gene expression datasets that can be used for benchmarking gene expression oriented machine learning algorithms. They can be used for estimation of different quality…
77 runs0 likes0 downloads0 reach0 impact
337 instances - 10936 features - 2 classes - 0 missing values
GEMLeR provides a collection of gene expression datasets that can be used for benchmarking gene expression oriented machine learning algorithms. They can be used for estimation of different quality…
59 runs0 likes0 downloads0 reach0 impact
1545 instances - 10937 features - 2 classes - 0 missing values
GEMLeR provides a collection of gene expression datasets that can be used for benchmarking gene expression oriented machine learning algorithms. They can be used for estimation of different quality…
2841 runs0 likes0 downloads0 reach0 impact
630 instances - 10936 features - 2 classes - 0 missing values
GEMLeR provides a collection of gene expression datasets that can be used for benchmarking gene expression oriented machine learning algorithms. They can be used for estimation of different quality…
77 runs0 likes0 downloads0 reach0 impact
329 instances - 10936 features - 2 classes - 0 missing values
GEMLeR provides a collection of gene expression datasets that can be used for benchmarking gene expression oriented machine learning algorithms. They can be used for estimation of different quality…
78 runs0 likes0 downloads0 reach0 impact
363 instances - 10936 features - 2 classes - 0 missing values
GEMLeR provides a collection of gene expression datasets that can be used for benchmarking gene expression oriented machine learning algorithms. They can be used for estimation of different quality…
72 runs0 likes0 downloads0 reach0 impact
1545 instances - 10936 features - 2 classes - 0 missing values
GEMLeR provides a collection of gene expression datasets that can be used for benchmarking gene expression oriented machine learning algorithms. They can be used for estimation of different quality…
78 runs0 likes0 downloads0 reach0 impact
130 instances - 10936 features - 2 classes - 0 missing values
GEMLeR provides a collection of gene expression datasets that can be used for benchmarking gene expression oriented machine learning algorithms. They can be used for estimation of different quality…
65 runs0 likes0 downloads0 reach0 impact
324 instances - 10937 features - 2 classes - 0 missing values
GEMLeR provides a collection of gene expression datasets that can be used for benchmarking gene expression oriented machine learning algorithms. They can be used for estimation of different quality…
2849 runs0 likes0 downloads0 reach0 impact
1545 instances - 10936 features - 2 classes - 0 missing values
GEMLeR provides a collection of gene expression datasets that can be used for benchmarking gene expression oriented machine learning algorithms. They can be used for estimation of different quality…
2862 runs0 likes0 downloads0 reach0 impact
1545 instances - 10936 features - 2 classes - 0 missing values
GEMLeR provides a collection of gene expression datasets that can be used for benchmarking gene expression oriented machine learning algorithms. They can be used for estimation of different quality…
2866 runs0 likes0 downloads0 reach0 impact
546 instances - 10936 features - 2 classes - 0 missing values
GEMLeR provides a collection of gene expression datasets that can be used for benchmarking gene expression oriented machine learning algorithms. They can be used for estimation of different quality…
77 runs0 likes0 downloads0 reach0 impact
250 instances - 10936 features - 2 classes - 0 missing values
GEMLeR provides a collection of gene expression datasets that can be used for benchmarking gene expression oriented machine learning algorithms. They can be used for estimation of different quality…
77 runs0 likes0 downloads0 reach0 impact
355 instances - 10936 features - 2 classes - 0 missing values
GEMLeR provides a collection of gene expression datasets that can be used for benchmarking gene expression oriented machine learning algorithms. They can be used for estimation of different quality…
2855 runs0 likes0 downloads0 reach0 impact
1545 instances - 10936 features - 2 classes - 0 missing values
GEMLeR provides a collection of gene expression datasets that can be used for benchmarking gene expression oriented machine learning algorithms. They can be used for estimation of different quality…
65 runs0 likes0 downloads0 reach0 impact
347 instances - 10936 features - 2 classes - 0 missing values
GEMLeR provides a collection of gene expression datasets that can be used for benchmarking gene expression oriented machine learning algorithms. They can be used for estimation of different quality…
77 runs0 likes0 downloads0 reach0 impact
203 instances - 10936 features - 2 classes - 0 missing values
GEMLeR provides a collection of gene expression datasets that can be used for benchmarking gene expression oriented machine learning algorithms. They can be used for estimation of different quality…
77 runs0 likes0 downloads0 reach0 impact
193 instances - 10936 features - 2 classes - 0 missing values
GEMLeR provides a collection of gene expression datasets that can be used for benchmarking gene expression oriented machine learning algorithms. They can be used for estimation of different quality…
2853 runs0 likes0 downloads0 reach0 impact
1545 instances - 10936 features - 2 classes - 0 missing values
GEMLeR provides a collection of gene expression datasets that can be used for benchmarking gene expression oriented machine learning algorithms. They can be used for estimation of different quality…
77 runs0 likes0 downloads0 reach0 impact
384 instances - 10936 features - 2 classes - 0 missing values
GEMLeR provides a collection of gene expression datasets that can be used for benchmarking gene expression oriented machine learning algorithms. They can be used for estimation of different quality…
2865 runs0 likes0 downloads0 reach0 impact
1545 instances - 10936 features - 2 classes - 0 missing values
GEMLeR provides a collection of gene expression datasets that can be used for benchmarking gene expression oriented machine learning algorithms. They can be used for estimation of different quality…
78 runs0 likes0 downloads0 reach0 impact
421 instances - 10936 features - 2 classes - 0 missing values
GEMLeR provides a collection of gene expression datasets that can be used for benchmarking gene expression oriented machine learning algorithms. They can be used for estimation of different quality…
65 runs0 likes0 downloads0 reach0 impact
412 instances - 10936 features - 2 classes - 0 missing values
GEMLeR provides a collection of gene expression datasets that can be used for benchmarking gene expression oriented machine learning algorithms. They can be used for estimation of different quality…
65 runs0 likes0 downloads0 reach0 impact
146 instances - 10936 features - 2 classes - 0 missing values
GEMLeR provides a collection of gene expression datasets that can be used for benchmarking gene expression oriented machine learning algorithms. They can be used for estimation of different quality…
77 runs0 likes0 downloads0 reach0 impact
201 instances - 10936 features - 2 classes - 0 missing values
GEMLeR provides a collection of gene expression datasets that can be used for benchmarking gene expression oriented machine learning algorithms. They can be used for estimation of different quality…
82 runs0 likes0 downloads0 reach0 impact
405 instances - 10936 features - 2 classes - 0 missing values
GEMLeR provides a collection of gene expression datasets that can be used for benchmarking gene expression oriented machine learning algorithms. They can be used for estimation of different quality…
77 runs0 likes0 downloads0 reach0 impact
413 instances - 10936 features - 2 classes - 0 missing values