People
Pieter Gijsbers
Search these datasets in more detail

Pieter's datasets

TALLO - a global tree allometry and crown architecture database. This is the Tallo dataset described in Jucker et al. (2022) but recreated with Python scripts from Laurens Bliek. The scripts can be…
0 runs0 likes0 downloads0 reach0 impact
307014 instances - 21 features - 0 classes - 0 missing values
This is the full version of the KDD Cup 2009 dataset Customer Relationship Management (CRM) is a key element of modern marketing strategies. The KDD Cup 2009 offers the opportunity to work on large…
0 runs0 likes0 downloads0 reach1 impact
50000 instances - 14892 features - 2 classes - 19658569 missing values
source: http://plato.asu.edu/ftp/solvable.html authors: Rolf-David Bergdoll PAR10 performances of modern solvers on the solvable instances of MIPLIB2010. http://miplib.zib.de/ The algorithm runtime…
0 runs0 likes0 downloads0 reach0 impact
1090 instances - 145 features - 0 classes - 0 missing values
Incident reports from the San Franciso Police Department between January 2003 and May 2018, provided by the City and County of San Francisco. The dataset was downloaded on 05.11.2018. from…
0 runs1 likes0 downloads1 reach1 impact
2215023 instances - 9 features - 2 classes - 0 missing values
Date converted to year/mo/day numerics.This dataset contains house sale prices for King County, which includes Seattle. It includes homes sold between May 2014 and May 2015. It contains 19 house…
0 runs0 likes0 downloads0 reach0 impact
21613 instances - 22 features - 0 classes - 0 missing values
Ignores community name.**Author**: Title: Communities and Crime Abstract: Communities within the United States. The data combines socio-economic data from the 1990 US Census, law enforcement data from…
0 runs0 likes0 downloads0 reach0 impact
1994 instances - 127 features - 0 classes - 39202 missing values
String datetime information extracted to numeric columns.Trip Record Data provided by the New York City Taxi and Limousine Commission (TLC)…
0 runs0 likes0 downloads0 reach0 impact
581835 instances - 19 features - 0 classes - 0 missing values
Modified version for the automl benchmark. Regroups information for about 7800 different US colleges. Including geographical information, stats about the population attending and post graduation…
0 runs0 likes0 downloads0 reach0 impact
7063 instances - 45 features - 0 classes - 104249 missing values
Make target (age) numeric**Author**: 1. Title of Database: Abalone data 2. Sources: (a) Original owners of database: Marine Resources Division Marine Research Laboratories - Taroona Department of…
1 runs0 likes0 downloads0 reach0 impact
4177 instances - 9 features - 0 classes - 0 missing values
Version with corrected feature types. 'PrivacySuppressed' are converted to None. Ignores hard string data. Regroups information for about 7800 different US colleges. Including geographical…
0 runs0 likes0 downloads0 reach0 impact
Version with url set as row id, creator data missing due to bad formatting.**Author**: Kelwin Fernandes (INESC TEC, Universidade doPorto), Pedro Vinagre (ALGORITMI Research Centre, Universidade do…
0 runs0 likes0 downloads0 reach0 impact
39644 instances - 60 features - 0 classes - 0 missing values
Version with corrected feature types. 'PrivacySuppressed' are converted to None. Regroups information for about 7800 different US colleges. Including geographical information, stats about the…
0 runs0 likes0 downloads0 reach0 impact
7063 instances - 47 features - 0 classes - 104305 missing values
% Title: Flora % Source: https://automl.chalearn.org/data % % Dataset from the first ChaLearn AutoML challenge (2014). % Only the training data is included, as there were no labels for validation and…
0 runs0 likes0 downloads0 reach0 impact
15000 instances - 200001 features - 0 classes - 0 missing values
Experiment data obtained by running random configurations of xgboost through mlr on 118 different classification tasks from openml. Parameter descriptions:…
0 runs0 likes0 downloads0 reach0 impact
2955210 instances - 21 features - classes - 7051006 missing values
Experiment data obtained by running random configurations of ranger through mlr on 119 different classification tasks from openml.
0 runs0 likes0 downloads0 reach0 impact
278863 instances - 16 features - classes - 138965 missing values
Experiment data obtained by running random configurations of rpart through mlr on 115 different classification tasks from openml.
0 runs0 likes0 downloads0 reach0 impact
92067 instances - 12 features - classes - 0 missing values
Experiment data obtained by running random configurations of an SVM through mlr on 106 different classification tasks from openml.
0 runs0 likes0 downloads0 reach0 impact
540576 instances - 15 features - classes - 658962 missing values
Experiment data obtained by running random configurations of glmnet through mlr on 114 different classification tasks from openml.
0 runs0 likes0 downloads0 reach0 impact
104820 instances - 10 features - classes - 0 missing values
Experiment data obtained by running random configurations of the hnsw kNN through mlr on 116 different classification tasks from openml.
0 runs0 likes0 downloads0 reach0 impact
111753 instances - 13 features - classes - 0 missing values
No data.
0 runs0 likes0 downloads0 reach0 impact
1090 instances - 147 features - 0 classes - 0 missing values
Byron Roe (byronroe '@' umich.edu) Department of Physics University of Michigan Ann Arbor, MI 48109 This dataset is taken from the MiniBooNE experiment and is used to distinguish electron neutrinos…
12 runs0 likes4 downloads4 reach14 impact
130064 instances - 51 features - 2 classes - 0 missing values
This is the dataset used for the 2016 IDA Industrial Challenge, courtesy of Scania. For a full description, see http://archive.ics.uci.edu/ml/datasets/IDA2016Challenge . This dataset contains both the…
9 runs0 likes2 downloads2 reach19 impact
76000 instances - 171 features - 2 classes - 1078695 missing values
Duplicate of the diabetes dataset: https://www.openml.org/d/37
31 runs0 likes0 downloads0 reach0 impact
768 instances - 9 features - 2 classes - 0 missing values
parity5-pmlb
32 runs0 likes0 downloads0 reach0 impact
32 instances - 6 features - 2 classes - 0 missing values
dis-pmlb
31 runs0 likes0 downloads0 reach0 impact
3772 instances - 30 features - 2 classes - 0 missing values
Duplicate of credit-approval dataset: https://www.openml.org/d/29
31 runs0 likes0 downloads0 reach0 impact
690 instances - 16 features - 2 classes - 0 missing values
cleveland-nominal-pmlb
31 runs0 likes0 downloads0 reach0 impact
303 instances - 8 features - 5 classes - 0 missing values
cleve-pmlb
32 runs0 likes0 downloads0 reach0 impact
303 instances - 14 features - 2 classes - 0 missing values
analcatdata_happiness-pmlb
31 runs0 likes0 downloads0 reach0 impact
60 instances - 4 features - 3 classes - 0 missing values
allrep-pmlb
31 runs0 likes0 downloads0 reach0 impact
3772 instances - 30 features - 4 classes - 0 missing values
allbp-pmlb
31 runs0 likes0 downloads0 reach0 impact
3772 instances - 30 features - 3 classes - 0 missing values
parity5_plus_5-pmlb
31 runs0 likes0 downloads0 reach22 impact
1124 instances - 11 features - 2 classes - 0 missing values
)), [PMLB](https://github.com/EpistasisLab/penn-ml-benchmarks/tree/master/datasets/classification/tokyo1) This is Performance co-pilot (PCP) data for the Tokyo server at Silicon Graphics International…
37 runs0 likes1 downloads1 reach22 impact
959 instances - 45 features - 2 classes - 0 missing values
PMLB version of the Titanic dataset, which only uses 3 features. See version 1 for the complete version: https://www.openml.org/d/40945
35 runs0 likes2 downloads2 reach23 impact
2201 instances - 4 features - 2 classes - 0 missing values
flare-pmlb
32 runs0 likes0 downloads0 reach0 impact
1066 instances - 11 features - 2 classes - 0 missing values
A dataset relating characteristics of telephony account features and usage and whether or not the customer churned. Originally used in [Discovering Knowledge in Data: An Introduction to Data…
7512 runs2 likes9 downloads11 reach26 impact
5000 instances - 21 features - 2 classes - 0 missing values
cars1-pmlb
31 runs0 likes0 downloads0 reach0 impact
392 instances - 8 features - 3 classes - 0 missing values
Deactivated. Duplicate of https://www.openml.org/d/15
31 runs0 likes0 downloads0 reach0 impact
699 instances - 11 features - 2 classes - 0 missing values
GAMETES_Epistasis_2-Way_20atts_0.4H_EDM-1_1-pmlb
31 runs0 likes0 downloads0 reach0 impact
1600 instances - 21 features - 2 classes - 0 missing values
GAMETES_Epistasis_2-Way_20atts_0.1H_EDM-1_1-pmlb
31 runs0 likes0 downloads0 reach0 impact
1600 instances - 21 features - 2 classes - 0 missing values
GAMETES_Epistasis_2-Way_1000atts_0.4H_EDM-1_EDM-1_1-pmlb
0 runs0 likes0 downloads0 reach0 impact
1600 instances - 1001 features - 2 classes - 0 missing values
Dataset used by Buntine and Niblett (1992). Composed of 10 features, one of which is irrelevant. The target is a disjunctive normal form formula over the nine other attributes, with additional…
31 runs0 likes0 downloads0 reach22 impact
973 instances - 10 features - 2 classes - 0 missing values
wine-quality-red-pmlb
31 runs1 likes7 downloads8 reach23 impact
1599 instances - 12 features - 6 classes - 0 missing values
threeOf9-pmlb
31 runs0 likes0 downloads0 reach0 impact
512 instances - 10 features - 2 classes - 0 missing values
Relevant Information: -- The database contains 3 potential classes, one for the number of times a certain type of solar flare occured in a 24 hour period. -- Each instance represents captured features…
31 runs0 likes0 downloads0 reach0 impact
1066 instances - 13 features - 6 classes - 0 missing values
Relevant Information: -- The database contains 3 potential classes, one for the number of times a certain type of solar flare occured in a 24 hour period. -- Each instance represents captured features…
31 runs0 likes0 downloads0 reach0 impact
315 instances - 13 features - 5 classes - 0 missing values
Source: [UCI](https://archive.ics.uci.edu/ml/datasets/Statlog+(Shuttle)) Donor: Jason Catlett Basser Department of Computer Science, University of Sydney, N.S.W., Australia Data Set Information:…
12 runs0 likes4 downloads4 reach25 impact
58000 instances - 10 features - 7 classes - 0 missing values
postoperative-patient-data-pmlb
26 runs0 likes0 downloads0 reach0 impact
88 instances - 9 features - 2 classes - 0 missing values
new-thyroid-pmlb
31 runs0 likes0 downloads0 reach0 impact
215 instances - 6 features - 3 classes - 0 missing values
mux6-pmlb
31 runs0 likes0 downloads0 reach0 impact
128 instances - 7 features - 2 classes - 0 missing values
The origin is not clear, but presumably this is an artificial problem representing M-of-N rules. The target is 1 if a certain M 'bits' are '1'? (Joaquin Vanschoren)
31 runs0 likes1 downloads1 reach22 impact
1324 instances - 11 features - 2 classes - 0 missing values
magic-pmlb
0 runs0 likes0 downloads0 reach0 impact
19020 instances - 11 features - 2 classes - 0 missing values
led7-pmlb
31 runs0 likes0 downloads0 reach0 impact
3200 instances - 8 features - 10 classes - 0 missing values
led24-pmlb
31 runs0 likes2 downloads2 reach22 impact
3200 instances - 25 features - 10 classes - 0 missing values
hypothyroid-pmlb
31 runs0 likes0 downloads0 reach0 impact
3163 instances - 26 features - 2 classes - 0 missing values
glass2-pmlb
0 runs0 likes0 downloads0 reach0 impact
163 instances - 10 features - 2 classes - 0 missing values
Re-upload of the dataset as it is present in the Penn ML Benchmark (https://github.com/EpistasisLab/penn-ml-benchmarks/tree/master/datasets/classification/fars). It's a dataset on traffic accidents,…
1 runs0 likes4 downloads4 reach23 impact
100968 instances - 30 features - 8 classes - 0 missing values
ecoli-pmlb
31 runs0 likes0 downloads0 reach0 impact
327 instances - 8 features - 5 classes - 0 missing values
Originally from the StatLog project. The raw data is still available on [UCI](https://archive.ics.uci.edu/ml/datasets/Molecular+Biology+(Splice-junction+Gene+Sequences)). The data consists of 3,186…
7063 runs0 likes9 downloads9 reach26 impact
3186 instances - 181 features - 3 classes - 0 missing values
corral-pmlb
31 runs0 likes0 downloads0 reach0 impact
160 instances - 7 features - 2 classes - 0 missing values
This database contains all legal 8-ply positions in the game of connect-4 in which neither player has won yet, and in which the next move is not forced. Attributes represent board positions on a 6x6…
9766 runs0 likes12 downloads12 reach28 impact
67557 instances - 43 features - 3 classes - 0 missing values
Derived from the Musk dataset: https://www.openml.org/d/1116
31 runs0 likes0 downloads0 reach0 impact
6598 instances - 169 features - 2 classes - 0 missing values
Derived from the Musk dataset: https://www.openml.org/d/1116
31 runs0 likes0 downloads0 reach0 impact
476 instances - 169 features - 2 classes - 0 missing values
car-evaluation-pmlb
31 runs0 likes0 downloads0 reach0 impact
1728 instances - 22 features - 4 classes - 0 missing values
calendarDOW-pmlb
31 runs0 likes0 downloads0 reach0 impact
399 instances - 33 features - 5 classes - 0 missing values
analcatdata_fraud-pmlb
34 runs0 likes0 downloads0 reach0 impact
42 instances - 12 features - 2 classes - 0 missing values
agaricus-lepiota-pmlb
0 runs0 likes0 downloads0 reach0 impact
8145 instances - 23 features - 2 classes - 0 missing values
GAMETES_Heterogeneity_20atts_1600_Het_0.4_0.2_75_EDM-2_001-pmlb
31 runs0 likes0 downloads0 reach0 impact
1600 instances - 21 features - 2 classes - 0 missing values
GAMETES_Heterogeneity_20atts_1600_Het_0.4_0.2_50_EDM-2_001-pmlb
31 runs0 likes0 downloads0 reach0 impact
1600 instances - 21 features - 2 classes - 0 missing values
GAMETES_Epistasis_3-Way_20atts_0.2H_EDM-1_1-pmlb
31 runs0 likes0 downloads0 reach0 impact
1600 instances - 21 features - 2 classes - 0 missing values
GAMETES_Epistasis_2-Way_20atts_0.4H_EDM-1_1-pmlb
31 runs0 likes0 downloads0 reach0 impact
1600 instances - 21 features - 2 classes - 0 missing values
GAMETES_Epistasis_2-Way_20atts_0.1H_EDM-1_1-pmlb
31 runs0 likes0 downloads0 reach0 impact
1600 instances - 21 features - 2 classes - 0 missing values
GAMETES_Epistasis_2-Way_1000atts_0.4H_EDM-1_EDM-1_1-pmlb
0 runs0 likes0 downloads0 reach0 impact
1600 instances - 1001 features - 2 classes - 0 missing values
GAMETES_Heterogeneity_20atts_1600_Het_0.4_0.2_75_EDM-2_001-pmlb
31 runs0 likes1 downloads1 reach22 impact
1600 instances - 21 features - 2 classes - 0 missing values
GAMETES_Heterogeneity_20atts_1600_Het_0.4_0.2_50_EDM-2_001-pmlb
0 runs0 likes1 downloads1 reach22 impact
1600 instances - 21 features - 2 classes - 0 missing values
GAMETES_Epistasis_3-Way_20atts_0.2H_EDM-1_1-pmlb
31 runs0 likes1 downloads1 reach22 impact
1600 instances - 21 features - 2 classes - 0 missing values
GAMETES_Epistasis_2-Way_20atts_0.4H_EDM-1_1-pmlb
31 runs0 likes1 downloads1 reach22 impact
1600 instances - 21 features - 2 classes - 0 missing values
GAMETES_Epistasis_2-Way_20atts_0.1H_EDM-1_1-pmlb
31 runs0 likes1 downloads1 reach22 impact
1600 instances - 21 features - 2 classes - 0 missing values
GAMETES_Epistasis_2-Way_1000atts_0.4H_EDM-1_EDM-1_1-pmlb
0 runs0 likes2 downloads2 reach22 impact
1600 instances - 1001 features - 2 classes - 0 missing values
test auto
0 runs0 likes0 downloads0 reach0 impact
202 instances - 26 features - 5 classes - 0 missing values
test allhyper
0 runs0 likes0 downloads0 reach0 impact
3771 instances - 30 features - 4 classes - 0 missing values
test yeast
0 runs0 likes0 downloads0 reach0 impact
1479 instances - 9 features - 9 classes - 0 missing values
test yeast
0 runs0 likes0 downloads0 reach0 impact
1479 instances - 9 features - 9 classes - 0 missing values
### Description Cylinder bands UCI dataset - Process delays known as cylinder banding in rotogravure printing were substantially mitigated using control rules discovered by decision tree induction.…
0 runs0 likes0 downloads0 reach0 impact
540 instances - 38 features - 2 classes - 999 missing values
### Description Cylinder bands UCI dataset - Process delays known as cylinder banding in rotogravure printing were substantially mitigated using control rules discovered by decision tree induction.…
0 runs0 likes0 downloads0 reach0 impact
540 instances - 38 features - 2 classes - 999 missing values