People
Matthias Feurer
Search these datasets in more detail

Matthias's datasets

Microsoft Learning to Rank Datasets ## Dataset Descriptions The datasets are machine learning data, in which queries and urls are represented by IDs. The datasets consist of feature vectors extracted…
0 runs0 likes0 downloads0 reach0 impact
1200192 instances - 137 features - 5 classes - 0 missing values
Binarized version of the California Housing Dataset This dataset was obtained from Luis Torgo's collection of regression datasets. It was binarized to serve as the original, unprocessed date for the…
0 runs0 likes0 downloads0 reach0 impact
20640 instances - 9 features - 2 classes - 0 missing values
Improve on the state of the art in credit scoring by predicting the probability that somebody will experience financial distress in the next two years. ## Description Banks play a crucial role in…
0 runs0 likes0 downloads0 reach0 impact
150000 instances - 11 features - 2 classes - 33655 missing values
Data from the PASCAL Challenge 2008 as available on the LibSVM repository ## Description Preprocessing: The raw data set (epsilon_train) is instance-wisely scaled to unit length and split into two…
0 runs0 likes0 downloads0 reach0 impact
500000 instances - 2001 features - 2 classes - 0 missing values
This is a classification problem to distinguish between a signal process which produces Higgs bosons and a background process which does not. ## Information The data has been produced using Monte…
0 runs0 likes0 downloads0 reach0 impact
11000000 instances - 29 features - 2 classes - 0 missing values
Context "Predict behavior to retain customers. You can analyze all relevant customer data and develop focused customer retention programs." [IBM Sample Data Sets] Content Each row represents a…
0 runs0 likes0 downloads0 reach0 impact
7043 instances - 20 features - 2 classes - 11 missing values
Home Credit Default Risk Main Table > Huang, X., Khetan, A., Cvitkovic, M., & Karnin, Z. (2020). > Tabtransformer: Tabular data modeling using contextual embeddings. > arXiv preprint…
0 runs0 likes0 downloads0 reach0 impact
307511 instances - 121 features - 2 classes - 9152465 missing values
At Santander our mission is to help people and businesses prosper. We are always looking for ways to help our customers understand their financial health and identify which products and services might…
0 runs0 likes0 downloads0 reach0 impact
200000 instances - 201 features - 2 classes - 0 missing values
This data was extracted from the 1994 Census bureau database by Ronny Kohavi and Barry Becker (Data Mining and Visualization, Silicon Graphics). A set of reasonably clean records was extracted using…
0 runs0 likes0 downloads0 reach0 impact
32561 instances - 15 features - classes - 4262 missing values
Dota 2 is a popular computer game with two teams of 5 players. At the start of the game each player chooses a unique hero with different strengths and weaknesses. Source: stephen.tridgell '@'…
0 runs0 likes0 downloads0 reach0 impact
102944 instances - 117 features - 2 classes - 0 missing values
======================================================================================================== Seismic bumps dataset…
0 runs0 likes0 downloads0 reach0 impact
2584 instances - 19 features - 2 classes - 0 missing values
## Source: 1. C. Okan Sakar Department of Computer Engineering, Faculty of Engineering and Natural Sciences, Bahcesehir University, 34349 Besiktas, Istanbul, Turkey 2. Yomi Kastro Inveon Information…
0 runs0 likes0 downloads0 reach0 impact
12330 instances - 18 features - 2 classes - 0 missing values
This is the training set of the COIL 2000 challenge as used by Huang et al. (2020). > Huang, X., Khetan, A., Cvitkovic, M., & Karnin, Z. (2020). > Tabtransformer: Tabular data modeling using…
0 runs0 likes0 downloads0 reach0 impact
5822 instances - 86 features - 0 classes - 0 missing values
Pulsar candidates collected during the HTRU survey. Pulsars are a type of star, of considerable scientific interest. Candidates must be classified in to pulsar and non-pulsar classes to aid discovery.…
0 runs0 likes0 downloads0 reach0 impact
17898 instances - 9 features - 2 classes - 0 missing values
Mammography is the most effective method for breast cancer screening available today. However, the low positive predictive value of breast biopsy resulting from mammogram interpretation leads to…
0 runs0 likes0 downloads0 reach0 impact
961 instances - 5 features - 2 classes - 160 missing values
This dataset is a subset of the [KDDCup 2012 track 2](https://www.kaggle.com/competitions/kddcup2012-track2/) data created by Manu Joseph and Harsh Raj for the paper > Joseph, M., & Raj, H. (2022). >…
0 runs0 likes0 downloads0 reach0 impact
1000000 instances - 12 features - 2 classes - 0 missing values
This dataset is from the "Explainable Machine Learning Challenge": > The Explainable Machine Learning Challenge is a collaboration between Google, FICO and academics at Berkeley, Oxford, Imperial, UC…
0 runs0 likes0 downloads0 reach0 impact
9871 instances - 24 features - 2 classes - 12643 missing values
This dataset is from the "Explainable Machine Learning Challenge": > The Explainable Machine Learning Challenge is a collaboration between Google, FICO and academics at Berkeley, Oxford, Imperial, UC…
0 runs0 likes0 downloads0 reach0 impact
9871 instances - 24 features - 2 classes - 0 missing values
This is the datasets from the Kaggle Higgs Boson Machine Learning Challenge 2014. The data was downloaded from the [CERN website](http://opendata.cern.ch/record/328), which also hosts the…
0 runs0 likes0 downloads0 reach0 impact
800000 instances - 31 features - 2 classes - 5053446 missing values
This is the datasets from the Kaggle Higgs Boson Machine Learning Challenge 2014. The data was downloaded from the [CERN website](http://opendata.cern.ch/record/328), which also hosts the…
0 runs0 likes0 downloads0 reach0 impact
818238 instances - 31 features - 2 classes - 5168486 missing values
This is the datasets from the Kaggle Higgs Boson Machine Learning Challenge 2014. The data was downloaded from the [CERN website](http://opendata.cern.ch/record/328), which also hosts the…
0 runs0 likes0 downloads0 reach0 impact
818238 instances - 31 features - 2 classes - 0 missing values
## Overview The Otto Group is one of the world's biggest e-commerce companies, with subsidiaries in more than 20 countries, including Crate & Barrel (USA), Otto.de (Germany) and 3 Suisses (France). We…
0 runs0 likes0 downloads0 reach0 impact
61878 instances - 94 features - 9 classes - 0 missing values
## Data description There are 3 types of input features: * Objective: factual information; * Examination: results of medical examination; * Subjective: information given by the patient. Features: 1.…
0 runs0 likes0 downloads0 reach0 impact
70000 instances - 12 features - 2 classes - 0 missing values
A Tour & Travels Company Wants To Predict Whether A Customer Will Churn Or Not Based On Indicators Given Below. Help Build Predictive Models And Save The Company's Money. Perform Fascinating EDAs. The…
0 runs0 likes0 downloads0 reach0 impact
954 instances - 7 features - 2 classes - 60 missing values
Much of machine learning research focuses on producing models which perform well on benchmark tasks, in turn improving our understanding of the challenges associated with those tasks. From the…
1 runs0 likes0 downloads0 reach13 impact
270912 instances - 785 features - 49 classes - 0 missing values
The German Traffic Sign Benchmark is a multi-class, single-image classification challenge held at the International Joint Conference on Neural Networks (IJCNN) 2011. We cordially invite researchers…
1 runs0 likes1 downloads1 reach11 impact
51839 instances - 257 features - 43 classes - 0 missing values
The German Traffic Sign Benchmark is a multi-class, single-image classification challenge held at the International Joint Conference on Neural Networks (IJCNN) 2011. We cordially invite researchers…
1 runs0 likes0 downloads0 reach13 impact
51839 instances - 2917 features - 43 classes - 0 missing values
The German Traffic Sign Benchmark is a multi-class, single-image classification challenge held at the International Joint Conference on Neural Networks (IJCNN) 2011. We cordially invite researchers…
0 runs0 likes1 downloads1 reach11 impact
51839 instances - 1569 features - 43 classes - 0 missing values
The German Traffic Sign Benchmark is a multi-class, single-image classification challenge held at the International Joint Conference on Neural Networks (IJCNN) 2011. We cordially invite researchers…
0 runs0 likes1 downloads1 reach11 impact
51839 instances - 1569 features - 43 classes - 0 missing values
This dataset is just like the CIFAR-10, except it has 100 classes containing 600 images each. There are 500 training images and 100 testing images per class. The 100 classes in the CIFAR-100 are…
0 runs0 likes0 downloads0 reach0 impact
60000 instances - 3073 features - 100 classes - 0 missing values
Much of machine learning research focuses on producing models which perform well on benchmark tasks, in turn improving our understanding of the challenges associated with those tasks. From the…
0 runs0 likes1 downloads1 reach11 impact
70000 instances - 785 features - 10 classes - 0 missing values