OpenML

JavaScript is required to properly view the contents of this page!

jannis

active ARFF Publicly available Visibility: public Uploaded 03-01-2023 by Leo Grin
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes

Issue	#Downvotes for this reason	By

Loading wiki

Help us complete this description Edit

Dataset used in the tabular data benchmark https://github.com/LeoGrin/tabular-benchmark, transformed in the same way. This dataset belongs to the "classification on numerical features" benchmark. Original description: SOURCE: [ChaLearn Automatic Machine Learning Challenge (AutoML)](https://competitions.codalab.org/competitions/2321), [ChaLearn](https://automl.chalearn.org/data) This is a "supervised learning" challenge in machine learning. We are making available 30 datasets, all pre-formatted in given feature representations (this means that each example consists of a fixed number of numerical coefficients). The challenge is to solve classification and regression problems, without any further human intervention. The difficulty is that there is a broad diversity of data types and distributions (including balanced or unbalanced classes, sparse or dense feature representations, with or without missing values or categorical variables, various metrics of evaluation, various proportions of number of features and number of examples). The problems are drawn from a wide variety of domains and include medical diagnosis from laboratory analyses, speech recognition, credit rating, prediction or drug toxicity or efficacy, classification of text, prediction of customer satisfaction, object recognition, protein structure prediction, action recognition in video data, etc. While there exist machine learning toolkits including methods that can solve all these problems, it is still considerable human effort to find, for a given combination of dataset, task, metric of evaluation, and available computational time, the combination of methods and hyper-parameter setting that is best suited. Your challenge is to create the "perfect black box" eliminating the human in the loop. This is a challenge with code submission: your code will be executed automatically on our servers to train and test your learning machines with unknown datasets. However, there is NO OBLIGATION TO SUBMIT CODE. Half of the prizes can be won by just submitting prediction results. There are six rounds (Prep, Novice, Intermediate, Advanced, Expert, and Master) in which datasets of progressive difficulty are introduced (5 per round). There is NO PREREQUISITE TO PARTICIPATE IN PREVIOUS ROUNDS to enter a new round. The rounds alternate AutoML phases in which submitted code is "blind tested" in limited time on our platform, using datasets you have never seen before, and Tweakathon phases giving you time to improve your methods by tweaking them on those datasets and running them on your own systems (without computational resource limitation). NOTE: This dataset corresponds to one of the datasets of the challenge.

55 features

class (target)	numeric	2 unique values 0 missing
V1	numeric	38700 unique values 0 missing
V2	numeric	841 unique values 0 missing
V3	numeric	864 unique values 0 missing
V4	numeric	54991 unique values 0 missing
V5	numeric	55857 unique values 0 missing
V6	numeric	53873 unique values 0 missing
V7	numeric	55093 unique values 0 missing
V8	numeric	54814 unique values 0 missing
V9	numeric	54199 unique values 0 missing
V10	numeric	52203 unique values 0 missing
V11	numeric	52571 unique values 0 missing
V12	numeric	54008 unique values 0 missing
V13	numeric	55057 unique values 0 missing
V14	numeric	55103 unique values 0 missing
V15	numeric	55330 unique values 0 missing
V16	numeric	56741 unique values 0 missing
V17	numeric	56757 unique values 0 missing
V18	numeric	56630 unique values 0 missing
V19	numeric	54597 unique values 0 missing
V20	numeric	56857 unique values 0 missing
V21	numeric	56520 unique values 0 missing
V22	numeric	54210 unique values 0 missing
V23	numeric	55698 unique values 0 missing
V24	numeric	55839 unique values 0 missing
V25	numeric	56704 unique values 0 missing
V26	numeric	56742 unique values 0 missing
V27	numeric	56528 unique values 0 missing
V28	numeric	54608 unique values 0 missing
V29	numeric	845 unique values 0 missing
V30	numeric	56873 unique values 0 missing
V31	numeric	55325 unique values 0 missing
V32	numeric	55301 unique values 0 missing
V33	numeric	53763 unique values 0 missing
V34	numeric	55308 unique values 0 missing
V35	numeric	56898 unique values 0 missing
V36	numeric	56481 unique values 0 missing
V37	numeric	51987 unique values 0 missing
V38	numeric	866 unique values 0 missing
V39	numeric	52030 unique values 0 missing
V40	numeric	55377 unique values 0 missing
V41	numeric	56654 unique values 0 missing
V42	numeric	55255 unique values 0 missing
V43	numeric	53813 unique values 0 missing
V44	numeric	52647 unique values 0 missing
V45	numeric	55276 unique values 0 missing
V46	numeric	55043 unique values 0 missing
V47	numeric	54146 unique values 0 missing
V48	numeric	54894 unique values 0 missing
V49	numeric	52028 unique values 0 missing
V50	numeric	56512 unique values 0 missing
V51	numeric	55504 unique values 0 missing
V52	numeric	53777 unique values 0 missing
V53	numeric	38799 unique values 0 missing
V54	numeric	54290 unique values 0 missing