Data
albert

albert

active ARFF Publicly available Visibility: public Uploaded 15-08-2018 by Janek Thomas
0 likes downloaded by 1 people , 1 total downloads 0 issues 0 downvotes
  • chalearn Demographics Health study_218 study_271 study_240 study_379
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
The goal of this challenge is to expose the research community to real world datasets of interest to 4Paradigm. All datasets are formatted in a uniform way, though the type of data might differ. The data are provided as preprocessed matrices, so that participants can focus on classification, although participants are welcome to use additional feature extraction procedures (as long as they do not violate any rule of the challenge). All problems are binary classification problems and are assessed with the normalized Area Under the ROC Curve (AUC) metric (i.e. 2*AUC-1). The identity of the datasets and the type of data is concealed, though its structure is revealed. The final score in phase 2 will be the average of rankings on all testing datasets, a ranking will be generated from such results, and winners will be determined according to such ranking. The tasks are constrained by a time budget. The Codalab platform provides computational resources shared by all participants. Each code submission will be exceuted in a compute worker with the following characteristics: 2Cores / 8G Memory / 40G SSD with Ubuntu OS. To ensure the fairness of the evaluation, when a code submission is evaluated, its execution time is limited in time. http://automl.chalearn.org/data

79 features

class (target)nominal2 unique values
0 missing
V1numeric233 unique values
176110 missing
V2numeric3949 unique values
0 missing
V3numeric1218 unique values
98508 missing
V4numeric122 unique values
96050 missing
V5numeric47036 unique values
8454 missing
V6numeric2783 unique values
79253 missing
V7numeric1149 unique values
15770 missing
V8numeric174 unique values
190 missing
V9numeric2492 unique values
15770 missing
V10numeric10 unique values
176110 missing
V11numeric117 unique values
15770 missing
V12numeric132 unique values
320924 missing
V13numeric236 unique values
96050 missing
V14nominal957 unique values
0 missing
V15nominal545 unique values
0 missing
V16nominal170908 unique values
0 missing
V17nominal74500 unique values
0 missing
V18nominal225 unique values
0 missing
V19nominal14 unique values
0 missing
V20nominal10235 unique values
0 missing
V21nominal442 unique values
0 missing
V22nominal3 unique values
0 missing
V23nominal22358 unique values
0 missing
V24nominal4566 unique values
0 missing
V25nominal153820 unique values
0 missing
V26nominal3099 unique values
0 missing
V27nominal26 unique values
0 missing
V28nominal7774 unique values
0 missing
V29nominal121479 unique values
0 missing
V30nominal10 unique values
0 missing
V31nominal3539 unique values
0 missing
V32nominal1655 unique values
0 missing
V33nominal4 unique values
0 missing
V34nominal139752 unique values
0 missing
V35nominal11 unique values
0 missing
V36nominal14 unique values
0 missing
V37nominal27266 unique values
0 missing
V38nominal61 unique values
0 missing
V39nominal26343 unique values
0 missing
V40numeric1132 unique values
15814 missing
V41nominal4 unique values
0 missing
V42numeric169 unique values
193 missing
V43numeric2766 unique values
79362 missing
V44nominal121505 unique values
0 missing
V45nominal10 unique values
0 missing
V46nominal10230 unique values
0 missing
V47nominal14 unique values
0 missing
V48nominal225 unique values
0 missing
V49nominal545 unique values
0 missing
V50numeric1151 unique values
15875 missing
V51numeric118 unique values
15852 missing
V52numeric10 unique values
176385 missing
V53numeric133 unique values
321209 missing
V54nominal544 unique values
0 missing
V55nominal170898 unique values
0 missing
V56nominal22371 unique values
0 missing
V57nominal26381 unique values
0 missing
V58nominal27182 unique values
0 missing
V59numeric1161 unique values
15723 missing
V60nominal546 unique values
0 missing
V61nominal3098 unique values
0 missing
V62nominal3103 unique values
0 missing
V63nominal14 unique values
0 missing
V64numeric136 unique values
321145 missing
V65nominal26 unique values
0 missing
V66nominal10227 unique values
0 missing
V67numeric136 unique values
321059 missing
V68nominal26 unique values
0 missing
V69numeric132 unique values
320967 missing
V70nominal62 unique values
0 missing
V71nominal60 unique values
0 missing
V72numeric118 unique values
15684 missing
V73nominal227 unique values
0 missing
V74nominal220 unique values
0 missing
V75numeric117 unique values
15773 missing
V76nominal229 unique values
0 missing
V77nominal7786 unique values
0 missing
V78nominal3103 unique values
0 missing

19 properties

425240
Number of instances (rows) of the dataset.
79
Number of attributes (columns) of the dataset.
2
Number of distinct values of the target attribute (if it is nominal).
2734000
Number of missing values in the dataset.
425159
Number of instances with at least one value missing.
26
Number of numeric attributes.
53
Number of nominal attributes.
1.27
Percentage of binary attributes.
99.98
Percentage of instances having missing values.
0.5
Average class difference between consecutive instances.
8.14
Percentage of missing values.
32.91
Percentage of numeric attributes.
0
Number of attributes divided by the number of instances.
67.09
Percentage of nominal attributes.
50
Percentage of instances belonging to the most frequent class.
212620
Number of instances belonging to the most frequent class.
50
Percentage of instances belonging to the least frequent class.
212620
Number of instances belonging to the least frequent class.
1
Number of binary attributes.

18 tasks

0 runs - estimation_procedure: 20% Holdout (Ordered) - target_feature: class
0 runs - estimation_procedure: 33% Holdout set - target_feature: class
0 runs - estimation_procedure: 4-fold Crossvalidation - target_feature: class
0 runs - estimation_procedure: 10-fold Crossvalidation - target_feature: class
0 runs - estimation_procedure: 10-fold Learning Curve - target_feature: class
0 runs - estimation_procedure: 10-fold Learning Curve - target_feature: class
0 runs - estimation_procedure: 10-fold Learning Curve - target_feature: class
0 runs - estimation_procedure: 10-fold Learning Curve - target_feature: class
0 runs - estimation_procedure: Interleaved Test then Train - target_feature: class
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
Define a new task