Data
covertype

covertype

active ARFF Publicly available Visibility: public Uploaded 21-06-2022 by Leo Grin
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
Dataset used in the tabular data benchmark https://github.com/LeoGrin/tabular-benchmark, transformed in the same way. This dataset belongs to the "regression on numerical features" benchmark. Original description: Author: Jock A. Blackard, Dr. Denis J. Dean, Dr. Charles W. Anderson Source: [LibSVM repository](http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/) - 2013-11-14 Please cite: For the binarization: R. Collobert, S. Bengio, and Y. Bengio. A parallel mixture of SVMs for very large scale problems. Neural Computation, 14(05):1105-1114, 2002. This is the famous covertype dataset in its binary version, retrieved 2013-11-13 from the libSVM site (called covtype.binary there). Additional to the preprocessing done there (see LibSVM site for details), this dataset was created as follows: -load covertpype dataset, unscaled. -normalize each file columnwise according to the following rules: -If a column only contains one value (constant feature), it will set to zero and thus removed by sparsity. -If a column contains two values (binary feature), the value occuring more often will be set to zero, the other to one. -If a column contains more than two values (multinary/real feature), the column is divided by its std deviation. -duplicate lines were finally removed. Preprocessing: Transform from multiclass into binary class.

11 features

Y (target)nominal2 unique values
0 missing
X1numeric1971 unique values
0 missing
X2numeric361 unique values
0 missing
X3numeric67 unique values
0 missing
X4numeric551 unique values
0 missing
X5numeric700 unique values
0 missing
X6numeric5785 unique values
0 missing
X7numeric207 unique values
0 missing
X8numeric185 unique values
0 missing
X9numeric255 unique values
0 missing
X10numeric5827 unique values
0 missing

19 properties

566602
Number of instances (rows) of the dataset.
11
Number of attributes (columns) of the dataset.
2
Number of distinct values of the target attribute (if it is nominal).
0
Number of missing values in the dataset.
0
Number of instances with at least one value missing.
10
Number of numeric attributes.
1
Number of nominal attributes.
9.09
Percentage of binary attributes.
0
Percentage of instances having missing values.
0
Percentage of missing values.
1
Average class difference between consecutive instances.
90.91
Percentage of numeric attributes.
0
Number of attributes divided by the number of instances.
9.09
Percentage of nominal attributes.
50
Percentage of instances belonging to the most frequent class.
283301
Number of instances belonging to the most frequent class.
50
Percentage of instances belonging to the least frequent class.
283301
Number of instances belonging to the least frequent class.
1
Number of binary attributes.

1 tasks

0 runs - estimation_procedure: 10-fold Crossvalidation - evaluation_measure: predictive_accuracy - target_feature: Y
Define a new task