OpenML

JavaScript is required to properly view the contents of this page!

Explore
- Data
- Task
- Flow
- Run
- Study
- Task type
- Measure
- People
Help
Blog
Contact
Please cite us

covertype

active ARFF Publicly available Visibility: public Uploaded 05-07-2022 by Leo Grin
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes

Issue	#Downvotes for this reason	By

Loading wiki

Help us complete this description Edit

Dataset used in the tabular data benchmark https://github.com/LeoGrin/tabular-benchmark, transformed in the same way. This dataset belongs to the "classification on numerical features" benchmark. Original description: Author: Jock A. Blackard, Dr. Denis J. Dean, Dr. Charles W. Anderson Source: [LibSVM repository](http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/) - 2013-11-14 Please cite: For the binarization: R. Collobert, S. Bengio, and Y. Bengio. A parallel mixture of SVMs for very large scale problems. Neural Computation, 14(05):1105-1114, 2002. This is the famous covertype dataset in its binary version, retrieved 2013-11-13 from the libSVM site (called covtype.binary there). Additional to the preprocessing done there (see LibSVM site for details), this dataset was created as follows: -load covertpype dataset, unscaled. -normalize each file columnwise according to the following rules: -If a column only contains one value (constant feature), it will set to zero and thus removed by sparsity. -If a column contains two values (binary feature), the value occuring more often will be set to zero, the other to one. -If a column contains more than two values (multinary/real feature), the column is divided by its std deviation. -duplicate lines were finally removed. Preprocessing: Transform from multiclass into binary class.

11 features

Y (target)	nominal	2 unique values 0 missing
X1	numeric	1971 unique values 0 missing
X2	numeric	361 unique values 0 missing
X3	numeric	67 unique values 0 missing
X4	numeric	551 unique values 0 missing
X5	numeric	700 unique values 0 missing
X6	numeric	5785 unique values 0 missing
X7	numeric	207 unique values 0 missing
X8	numeric	185 unique values 0 missing
X9	numeric	255 unique values 0 missing
X10	numeric	5827 unique values 0 missing