OpenML

JavaScript is required to properly view the contents of this page!

Explore
- Data
- Task
- Flow
- Run
- Study
- Task type
- Measure
- People
Help
Blog
Contact
Please cite us

covertype

active ARFF Publicly available Visibility: public Uploaded 22-06-2015 by Joaquin Vanschoren
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes

Issue	#Downvotes for this reason	By

Loading wiki

Help us complete this description Edit

Author: Jock A. Blackard, Dr. Denis J. Dean, Dr. Charles W. Anderson Source: [UCI](https://archive.ics.uci.edu/ml/datasets/Covertype) - 1998 This is the original version of the famous covertype dataset in ARFF format. Covertype Predicting forest cover type from cartographic variables only (no remotely sensed data). The actual forest cover type for a given observation (30 x 30 meter cell) was determined from US Forest Service (USFS) Region 2 Resource Information System (RIS) data. Independent variables were derived from data originally obtained from US Geological Survey (USGS) and USFS data. Data is in raw form (not scaled) and contains binary (0 or 1) columns of data for qualitative independent variables (wilderness areas and soil types). This study area includes four wilderness areas located in the Roosevelt National Forest of northern Colorado. These areas represent forests with minimal human-caused disturbances, so that existing forest cover types are more a result of ecological processes rather than forest management practices. Some background information for these four wilderness areas: Neota (area 2) probably has the highest mean elevational value of the 4 wilderness areas. Rawah (area 1) and Comanche Peak (area 3) would have a lower mean elevational value, while Cache la Poudre (area 4) would have the lowest mean elevational value. As for primary major tree species in these areas, Neota would have spruce/fir (type 1), while Rawah and Comanche Peak would probably have lodgepole pine (type 2) as their primary species, followed by spruce/fir and aspen (type 5). Cache la Poudre would tend to have Ponderosa pine (type 3), Douglas-fir (type 6), and cottonwood/willow (type 4). The Rawah and Comanche Peak areas would tend to be more typical of the overall dataset than either the Neota or Cache la Poudre, due to their assortment of tree species and range of predictive variable values (elevation, etc.) Cache la Poudre would probably be more unique than the others, due to its relatively low elevation range and species composition. Attribute Information: Given is the attribute name, attribute type, the measurement unit and a brief description. The forest cover type is the classification problem. The order of this listing corresponds to the order of numerals along the rows of the database. > Name / Data Type / Measurement / Description Elevation / quantitative /meters / Elevation in meters Aspect / quantitative / azimuth / Aspect in degrees azimuth Slope / quantitative / degrees / Slope in degrees Horizontal_Distance_To_Hydrology / quantitative / meters / Horz Dist to nearest surface water features Vertical_Distance_To_Hydrology / quantitative / meters / Vert Dist to nearest surface water features Horizontal_Distance_To_Roadways / quantitative / meters / Horz Dist to nearest roadway Hillshade_9am / quantitative / 0 to 255 index / Hillshade index at 9am, summer solstice Hillshade_Noon / quantitative / 0 to 255 index / Hillshade index at noon, summer solstice Hillshade_3pm / quantitative / 0 to 255 index / Hillshade index at 3pm, summer solstice Horizontal_Distance_To_Fire_Points / quantitative / meters / Horz Dist to nearest wildfire ignition points Wilderness_Area (4 binary columns) / qualitative / 0 (absence) or 1 (presence) / Wilderness area designation Soil_Type (40 binary columns) / qualitative / 0 (absence) or 1 (presence) / Soil Type designation Cover_Type (7 types) / integer / 1 to 7 / Forest Cover Type designation Relevant Papers: - Blackard, Jock A. and Denis J. Dean. 2000. "Comparative Accuracies of Artificial Neural Networks and Discriminant Analysis in Predicting Forest Cover Types from Cartographic Variables." Computers and Electronics in Agriculture 24(3):131-151. - Blackard, Jock A. and Denis J. Dean. 1998. "Comparative Accuracies of Neural Networks and Discriminant Analysis in Predicting Forest Cover Types from Cartographic Variables." Second Southern Forestry GIS Conference. University of Georgia. Athens, GA. Pages 189-199. - Blackard, Jock A. 1998. "Comparison of Neural Networks and Discriminant Analysis in Predicting Forest Cover Types." Ph.D. dissertation. Department of Forest Sciences. Colorado State University. Fort Collins, Colorado. 165 pages.

55 features

class (target)	nominal	7 unique values 0 missing
Elevation	numeric	1978 unique values 0 missing
Aspect	numeric	361 unique values 0 missing
Slope	numeric	67 unique values 0 missing
Horizontal_Distance_To_Hydrology	numeric	551 unique values 0 missing
Vertical_Distance_To_Hydrology	numeric	700 unique values 0 missing
Horizontal_Distance_To_Roadways	numeric	5785 unique values 0 missing
Hillshade_9am	numeric	207 unique values 0 missing
Hillshade_Noon	numeric	185 unique values 0 missing
Hillshade_3pm	numeric	255 unique values 0 missing
Horizontal_Distance_To_Fire_Points	numeric	5827 unique values 0 missing
Wilderness_Area1	nominal	2 unique values 0 missing
Wilderness_Area2	nominal	2 unique values 0 missing
Wilderness_Area3	nominal	2 unique values 0 missing
Wilderness_Area4	nominal	2 unique values 0 missing
Soil_Type1	nominal	2 unique values 0 missing
Soil_Type2	nominal	2 unique values 0 missing
Soil_Type3	nominal	2 unique values 0 missing
Soil_Type4	nominal	2 unique values 0 missing
Soil_Type5	nominal	2 unique values 0 missing
Soil_Type6	nominal	2 unique values 0 missing
Soil_Type7	nominal	2 unique values 0 missing
Soil_Type8	nominal	2 unique values 0 missing
Soil_Type9	nominal	2 unique values 0 missing
Soil_Type10	nominal	2 unique values 0 missing
Soil_Type11	nominal	2 unique values 0 missing
Soil_Type12	nominal	2 unique values 0 missing
Soil_Type13	nominal	2 unique values 0 missing
Soil_Type14	nominal	2 unique values 0 missing
Soil_Type15	nominal	2 unique values 0 missing
Soil_Type16	nominal	2 unique values 0 missing
Soil_Type17	nominal	2 unique values 0 missing
Soil_Type18	nominal	2 unique values 0 missing
Soil_Type19	nominal	2 unique values 0 missing
Soil_Type20	nominal	2 unique values 0 missing
Soil_Type21	nominal	2 unique values 0 missing
Soil_Type22	nominal	2 unique values 0 missing
Soil_Type23	nominal	2 unique values 0 missing
Soil_Type24	nominal	2 unique values 0 missing
Soil_Type25	nominal	2 unique values 0 missing
Soil_Type26	nominal	2 unique values 0 missing
Soil_Type27	nominal	2 unique values 0 missing
Soil_Type28	nominal	2 unique values 0 missing
Soil_Type29	nominal	2 unique values 0 missing
Soil_Type30	nominal	2 unique values 0 missing
Soil_Type31	nominal	2 unique values 0 missing
Soil_Type32	nominal	2 unique values 0 missing
Soil_Type33	nominal	2 unique values 0 missing
Soil_Type34	nominal	2 unique values 0 missing
Soil_Type35	nominal	2 unique values 0 missing
Soil_Type36	nominal	2 unique values 0 missing
Soil_Type37	nominal	2 unique values 0 missing
Soil_Type38	nominal	2 unique values 0 missing
Soil_Type39	nominal	2 unique values 0 missing
Soil_Type40	nominal	2 unique values 0 missing

Show all 55 features

107 properties

NumberOfInstances

581012

Number of instances (rows) of the dataset.

NumberOfFeatures

Number of attributes (columns) of the dataset.

NumberOfClasses

Number of distinct values of the target attribute (if it is nominal).

NumberOfMissingValues

Number of missing values in the dataset.

NumberOfInstancesWithMissingValues

Number of instances with at least one value missing.

NumberOfNumericFeatures

Number of numeric attributes.

NumberOfSymbolicFeatures

Number of nominal attributes.

REPTreeDepth1Kappa

0.85

Kappa coefficient achieved by the landmarker weka.classifiers.trees.REPTree -L 1

CfsSubsetEval_kNN1NAUC

0.85

Area Under the ROC Curve achieved by the landmarker weka.classifiers.lazy.IBk -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

StdvNominalAttDistinctValues

0.75

Standard deviation of the number of distinct values among attributes of the nominal type.

J48.001.ErrRate

0.07

Error rate achieved by the landmarker weka.classifiers.trees.J48 -C .001

MeanNominalAttDistinctValues

2.11

Average number of distinct values among the attributes of the nominal type.

Quartile1SkewnessOfNumericAtts

-0.88

First quartile of skewness among attributes of the numeric type.

REPTreeDepth2AUC

0.97

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.REPTree -L 2

CfsSubsetEval_kNN1NErrRate

0.28

Error rate achieved by the landmarker weka.classifiers.lazy.IBk -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

kNN1NAUC

0.95

Area Under the ROC Curve achieved by the landmarker weka.classifiers.lazy.IBk

J48.001.Kappa

0.89

Kappa coefficient achieved by the landmarker weka.classifiers.trees.J48 -C .001

MeanSkewnessOfNumericAtts

0.28

Mean skewness among attributes of the numeric type.

Quartile1StdDevOfNumericAtts

25.02

First quartile of standard deviation of attributes of the numeric type.

REPTreeDepth2ErrRate

0.09

Error rate achieved by the landmarker weka.classifiers.trees.REPTree -L 2

CfsSubsetEval_kNN1NKappa

0.53

Kappa coefficient achieved by the landmarker weka.classifiers.lazy.IBk -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

kNN1NErrRate

0.07

Error rate achieved by the landmarker weka.classifiers.lazy.IBk

MajorityClassPercentage

48.76

Percentage of instances belonging to the most frequent class.

MeanStdDevOfNumericAtts

363.85

Mean standard deviation of attributes of the numeric type.

Quartile2AttributeEntropy

0.09

Second quartile (Median) of entropy among attributes.

REPTreeDepth2Kappa

0.85

Kappa coefficient achieved by the landmarker weka.classifiers.trees.REPTree -L 2

ClassEntropy

1.74

Entropy of the target attribute values.

kNN1NKappa

0.89

Kappa coefficient achieved by the landmarker weka.classifiers.lazy.IBk

MajorityClassSize

283301

Number of instances belonging to the most frequent class.

MinAttributeEntropy

Minimal entropy among attributes.

Quartile2KurtosisOfNumericAtts

1.06

Second quartile (Median) of kurtosis among attributes of the numeric type.

REPTreeDepth3AUC

0.97

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.REPTree -L 3

DecisionStumpAUC

0.6

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.DecisionStump

MaxAttributeEntropy

0.99

Maximum entropy among attributes.

MinKurtosisOfNumericAtts

-1.22

Minimum kurtosis among attributes of the numeric type.

Quartile2MeansOfNumericAtts

217.73

Second quartile (Median) of means among attributes of the numeric type.

REPTreeDepth3ErrRate

0.09

Error rate achieved by the landmarker weka.classifiers.trees.REPTree -L 3

DecisionStumpErrRate

0.51

Error rate achieved by the landmarker weka.classifiers.trees.DecisionStump

MaxKurtosisOfNumericAtts

5.25

Maximum kurtosis among attributes of the numeric type.

MinMeansOfNumericAtts

14.1

Minimum of means among attributes of the numeric type.

Quartile2MutualInformation

0.01

Second quartile (Median) of mutual information between the nominal attributes and the target attribute.

Quartile2SkewnessOfNumericAtts

0.56

Second quartile (Median) of skewness among attributes of the numeric type.

REPTreeDepth3Kappa

0.85

Kappa coefficient achieved by the landmarker weka.classifiers.trees.REPTree -L 3

DecisionStumpKappa

Kappa coefficient achieved by the landmarker weka.classifiers.trees.DecisionStump

MaxMeansOfNumericAtts

2959.37

Maximum of means among attributes of the numeric type.

MinMutualInformation

Minimal mutual information between the nominal attributes and the target attribute.

Quartile2StdDevOfNumericAtts

85.1

Second quartile (Median) of standard deviation of attributes of the numeric type.

RandomTreeDepth1AUC

0.9

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.RandomTree -depth 1

Dimensionality

Number of attributes divided by the number of instances.

MaxMutualInformation

0.21

Maximum mutual information between the nominal attributes and the target attribute.

MinNominalAttDistinctValues

The minimal number of distinct values among attributes of the nominal type.

PercentageOfBinaryFeatures

Percentage of binary attributes.

Quartile3AttributeEntropy

0.29

Third quartile of entropy among attributes.

RandomTreeDepth1ErrRate

0.13

Error rate achieved by the landmarker weka.classifiers.trees.RandomTree -depth 1

EquivalentNumberOfAtts

80.19

Number of attributes needed to optimally describe the class (under the assumption of independence among attributes). Equals ClassEntropy divided by MeanMutualInformation.

MaxNominalAttDistinctValues

The maximum number of distinct values among attributes of the nominal type.

MinSkewnessOfNumericAtts

-1.18

Minimum skewness among attributes of the numeric type.

PercentageOfInstancesWithMissingValues

Percentage of instances having missing values.

Quartile3KurtosisOfNumericAtts

1.92

Third quartile of kurtosis among attributes of the numeric type.

AutoCorrelation

0.95

Average class difference between consecutive instances.

RandomTreeDepth1Kappa

0.79

Kappa coefficient achieved by the landmarker weka.classifiers.trees.RandomTree -depth 1

J48.00001.AUC

0.96

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.J48 -C .00001

MaxSkewnessOfNumericAtts

1.79

Maximum skewness among attributes of the numeric type.

MinStdDevOfNumericAtts

7.49

Minimum standard deviation of attributes of the numeric type.

PercentageOfMissingValues

Percentage of missing values.

Quartile3MeansOfNumericAtts

2072.76

Third quartile of means among attributes of the numeric type.

CfsSubsetEval_DecisionStumpAUC

0.85

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.DecisionStump -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

RandomTreeDepth2AUC

0.9

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.RandomTree -depth 2

J48.00001.ErrRate

0.07

Error rate achieved by the landmarker weka.classifiers.trees.J48 -C .00001

MaxStdDevOfNumericAtts

1559.25

Maximum standard deviation of attributes of the numeric type.

MinorityClassPercentage

0.47

Percentage of instances belonging to the least frequent class.

PercentageOfNumericFeatures

18.18

Percentage of numeric attributes.

Quartile3MutualInformation

0.03

Third quartile of mutual information between the nominal attributes and the target attribute.

CfsSubsetEval_DecisionStumpErrRate

0.28

Error rate achieved by the landmarker weka.classifiers.trees.DecisionStump -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

RandomTreeDepth2ErrRate

0.13

Error rate achieved by the landmarker weka.classifiers.trees.RandomTree -depth 2

J48.00001.Kappa

0.89

Kappa coefficient achieved by the landmarker weka.classifiers.trees.J48 -C .00001

MeanAttributeEntropy

0.18

Average entropy of the attributes.

MinorityClassSize

2747

Number of instances belonging to the least frequent class.

PercentageOfSymbolicFeatures

81.82

Percentage of nominal attributes.

Quartile3SkewnessOfNumericAtts

1.18

Third quartile of skewness among attributes of the numeric type.

CfsSubsetEval_DecisionStumpKappa

0.53

Kappa coefficient achieved by the landmarker weka.classifiers.trees.DecisionStump -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

RandomTreeDepth2Kappa

0.79

Kappa coefficient achieved by the landmarker weka.classifiers.trees.RandomTree -depth 2

J48.0001.AUC

0.96

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.J48 -C .0001

MeanKurtosisOfNumericAtts

1.23

Mean kurtosis among attributes of the numeric type.

NaiveBayesAUC

0.83

Area Under the ROC Curve achieved by the landmarker weka.classifiers.bayes.NaiveBayes

Quartile1AttributeEntropy

0.02

First quartile of entropy among attributes.

Quartile3StdDevOfNumericAtts

541.04

Third quartile of standard deviation of attributes of the numeric type.

CfsSubsetEval_NaiveBayesAUC

0.85

Area Under the ROC Curve achieved by the landmarker weka.classifiers.bayes.NaiveBayes -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

RandomTreeDepth3AUC

0.9

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.RandomTree -depth 3

J48.0001.ErrRate

0.07

Error rate achieved by the landmarker weka.classifiers.trees.J48 -C .0001

MeanMeansOfNumericAtts

835.34

Mean of means among attributes of the numeric type.

NaiveBayesErrRate

0.35

Error rate achieved by the landmarker weka.classifiers.bayes.NaiveBayes

Quartile1KurtosisOfNumericAtts

0.2

First quartile of kurtosis among attributes of the numeric type.

REPTreeDepth1AUC

0.97

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.REPTree -L 1

CfsSubsetEval_NaiveBayesErrRate

0.28

Error rate achieved by the landmarker weka.classifiers.bayes.NaiveBayes -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

RandomTreeDepth3ErrRate

0.13

Error rate achieved by the landmarker weka.classifiers.trees.RandomTree -depth 3

J48.0001.Kappa

0.89

Kappa coefficient achieved by the landmarker weka.classifiers.trees.J48 -C .0001

MeanMutualInformation

0.02

Average mutual information between the nominal attributes and the target attribute.

NaiveBayesKappa

0.47

Kappa coefficient achieved by the landmarker weka.classifiers.bayes.NaiveBayes

Quartile1MeansOfNumericAtts

118.5

First quartile of means among attributes of the numeric type.

REPTreeDepth1ErrRate

0.09

Error rate achieved by the landmarker weka.classifiers.trees.REPTree -L 1

CfsSubsetEval_NaiveBayesKappa

0.53

Kappa coefficient achieved by the landmarker weka.classifiers.bayes.NaiveBayes -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

RandomTreeDepth3Kappa

0.79

Kappa coefficient achieved by the landmarker weka.classifiers.trees.RandomTree -depth 3

J48.001.AUC

0.96

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.J48 -C .001

MeanNoiseToSignalRatio

7.52

An estimate of the amount of irrelevant information in the attributes regarding the class. Equals (MeanAttributeEntropy - MeanMutualInformation) divided by MeanMutualInformation.

NumberOfBinaryFeatures

Number of binary attributes.

Quartile1MutualInformation

First quartile of mutual information between the nominal attributes and the target attribute.

Show all 107 properties

25 tasks

Supervised Classification on covertype

5 runs - estimation_procedure: 10-fold Crossvalidation - target_feature: class

Supervised Classification on covertype

4 runs - estimation_procedure: 33% Holdout set - evaluation_measure: predictive_accuracy - target_feature: class

Supervised Classification on covertype

0 runs - estimation_procedure: 20% Holdout (Ordered) - evaluation_measure: predictive_accuracy - target_feature: class

Supervised Classification on covertype

0 runs - estimation_procedure: 10-fold Crossvalidation - evaluation_measure: precision - target_feature: class

Supervised Classification on covertype

0 runs - estimation_procedure: 4-fold Crossvalidation - target_feature: class

Learning Curve on covertype

0 runs - estimation_procedure: 10-fold Learning Curve - target_feature: class

Learning Curve on covertype

0 runs - estimation_procedure: 10-fold Learning Curve - target_feature: class

Learning Curve on covertype

0 runs - estimation_procedure: 10-fold Learning Curve - target_feature: class

Learning Curve on covertype

0 runs - estimation_procedure: 10-fold Learning Curve - target_feature: class

Learning Curve on covertype

0 runs - estimation_procedure: 10-fold Learning Curve - target_feature: class

Learning Curve on covertype

0 runs - estimation_procedure: 10-fold Learning Curve - target_feature: class

Learning Curve on covertype

0 runs - estimation_procedure: 10-fold Learning Curve - target_feature: class

Learning Curve on covertype

0 runs - estimation_procedure: 10-fold Learning Curve - target_feature: class

Supervised Data Stream Classification on covertype

0 runs - estimation_procedure: Interleaved Test then Train - target_feature: class

Clustering on covertype