Data
sylva_agnostic

sylva_agnostic

deactivated ARFF Publicly available Visibility: public Uploaded 06-10-2014 by Joaquin Vanschoren
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
  • mythbusting_1 OpenML100 study_1 study_123 study_14 study_15 study_20 study_34 study_41
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
Author: [Isabelle Guyon](isabelle@clopinet.com) Source: [Agnostic Learning vs. Prior Knowledge Challenge](http://www.agnostic.inf.ethz.ch) Please cite: None Dataset from the Agnostic Learning vs. Prior Knowledge Challenge (http://www.agnostic.inf.ethz.ch), which consisted of 5 different datasets (SYLVA, GINA, NOVA, HIVA, ADA). The purpose of the challenge was to check if the performance of domain-specific feature engineering (prior knowledge) can be met by algorithms that were trained on data without any domain-specific knowledge (agnostic). For the latter, the data was anonymised and preprocessed in a way that makes them uninterpretable. This dataset contains the agnostic (smashed) version of a data set from the Remote Sensing and GIS Program of Colorado State University for the time span June 2005 - September 2006. A Similar, raw and not-agnostic data set is termed __Covertype Dataset__ and can be found in the [UCI Database](https://archive.ics.uci.edu/ml/datasets/covertype). Modified by TunedIT (converted to ARFF format) ### Topic The task of SYLVA is to classify forest cover types. The forest cover type for 30 x 30 meter cells is obtained from US Forest Service (USFS) Region 2 Resource Information System (RIS) data. We brought it back to a two-class classification problem (classifying Ponderosa pine vs. everything else). The “agnostic data” consists in 216 input variables. Each pattern is composed of 4 records: 2 true records matching the target and 2 records picked at random. Thus ½ of the features are distracters. The “prior knowledge data” is identical to the “agnostic data”, except that the distracters are removed and the identity of the features is revealed. ### Description Data type: non-sparse Number of features: 216 Number of examples and check-sums: Pos_ex Neg_ex Tot_ex Check_sum Train 805 12281 13086 238271607.00 Valid 81 1228 1309 23817234.00 This dataset contains samples from both training and validation datasets. ### Source Original owners: Remote Sensing and GIS Program Department of Forest Sciences College of Natural Resources Colorado State University Fort Collins, CO 80523 (contact Jock A. Blackard, jblackard/wo_ftcol@fs.fed.us or Dr. Denis J. Dean, denis@cnr.colostate.edu) Jock A. Blackard USDA Forest Service 3825 E. Mulberry Fort Collins, CO 80524 USA jblackard/wo_ftcol@fs.fed.us

217 features

label (target)nominal2 unique values
0 missing
attr0numeric173 unique values
0 missing
attr1numeric2 unique values
0 missing
attr2numeric2 unique values
0 missing
attr3numeric2 unique values
0 missing
attr4numeric915 unique values
0 missing
attr5numeric2 unique values
0 missing
attr6numeric2 unique values
0 missing
attr7numeric2 unique values
0 missing
attr8numeric2 unique values
0 missing
attr9numeric923 unique values
0 missing
attr10numeric2 unique values
0 missing
attr11numeric354 unique values
0 missing
attr12numeric441 unique values
0 missing
attr13numeric2 unique values
0 missing
attr14numeric1 unique values
0 missing
attr15numeric2 unique values
0 missing
attr16numeric2 unique values
0 missing
attr17numeric2 unique values
0 missing
attr18numeric2 unique values
0 missing
attr19numeric2 unique values
0 missing
attr20numeric353 unique values
0 missing
attr21numeric2 unique values
0 missing
attr22numeric165 unique values
0 missing
attr23numeric167 unique values
0 missing
attr24numeric2 unique values
0 missing
attr25numeric2 unique values
0 missing
attr26numeric2 unique values
0 missing
attr27numeric2 unique values
0 missing
attr28numeric2 unique values
0 missing
attr29numeric2 unique values
0 missing
attr30numeric2 unique values
0 missing
attr31numeric2 unique values
0 missing
attr32numeric2 unique values
0 missing
attr33numeric2 unique values
0 missing
attr34numeric2 unique values
0 missing
attr35numeric2 unique values
0 missing
attr36numeric2 unique values
0 missing
attr37numeric2 unique values
0 missing
attr38numeric2 unique values
0 missing
attr39numeric2 unique values
0 missing
attr40numeric2 unique values
0 missing
attr41numeric2 unique values
0 missing
attr42numeric2 unique values
0 missing
attr43numeric2 unique values
0 missing
attr44numeric2 unique values
0 missing
attr45numeric2 unique values
0 missing
attr46numeric2 unique values
0 missing
attr47numeric2 unique values
0 missing
attr48numeric129 unique values
0 missing
attr49numeric2 unique values
0 missing
attr50numeric2 unique values
0 missing
attr51numeric361 unique values
0 missing
attr52numeric2 unique values
0 missing
attr53numeric909 unique values
0 missing
attr54numeric803 unique values
0 missing
attr55numeric2 unique values
0 missing
attr56numeric2 unique values
0 missing
attr57numeric2 unique values
0 missing
attr58numeric2 unique values
0 missing
attr59numeric361 unique values
0 missing
attr60numeric2 unique values
0 missing
attr61numeric2 unique values
0 missing
attr62numeric245 unique values
0 missing
attr63numeric2 unique values
0 missing
attr64numeric2 unique values
0 missing
attr65numeric2 unique values
0 missing
attr66numeric2 unique values
0 missing
attr67numeric2 unique values
0 missing
attr68numeric2 unique values
0 missing
attr69numeric239 unique values
0 missing
attr70numeric2 unique values
0 missing
attr71numeric2 unique values
0 missing
attr72numeric2 unique values
0 missing
attr73numeric2 unique values
0 missing
attr74numeric131 unique values
0 missing
attr75numeric2 unique values
0 missing
attr76numeric2 unique values
0 missing
attr77numeric2 unique values
0 missing
attr78numeric2 unique values
0 missing
attr79numeric2 unique values
0 missing
attr80numeric2 unique values
0 missing
attr81numeric2 unique values
0 missing
attr82numeric2 unique values
0 missing
attr83numeric2 unique values
0 missing
attr84numeric2 unique values
0 missing
attr85numeric2 unique values
0 missing
attr86numeric2 unique values
0 missing
attr87numeric2 unique values
0 missing
attr88numeric2 unique values
0 missing
attr89numeric2 unique values
0 missing
attr90numeric2 unique values
0 missing
attr91numeric915 unique values
0 missing
attr92numeric1 unique values
0 missing
attr93numeric2 unique values
0 missing
attr94numeric2 unique values
0 missing
attr95numeric2 unique values
0 missing
attr96numeric799 unique values
0 missing
attr97numeric360 unique values
0 missing
attr98numeric2 unique values
0 missing
attr99numeric2 unique values
0 missing
attr100numeric2 unique values
0 missing
attr101numeric427 unique values
0 missing
attr102numeric2 unique values
0 missing
attr103numeric2 unique values
0 missing
attr104numeric2 unique values
0 missing
attr105numeric2 unique values
0 missing
attr106numeric916 unique values
0 missing
attr107numeric2 unique values
0 missing
attr108numeric2 unique values
0 missing
attr109numeric801 unique values
0 missing
attr110numeric2 unique values
0 missing
attr111numeric353 unique values
0 missing
attr112numeric2 unique values
0 missing
attr113numeric2 unique values
0 missing
attr114numeric2 unique values
0 missing
attr115numeric1 unique values
0 missing
attr116numeric2 unique values
0 missing
attr117numeric2 unique values
0 missing
attr118numeric2 unique values
0 missing
attr119numeric2 unique values
0 missing
attr120numeric2 unique values
0 missing
attr121numeric2 unique values
0 missing
attr122numeric2 unique values
0 missing
attr123numeric2 unique values
0 missing
attr124numeric2 unique values
0 missing
attr125numeric2 unique values
0 missing
attr126numeric2 unique values
0 missing
attr127numeric2 unique values
0 missing
attr128numeric2 unique values
0 missing
attr129numeric2 unique values
0 missing
attr130numeric2 unique values
0 missing
attr131numeric2 unique values
0 missing
attr132numeric2 unique values
0 missing
attr133numeric2 unique values
0 missing
attr134numeric2 unique values
0 missing
attr135numeric2 unique values
0 missing
attr136numeric2 unique values
0 missing
attr137numeric2 unique values
0 missing
attr138numeric2 unique values
0 missing
attr139numeric2 unique values
0 missing
attr140numeric2 unique values
0 missing
attr141numeric2 unique values
0 missing
attr142numeric242 unique values
0 missing
attr143numeric2 unique values
0 missing
attr144numeric2 unique values
0 missing
attr145numeric2 unique values
0 missing
attr146numeric340 unique values
0 missing
attr147numeric2 unique values
0 missing
attr148numeric906 unique values
0 missing
attr149numeric2 unique values
0 missing
attr150numeric2 unique values
0 missing
attr151numeric48 unique values
0 missing
attr152numeric49 unique values
0 missing
attr153numeric429 unique values
0 missing
attr154numeric2 unique values
0 missing
attr155numeric2 unique values
0 missing
attr156numeric2 unique values
0 missing
attr157numeric2 unique values
0 missing
attr158numeric2 unique values
0 missing
attr159numeric2 unique values
0 missing
attr160numeric2 unique values
0 missing
attr161numeric50 unique values
0 missing
attr162numeric2 unique values
0 missing
attr163numeric128 unique values
0 missing
attr164numeric51 unique values
0 missing
attr165numeric918 unique values
0 missing
attr166numeric2 unique values
0 missing
attr167numeric170 unique values
0 missing
attr168numeric2 unique values
0 missing
attr169numeric2 unique values
0 missing
attr170numeric2 unique values
0 missing
attr171numeric2 unique values
0 missing
attr172numeric2 unique values
0 missing
attr173numeric2 unique values
0 missing
attr174numeric2 unique values
0 missing
attr175numeric2 unique values
0 missing
attr176numeric360 unique values
0 missing
attr177numeric430 unique values
0 missing
attr178numeric2 unique values
0 missing
attr179numeric919 unique values
0 missing
attr180numeric2 unique values
0 missing
attr181numeric2 unique values
0 missing
attr182numeric2 unique values
0 missing
attr183numeric2 unique values
0 missing
attr184numeric2 unique values
0 missing
attr185numeric2 unique values
0 missing
attr186numeric2 unique values
0 missing
attr187numeric2 unique values
0 missing
attr188numeric2 unique values
0 missing
attr189numeric2 unique values
0 missing
attr190numeric2 unique values
0 missing
attr191numeric2 unique values
0 missing
attr192numeric2 unique values
0 missing
attr193numeric2 unique values
0 missing
attr194numeric133 unique values
0 missing
attr195numeric2 unique values
0 missing
attr196numeric2 unique values
0 missing
attr197numeric2 unique values
0 missing
attr198numeric2 unique values
0 missing
attr199numeric2 unique values
0 missing
attr200numeric2 unique values
0 missing
attr201numeric801 unique values
0 missing
attr202numeric2 unique values
0 missing
attr203numeric244 unique values
0 missing
attr204numeric2 unique values
0 missing
attr205numeric2 unique values
0 missing
attr206numeric2 unique values
0 missing
attr207numeric2 unique values
0 missing
attr208numeric1 unique values
0 missing
attr209numeric2 unique values
0 missing
attr210numeric2 unique values
0 missing
attr211numeric2 unique values
0 missing
attr212numeric2 unique values
0 missing
attr213numeric2 unique values
0 missing
attr214numeric2 unique values
0 missing
attr215numeric2 unique values
0 missing

107 properties

14395
Number of instances (rows) of the dataset.
217
Number of attributes (columns) of the dataset.
2
Number of distinct values of the target attribute (if it is nominal).
0
Number of missing values in the dataset.
0
Number of instances with at least one value missing.
216
Number of numeric attributes.
1
Number of nominal attributes.
0.92
Kappa coefficient achieved by the landmarker weka.classifiers.trees.REPTree -L 3
0
Kappa coefficient achieved by the landmarker weka.classifiers.trees.DecisionStump
879.13
Maximum of means among attributes of the numeric type.
Minimal mutual information between the nominal attributes and the target attribute.
6.24
Second quartile (Median) of skewness among attributes of the numeric type.
0.84
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.RandomTree -depth 1
0.02
Number of attributes divided by the number of instances.
Maximum mutual information between the nominal attributes and the target attribute.
2
The minimal number of distinct values among attributes of the nominal type.
0.46
Percentage of binary attributes.
0.15
Second quartile (Median) of standard deviation of attributes of the numeric type.
0.04
Error rate achieved by the landmarker weka.classifiers.trees.RandomTree -depth 1
Number of attributes needed to optimally describe the class (under the assumption of independence among attributes). Equals ClassEntropy divided by MeanMutualInformation.
2
The maximum number of distinct values among attributes of the nominal type.
-1.18
Minimum skewness among attributes of the numeric type.
0
Percentage of instances having missing values.
Third quartile of entropy among attributes.
0.67
Kappa coefficient achieved by the landmarker weka.classifiers.trees.RandomTree -depth 1
0.96
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.J48 -C .00001
119.98
Maximum skewness among attributes of the numeric type.
0
Minimum standard deviation of attributes of the numeric type.
0
Percentage of missing values.
274.69
Third quartile of kurtosis among attributes of the numeric type.
0.88
Average class difference between consecutive instances.
0.84
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.RandomTree -depth 2
0.01
Error rate achieved by the landmarker weka.classifiers.trees.J48 -C .00001
312.04
Maximum standard deviation of attributes of the numeric type.
6.15
Percentage of instances belonging to the least frequent class.
99.54
Percentage of numeric attributes.
0.1
Third quartile of means among attributes of the numeric type.
0.97
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.DecisionStump -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
0.04
Error rate achieved by the landmarker weka.classifiers.trees.RandomTree -depth 2
0.92
Kappa coefficient achieved by the landmarker weka.classifiers.trees.J48 -C .00001
Average entropy of the attributes.
886
Number of instances belonging to the least frequent class.
0.46
Percentage of nominal attributes.
Third quartile of mutual information between the nominal attributes and the target attribute.
0.01
Error rate achieved by the landmarker weka.classifiers.trees.DecisionStump -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
0.67
Kappa coefficient achieved by the landmarker weka.classifiers.trees.RandomTree -depth 2
0.96
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.J48 -C .0001
560.67
Mean kurtosis among attributes of the numeric type.
1
Area Under the ROC Curve achieved by the landmarker weka.classifiers.bayes.NaiveBayes
First quartile of entropy among attributes.
16.63
Third quartile of skewness among attributes of the numeric type.
0.92
Kappa coefficient achieved by the landmarker weka.classifiers.trees.DecisionStump -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
0.84
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.RandomTree -depth 3
0.01
Error rate achieved by the landmarker weka.classifiers.trees.J48 -C .0001
84.29
Mean of means among attributes of the numeric type.
0.02
Error rate achieved by the landmarker weka.classifiers.bayes.NaiveBayes
5.41
First quartile of kurtosis among attributes of the numeric type.
0.3
Third quartile of standard deviation of attributes of the numeric type.
0.97
Area Under the ROC Curve achieved by the landmarker weka.classifiers.bayes.NaiveBayes -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
0.04
Error rate achieved by the landmarker weka.classifiers.trees.RandomTree -depth 3
0.92
Kappa coefficient achieved by the landmarker weka.classifiers.trees.J48 -C .0001
Average mutual information between the nominal attributes and the target attribute.
0.82
Kappa coefficient achieved by the landmarker weka.classifiers.bayes.NaiveBayes
0
First quartile of means among attributes of the numeric type.
0.98
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.REPTree -L 1
0.01
Error rate achieved by the landmarker weka.classifiers.bayes.NaiveBayes -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
0.67
Kappa coefficient achieved by the landmarker weka.classifiers.trees.RandomTree -depth 3
0.96
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.J48 -C .001
An estimate of the amount of irrelevant information in the attributes regarding the class. Equals (MeanAttributeEntropy - MeanMutualInformation) divided by MeanMutualInformation.
1
Number of binary attributes.
First quartile of mutual information between the nominal attributes and the target attribute.
0.01
Error rate achieved by the landmarker weka.classifiers.trees.REPTree -L 1
0.92
Kappa coefficient achieved by the landmarker weka.classifiers.bayes.NaiveBayes -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
0.97
Area Under the ROC Curve achieved by the landmarker weka.classifiers.lazy.IBk -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
0
Standard deviation of the number of distinct values among attributes of the nominal type.
0.01
Error rate achieved by the landmarker weka.classifiers.trees.J48 -C .001
2
Average number of distinct values among the attributes of the nominal type.
2.62
First quartile of skewness among attributes of the numeric type.
0.92
Kappa coefficient achieved by the landmarker weka.classifiers.trees.REPTree -L 1
0.01
Error rate achieved by the landmarker weka.classifiers.lazy.IBk -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
0.89
Area Under the ROC Curve achieved by the landmarker weka.classifiers.lazy.IBk
0.92
Kappa coefficient achieved by the landmarker weka.classifiers.trees.J48 -C .001
13.35
Mean skewness among attributes of the numeric type.
0.06
First quartile of standard deviation of attributes of the numeric type.
0.98
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.REPTree -L 2
0.92
Kappa coefficient achieved by the landmarker weka.classifiers.lazy.IBk -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
0.02
Error rate achieved by the landmarker weka.classifiers.lazy.IBk
93.85
Percentage of instances belonging to the most frequent class.
28.4
Mean standard deviation of attributes of the numeric type.
Second quartile (Median) of entropy among attributes.
0.01
Error rate achieved by the landmarker weka.classifiers.trees.REPTree -L 2
0.33
Entropy of the target attribute values.
0.79
Kappa coefficient achieved by the landmarker weka.classifiers.lazy.IBk
13509
Number of instances belonging to the most frequent class.
Minimal entropy among attributes.
36.96
Second quartile (Median) of kurtosis among attributes of the numeric type.
0.92
Kappa coefficient achieved by the landmarker weka.classifiers.trees.REPTree -L 2
0.98
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.REPTree -L 3
0.92
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.DecisionStump
Maximum entropy among attributes.
-1.97
Minimum kurtosis among attributes of the numeric type.
0.02
Second quartile (Median) of means among attributes of the numeric type.
0.01
Error rate achieved by the landmarker weka.classifiers.trees.REPTree -L 3
0.06
Error rate achieved by the landmarker weka.classifiers.trees.DecisionStump
14395
Maximum kurtosis among attributes of the numeric type.
0
Minimum of means among attributes of the numeric type.
Second quartile (Median) of mutual information between the nominal attributes and the target attribute.

25 tasks

60368 runs - estimation_procedure: 10-fold Crossvalidation - target_feature: label
126 runs - estimation_procedure: 10 times 10-fold Crossvalidation - evaluation_measure: predictive_accuracy - target_feature: label
0 runs - estimation_procedure: 33% Holdout set - evaluation_measure: predictive_accuracy - target_feature: label
88 runs - estimation_procedure: 10-fold Learning Curve - target_feature: label
0 runs - estimation_procedure: Interleaved Test then Train - target_feature: label
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - target_feature: label
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
1304 runs - target_feature: label
1303 runs - target_feature: label
0 runs - target_feature: label
0 runs - target_feature: label
0 runs - target_feature: label
0 runs - target_feature: label
0 runs - target_feature: label
0 runs - target_feature: label
Define a new task