OpenML

JavaScript is required to properly view the contents of this page!

Explore
- Data
- Task
- Flow
- Run
- Study
- Task type
- Measure
- People
Help
Blog
Contact
Please cite us

trains

active ARFF Publicly available Visibility: public Uploaded 06-04-2014 by Jan van Rijn
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes

Issue	#Downvotes for this reason	By

Loading wiki

Help us complete this description Edit

Author: Source: Unknown - Please cite: 1. Title: INDUCE Trains Data set 2. Sources: - Donor: GMU, Center for AI, Software Librarian, Eric E. Bloedorn (bloedorn@aic.gmu.edu) - Original owners: Ryszard S. Michalski (michalski@aic.gmu.edu) and Robert Stepp - Date received: 1 June 1994 - Date updated: 24 June 1994 (Thanks to Larry Holder (UT Arlington) for noticing a translation error) 3. Past usage: - This set most closely resembles the data sets described in the following two publications: 1. R.S. Michalski and J.B. Larson "Inductive Inference of VL Decision Rules" In Proceedings of the Workshop in Pattern-Directed Inference Systems, Hawaii, May 1977. Also published in SIGART Newsletter, ACM No. 63, pp. 38-44, June 1977. 2. Stepp, R.E. and Michalski, R.S. "Conceptual Clustering: Inventing Goal-Oriented Classifications of Structured Objects" In R.S. Michalski, J.G. Carbonell, and T.M. Mitchell (Eds.) "Machine Learning: An Artificial Intelligence Approach, Volume II". Los Altos, Ca: Morgan Kaufmann. Both of these papers describe a set of 10 trains, 5 east-bound and 5 west bound. Both refer to the same 10 trains as seen by the figures in these publications. The differences are: 1) This dataset has 10 attributes, no wheel, or load color attributes 2) Reference 2 (Stepp, Michalski): does not completely list the attributes used, but does mention wheel color - an attribute not present in this dataset. 3) Reference 1 (Michalski, Larson): 12 attributes mentioned, but only 6 are explicitly described. These 6 are included in the dataset below and the Stepp and Michalski set. Results: [1] Michalski and Larson found the following decision rules: (1) There exists car1, car2, lod1 and lod2 such that [infront(car1, car2)][lcont(car1, lod1)][lcont(car2,lod2)] [load-shape(lod1)=triangle][load-shape(lod2)=polygon]=>[dir=east] (2) There exists a car1 such that [ln(car1)=short][car-shape(car1)=closed-top]=>[dir=east] (3) [ncar=3]v There exists car1 such that [car1(car-shape(car1)=jagged- top] =>[dir=west] There exists car1 such that (4) [#cars(ln=long)=2][cshape(car1)=open,trapezoind,u-shaped] v [location(car1)=2][cshape(car1)=closed, rectangle]=>[dir=west] (The first selector in rule 4 uses a meta descriptor generated by the program that counts the number of long cars in a train) [2] The goal of the cluster research is to develop a general method for clustering structured objects that can generate conjunctive descriptions that occur in human classifications or invent new concepts that have similar appeal. CLUSTER/S was able to find the following cognitively appealing clusters: 1) a) "There are two different car shapes in the train" b) "There are three or more different car shapes in the train" 2) a) Wheels on all cars have the same color, b) wheels on all cars do not have the same color." 4. Relevant information: - Additional "background" knowledge is supplied that provides a partial ordering on some of the attribute values. - We are providing this dataset both in its original form and in a form similar to the more typical propositional datasets in our repository. Since the trains dataset records relations between attributes, this transformation was somewhat challenging. However, it may shed some insight on this problem for people who are more familiar with the simple one-instance-per-line dataset format. - Hierarchy of values: if (cshape is one of {openrect,opentrap,ushaped,dblopnrect} then cshape is opentop if (cshape is one of {hexagon,ellipse,closedrect,jaggedtop,slopetop, engine} then cshape closedtop - Prediction task: Determine concise decision rules distinguishing trains traveling east from those traveling west. 5. Number of instances: 10 6. Number of attributes: - 10, not including the class attribute 1. ccont(train idx1, car idx2): car idx is contained in train idx 2. ncar(train idx): # of trains in car train idx (int) 3. infront(car idx1, car idx2): relative positions of cars in train 4. loc(car idx): absolute position of car in train (int) 5. nwhl(car idx): # of wheels of car idx (int) 6. ln(car idx): length of car idx (long, short) 7. cshape(car idx): shape of car (engine, dblopenrect, closedrect, openrect, opentrap, ushaped, hexagon, ellipse, jaggedtop, slopetop, opentop, closedtop) 8. npl(car idx): number of loads in car idx 9. lcont(car idx, load idx): description of which cars hold which loads 10. lhshape(load idx): description of load shape (trianglod, rectanglod, circlelod, hexagonlod) Class: direction (east, west) The following format was used for the "transformed" dataset representation as found in trains.transformed.data (one instance per line): Attributes: 33 1. Number_of_cars (integer in [3-5]) 2. Number_of_different_loads (integer in [1-4]) 3-22: 5 attributes for each of cars 2 through 5: (20 attributes total) - num_wheels (integer in [2-3]) - length (short or long) - shape (closedrect, dblopnrect, ellipse, engine, hexagon, jaggedtop, openrect, opentrap, slopetop, ushaped) - num_loads (integer in [0-3]) - load_shape (circlelod, hexagonlod, rectanglod, trianglod) 23-32: 10 Boolean attributes describing whether 2 types of loads are on adjacent cars of the train - Rectangle_next_to_rectangle (0 if false, 1 if true) - Rectangle_next_to_triangle (0 if false, 1 if true) - Rectangle_next_to_hexagon (0 if false, 1 if true) - Rectangle_next_to_circle (0 if false, 1 if true) - Triangle_next_to_triangle (0 if false, 1 if true) - Triangle_next_to_hexagon (0 if false, 1 if true) - Triangle_next_to_circle (0 if false, 1 if true) - Hexagon_next_to_hexagon (0 if false, 1 if true) - Hexagon_next_to_circle (0 if false, 1 if true) - Circle_next_to_circle (0 if false, 1 if true) 33. Class attribute (east or west) The number of cars vary between 3 and 5. Therefore, attributes referring to properties of cars that do not exist (such as the 5 attriubutes for the "5th" car when the train has fewer than 5 cars) are assigned a value of "-". 7. Distribution of classes: - There are 5 east-bound trains and 5 west-bound trains (i.e., 50% east, 50% west) Information about the dataset CLASSTYPE: nominal CLASSINDEX: last

33 features

class (target)	nominal	2 unique values 0 missing
Number_of_cars	nominal	3 unique values 0 missing
Number_of_different_loads	nominal	4 unique values 0 missing
num_wheels_2	nominal	2 unique values 0 missing
length_2	nominal	2 unique values 0 missing
shape_2	nominal	5 unique values 0 missing
num_loads_2	nominal	2 unique values 0 missing
load_shape_2	nominal	3 unique values 0 missing
num_wheels_3	nominal	2 unique values 0 missing
length_3	nominal	2 unique values 0 missing
shape_3	nominal	8 unique values 0 missing
num_loads_3	nominal	2 unique values 0 missing
load_shape_3	nominal	3 unique values 0 missing
num_wheels_4	nominal	2 unique values 3 missing
length_4	nominal	2 unique values 3 missing
shape_4	nominal	4 unique values 3 missing
num_loads_4	nominal	3 unique values 3 missing
load_shape_4	nominal	4 unique values 4 missing
num_wheels_5	nominal	1 unique values 7 missing
length_5	nominal	1 unique values 7 missing
shape_5	nominal	2 unique values 7 missing
num_loads_5	nominal	1 unique values 7 missing
load_shape_5	nominal	2 unique values 7 missing
Rectangle_next_to_rectangle	nominal	2 unique values 0 missing
Rectangle_next_to_triangle	nominal	2 unique values 0 missing
Rectangle_next_to_hexagon	nominal	1 unique values 0 missing
Rectangle_next_to_circle	nominal	2 unique values 0 missing
Triangle_next_to_triangle	nominal	2 unique values 0 missing
Triangle_next_to_hexagon	nominal	2 unique values 0 missing
Triangle_next_to_circle	nominal	2 unique values 0 missing
Hexagon_next_to_hexagon	nominal	1 unique values 0 missing
Hexagon_next_to_circle	nominal	2 unique values 0 missing
Circle_next_to_circle	nominal	1 unique values 0 missing

Show all 33 features

107 properties

NumberOfInstances

Number of instances (rows) of the dataset.

NumberOfFeatures

Number of attributes (columns) of the dataset.

NumberOfClasses

Number of distinct values of the target attribute (if it is nominal).

NumberOfMissingValues

Number of missing values in the dataset.

NumberOfInstancesWithMissingValues

Number of instances with at least one value missing.

NumberOfNumericFeatures

Number of numeric attributes.

NumberOfSymbolicFeatures

Number of nominal attributes.

MaxStdDevOfNumericAtts

Maximum standard deviation of attributes of the numeric type.

MinorityClassPercentage

Percentage of instances belonging to the least frequent class.

PercentageOfNumericFeatures

Percentage of numeric attributes.

Quartile3MeansOfNumericAtts

Third quartile of means among attributes of the numeric type.

CfsSubsetEval_DecisionStumpAUC

0.6

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.DecisionStump -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

RandomTreeDepth2AUC

0.76

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.RandomTree -depth 2

J48.00001.ErrRate

0.6

Error rate achieved by the landmarker weka.classifiers.trees.J48 -C .00001

MeanAttributeEntropy

0.87

Average entropy of the attributes.

MinorityClassSize

Number of instances belonging to the least frequent class.

PercentageOfSymbolicFeatures

100

Percentage of nominal attributes.

Quartile3MutualInformation

0.25

Third quartile of mutual information between the nominal attributes and the target attribute.

CfsSubsetEval_DecisionStumpErrRate

0.4

Error rate achieved by the landmarker weka.classifiers.trees.DecisionStump -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

RandomTreeDepth2ErrRate

0.2

Error rate achieved by the landmarker weka.classifiers.trees.RandomTree -depth 2

J48.00001.Kappa

-0.2

Kappa coefficient achieved by the landmarker weka.classifiers.trees.J48 -C .00001

MeanKurtosisOfNumericAtts

Mean kurtosis among attributes of the numeric type.

NaiveBayesAUC

0.72

Area Under the ROC Curve achieved by the landmarker weka.classifiers.bayes.NaiveBayes

Quartile1AttributeEntropy

0.47

First quartile of entropy among attributes.

Quartile3SkewnessOfNumericAtts

Third quartile of skewness among attributes of the numeric type.

CfsSubsetEval_DecisionStumpKappa

0.2

Kappa coefficient achieved by the landmarker weka.classifiers.trees.DecisionStump -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

RandomTreeDepth2Kappa

0.6

Kappa coefficient achieved by the landmarker weka.classifiers.trees.RandomTree -depth 2

J48.0001.AUC

0.4

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.J48 -C .0001

MeanMeansOfNumericAtts

Mean of means among attributes of the numeric type.

NaiveBayesErrRate

0.3

Error rate achieved by the landmarker weka.classifiers.bayes.NaiveBayes

Quartile1KurtosisOfNumericAtts

First quartile of kurtosis among attributes of the numeric type.

Quartile3StdDevOfNumericAtts

Third quartile of standard deviation of attributes of the numeric type.

CfsSubsetEval_NaiveBayesAUC

0.6

Area Under the ROC Curve achieved by the landmarker weka.classifiers.bayes.NaiveBayes -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

RandomTreeDepth3AUC

0.76

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.RandomTree -depth 3

J48.0001.ErrRate

0.6

Error rate achieved by the landmarker weka.classifiers.trees.J48 -C .0001

MeanMutualInformation

0.19

Average mutual information between the nominal attributes and the target attribute.

NaiveBayesKappa

0.4

Kappa coefficient achieved by the landmarker weka.classifiers.bayes.NaiveBayes

Quartile1MeansOfNumericAtts

First quartile of means among attributes of the numeric type.

REPTreeDepth1AUC

0.4

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.REPTree -L 1

CfsSubsetEval_NaiveBayesErrRate

0.4

Error rate achieved by the landmarker weka.classifiers.bayes.NaiveBayes -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

RandomTreeDepth3ErrRate

0.2

Error rate achieved by the landmarker weka.classifiers.trees.RandomTree -depth 3

J48.0001.Kappa

-0.2

Kappa coefficient achieved by the landmarker weka.classifiers.trees.J48 -C .0001

MeanNoiseToSignalRatio

3.65

An estimate of the amount of irrelevant information in the attributes regarding the class. Equals (MeanAttributeEntropy - MeanMutualInformation) divided by MeanMutualInformation.

NumberOfBinaryFeatures

Number of binary attributes.

Quartile1MutualInformation

0.01

First quartile of mutual information between the nominal attributes and the target attribute.

REPTreeDepth1ErrRate

0.6

Error rate achieved by the landmarker weka.classifiers.trees.REPTree -L 1

CfsSubsetEval_NaiveBayesKappa

0.2

Kappa coefficient achieved by the landmarker weka.classifiers.bayes.NaiveBayes -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

RandomTreeDepth3Kappa

0.6

Kappa coefficient achieved by the landmarker weka.classifiers.trees.RandomTree -depth 3

J48.001.AUC

0.4

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.J48 -C .001

MeanNominalAttDistinctValues

2.39

Average number of distinct values among the attributes of the nominal type.

Quartile1SkewnessOfNumericAtts

First quartile of skewness among attributes of the numeric type.

REPTreeDepth1Kappa

-0.2

Kappa coefficient achieved by the landmarker weka.classifiers.trees.REPTree -L 1

CfsSubsetEval_kNN1NAUC

0.6

Area Under the ROC Curve achieved by the landmarker weka.classifiers.lazy.IBk -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

StdvNominalAttDistinctValues

1.39

Standard deviation of the number of distinct values among attributes of the nominal type.

J48.001.ErrRate

0.6

Error rate achieved by the landmarker weka.classifiers.trees.J48 -C .001

MeanSkewnessOfNumericAtts

Mean skewness among attributes of the numeric type.

Quartile1StdDevOfNumericAtts

First quartile of standard deviation of attributes of the numeric type.

REPTreeDepth2AUC

0.4

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.REPTree -L 2

CfsSubsetEval_kNN1NErrRate

0.4

Error rate achieved by the landmarker weka.classifiers.lazy.IBk -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

kNN1NAUC

0.5

Area Under the ROC Curve achieved by the landmarker weka.classifiers.lazy.IBk

J48.001.Kappa

-0.2

Kappa coefficient achieved by the landmarker weka.classifiers.trees.J48 -C .001

MeanStdDevOfNumericAtts

Mean standard deviation of attributes of the numeric type.

Quartile2AttributeEntropy

0.8

Second quartile (Median) of entropy among attributes.

REPTreeDepth2ErrRate

0.6

Error rate achieved by the landmarker weka.classifiers.trees.REPTree -L 2

CfsSubsetEval_kNN1NKappa

0.2

Kappa coefficient achieved by the landmarker weka.classifiers.lazy.IBk -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

kNN1NErrRate

0.5

Error rate achieved by the landmarker weka.classifiers.lazy.IBk

MajorityClassPercentage

Percentage of instances belonging to the most frequent class.

MinAttributeEntropy

Minimal entropy among attributes.

Quartile2KurtosisOfNumericAtts

Second quartile (Median) of kurtosis among attributes of the numeric type.

REPTreeDepth2Kappa

-0.2

Kappa coefficient achieved by the landmarker weka.classifiers.trees.REPTree -L 2

ClassEntropy

Entropy of the target attribute values.

kNN1NKappa

Kappa coefficient achieved by the landmarker weka.classifiers.lazy.IBk

MajorityClassSize

Number of instances belonging to the most frequent class.

MinKurtosisOfNumericAtts

Minimum kurtosis among attributes of the numeric type.

Quartile2MeansOfNumericAtts

Second quartile (Median) of means among attributes of the numeric type.

REPTreeDepth3AUC

0.4

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.REPTree -L 3

DecisionStumpAUC

0.5

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.DecisionStump

MaxAttributeEntropy

2.92

Maximum entropy among attributes.

MinMeansOfNumericAtts

Minimum of means among attributes of the numeric type.

Quartile2MutualInformation

0.11

Second quartile (Median) of mutual information between the nominal attributes and the target attribute.

REPTreeDepth3ErrRate

0.6

Error rate achieved by the landmarker weka.classifiers.trees.REPTree -L 3

DecisionStumpErrRate

0.5

Error rate achieved by the landmarker weka.classifiers.trees.DecisionStump

MaxKurtosisOfNumericAtts

Maximum kurtosis among attributes of the numeric type.

MinMutualInformation

Minimal mutual information between the nominal attributes and the target attribute.

Quartile2SkewnessOfNumericAtts

Second quartile (Median) of skewness among attributes of the numeric type.

REPTreeDepth3Kappa

-0.2

Kappa coefficient achieved by the landmarker weka.classifiers.trees.REPTree -L 3

DecisionStumpKappa

Kappa coefficient achieved by the landmarker weka.classifiers.trees.DecisionStump

MaxMeansOfNumericAtts

Maximum of means among attributes of the numeric type.

MinNominalAttDistinctValues

The minimal number of distinct values among attributes of the nominal type.

PercentageOfBinaryFeatures

54.55

Percentage of binary attributes.

Quartile2StdDevOfNumericAtts

Second quartile (Median) of standard deviation of attributes of the numeric type.

RandomTreeDepth1AUC

0.76

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.RandomTree -depth 1

Dimensionality

3.3

Number of attributes divided by the number of instances.

MaxMutualInformation

Maximum mutual information between the nominal attributes and the target attribute.

MinSkewnessOfNumericAtts

Minimum skewness among attributes of the numeric type.

PercentageOfInstancesWithMissingValues

Percentage of instances having missing values.

Quartile3AttributeEntropy

1.36

Third quartile of entropy among attributes.

RandomTreeDepth1ErrRate

0.2

Error rate achieved by the landmarker weka.classifiers.trees.RandomTree -depth 1

EquivalentNumberOfAtts

5.37

Number of attributes needed to optimally describe the class (under the assumption of independence among attributes). Equals ClassEntropy divided by MeanMutualInformation.

MaxNominalAttDistinctValues

The maximum number of distinct values among attributes of the nominal type.

MinStdDevOfNumericAtts

Minimum standard deviation of attributes of the numeric type.

PercentageOfMissingValues

15.45

Percentage of missing values.

Quartile3KurtosisOfNumericAtts

Third quartile of kurtosis among attributes of the numeric type.

AutoCorrelation

0.89

Average class difference between consecutive instances.

RandomTreeDepth1Kappa

0.6

Kappa coefficient achieved by the landmarker weka.classifiers.trees.RandomTree -depth 1

J48.00001.AUC

0.4

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.J48 -C .00001

MaxSkewnessOfNumericAtts

Maximum skewness among attributes of the numeric type.

Show all 107 properties

31 tasks

Supervised Classification on trains

693 runs - estimation_procedure: 10-fold Crossvalidation - evaluation_measure: predictive_accuracy - target_feature: class

Supervised Classification on trains

368 runs - estimation_procedure: 33% Holdout set - evaluation_measure: predictive_accuracy - target_feature: class

Supervised Classification on trains

345 runs - estimation_procedure: 5 times 2-fold Crossvalidation - evaluation_measure: predictive_accuracy - target_feature: class

Supervised Classification on trains

216 runs - estimation_procedure: 10 times 10-fold Crossvalidation - evaluation_measure: predictive_accuracy - target_feature: class

Supervised Classification on trains

31 runs - estimation_procedure: 10-fold Crossvalidation - evaluation_measure: precision - target_feature: class

Supervised Classification on trains

0 runs - estimation_procedure: Leave one out - evaluation_measure: predictive_accuracy - target_feature: class

Supervised Classification on trains

0 runs - estimation_procedure: 10-fold Crossvalidation - target_feature: class

Supervised Classification on trains

0 runs - estimation_procedure: Custom Holdout - target_feature: class

Learning Curve on trains

213 runs - estimation_procedure: 10 times 10-fold Learning Curve - evaluation_measure: predictive_accuracy - target_feature: class

Learning Curve on trains

82 runs - estimation_procedure: 10-fold Learning Curve - evaluation_measure: predictive_accuracy - target_feature: class

Learning Curve on trains

0 runs - estimation_procedure: 10-fold Learning Curve - target_feature: class

Learning Curve on trains

0 runs - estimation_procedure: 10-fold Learning Curve - target_feature: class

Learning Curve on trains

0 runs - estimation_procedure: 10-fold Learning Curve - target_feature: class

Learning Curve on trains

0 runs - estimation_procedure: 10-fold Learning Curve - target_feature: class

Learning Curve on trains

0 runs - estimation_procedure: 10-fold Learning Curve - target_feature: class

Learning Curve on trains

0 runs - estimation_procedure: 10-fold Learning Curve - target_feature: class

Learning Curve on trains

0 runs - estimation_procedure: 10-fold Learning Curve - target_feature: class

Learning Curve on trains

0 runs - estimation_procedure: 10-fold Learning Curve - target_feature: class

Supervised Data Stream Classification on trains

25 runs - estimation_procedure: Interleaved Test then Train - target_feature: class

Clustering on trains