OpenML

JavaScript is required to properly view the contents of this page!

Explore
- Data
- Task
- Flow
- Run
- Study
- Task type
- Measure
- People
Help
Blog
Contact
Please cite us

auto93

active ARFF Publicly available Visibility: public Uploaded 03-10-2014 by Joaquin Vanschoren
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes

Issue	#Downvotes for this reason	By

Loading wiki

Help us complete this description Edit

Author: Source: Unknown - Date unknown Please cite: !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Attributes 2,4, and 6 deleted. Midrange price treated as the class attribute. As used by Kilpatrick, D. & Cameron-Jones, M. (1998). Numeric prediction using instance-based learning with encoding length selection. In Progress in Connectionist-Based Information Systems. Singapore: Springer-Verlag. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! NAME: 1993 New Car Data TYPE: Sample SIZE: 93 observations, 26 variables DESCRIPTIVE ABSTRACT: Specifications are given for 93 new car models for the 1993 year. Several measures are given to evaluate price, mpg ratings, engine size, body size, and features. SOURCES: _Consumer Reports: The 1993 Cars - Annual Auto Issue_ (April 1993), Yonkers, NY: Consumers Union. _PACE New Car & Truck 1993 Buying Guide_ (1993), Milwaukee, WI: Pace Publications Inc. VARIABLE DESCRIPTIONS: Line 1 Columns 1 - 14 Manufacturer 15 - 29 Model 30 - 36 Type Small, Sporty, Compact, Midsize, Large - as defined in the _Consumer Reports_ article 38 - 41 Minimum Price (in $1,000) - Price for basic version of this model 43 - 46 Midrange Price (in $1,000) - Average of Min and Max prices 48 - 51 Maximum Price (in $1,000) - Price for a premium version 53 - 54 City MPG (miles per gallon by EPA rating) 56 - 57 Highway MPG 59 - 59 Air Bags standard 0 = none, 1 = driver only, 2 = driver & passenger 61 - 61 Drive train type 0 = rear wheel drive 1 = front wheel drive 2 = all wheel drive 63 - 63 Number of cylinders 65 - 67 Engine size (liters) 69 - 71 Horsepower (maximum) 73 - 76 RPM (revs per minute at maximum horsepower) Line 2 Columns 1 - 4 Engine revolutions per mile (in highest gear) 6 - 6 Manual transmission available 0 = No, 1 = Yes 8 - 11 Fuel tank capacity (gallons) 13 - 13 Passenger capacity (persons) 15 - 17 Length (inches) 19 - 21 Wheelbase (inches) 23 - 24 Width (inches) 26 - 27 U-turn space (feet) 29 - 32 Rear seat room (inches) 34 - 35 Luggage capacity (cu. ft.) 37 - 40 Weight (pounds) 42 - 42 Domestic? 0 = non-U.S. manufacturer, 1 = U.S. manufacturer Values are aligned and delimited by blanks. Missing values are denoted with *. There are two data lines for each case. SPECIAL NOTES: The only missing values are for CYLINDERS in the rotary engine Mazda RX-7, REAR SEAT room for the two-seaters (Corvette and RX-7), and LUGGAGE capacity for the vans and two-seaters. WEIGHT is taken from the _Consumer Reports_ data and includes a full fuel tank, automatic transmission (if available), and air conditioning. STORY BEHIND THE DATA: Cars were selected at random from among 1993 passenger car models that were listed in both the _Consumer Reports_ issue and the _PACE Buying Guide_. Pickup trucks and Sport/Utility vehicles were eliminated due to incomplete information in the _Consumer Reports_ source. Duplicate models (e.g., Dodge Shadow and Plymouth Sundance) were listed at most once. A similar dataset for 1989 model cars appeared as one of the sample datasets shipped with the _Student Edition of Execustat_ (PWS-KENT 1990). Further description can be found in the "Datasets and Stories" article "1993 New Car Data" in the _Journal of Statistics Education_ (Lock 1993). Send the message send jse/v1n1/datasets.lock to the address archive@jse.stat.ncsu.edu PEDAGOGICAL NOTES: This is a multi-purpose dataset that can be used at many points in an introductory course. It includes many good numeric variables and several options for dividing the cars up into groups. Students tend to be familiar with most of the variables (and specific car models). They can anticipate and pose explanations for many of the relationships to be found in the data, although some surprises may be encountered. One can easily find examples of pairs of variables that demonstrate strong or weak, positive or negative associations. PRICE and MPG variables tend to be popular choices as "dependent" variables. Basic graphs will often reveal unusual data values (like the price for a Mercedes-Benz). REFERENCES: Lock, R. H. (1993), "1993 New Car Data," _Journal of Statistics Education_, 1, No. 1. _Student Edition of Execustat_ (1990), Boston, MA: PWS-KENT Publishing Co. SUBMITTED BY: Robin H. Lock Mathematics Department St. Lawrence University Canton, NY 13617 (315) 379-5960 rlock@stlawu.bitnet

23 features

class (target)	numeric	81 unique values 0 missing
Manufacturer	nominal	31 unique values 0 missing
Type	nominal	6 unique values 0 missing
City_MPG	numeric	21 unique values 0 missing
Highway_MPG	numeric	22 unique values 0 missing
Air_Bags_standard	nominal	3 unique values 0 missing
Drive_train_type	nominal	3 unique values 0 missing
Number_of_cylinders	numeric	5 unique values 1 missing
Engine_size	numeric	26 unique values 0 missing
Horsepower	numeric	57 unique values 0 missing
RPM	numeric	24 unique values 0 missing
Engine_revolutions_per_mile	numeric	78 unique values 0 missing
Manual_transmission_available	nominal	2 unique values 0 missing
Fuel_tank_capacity	numeric	38 unique values 0 missing
Passenger_capacity	numeric	6 unique values 0 missing
Length	numeric	51 unique values 0 missing
Wheelbase	numeric	27 unique values 0 missing
Width	numeric	16 unique values 0 missing
U-turn_space	numeric	14 unique values 0 missing
Rear_seat_room	numeric	24 unique values 2 missing
Luggage_capacity	numeric	16 unique values 11 missing
Weight	numeric	81 unique values 0 missing
Domestic	nominal	2 unique values 0 missing

Show all 23 features

107 properties

NumberOfInstances

Number of instances (rows) of the dataset.

NumberOfFeatures

Number of attributes (columns) of the dataset.

NumberOfClasses

Number of distinct values of the target attribute (if it is nominal).

NumberOfMissingValues

Number of missing values in the dataset.

NumberOfInstancesWithMissingValues

Number of instances with at least one value missing.

NumberOfNumericFeatures

Number of numeric attributes.

NumberOfSymbolicFeatures

Number of nominal attributes.

MinorityClassPercentage

Percentage of instances belonging to the least frequent class.

PercentageOfNumericFeatures

73.91

Percentage of numeric attributes.

Quartile3MeansOfNumericAtts

163.52

Third quartile of means among attributes of the numeric type.

CfsSubsetEval_DecisionStumpAUC

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.DecisionStump -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

RandomTreeDepth2AUC

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.RandomTree -depth 2

J48.00001.ErrRate

Error rate achieved by the landmarker weka.classifiers.trees.J48 -C .00001

MaxStdDevOfNumericAtts

596.73

Maximum standard deviation of attributes of the numeric type.

MinorityClassSize

Number of instances belonging to the least frequent class.

PercentageOfSymbolicFeatures

26.09

Percentage of nominal attributes.

Quartile3MutualInformation

Third quartile of mutual information between the nominal attributes and the target attribute.

CfsSubsetEval_DecisionStumpErrRate

Error rate achieved by the landmarker weka.classifiers.trees.DecisionStump -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

RandomTreeDepth2ErrRate

Error rate achieved by the landmarker weka.classifiers.trees.RandomTree -depth 2

J48.00001.Kappa

Kappa coefficient achieved by the landmarker weka.classifiers.trees.J48 -C .00001

MeanAttributeEntropy

Average entropy of the attributes.

NaiveBayesAUC

Area Under the ROC Curve achieved by the landmarker weka.classifiers.bayes.NaiveBayes

Quartile1AttributeEntropy

First quartile of entropy among attributes.

Quartile3SkewnessOfNumericAtts

0.91

Third quartile of skewness among attributes of the numeric type.

CfsSubsetEval_DecisionStumpKappa

Kappa coefficient achieved by the landmarker weka.classifiers.trees.DecisionStump -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

RandomTreeDepth2Kappa

Kappa coefficient achieved by the landmarker weka.classifiers.trees.RandomTree -depth 2

J48.0001.AUC

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.J48 -C .0001

MeanKurtosisOfNumericAtts

0.67

Mean kurtosis among attributes of the numeric type.

NaiveBayesErrRate

Error rate achieved by the landmarker weka.classifiers.bayes.NaiveBayes

Quartile1KurtosisOfNumericAtts

-0.33

First quartile of kurtosis among attributes of the numeric type.

Quartile3StdDevOfNumericAtts

33.49

Third quartile of standard deviation of attributes of the numeric type.

CfsSubsetEval_NaiveBayesAUC

Area Under the ROC Curve achieved by the landmarker weka.classifiers.bayes.NaiveBayes -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

RandomTreeDepth3AUC

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.RandomTree -depth 3

J48.0001.ErrRate

Error rate achieved by the landmarker weka.classifiers.trees.J48 -C .0001

MeanMeansOfNumericAtts

668.65

Mean of means among attributes of the numeric type.

MeanMutualInformation

Average mutual information between the nominal attributes and the target attribute.

NaiveBayesKappa

Kappa coefficient achieved by the landmarker weka.classifiers.bayes.NaiveBayes

Quartile1MeansOfNumericAtts

15.28

First quartile of means among attributes of the numeric type.

REPTreeDepth1AUC

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.REPTree -L 1

CfsSubsetEval_NaiveBayesErrRate

Error rate achieved by the landmarker weka.classifiers.bayes.NaiveBayes -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

RandomTreeDepth3ErrRate

Error rate achieved by the landmarker weka.classifiers.trees.RandomTree -depth 3

J48.0001.Kappa

Kappa coefficient achieved by the landmarker weka.classifiers.trees.J48 -C .0001

MeanNoiseToSignalRatio

An estimate of the amount of irrelevant information in the attributes regarding the class. Equals (MeanAttributeEntropy - MeanMutualInformation) divided by MeanMutualInformation.

NumberOfBinaryFeatures

Number of binary attributes.

Quartile1MutualInformation

First quartile of mutual information between the nominal attributes and the target attribute.

REPTreeDepth1ErrRate

Error rate achieved by the landmarker weka.classifiers.trees.REPTree -L 1

CfsSubsetEval_NaiveBayesKappa

Kappa coefficient achieved by the landmarker weka.classifiers.bayes.NaiveBayes -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

RandomTreeDepth3Kappa

Kappa coefficient achieved by the landmarker weka.classifiers.trees.RandomTree -depth 3

J48.001.AUC

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.J48 -C .001

MeanNominalAttDistinctValues

7.83

Average number of distinct values among the attributes of the nominal type.

Quartile1SkewnessOfNumericAtts

-0.01

First quartile of skewness among attributes of the numeric type.

REPTreeDepth1Kappa

Kappa coefficient achieved by the landmarker weka.classifiers.trees.REPTree -L 1

CfsSubsetEval_kNN1NAUC

Area Under the ROC Curve achieved by the landmarker weka.classifiers.lazy.IBk -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

StdvNominalAttDistinctValues

11.44

Standard deviation of the number of distinct values among attributes of the nominal type.

J48.001.ErrRate

Error rate achieved by the landmarker weka.classifiers.trees.J48 -C .001

MeanSkewnessOfNumericAtts

0.45

Mean skewness among attributes of the numeric type.

Quartile1StdDevOfNumericAtts

2.99

First quartile of standard deviation of attributes of the numeric type.

REPTreeDepth2AUC

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.REPTree -L 2

CfsSubsetEval_kNN1NErrRate

Error rate achieved by the landmarker weka.classifiers.lazy.IBk -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

kNN1NAUC

Area Under the ROC Curve achieved by the landmarker weka.classifiers.lazy.IBk

J48.001.Kappa

Kappa coefficient achieved by the landmarker weka.classifiers.trees.J48 -C .001

MeanStdDevOfNumericAtts

105.72

Mean standard deviation of attributes of the numeric type.

Quartile2AttributeEntropy

Second quartile (Median) of entropy among attributes.

REPTreeDepth2ErrRate

Error rate achieved by the landmarker weka.classifiers.trees.REPTree -L 2

CfsSubsetEval_kNN1NKappa

Kappa coefficient achieved by the landmarker weka.classifiers.lazy.IBk -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

kNN1NErrRate

Error rate achieved by the landmarker weka.classifiers.lazy.IBk

MajorityClassPercentage

Percentage of instances belonging to the most frequent class.

MinAttributeEntropy

Minimal entropy among attributes.

Quartile2KurtosisOfNumericAtts

0.38

Second quartile (Median) of kurtosis among attributes of the numeric type.

REPTreeDepth2Kappa

Kappa coefficient achieved by the landmarker weka.classifiers.trees.REPTree -L 2

ClassEntropy

Entropy of the target attribute values.

kNN1NKappa

Kappa coefficient achieved by the landmarker weka.classifiers.lazy.IBk

MajorityClassSize

Number of instances belonging to the most frequent class.

MinKurtosisOfNumericAtts

-0.86

Minimum kurtosis among attributes of the numeric type.

Quartile2MeansOfNumericAtts

29.09

Second quartile (Median) of means among attributes of the numeric type.

REPTreeDepth3AUC

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.REPTree -L 3

DecisionStumpAUC

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.DecisionStump

MaxAttributeEntropy

Maximum entropy among attributes.

MinMeansOfNumericAtts

2.67

Minimum of means among attributes of the numeric type.

Quartile2MutualInformation

Second quartile (Median) of mutual information between the nominal attributes and the target attribute.

REPTreeDepth3ErrRate

Error rate achieved by the landmarker weka.classifiers.trees.REPTree -L 3

DecisionStumpErrRate

Error rate achieved by the landmarker weka.classifiers.trees.DecisionStump

MaxKurtosisOfNumericAtts

Maximum kurtosis among attributes of the numeric type.

MinMutualInformation

Minimal mutual information between the nominal attributes and the target attribute.

Quartile2SkewnessOfNumericAtts

0.23

Second quartile (Median) of skewness among attributes of the numeric type.

REPTreeDepth3Kappa

Kappa coefficient achieved by the landmarker weka.classifiers.trees.REPTree -L 3

DecisionStumpKappa

Kappa coefficient achieved by the landmarker weka.classifiers.trees.DecisionStump

MaxMeansOfNumericAtts

5280.65

Maximum of means among attributes of the numeric type.

MinNominalAttDistinctValues

The minimal number of distinct values among attributes of the nominal type.

PercentageOfBinaryFeatures

8.7

Percentage of binary attributes.

Quartile2StdDevOfNumericAtts

5.33

Second quartile (Median) of standard deviation of attributes of the numeric type.

RandomTreeDepth1AUC

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.RandomTree -depth 1

Dimensionality

0.25

Number of attributes divided by the number of instances.

MaxMutualInformation

Maximum mutual information between the nominal attributes and the target attribute.

MinSkewnessOfNumericAtts

-0.26

Minimum skewness among attributes of the numeric type.

PercentageOfInstancesWithMissingValues

11.83

Percentage of instances having missing values.

Quartile3AttributeEntropy

Third quartile of entropy among attributes.

RandomTreeDepth1ErrRate

Error rate achieved by the landmarker weka.classifiers.trees.RandomTree -depth 1

EquivalentNumberOfAtts

Number of attributes needed to optimally describe the class (under the assumption of independence among attributes). Equals ClassEntropy divided by MeanMutualInformation.

MaxNominalAttDistinctValues

The maximum number of distinct values among attributes of the nominal type.

MinStdDevOfNumericAtts

1.04

Minimum standard deviation of attributes of the numeric type.

PercentageOfMissingValues

0.65

Percentage of missing values.

Quartile3KurtosisOfNumericAtts

1.02

Third quartile of kurtosis among attributes of the numeric type.

AutoCorrelation