Data
auto93

auto93

active ARFF Publicly available Visibility: public Uploaded 03-10-2014 by Joaquin Vanschoren
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
Author: Source: Unknown - Date unknown Please cite: !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Attributes 2,4, and 6 deleted. Midrange price treated as the class attribute. As used by Kilpatrick, D. & Cameron-Jones, M. (1998). Numeric prediction using instance-based learning with encoding length selection. In Progress in Connectionist-Based Information Systems. Singapore: Springer-Verlag. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! NAME: 1993 New Car Data TYPE: Sample SIZE: 93 observations, 26 variables DESCRIPTIVE ABSTRACT: Specifications are given for 93 new car models for the 1993 year. Several measures are given to evaluate price, mpg ratings, engine size, body size, and features. SOURCES: _Consumer Reports: The 1993 Cars - Annual Auto Issue_ (April 1993), Yonkers, NY: Consumers Union. _PACE New Car & Truck 1993 Buying Guide_ (1993), Milwaukee, WI: Pace Publications Inc. VARIABLE DESCRIPTIONS: Line 1 Columns 1 - 14 Manufacturer 15 - 29 Model 30 - 36 Type Small, Sporty, Compact, Midsize, Large - as defined in the _Consumer Reports_ article 38 - 41 Minimum Price (in $1,000) - Price for basic version of this model 43 - 46 Midrange Price (in $1,000) - Average of Min and Max prices 48 - 51 Maximum Price (in $1,000) - Price for a premium version 53 - 54 City MPG (miles per gallon by EPA rating) 56 - 57 Highway MPG 59 - 59 Air Bags standard 0 = none, 1 = driver only, 2 = driver & passenger 61 - 61 Drive train type 0 = rear wheel drive 1 = front wheel drive 2 = all wheel drive 63 - 63 Number of cylinders 65 - 67 Engine size (liters) 69 - 71 Horsepower (maximum) 73 - 76 RPM (revs per minute at maximum horsepower) Line 2 Columns 1 - 4 Engine revolutions per mile (in highest gear) 6 - 6 Manual transmission available 0 = No, 1 = Yes 8 - 11 Fuel tank capacity (gallons) 13 - 13 Passenger capacity (persons) 15 - 17 Length (inches) 19 - 21 Wheelbase (inches) 23 - 24 Width (inches) 26 - 27 U-turn space (feet) 29 - 32 Rear seat room (inches) 34 - 35 Luggage capacity (cu. ft.) 37 - 40 Weight (pounds) 42 - 42 Domestic? 0 = non-U.S. manufacturer, 1 = U.S. manufacturer Values are aligned and delimited by blanks. Missing values are denoted with *. There are two data lines for each case. SPECIAL NOTES: The only missing values are for CYLINDERS in the rotary engine Mazda RX-7, REAR SEAT room for the two-seaters (Corvette and RX-7), and LUGGAGE capacity for the vans and two-seaters. WEIGHT is taken from the _Consumer Reports_ data and includes a full fuel tank, automatic transmission (if available), and air conditioning. STORY BEHIND THE DATA: Cars were selected at random from among 1993 passenger car models that were listed in both the _Consumer Reports_ issue and the _PACE Buying Guide_. Pickup trucks and Sport/Utility vehicles were eliminated due to incomplete information in the _Consumer Reports_ source. Duplicate models (e.g., Dodge Shadow and Plymouth Sundance) were listed at most once. A similar dataset for 1989 model cars appeared as one of the sample datasets shipped with the _Student Edition of Execustat_ (PWS-KENT 1990). Further description can be found in the "Datasets and Stories" article "1993 New Car Data" in the _Journal of Statistics Education_ (Lock 1993). Send the message send jse/v1n1/datasets.lock to the address archive@jse.stat.ncsu.edu PEDAGOGICAL NOTES: This is a multi-purpose dataset that can be used at many points in an introductory course. It includes many good numeric variables and several options for dividing the cars up into groups. Students tend to be familiar with most of the variables (and specific car models). They can anticipate and pose explanations for many of the relationships to be found in the data, although some surprises may be encountered. One can easily find examples of pairs of variables that demonstrate strong or weak, positive or negative associations. PRICE and MPG variables tend to be popular choices as "dependent" variables. Basic graphs will often reveal unusual data values (like the price for a Mercedes-Benz). REFERENCES: Lock, R. H. (1993), "1993 New Car Data," _Journal of Statistics Education_, 1, No. 1. _Student Edition of Execustat_ (1990), Boston, MA: PWS-KENT Publishing Co. SUBMITTED BY: Robin H. Lock Mathematics Department St. Lawrence University Canton, NY 13617 (315) 379-5960 rlock@stlawu.bitnet

23 features

class (target)numeric81 unique values
0 missing
Manufacturernominal31 unique values
0 missing
Typenominal6 unique values
0 missing
City_MPGnumeric21 unique values
0 missing
Highway_MPGnumeric22 unique values
0 missing
Air_Bags_standardnominal3 unique values
0 missing
Drive_train_typenominal3 unique values
0 missing
Number_of_cylindersnumeric5 unique values
1 missing
Engine_sizenumeric26 unique values
0 missing
Horsepowernumeric57 unique values
0 missing
RPMnumeric24 unique values
0 missing
Engine_revolutions_per_milenumeric78 unique values
0 missing
Manual_transmission_availablenominal2 unique values
0 missing
Fuel_tank_capacitynumeric38 unique values
0 missing
Passenger_capacitynumeric6 unique values
0 missing
Lengthnumeric51 unique values
0 missing
Wheelbasenumeric27 unique values
0 missing
Widthnumeric16 unique values
0 missing
U-turn_spacenumeric14 unique values
0 missing
Rear_seat_roomnumeric24 unique values
2 missing
Luggage_capacitynumeric16 unique values
11 missing
Weightnumeric81 unique values
0 missing
Domesticnominal2 unique values
0 missing

107 properties

93
Number of instances (rows) of the dataset.
23
Number of attributes (columns) of the dataset.
0
Number of distinct values of the target attribute (if it is nominal).
14
Number of missing values in the dataset.
11
Number of instances with at least one value missing.
17
Number of numeric attributes.
6
Number of nominal attributes.
Percentage of instances belonging to the least frequent class.
73.91
Percentage of numeric attributes.
163.52
Third quartile of means among attributes of the numeric type.
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.DecisionStump -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.RandomTree -depth 2
Error rate achieved by the landmarker weka.classifiers.trees.J48 -C .00001
596.73
Maximum standard deviation of attributes of the numeric type.
Number of instances belonging to the least frequent class.
26.09
Percentage of nominal attributes.
Third quartile of mutual information between the nominal attributes and the target attribute.
Error rate achieved by the landmarker weka.classifiers.trees.DecisionStump -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
Error rate achieved by the landmarker weka.classifiers.trees.RandomTree -depth 2
Kappa coefficient achieved by the landmarker weka.classifiers.trees.J48 -C .00001
Average entropy of the attributes.
Area Under the ROC Curve achieved by the landmarker weka.classifiers.bayes.NaiveBayes
First quartile of entropy among attributes.
0.91
Third quartile of skewness among attributes of the numeric type.
Kappa coefficient achieved by the landmarker weka.classifiers.trees.DecisionStump -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
Kappa coefficient achieved by the landmarker weka.classifiers.trees.RandomTree -depth 2
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.J48 -C .0001
0.67
Mean kurtosis among attributes of the numeric type.
Error rate achieved by the landmarker weka.classifiers.bayes.NaiveBayes
-0.33
First quartile of kurtosis among attributes of the numeric type.
33.49
Third quartile of standard deviation of attributes of the numeric type.
Area Under the ROC Curve achieved by the landmarker weka.classifiers.bayes.NaiveBayes -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.RandomTree -depth 3
Error rate achieved by the landmarker weka.classifiers.trees.J48 -C .0001
668.65
Mean of means among attributes of the numeric type.
Average mutual information between the nominal attributes and the target attribute.
Kappa coefficient achieved by the landmarker weka.classifiers.bayes.NaiveBayes
15.28
First quartile of means among attributes of the numeric type.
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.REPTree -L 1
Error rate achieved by the landmarker weka.classifiers.bayes.NaiveBayes -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
Error rate achieved by the landmarker weka.classifiers.trees.RandomTree -depth 3
Kappa coefficient achieved by the landmarker weka.classifiers.trees.J48 -C .0001
An estimate of the amount of irrelevant information in the attributes regarding the class. Equals (MeanAttributeEntropy - MeanMutualInformation) divided by MeanMutualInformation.
2
Number of binary attributes.
First quartile of mutual information between the nominal attributes and the target attribute.
Error rate achieved by the landmarker weka.classifiers.trees.REPTree -L 1
Kappa coefficient achieved by the landmarker weka.classifiers.bayes.NaiveBayes -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
Kappa coefficient achieved by the landmarker weka.classifiers.trees.RandomTree -depth 3
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.J48 -C .001
7.83
Average number of distinct values among the attributes of the nominal type.
-0.01
First quartile of skewness among attributes of the numeric type.
Kappa coefficient achieved by the landmarker weka.classifiers.trees.REPTree -L 1
Area Under the ROC Curve achieved by the landmarker weka.classifiers.lazy.IBk -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
11.44
Standard deviation of the number of distinct values among attributes of the nominal type.
Error rate achieved by the landmarker weka.classifiers.trees.J48 -C .001
0.45
Mean skewness among attributes of the numeric type.
2.99
First quartile of standard deviation of attributes of the numeric type.
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.REPTree -L 2
Error rate achieved by the landmarker weka.classifiers.lazy.IBk -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
Area Under the ROC Curve achieved by the landmarker weka.classifiers.lazy.IBk
Kappa coefficient achieved by the landmarker weka.classifiers.trees.J48 -C .001
105.72
Mean standard deviation of attributes of the numeric type.
Second quartile (Median) of entropy among attributes.
Error rate achieved by the landmarker weka.classifiers.trees.REPTree -L 2
Kappa coefficient achieved by the landmarker weka.classifiers.lazy.IBk -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
Error rate achieved by the landmarker weka.classifiers.lazy.IBk
Percentage of instances belonging to the most frequent class.
Minimal entropy among attributes.
0.38
Second quartile (Median) of kurtosis among attributes of the numeric type.
Kappa coefficient achieved by the landmarker weka.classifiers.trees.REPTree -L 2
Entropy of the target attribute values.
Kappa coefficient achieved by the landmarker weka.classifiers.lazy.IBk
Number of instances belonging to the most frequent class.
-0.86
Minimum kurtosis among attributes of the numeric type.
29.09
Second quartile (Median) of means among attributes of the numeric type.
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.REPTree -L 3
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.DecisionStump
Maximum entropy among attributes.
2.67
Minimum of means among attributes of the numeric type.
Second quartile (Median) of mutual information between the nominal attributes and the target attribute.
Error rate achieved by the landmarker weka.classifiers.trees.REPTree -L 3
Error rate achieved by the landmarker weka.classifiers.trees.DecisionStump
4
Maximum kurtosis among attributes of the numeric type.
Minimal mutual information between the nominal attributes and the target attribute.
0.23
Second quartile (Median) of skewness among attributes of the numeric type.
Kappa coefficient achieved by the landmarker weka.classifiers.trees.REPTree -L 3
Kappa coefficient achieved by the landmarker weka.classifiers.trees.DecisionStump
5280.65
Maximum of means among attributes of the numeric type.
2
The minimal number of distinct values among attributes of the nominal type.
8.7
Percentage of binary attributes.
5.33
Second quartile (Median) of standard deviation of attributes of the numeric type.
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.RandomTree -depth 1
0.25
Number of attributes divided by the number of instances.
Maximum mutual information between the nominal attributes and the target attribute.
-0.26
Minimum skewness among attributes of the numeric type.
11.83
Percentage of instances having missing values.
Third quartile of entropy among attributes.
Error rate achieved by the landmarker weka.classifiers.trees.RandomTree -depth 1
Number of attributes needed to optimally describe the class (under the assumption of independence among attributes). Equals ClassEntropy divided by MeanMutualInformation.
31
The maximum number of distinct values among attributes of the nominal type.
1.04
Minimum standard deviation of attributes of the numeric type.
0.65
Percentage of missing values.
1.02
Third quartile of kurtosis among attributes of the numeric type.
-6.5
Average class difference between consecutive instances.
Kappa coefficient achieved by the landmarker weka.classifiers.trees.RandomTree -depth 1
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.J48 -C .00001
1.7
Maximum skewness among attributes of the numeric type.

18 tasks

0 runs - estimation_procedure: 10-fold Crossvalidation - evaluation_measure: mean_absolute_error - target_feature: class
0 runs - estimation_procedure: 10 times 10-fold Crossvalidation - evaluation_measure: mean_absolute_error - target_feature: class
0 runs - estimation_procedure: 10-fold Crossvalidation - evaluation_measure: predictive_accuracy - target_feature: class
0 runs - estimation_procedure: 10 times 10-fold Crossvalidation - evaluation_measure: predictive_accuracy - target_feature: class
0 runs - estimation_procedure: Custom 10-fold Crossvalidation - evaluation_measure: predictive_accuracy - target_feature: class
0 runs - estimation_procedure: Test on Training Data - evaluation_measure: predictive_accuracy - target_feature: class
0 runs - estimation_procedure: 5 times 2-fold Crossvalidation - evaluation_measure: predictive_accuracy - target_feature: class
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
Define a new task