OpenML

JavaScript is required to properly view the contents of this page!

Explore
- Data
- Task
- Flow
- Run
- Study
- Task type
- Measure
- People
Help
Blog
Contact
Please cite us

optdigits

active ARFF Publicly available Visibility: public Uploaded 06-04-2014 by Jan van Rijn
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes

Issue	#Downvotes for this reason	By

Loading wiki

Help us complete this description Edit

Author: E. Alpaydin, C. Kaynak Source: [UCI](http://archive.ics.uci.edu/ml/datasets/optical+recognition+of+handwritten+digits) Please cite: [UCI citation policy](https://archive.ics.uci.edu/ml/citation_policy.html) 1. Title of Database: Optical Recognition of Handwritten Digits 2. Source: E. Alpaydin, C. Kaynak Department of Computer Engineering Bogazici University, 80815 Istanbul Turkey alpaydin@boun.edu.tr July 1998 3. Past Usage: C. Kaynak (1995) Methods of Combining Multiple Classifiers and Their Applications to Handwritten Digit Recognition, MSc Thesis, Institute of Graduate Studies in Science and Engineering, Bogazici University. E. Alpaydin, C. Kaynak (1998) Cascading Classifiers, Kybernetika, to appear. ftp://ftp.icsi.berkeley.edu/pub/ai/ethem/kyb.ps.Z 4. Relevant Information: We used preprocessing programs made available by NIST to extract normalized bitmaps of handwritten digits from a preprinted form. From a total of 43 people, 30 contributed to the training set and different 13 to the test set. 32x32 bitmaps are divided into nonoverlapping blocks of 4x4 and the number of on pixels are counted in each block. This generates an input matrix of 8x8 where each element is an integer in the range 0..16. This reduces dimensionality and gives invariance to small distortions. For info on NIST preprocessing routines, see M. D. Garris, J. L. Blue, G. T. Candela, D. L. Dimmick, J. Geist, P. J. Grother, S. A. Janet, and C. L. Wilson, NIST Form-Based Handprint Recognition System, NISTIR 5469, 1994. 5. Number of Instances optdigits.tra Training 3823 optdigits.tes Testing 1797 The way we used the dataset was to use half of training for actual training, one-fourth for validation and one-fourth for writer-dependent testing. The test set was used for writer-independent testing and is the actual quality measure. 6. Number of Attributes 64 input+1 class attribute 7. For Each Attribute: All input attributes are integers in the range 0..16. The last attribute is the class code 0..9 8. Missing Attribute Values None 9. Class Distribution Class: No of examples in training set 0: 376 1: 389 2: 380 3: 389 4: 387 5: 376 6: 377 7: 387 8: 380 9: 382 Class: No of examples in testing set 0: 178 1: 182 2: 177 3: 183 4: 181 5: 182 6: 181 7: 179 8: 174 9: 180 Accuracy on the testing set with k-nn using Euclidean distance as the metric k = 1 : 98.00 k = 2 : 97.38 k = 3 : 97.83 k = 4 : 97.61 k = 5 : 97.89 k = 6 : 97.77 k = 7 : 97.66 k = 8 : 97.66 k = 9 : 97.72 k = 10 : 97.55 k = 11 : 97.89

65 features

class (target)	nominal	10 unique values 0 missing
input1	numeric	1 unique values 0 missing
input2	numeric	9 unique values 0 missing
input3	numeric	17 unique values 0 missing
input4	numeric	17 unique values 0 missing
input5	numeric	17 unique values 0 missing
input6	numeric	17 unique values 0 missing
input7	numeric	17 unique values 0 missing
input8	numeric	17 unique values 0 missing
input9	numeric	4 unique values 0 missing
input10	numeric	17 unique values 0 missing
input11	numeric	17 unique values 0 missing
input12	numeric	17 unique values 0 missing
input13	numeric	17 unique values 0 missing
input14	numeric	17 unique values 0 missing
input15	numeric	17 unique values 0 missing
input16	numeric	15 unique values 0 missing
input17	numeric	5 unique values 0 missing
input18	numeric	17 unique values 0 missing
input19	numeric	17 unique values 0 missing
input20	numeric	17 unique values 0 missing
input21	numeric	17 unique values 0 missing
input22	numeric	17 unique values 0 missing
input23	numeric	17 unique values 0 missing
input24	numeric	9 unique values 0 missing
input25	numeric	2 unique values 0 missing
input26	numeric	17 unique values 0 missing
input27	numeric	17 unique values 0 missing
input28	numeric	17 unique values 0 missing
input29	numeric	17 unique values 0 missing
input30	numeric	17 unique values 0 missing
input31	numeric	17 unique values 0 missing
input32	numeric	3 unique values 0 missing
input33	numeric	2 unique values 0 missing
input34	numeric	16 unique values 0 missing
input35	numeric	17 unique values 0 missing
input36	numeric	17 unique values 0 missing
input37	numeric	17 unique values 0 missing
input38	numeric	17 unique values 0 missing
input39	numeric	15 unique values 0 missing
input40	numeric	1 unique values 0 missing
input41	numeric	8 unique values 0 missing
input42	numeric	17 unique values 0 missing
input43	numeric	17 unique values 0 missing
input44	numeric	17 unique values 0 missing
input45	numeric	17 unique values 0 missing
input46	numeric	17 unique values 0 missing
input47	numeric	17 unique values 0 missing
input48	numeric	7 unique values 0 missing
input49	numeric	9 unique values 0 missing
input50	numeric	17 unique values 0 missing
input51	numeric	17 unique values 0 missing
input52	numeric	17 unique values 0 missing
input53	numeric	17 unique values 0 missing
input54	numeric	17 unique values 0 missing
input55	numeric	17 unique values 0 missing
input56	numeric	13 unique values 0 missing
input57	numeric	2 unique values 0 missing
input58	numeric	11 unique values 0 missing
input59	numeric	17 unique values 0 missing
input60	numeric	17 unique values 0 missing
input61	numeric	17 unique values 0 missing
input62	numeric	17 unique values 0 missing
input63	numeric	17 unique values 0 missing
input64	numeric	17 unique values 0 missing

Show all 65 features

107 properties

NumberOfInstances

5620

Number of instances (rows) of the dataset.

NumberOfFeatures

Number of attributes (columns) of the dataset.

NumberOfClasses

Number of distinct values of the target attribute (if it is nominal).

NumberOfMissingValues

Number of missing values in the dataset.

NumberOfInstancesWithMissingValues

Number of instances with at least one value missing.

NumberOfNumericFeatures

Number of numeric attributes.

NumberOfSymbolicFeatures

Number of nominal attributes.

MaxStdDevOfNumericAtts

6.52

Maximum standard deviation of attributes of the numeric type.

MinorityClassPercentage

9.86

Percentage of instances belonging to the least frequent class.

PercentageOfNumericFeatures

98.46

Percentage of numeric attributes.

Quartile3MeansOfNumericAtts

9.05

Third quartile of means among attributes of the numeric type.

CfsSubsetEval_DecisionStumpAUC

0.94

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.DecisionStump -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

RandomTreeDepth2AUC

0.91

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.RandomTree -depth 2

J48.00001.ErrRate

0.12

Error rate achieved by the landmarker weka.classifiers.trees.J48 -C .00001

MeanAttributeEntropy

Average entropy of the attributes.

MinorityClassSize

554

Number of instances belonging to the least frequent class.

PercentageOfSymbolicFeatures

1.54

Percentage of nominal attributes.

Quartile3MutualInformation

Third quartile of mutual information between the nominal attributes and the target attribute.

CfsSubsetEval_DecisionStumpErrRate

0.12

Error rate achieved by the landmarker weka.classifiers.trees.DecisionStump -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

RandomTreeDepth2ErrRate

0.16

Error rate achieved by the landmarker weka.classifiers.trees.RandomTree -depth 2

J48.00001.Kappa

0.87

Kappa coefficient achieved by the landmarker weka.classifiers.trees.J48 -C .00001

MeanKurtosisOfNumericAtts

168.55

Mean kurtosis among attributes of the numeric type.

NaiveBayesAUC

0.98

Area Under the ROC Curve achieved by the landmarker weka.classifiers.bayes.NaiveBayes

Quartile1AttributeEntropy

First quartile of entropy among attributes.

Quartile3SkewnessOfNumericAtts

4.07

Third quartile of skewness among attributes of the numeric type.

CfsSubsetEval_DecisionStumpKappa

0.87

Kappa coefficient achieved by the landmarker weka.classifiers.trees.DecisionStump -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

RandomTreeDepth2Kappa

0.82

Kappa coefficient achieved by the landmarker weka.classifiers.trees.RandomTree -depth 2

J48.0001.AUC

0.94

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.J48 -C .0001

MeanMeansOfNumericAtts

4.91

Mean of means among attributes of the numeric type.

NaiveBayesErrRate

0.09

Error rate achieved by the landmarker weka.classifiers.bayes.NaiveBayes

Quartile1KurtosisOfNumericAtts

-1.37

First quartile of kurtosis among attributes of the numeric type.

Quartile3StdDevOfNumericAtts

5.87

Third quartile of standard deviation of attributes of the numeric type.

CfsSubsetEval_NaiveBayesAUC

0.94

Area Under the ROC Curve achieved by the landmarker weka.classifiers.bayes.NaiveBayes -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

RandomTreeDepth3AUC

0.91

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.RandomTree -depth 3

J48.0001.ErrRate

0.12

Error rate achieved by the landmarker weka.classifiers.trees.J48 -C .0001

MeanMutualInformation

Average mutual information between the nominal attributes and the target attribute.

NaiveBayesKappa

0.9

Kappa coefficient achieved by the landmarker weka.classifiers.bayes.NaiveBayes

Quartile1MeansOfNumericAtts

0.26

First quartile of means among attributes of the numeric type.

REPTreeDepth1AUC

0.96

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.REPTree -L 1

CfsSubsetEval_NaiveBayesErrRate

0.12

Error rate achieved by the landmarker weka.classifiers.bayes.NaiveBayes -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

RandomTreeDepth3ErrRate

0.16

Error rate achieved by the landmarker weka.classifiers.trees.RandomTree -depth 3

J48.0001.Kappa

0.87

Kappa coefficient achieved by the landmarker weka.classifiers.trees.J48 -C .0001

MeanNoiseToSignalRatio

An estimate of the amount of irrelevant information in the attributes regarding the class. Equals (MeanAttributeEntropy - MeanMutualInformation) divided by MeanMutualInformation.

NumberOfBinaryFeatures

Number of binary attributes.

Quartile1MutualInformation

First quartile of mutual information between the nominal attributes and the target attribute.

REPTreeDepth1ErrRate

0.14

Error rate achieved by the landmarker weka.classifiers.trees.REPTree -L 1

CfsSubsetEval_NaiveBayesKappa

0.87

Kappa coefficient achieved by the landmarker weka.classifiers.bayes.NaiveBayes -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

RandomTreeDepth3Kappa

0.82

Kappa coefficient achieved by the landmarker weka.classifiers.trees.RandomTree -depth 3

J48.001.AUC

0.94

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.J48 -C .001

MeanNominalAttDistinctValues

Average number of distinct values among the attributes of the nominal type.

Quartile1SkewnessOfNumericAtts

-0.33

First quartile of skewness among attributes of the numeric type.

REPTreeDepth1Kappa

0.84

Kappa coefficient achieved by the landmarker weka.classifiers.trees.REPTree -L 1

CfsSubsetEval_kNN1NAUC

0.94

Area Under the ROC Curve achieved by the landmarker weka.classifiers.lazy.IBk -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

StdvNominalAttDistinctValues

Standard deviation of the number of distinct values among attributes of the nominal type.

J48.001.ErrRate

0.12

Error rate achieved by the landmarker weka.classifiers.trees.J48 -C .001

MeanSkewnessOfNumericAtts

5.45

Mean skewness among attributes of the numeric type.

Quartile1StdDevOfNumericAtts

0.97

First quartile of standard deviation of attributes of the numeric type.

REPTreeDepth2AUC

0.96

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.REPTree -L 2

CfsSubsetEval_kNN1NErrRate

0.12

Error rate achieved by the landmarker weka.classifiers.lazy.IBk -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

kNN1NAUC

0.99

Area Under the ROC Curve achieved by the landmarker weka.classifiers.lazy.IBk

J48.001.Kappa

0.87

Kappa coefficient achieved by the landmarker weka.classifiers.trees.J48 -C .001

MeanStdDevOfNumericAtts

3.69

Mean standard deviation of attributes of the numeric type.

Quartile2AttributeEntropy

Second quartile (Median) of entropy among attributes.

REPTreeDepth2ErrRate

0.14

Error rate achieved by the landmarker weka.classifiers.trees.REPTree -L 2

CfsSubsetEval_kNN1NKappa

0.87

Kappa coefficient achieved by the landmarker weka.classifiers.lazy.IBk -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

kNN1NErrRate

0.02

Error rate achieved by the landmarker weka.classifiers.lazy.IBk

MajorityClassPercentage

10.18

Percentage of instances belonging to the most frequent class.

MinAttributeEntropy

Minimal entropy among attributes.

Quartile2KurtosisOfNumericAtts

0.08

Second quartile (Median) of kurtosis among attributes of the numeric type.

REPTreeDepth2Kappa

0.84

Kappa coefficient achieved by the landmarker weka.classifiers.trees.REPTree -L 2

ClassEntropy

3.32

Entropy of the target attribute values.

kNN1NKappa

0.98

Kappa coefficient achieved by the landmarker weka.classifiers.lazy.IBk

MajorityClassSize

572

Number of instances belonging to the most frequent class.

MinKurtosisOfNumericAtts

-1.65

Minimum kurtosis among attributes of the numeric type.

Quartile2MeansOfNumericAtts

4.57

Second quartile (Median) of means among attributes of the numeric type.

REPTreeDepth3AUC

0.96

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.REPTree -L 3

DecisionStumpAUC

0.69

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.DecisionStump

MaxAttributeEntropy

Maximum entropy among attributes.

MinMeansOfNumericAtts

Minimum of means among attributes of the numeric type.

Quartile2MutualInformation

Second quartile (Median) of mutual information between the nominal attributes and the target attribute.

REPTreeDepth3ErrRate

0.14

Error rate achieved by the landmarker weka.classifiers.trees.REPTree -L 3

DecisionStumpErrRate

0.8

Error rate achieved by the landmarker weka.classifiers.trees.DecisionStump

MaxKurtosisOfNumericAtts

2807.5

Maximum kurtosis among attributes of the numeric type.

MinMutualInformation

Minimal mutual information between the nominal attributes and the target attribute.

Quartile2SkewnessOfNumericAtts

0.56

Second quartile (Median) of skewness among attributes of the numeric type.

REPTreeDepth3Kappa

0.84

Kappa coefficient achieved by the landmarker weka.classifiers.trees.REPTree -L 3

DecisionStumpKappa

0.11

Kappa coefficient achieved by the landmarker weka.classifiers.trees.DecisionStump

MaxMeansOfNumericAtts

11.99

Maximum of means among attributes of the numeric type.

MinNominalAttDistinctValues

The minimal number of distinct values among attributes of the nominal type.

PercentageOfBinaryFeatures

Percentage of binary attributes.

Quartile2StdDevOfNumericAtts

4.3

Second quartile (Median) of standard deviation of attributes of the numeric type.

RandomTreeDepth1AUC

0.91

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.RandomTree -depth 1

Dimensionality

0.01

Number of attributes divided by the number of instances.

MaxMutualInformation

Maximum mutual information between the nominal attributes and the target attribute.

MinSkewnessOfNumericAtts

-1.3

Minimum skewness among attributes of the numeric type.

PercentageOfInstancesWithMissingValues

Percentage of instances having missing values.

Quartile3AttributeEntropy

Third quartile of entropy among attributes.

RandomTreeDepth1ErrRate

0.16

Error rate achieved by the landmarker weka.classifiers.trees.RandomTree -depth 1

EquivalentNumberOfAtts

Number of attributes needed to optimally describe the class (under the assumption of independence among attributes). Equals ClassEntropy divided by MeanMutualInformation.

MaxNominalAttDistinctValues

The maximum number of distinct values among attributes of the nominal type.

MinStdDevOfNumericAtts

Minimum standard deviation of attributes of the numeric type.

PercentageOfMissingValues

Percentage of missing values.

Quartile3KurtosisOfNumericAtts

20.3

Third quartile of kurtosis among attributes of the numeric type.

AutoCorrelation

0.09

Average class difference between consecutive instances.

RandomTreeDepth1Kappa

0.82

Kappa coefficient achieved by the landmarker weka.classifiers.trees.RandomTree -depth 1

J48.00001.AUC

0.94

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.J48 -C .00001

MaxSkewnessOfNumericAtts

Maximum skewness among attributes of the numeric type.

Show all 107 properties

71 tasks

Supervised Classification on optdigits

21750 runs - estimation_procedure: 10-fold Crossvalidation - target_feature: class

Supervised Classification on optdigits

305 runs - estimation_procedure: 5 times 2-fold Crossvalidation - evaluation_measure: predictive_accuracy - target_feature: class

Supervised Classification on optdigits

301 runs - estimation_procedure: 33% Holdout set - evaluation_measure: predictive_accuracy - target_feature: class

Supervised Classification on optdigits

167 runs - estimation_procedure: 10 times 10-fold Crossvalidation - evaluation_measure: predictive_accuracy - target_feature: class

Supervised Classification on optdigits

31 runs - estimation_procedure: 10-fold Crossvalidation - evaluation_measure: precision - target_feature: class

Supervised Classification on optdigits

1 runs - estimation_procedure: 5 times 2-fold Crossvalidation - target_feature: class

Supervised Classification on optdigits

0 runs - estimation_procedure: 33% Holdout set - target_feature: class

Supervised Classification on optdigits

0 runs - estimation_procedure: 4-fold Crossvalidation - target_feature: class

Learning Curve on optdigits

304 runs - estimation_procedure: 10-fold Learning Curve - evaluation_measure: predictive_accuracy - target_feature: class

Learning Curve on optdigits

174 runs - estimation_procedure: 10 times 10-fold Learning Curve - evaluation_measure: predictive_accuracy - target_feature: class

Learning Curve on optdigits

0 runs - estimation_procedure: 10-fold Learning Curve - target_feature: class

Learning Curve on optdigits

0 runs - estimation_procedure: 10-fold Learning Curve - target_feature: class

Learning Curve on optdigits

0 runs - estimation_procedure: 10-fold Learning Curve - target_feature: class

Learning Curve on optdigits

0 runs - estimation_procedure: 10-fold Learning Curve - target_feature: class

Learning Curve on optdigits

0 runs - estimation_procedure: 10-fold Learning Curve - target_feature: class

Learning Curve on optdigits

0 runs - estimation_procedure: 10-fold Learning Curve - target_feature: class

Learning Curve on optdigits

0 runs - estimation_procedure: 10-fold Learning Curve - target_feature: class

Learning Curve on optdigits

0 runs - estimation_procedure: 10-fold Learning Curve - target_feature: class

Supervised Data Stream Classification on optdigits

25 runs - estimation_procedure: Interleaved Test then Train - target_feature: class

Clustering on optdigits

0 runs - estimation_procedure: 50 times Clustering

Clustering on optdigits

0 runs - estimation_procedure: 50 times Clustering

Clustering on optdigits

0 runs - estimation_procedure: 50 times Clustering

Clustering on optdigits

0 runs - estimation_procedure: 50 times Clustering

Clustering on optdigits

0 runs - estimation_procedure: 50 times Clustering

Clustering on optdigits

0 runs

Clustering on optdigits

0 runs - estimation_procedure: 50 times Clustering

Clustering on optdigits

0 runs - target_feature: class

Clustering on optdigits

0 runs - estimation_procedure: 50 times Clustering

Clustering on optdigits

0 runs - estimation_procedure: 50 times Clustering

Clustering on optdigits

0 runs - estimation_procedure: 50 times Clustering

Clustering on optdigits

0 runs - estimation_procedure: 50 times Clustering

Subgroup Discovery on optdigits

1310 runs - target_feature: class

Subgroup Discovery on optdigits

1309 runs - target_feature: class

Subgroup Discovery on optdigits

1309 runs - target_feature: class

Subgroup Discovery on optdigits

1308 runs - target_feature: class

Subgroup Discovery on optdigits

1308 runs - target_feature: class

Subgroup Discovery on optdigits

1307 runs - target_feature: class

Subgroup Discovery on optdigits

1304 runs - target_feature: class

Subgroup Discovery on optdigits

1304 runs - target_feature: class

Subgroup Discovery on optdigits

1303 runs - target_feature: class

Subgroup Discovery on optdigits

1299 runs - target_feature: class

Subgroup Discovery on optdigits

0 runs - target_feature: class

Subgroup Discovery on optdigits

0 runs - target_feature: class

Subgroup Discovery on optdigits

0 runs - target_feature: class

Subgroup Discovery on optdigits

0 runs - target_feature: class

Subgroup Discovery on optdigits

0 runs - target_feature: class

Subgroup Discovery on optdigits

0 runs - target_feature: class

Subgroup Discovery on optdigits

0 runs - target_feature: class

Subgroup Discovery on optdigits

0 runs - target_feature: class

Subgroup Discovery on optdigits

0 runs - target_feature: class

Subgroup Discovery on optdigits

0 runs - target_feature: class

Subgroup Discovery on optdigits

0 runs - target_feature: class

Subgroup Discovery on optdigits

0 runs - target_feature: class

Subgroup Discovery on optdigits

0 runs - target_feature: class

Subgroup Discovery on optdigits

0 runs - target_feature: class

Subgroup Discovery on optdigits

0 runs - target_feature: class

Subgroup Discovery on optdigits

0 runs - target_feature: class

Subgroup Discovery on optdigits

0 runs - target_feature: class

Subgroup Discovery on optdigits

0 runs - target_feature: class

Subgroup Discovery on optdigits

0 runs - target_feature: class

Subgroup Discovery on optdigits

0 runs - target_feature: class

Subgroup Discovery on optdigits

0 runs - target_feature: class

Subgroup Discovery on optdigits

0 runs - target_feature: class

Subgroup Discovery on optdigits

0 runs - target_feature: class

Subgroup Discovery on optdigits

0 runs - target_feature: class

Subgroup Discovery on optdigits

0 runs - target_feature: class

Subgroup Discovery on optdigits

0 runs - target_feature: class

Subgroup Discovery on optdigits

0 runs - target_feature: class

Subgroup Discovery on optdigits

0 runs - target_feature: class

Subgroup Discovery on optdigits

0 runs - target_feature: class

Subgroup Discovery on optdigits

0 runs - target_feature: class

Define a new task

Sign in

optdigits

65 features

107 properties

71 tasks