OpenML

JavaScript is required to properly view the contents of this page!

Explore
- Data
- Task
- Flow
- Run
- Study
- Task type
- Measure
- People
Help
Blog
Contact
Please cite us

tecator

active ARFF Publicly available Visibility: public Uploaded 29-09-2014 by Joaquin Vanschoren
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes

Issue	#Downvotes for this reason	By

Loading wiki

Help us complete this description Edit

Author: Source: Unknown - Date unknown Please cite: This is the Tecator data set: The task is to predict the fat content of a meat sample on the basis of its near infrared absorbance spectrum. 1. Statement of permission from Tecator (the original data source) These data are recorded on a Tecator Infratec Food and Feed Analyzer working in the wavelength range 850 - 1050 nm by the Near Infrared Transmission (NIT) principle. Each sample contains finely chopped pure meat with different moisture, fat and protein contents. If results from these data are used in a publication we want you to mention the instrument and company name (Tecator) in the publication. In addition, please send a preprint of your article to Karin Thente, Tecator AB, Box 70, S-263 21 Hoganas, Sweden The data are available in the public domain with no responsability from the original data source. The data can be redistributed as long as this permission note is attached. For more information about the instrument - call Perstorp Analytical's representative in your area. 2. Description of the data file For each meat sample the data consists of a 100 channel spectrum of absorbances and the contents of moisture (water), fat and protein. The absorbance is -log10 of the transmittance measured by the spectrometer. The three contents, measured in percent, are determined by analytic chemistry. There are 240 samples which are divided into 5 data sets for the purpose of model validation and extrapolation studies. The data sets, further described in reference 1, are: Data set Use Samples C Traning 129 M Monitoring 43 T Testing 43 E1 Extrapolation, Fat 8 E2 Extrapolation, Protein 17 The data for all 240 samples appear at the end of this file - 25 lines per sample. The data sets appear in the order of the table above. The spectra are preprocessed using a principal component analysis on the data set C, and the first 22 principal components (scaled to unit variance) are included for each sample. Thus if you want to use the data for a standard (interpolation) test of your algorithm, use sample 1-172 for training and sample 173-215 for testing (and ignore the last 25 samples), and use the first 13 or so principal components to predict the fat content. Each line contains the 100 absorbances followed by the 22 principal components and finally the contents of moisture, fat and protein. Preceeding the data lines, the following lines appear: real_in=122 real_out=3 training_examples=172 test_examples=43 extrapolation_examples=25 3. More details on how to use the data The data are made available as a benchmark for regression models. In order to compare models, it is practical to use the data set as follows: C and M combined are used to tune (estimate, train) the model. (Some approaches set aside some training data to control overfitting. These data should be a subset of C+M. In (1) the subset M was used for this purpose.) T is used to test the model once it has been tuned. If each model has an element of randomness (as is the case for neural networks) the most reliable measure of performance of a single model is obtained by selecting a handful of models on the basis of C+M and quoting the average of the performances on T. In the presence of randomness it is bad practice to train a lot of models on C+M and then select the best of these on the basis of T. C, M and T are drawn from the same pool of data, so T is used to test the ability of the models to interpolate. The data sets E1 and E2 contain more fat and protein respectively and are intended to be used to test the ability of the models to extrapolate. 4. Performance of neural network models The performance is measured as Standard Error of Prediction (SEP) which is the root mean square of the difference between the true and the predicted content. For the prediction of fat on the data set T the following results were obtained Reference SEP method (see the papers for details) (1) 0.65 10-6-1 network, early stopping (2) 0.52 10-3-1 network, Bayesian (3) 0.36 13-X-1 network, Bayesian, Automatic Relevance Determination A linear model with 10 inputs yields SEP=2.78. 5. References (1) C.Borggaard and H.H.Thodberg, "Optimal Minimal Neural Interpretation of Spectra", Analytical Chemistry 64 (1992), p 545-551. (2) H.H.Thodberg, "Ace of Bayes: Application of Neural Networks with Pruning" Manuscript 1132, Danish Meat Research Institute (1993), available by anonymous ftp in the file: pub/neuroprose/thodberg.ace-of-bayes.ps.Z on the Internet node archive.cis.ohio-state.edu (128.146.8.52). (3) Revised and extended version of (2), in preparation, to be submitted to IEEE Trans. Neural Networks (1995) available by anonymous ftp in the file: pub/neuroprose/thodberg.bayesARD.ps.Z on the Internet node archive.cis.ohio-state.edu (128.146.8.52). Hans Henrik Thodberg Email: thodberg@nn.dmri.dk Danish Meat Research Institute Phone: (+45) 42 36 12 00 Maglegaardsvej 2, Postboks 57 Fax: (+45) 42 36 48 36 DK-4000 Roskilde, Denmark real_in=122 real_out=3 training_examples=172 test_examples=43 extrapolation_examples=25 Note: all 240 samples are included in the same order as mentioned above Information about the dataset CLASSTYPE: numeric CLASSINDEX: none specific

125 features

fat (target)	numeric	157 unique values 0 missing
absorbance_1	numeric	216 unique values 0 missing
absorbance_2	numeric	216 unique values 0 missing
absorbance_3	numeric	216 unique values 0 missing
absorbance_4	numeric	214 unique values 0 missing
absorbance_5	numeric	216 unique values 0 missing
absorbance_6	numeric	216 unique values 0 missing
absorbance_7	numeric	216 unique values 0 missing
absorbance_8	numeric	216 unique values 0 missing
absorbance_9	numeric	216 unique values 0 missing
absorbance_10	numeric	216 unique values 0 missing
absorbance_11	numeric	216 unique values 0 missing
absorbance_12	numeric	215 unique values 0 missing
absorbance_13	numeric	216 unique values 0 missing
absorbance_14	numeric	216 unique values 0 missing
absorbance_15	numeric	216 unique values 0 missing
absorbance_16	numeric	216 unique values 0 missing
absorbance_17	numeric	216 unique values 0 missing
absorbance_18	numeric	216 unique values 0 missing
absorbance_19	numeric	215 unique values 0 missing
absorbance_20	numeric	216 unique values 0 missing
absorbance_21	numeric	215 unique values 0 missing
absorbance_22	numeric	216 unique values 0 missing
absorbance_23	numeric	216 unique values 0 missing
absorbance_24	numeric	216 unique values 0 missing
absorbance_25	numeric	215 unique values 0 missing
absorbance_26	numeric	216 unique values 0 missing
absorbance_27	numeric	216 unique values 0 missing
absorbance_28	numeric	216 unique values 0 missing
absorbance_29	numeric	215 unique values 0 missing
absorbance_30	numeric	216 unique values 0 missing
absorbance_31	numeric	216 unique values 0 missing
absorbance_32	numeric	215 unique values 0 missing
absorbance_33	numeric	216 unique values 0 missing
absorbance_34	numeric	216 unique values 0 missing
absorbance_35	numeric	215 unique values 0 missing
absorbance_36	numeric	216 unique values 0 missing
absorbance_37	numeric	216 unique values 0 missing
absorbance_38	numeric	216 unique values 0 missing
absorbance_39	numeric	216 unique values 0 missing
absorbance_40	numeric	215 unique values 0 missing
absorbance_41	numeric	216 unique values 0 missing
absorbance_42	numeric	216 unique values 0 missing
absorbance_43	numeric	216 unique values 0 missing
absorbance_44	numeric	216 unique values 0 missing
absorbance_45	numeric	216 unique values 0 missing
absorbance_46	numeric	216 unique values 0 missing
absorbance_47	numeric	216 unique values 0 missing
absorbance_48	numeric	216 unique values 0 missing
absorbance_49	numeric	216 unique values 0 missing
absorbance_50	numeric	216 unique values 0 missing
absorbance_51	numeric	216 unique values 0 missing
absorbance_52	numeric	216 unique values 0 missing
absorbance_53	numeric	216 unique values 0 missing
absorbance_54	numeric	216 unique values 0 missing
absorbance_55	numeric	215 unique values 0 missing
absorbance_56	numeric	215 unique values 0 missing
absorbance_57	numeric	215 unique values 0 missing
absorbance_58	numeric	216 unique values 0 missing
absorbance_59	numeric	215 unique values 0 missing
absorbance_60	numeric	216 unique values 0 missing
absorbance_61	numeric	216 unique values 0 missing
absorbance_62	numeric	216 unique values 0 missing
absorbance_63	numeric	216 unique values 0 missing
absorbance_64	numeric	216 unique values 0 missing
absorbance_65	numeric	216 unique values 0 missing
absorbance_66	numeric	216 unique values 0 missing
absorbance_67	numeric	216 unique values 0 missing
absorbance_68	numeric	215 unique values 0 missing
absorbance_69	numeric	216 unique values 0 missing
absorbance_70	numeric	216 unique values 0 missing
absorbance_71	numeric	216 unique values 0 missing
absorbance_72	numeric	216 unique values 0 missing
absorbance_73	numeric	216 unique values 0 missing
absorbance_74	numeric	216 unique values 0 missing
absorbance_75	numeric	216 unique values 0 missing
absorbance_76	numeric	216 unique values 0 missing
absorbance_77	numeric	216 unique values 0 missing
absorbance_78	numeric	216 unique values 0 missing
absorbance_79	numeric	216 unique values 0 missing
absorbance_80	numeric	216 unique values 0 missing
absorbance_81	numeric	215 unique values 0 missing
absorbance_82	numeric	216 unique values 0 missing
absorbance_83	numeric	216 unique values 0 missing
absorbance_84	numeric	216 unique values 0 missing
absorbance_85	numeric	216 unique values 0 missing
absorbance_86	numeric	215 unique values 0 missing
absorbance_87	numeric	216 unique values 0 missing
absorbance_88	numeric	216 unique values 0 missing
absorbance_89	numeric	216 unique values 0 missing
absorbance_90	numeric	216 unique values 0 missing
absorbance_91	numeric	216 unique values 0 missing
absorbance_92	numeric	216 unique values 0 missing
absorbance_93	numeric	215 unique values 0 missing
absorbance_94	numeric	214 unique values 0 missing
absorbance_95	numeric	216 unique values 0 missing
absorbance_96	numeric	216 unique values 0 missing
absorbance_97	numeric	216 unique values 0 missing
absorbance_98	numeric	216 unique values 0 missing
absorbance_99	numeric	216 unique values 0 missing
absorbance_100	numeric	216 unique values 0 missing
principal_component_1	numeric	216 unique values 0 missing
principal_component_2	numeric	216 unique values 0 missing
principal_component_3	numeric	217 unique values 0 missing
principal_component_4	numeric	217 unique values 0 missing
principal_component_5	numeric	216 unique values 0 missing
principal_component_6	numeric	216 unique values 0 missing
principal_component_7	numeric	216 unique values 0 missing
principal_component_8	numeric	216 unique values 0 missing
principal_component_9	numeric	216 unique values 0 missing
principal_component_10	numeric	216 unique values 0 missing
principal_component_11	numeric	216 unique values 0 missing
principal_component_12	numeric	216 unique values 0 missing
principal_component_13	numeric	216 unique values 0 missing
principal_component_14	numeric	217 unique values 0 missing
principal_component_15	numeric	216 unique values 0 missing
principal_component_16	numeric	216 unique values 0 missing
principal_component_17	numeric	216 unique values 0 missing
principal_component_18	numeric	217 unique values 0 missing
principal_component_19	numeric	216 unique values 0 missing
principal_component_20	numeric	216 unique values 0 missing
principal_component_21	numeric	216 unique values 0 missing
principal_component_22	numeric	216 unique values 0 missing
moisture	numeric	141 unique values 0 missing
protein	numeric	97 unique values 0 missing

Show first 100 features

107 properties

NumberOfInstances

240

Number of instances (rows) of the dataset.

NumberOfFeatures

125

Number of attributes (columns) of the dataset.

NumberOfClasses

Number of distinct values of the target attribute (if it is nominal).

NumberOfMissingValues

Number of missing values in the dataset.

NumberOfInstancesWithMissingValues

Number of instances with at least one value missing.

NumberOfNumericFeatures

125

Number of numeric attributes.

NumberOfSymbolicFeatures

Number of nominal attributes.

Quartile2AttributeEntropy

Second quartile (Median) of entropy among attributes.

REPTreeDepth2ErrRate

Error rate achieved by the landmarker weka.classifiers.trees.REPTree -L 2

CfsSubsetEval_kNN1NKappa

Kappa coefficient achieved by the landmarker weka.classifiers.lazy.IBk -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

kNN1NErrRate

Error rate achieved by the landmarker weka.classifiers.lazy.IBk

MajorityClassPercentage

Percentage of instances belonging to the most frequent class.

MeanStdDevOfNumericAtts

0.83

Mean standard deviation of attributes of the numeric type.

Quartile2KurtosisOfNumericAtts

0.66

Second quartile (Median) of kurtosis among attributes of the numeric type.

REPTreeDepth2Kappa

Kappa coefficient achieved by the landmarker weka.classifiers.trees.REPTree -L 2

ClassEntropy

Entropy of the target attribute values.

kNN1NKappa

Kappa coefficient achieved by the landmarker weka.classifiers.lazy.IBk

MajorityClassSize

Number of instances belonging to the most frequent class.

MinAttributeEntropy

Minimal entropy among attributes.

Quartile2MeansOfNumericAtts

3.09

Second quartile (Median) of means among attributes of the numeric type.

REPTreeDepth3AUC

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.REPTree -L 3

DecisionStumpAUC

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.DecisionStump

MaxAttributeEntropy

Maximum entropy among attributes.

MinKurtosisOfNumericAtts

-0.47

Minimum kurtosis among attributes of the numeric type.

Quartile2MutualInformation

Second quartile (Median) of mutual information between the nominal attributes and the target attribute.

REPTreeDepth3ErrRate

Error rate achieved by the landmarker weka.classifiers.trees.REPTree -L 3

DecisionStumpErrRate

Error rate achieved by the landmarker weka.classifiers.trees.DecisionStump

MaxKurtosisOfNumericAtts

9.36

Maximum kurtosis among attributes of the numeric type.

MinMeansOfNumericAtts

-0.15

Minimum of means among attributes of the numeric type.

Quartile2SkewnessOfNumericAtts

0.82

Second quartile (Median) of skewness among attributes of the numeric type.

REPTreeDepth3Kappa

Kappa coefficient achieved by the landmarker weka.classifiers.trees.REPTree -L 3

DecisionStumpKappa

Kappa coefficient achieved by the landmarker weka.classifiers.trees.DecisionStump

MaxMeansOfNumericAtts

62.85

Maximum of means among attributes of the numeric type.

MinMutualInformation

Minimal mutual information between the nominal attributes and the target attribute.

PercentageOfBinaryFeatures

Percentage of binary attributes.

Quartile2StdDevOfNumericAtts

0.55

Second quartile (Median) of standard deviation of attributes of the numeric type.

RandomTreeDepth1AUC

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.RandomTree -depth 1

Dimensionality

0.52

Number of attributes divided by the number of instances.

MaxMutualInformation

Maximum mutual information between the nominal attributes and the target attribute.

MinNominalAttDistinctValues

The minimal number of distinct values among attributes of the nominal type.

PercentageOfInstancesWithMissingValues

Percentage of instances having missing values.

Quartile3AttributeEntropy

Third quartile of entropy among attributes.

RandomTreeDepth1ErrRate

Error rate achieved by the landmarker weka.classifiers.trees.RandomTree -depth 1

EquivalentNumberOfAtts

Number of attributes needed to optimally describe the class (under the assumption of independence among attributes). Equals ClassEntropy divided by MeanMutualInformation.

MaxNominalAttDistinctValues

The maximum number of distinct values among attributes of the nominal type.

MinSkewnessOfNumericAtts

-1.55

Minimum skewness among attributes of the numeric type.

PercentageOfMissingValues

Percentage of missing values.

Quartile3KurtosisOfNumericAtts

0.83

Third quartile of kurtosis among attributes of the numeric type.

AutoCorrelation

-2.64

Average class difference between consecutive instances.

RandomTreeDepth1Kappa

Kappa coefficient achieved by the landmarker weka.classifiers.trees.RandomTree -depth 1

J48.00001.AUC

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.J48 -C .00001

MaxSkewnessOfNumericAtts

1.69

Maximum skewness among attributes of the numeric type.

MinStdDevOfNumericAtts

0.41

Minimum standard deviation of attributes of the numeric type.

PercentageOfNumericFeatures

100

Percentage of numeric attributes.

Quartile3MeansOfNumericAtts

3.42

Third quartile of means among attributes of the numeric type.

CfsSubsetEval_DecisionStumpAUC

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.DecisionStump -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

RandomTreeDepth2AUC

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.RandomTree -depth 2

J48.00001.ErrRate

Error rate achieved by the landmarker weka.classifiers.trees.J48 -C .00001

MaxStdDevOfNumericAtts

14.36

Maximum standard deviation of attributes of the numeric type.

MinorityClassPercentage

Percentage of instances belonging to the least frequent class.

PercentageOfSymbolicFeatures

Percentage of nominal attributes.

Quartile3MutualInformation

Third quartile of mutual information between the nominal attributes and the target attribute.

CfsSubsetEval_DecisionStumpErrRate

Error rate achieved by the landmarker weka.classifiers.trees.DecisionStump -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

RandomTreeDepth2ErrRate

Error rate achieved by the landmarker weka.classifiers.trees.RandomTree -depth 2

J48.00001.Kappa

Kappa coefficient achieved by the landmarker weka.classifiers.trees.J48 -C .00001

MeanAttributeEntropy

Average entropy of the attributes.

MinorityClassSize

Number of instances belonging to the least frequent class.

Quartile1AttributeEntropy

First quartile of entropy among attributes.

Quartile3SkewnessOfNumericAtts

0.91

Third quartile of skewness among attributes of the numeric type.

CfsSubsetEval_DecisionStumpKappa

Kappa coefficient achieved by the landmarker weka.classifiers.trees.DecisionStump -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

RandomTreeDepth2Kappa

Kappa coefficient achieved by the landmarker weka.classifiers.trees.RandomTree -depth 2

J48.0001.AUC

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.J48 -C .0001

MeanKurtosisOfNumericAtts

1.01

Mean kurtosis among attributes of the numeric type.

NaiveBayesAUC

Area Under the ROC Curve achieved by the landmarker weka.classifiers.bayes.NaiveBayes

Quartile1KurtosisOfNumericAtts

0.58

First quartile of kurtosis among attributes of the numeric type.

Quartile3StdDevOfNumericAtts

0.55

Third quartile of standard deviation of attributes of the numeric type.

CfsSubsetEval_NaiveBayesAUC

Area Under the ROC Curve achieved by the landmarker weka.classifiers.bayes.NaiveBayes -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

RandomTreeDepth3AUC

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.RandomTree -depth 3

J48.0001.ErrRate

Error rate achieved by the landmarker weka.classifiers.trees.J48 -C .0001

MeanMeansOfNumericAtts

3.36

Mean of means among attributes of the numeric type.

NaiveBayesErrRate

Error rate achieved by the landmarker weka.classifiers.bayes.NaiveBayes

Quartile1MeansOfNumericAtts

2.85

First quartile of means among attributes of the numeric type.

REPTreeDepth1AUC

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.REPTree -L 1

CfsSubsetEval_NaiveBayesErrRate

Error rate achieved by the landmarker weka.classifiers.bayes.NaiveBayes -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

RandomTreeDepth3ErrRate

Error rate achieved by the landmarker weka.classifiers.trees.RandomTree -depth 3

J48.0001.Kappa

Kappa coefficient achieved by the landmarker weka.classifiers.trees.J48 -C .0001

MeanMutualInformation

Average mutual information between the nominal attributes and the target attribute.

NaiveBayesKappa

Kappa coefficient achieved by the landmarker weka.classifiers.bayes.NaiveBayes

Quartile1MutualInformation

First quartile of mutual information between the nominal attributes and the target attribute.

REPTreeDepth1ErrRate

Error rate achieved by the landmarker weka.classifiers.trees.REPTree -L 1

CfsSubsetEval_NaiveBayesKappa

Kappa coefficient achieved by the landmarker weka.classifiers.bayes.NaiveBayes -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

RandomTreeDepth3Kappa

Kappa coefficient achieved by the landmarker weka.classifiers.trees.RandomTree -depth 3

J48.001.AUC

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.J48 -C .001

MeanNoiseToSignalRatio

An estimate of the amount of irrelevant information in the attributes regarding the class. Equals (MeanAttributeEntropy - MeanMutualInformation) divided by MeanMutualInformation.

NumberOfBinaryFeatures

Number of binary attributes.

Quartile1SkewnessOfNumericAtts

0.79

First quartile of skewness among attributes of the numeric type.

REPTreeDepth1Kappa

Kappa coefficient achieved by the landmarker weka.classifiers.trees.REPTree -L 1

CfsSubsetEval_kNN1NAUC

Area Under the ROC Curve achieved by the landmarker weka.classifiers.lazy.IBk -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

StdvNominalAttDistinctValues

Standard deviation of the number of distinct values among attributes of the nominal type.

J48.001.ErrRate

Error rate achieved by the landmarker weka.classifiers.trees.J48 -C .001

MeanNominalAttDistinctValues

Average number of distinct values among the attributes of the nominal type.

Quartile1StdDevOfNumericAtts

0.51

First quartile of standard deviation of attributes of the numeric type.

REPTreeDepth2AUC

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.REPTree -L 2

CfsSubsetEval_kNN1NErrRate

Error rate achieved by the landmarker weka.classifiers.lazy.IBk -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

kNN1NAUC

Area Under the ROC Curve achieved by the landmarker weka.classifiers.lazy.IBk

J48.001.Kappa

Kappa coefficient achieved by the landmarker weka.classifiers.trees.J48 -C .001

MeanSkewnessOfNumericAtts

0.69

Mean skewness among attributes of the numeric type.

Show all 107 properties