OpenML

JavaScript is required to properly view the contents of this page!

biomed

active ARFF Publicly available Visibility: public Uploaded 28-09-2014 by Joaquin Vanschoren
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes

Issue	#Downvotes for this reason	By

Loading wiki

Help us complete this description Edit

Author: Source: Unknown - Date unknown Please cite: February 23, 1982 The 1982 annual meetings of the American Statistical Association (ASA) will be held August 16-19, 1982 in Cincinnati. At that meeting, the ASA Committee on Statistical Graphics plans to sponsor an "Exposition of Statistical Graphics Technology." The purpose of this activity is to more fully inform the ASA membership about the capabilities and uses of computer graphcis in statistical work. This letter is to invite you to participate in the Exposition. Attached is a set of biomedical data containing 209 observations (134 for "normals" and 75 for "carriers"). Each vendor of provider of statistical graphics software participating in the Exposition is to analyze these data using their software and to prepare tabular, graphical and text output illustrating the use of graphics in these analyses and summarizing their conclusions. The tabular and graphical materials must be direct computer output from the statistical graphics software; the textual descriptions and summaries need not be. The total display space available to each participant at the meeting will be a standard poster- board (approximately 4' x 2 1/2'). All entries will be displayed in one location at the meetings, together with brief written commentary by the committee summarizing the results of this activity. Reference Exposition of Statistical Graphics Technology, L. H. Cox, M. M. Johnson, K. Kafadar, ASA Proc Stat. Comp Section, 1982, pp 55-56. Enclosures THE DATA The following data arose in a study to develop screening methods to identify carriers of a rare genetic disorder. Four measurements m1, m2, m3, m4 were made on blood samples. One of these, m1, has been used before. The disease is Duchenne muscular dystrophy. Measurements are: M1- serum creatine kinase. M2- hemopexin. M3- pyruvate kinase. M4- lactate dehydrogenase. Because the disease is rare, there are only a few carriers of the disease from whom data are available. The data come in two files, one for normals and one for carriers of the disease. A description of the files is provided. The data have been stripped of the names and other identifiers. Otherwise the data are as received by the analyst. PURPOSE OF THE ANALYSIS The purpose of the analysis is to develop a screening procedure to detect carriers and to describe its effectiveness. Experts in the field have noted that young people tend to have higher measurements. The laboratory which prepared the measurements is worried that there may be a systematic drift over time in their measurement process. These effects should be considered in the analysis. Can graphical displays show the differences between the distributions of carriers and normals? FILE DESCRIPTION Column Content 1 Observation number (sequence number per patient) Note that there are several samples per patient for some patients. 2-8 Blank 9-12 Hospital identification number for blood sample 13-18 Blank 19-20 Age of patient 21-26 Blank 27-32 Date that blood sample was taken (mmddyy) Note that all day entries are 00. 33-39 Blank 40-43 ml (measurement 1) sss.s 44-50 Blank 51-54 m2 (measurement 2) xxx.x Eight missing data points. 55-61 Blank 62-65 m3 (measurement 3) xxx.x 66-72 Blank 73-75 m4 (measurement 4) xxx Seven missing data points. Information about the dataset CLASSTYPE: nominal CLASSINDEX: last

9 features

class (target)	nominal	2 unique values 0 missing
Observation_number	nominal	7 unique values 0 missing
Hospital_identification_number_for_blood_sample	numeric	191 unique values 0 missing
Age_of_patient	numeric	33 unique values 0 missing
Date_that_blood_sample_was_taken	numeric	26 unique values 0 missing
ml	numeric	99 unique values 0 missing
m2	numeric	119 unique values 8 missing
m3	numeric	140 unique values 0 missing
m4	numeric	125 unique values 7 missing

Show all 9 features

107 properties

NumberOfInstances

209

Number of instances (rows) of the dataset.

NumberOfFeatures

Number of attributes (columns) of the dataset.

NumberOfClasses

Number of distinct values of the target attribute (if it is nominal).

NumberOfMissingValues

Number of missing values in the dataset.

NumberOfInstancesWithMissingValues

Number of instances with at least one value missing.

NumberOfNumericFeatures

Number of numeric attributes.

NumberOfSymbolicFeatures

Number of nominal attributes.

MeanSkewnessOfNumericAtts

1.59

Mean skewness among attributes of the numeric type.

Quartile1StdDevOfNumericAtts

12.46

First quartile of standard deviation of attributes of the numeric type.

REPTreeDepth2AUC

0.74

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.REPTree -L 2

CfsSubsetEval_kNN1NErrRate

0.18

Error rate achieved by the landmarker weka.classifiers.lazy.IBk -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

kNN1NAUC

0.8

Area Under the ROC Curve achieved by the landmarker weka.classifiers.lazy.IBk

J48.001.Kappa

0.59

Kappa coefficient achieved by the landmarker weka.classifiers.trees.J48 -C .001

MeanStdDevOfNumericAtts

4234.91

Mean standard deviation of attributes of the numeric type.

Quartile2AttributeEntropy

1.57

Second quartile (Median) of entropy among attributes.

REPTreeDepth2ErrRate

0.23

Error rate achieved by the landmarker weka.classifiers.trees.REPTree -L 2

CfsSubsetEval_kNN1NKappa

0.59

Kappa coefficient achieved by the landmarker weka.classifiers.lazy.IBk -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

kNN1NErrRate

0.15

Error rate achieved by the landmarker weka.classifiers.lazy.IBk

MajorityClassPercentage

64.11

Percentage of instances belonging to the most frequent class.

MinAttributeEntropy

1.57

Minimal entropy among attributes.

Quartile2KurtosisOfNumericAtts

2.08

Second quartile (Median) of kurtosis among attributes of the numeric type.

REPTreeDepth2Kappa

0.47

Kappa coefficient achieved by the landmarker weka.classifiers.trees.REPTree -L 2

ClassEntropy

0.94

Entropy of the target attribute values.

kNN1NKappa

0.66

Kappa coefficient achieved by the landmarker weka.classifiers.lazy.IBk

MajorityClassSize

134

Number of instances belonging to the most frequent class.

MinKurtosisOfNumericAtts

-1.03

Minimum kurtosis among attributes of the numeric type.

Quartile2MeansOfNumericAtts

92.26

Second quartile (Median) of means among attributes of the numeric type.

REPTreeDepth3AUC

0.74

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.REPTree -L 3

DecisionStumpAUC

0.76

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.DecisionStump

MaxAttributeEntropy

1.57

Maximum entropy among attributes.

MinMeansOfNumericAtts

16.77

Minimum of means among attributes of the numeric type.

Quartile2MutualInformation

0.02

Second quartile (Median) of mutual information between the nominal attributes and the target attribute.

REPTreeDepth3ErrRate

0.23

Error rate achieved by the landmarker weka.classifiers.trees.REPTree -L 3

DecisionStumpErrRate

0.23

Error rate achieved by the landmarker weka.classifiers.trees.DecisionStump

MaxKurtosisOfNumericAtts

24.1

Maximum kurtosis among attributes of the numeric type.

MinMutualInformation

0.02

Minimal mutual information between the nominal attributes and the target attribute.

Quartile2SkewnessOfNumericAtts

1.33

Second quartile (Median) of skewness among attributes of the numeric type.

REPTreeDepth3Kappa

0.47

Kappa coefficient achieved by the landmarker weka.classifiers.trees.REPTree -L 3

DecisionStumpKappa

0.48

Kappa coefficient achieved by the landmarker weka.classifiers.trees.DecisionStump

MaxMeansOfNumericAtts

65772.42

Maximum of means among attributes of the numeric type.

MinNominalAttDistinctValues

The minimal number of distinct values among attributes of the nominal type.

PercentageOfBinaryFeatures

11.11

Percentage of binary attributes.

Quartile2StdDevOfNumericAtts

73.92

Second quartile (Median) of standard deviation of attributes of the numeric type.

RandomTreeDepth1AUC

0.81

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.RandomTree -depth 1

Dimensionality

0.04

Number of attributes divided by the number of instances.

MaxMutualInformation

0.02

Maximum mutual information between the nominal attributes and the target attribute.

MinSkewnessOfNumericAtts

-0.37

Minimum skewness among attributes of the numeric type.

PercentageOfInstancesWithMissingValues

7.18

Percentage of instances having missing values.

Quartile3AttributeEntropy

1.57

Third quartile of entropy among attributes.

RandomTreeDepth1ErrRate

0.17

Error rate achieved by the landmarker weka.classifiers.trees.RandomTree -depth 1

EquivalentNumberOfAtts

38.17

Number of attributes needed to optimally describe the class (under the assumption of independence among attributes). Equals ClassEntropy divided by MeanMutualInformation.

MaxNominalAttDistinctValues

The maximum number of distinct values among attributes of the nominal type.

MinStdDevOfNumericAtts

8.57

Minimum standard deviation of attributes of the numeric type.

PercentageOfMissingValues

0.8

Percentage of missing values.

Quartile3KurtosisOfNumericAtts

22.29

Third quartile of kurtosis among attributes of the numeric type.

AutoCorrelation

Average class difference between consecutive instances.

RandomTreeDepth1Kappa

0.62

Kappa coefficient achieved by the landmarker weka.classifiers.trees.RandomTree -depth 1

J48.00001.AUC

0.78

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.J48 -C .00001

MaxSkewnessOfNumericAtts

4.39

Maximum skewness among attributes of the numeric type.

MinorityClassPercentage

35.89

Percentage of instances belonging to the least frequent class.

PercentageOfNumericFeatures

77.78

Percentage of numeric attributes.

Quartile3MeansOfNumericAtts

1054.85

Third quartile of means among attributes of the numeric type.

CfsSubsetEval_DecisionStumpAUC

0.78

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.DecisionStump -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

RandomTreeDepth2AUC

0.81

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.RandomTree -depth 2

J48.00001.ErrRate

0.18

Error rate achieved by the landmarker weka.classifiers.trees.J48 -C .00001

MaxStdDevOfNumericAtts

29164.75

Maximum standard deviation of attributes of the numeric type.

MinorityClassSize

Number of instances belonging to the least frequent class.

PercentageOfSymbolicFeatures

22.22

Percentage of nominal attributes.

Quartile3MutualInformation

0.02

Third quartile of mutual information between the nominal attributes and the target attribute.

CfsSubsetEval_DecisionStumpErrRate

0.18

Error rate achieved by the landmarker weka.classifiers.trees.DecisionStump -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

RandomTreeDepth2ErrRate

0.17

Error rate achieved by the landmarker weka.classifiers.trees.RandomTree -depth 2

J48.00001.Kappa

0.59

Kappa coefficient achieved by the landmarker weka.classifiers.trees.J48 -C .00001

MeanAttributeEntropy

1.57

Average entropy of the attributes.

NaiveBayesAUC

0.95

Area Under the ROC Curve achieved by the landmarker weka.classifiers.bayes.NaiveBayes

Quartile1AttributeEntropy

1.57

First quartile of entropy among attributes.

Quartile3SkewnessOfNumericAtts

4.15

Third quartile of skewness among attributes of the numeric type.

CfsSubsetEval_DecisionStumpKappa

0.59

Kappa coefficient achieved by the landmarker weka.classifiers.trees.DecisionStump -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

RandomTreeDepth2Kappa

0.62

Kappa coefficient achieved by the landmarker weka.classifiers.trees.RandomTree -depth 2

J48.0001.AUC

0.78

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.J48 -C .0001

MeanKurtosisOfNumericAtts

7.39

Mean kurtosis among attributes of the numeric type.

NaiveBayesErrRate

0.11

Error rate achieved by the landmarker weka.classifiers.bayes.NaiveBayes

Quartile1KurtosisOfNumericAtts

-0.82

First quartile of kurtosis among attributes of the numeric type.

Quartile3StdDevOfNumericAtts

218.02

Third quartile of standard deviation of attributes of the numeric type.

CfsSubsetEval_NaiveBayesAUC

0.78

Area Under the ROC Curve achieved by the landmarker weka.classifiers.bayes.NaiveBayes -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

RandomTreeDepth3AUC

0.81

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.RandomTree -depth 3

J48.0001.ErrRate

0.18

Error rate achieved by the landmarker weka.classifiers.trees.J48 -C .0001

MeanMeansOfNumericAtts

9607.62

Mean of means among attributes of the numeric type.

MeanMutualInformation

0.02

Average mutual information between the nominal attributes and the target attribute.

NaiveBayesKappa

0.76

Kappa coefficient achieved by the landmarker weka.classifiers.bayes.NaiveBayes

Quartile1MeansOfNumericAtts

32.16

First quartile of means among attributes of the numeric type.

REPTreeDepth1AUC

0.74

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.REPTree -L 1

CfsSubsetEval_NaiveBayesErrRate

0.18

Error rate achieved by the landmarker weka.classifiers.bayes.NaiveBayes -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

RandomTreeDepth3ErrRate

0.17

Error rate achieved by the landmarker weka.classifiers.trees.RandomTree -depth 3

J48.0001.Kappa

0.59

Kappa coefficient achieved by the landmarker weka.classifiers.trees.J48 -C .0001

MeanNoiseToSignalRatio

62.65

An estimate of the amount of irrelevant information in the attributes regarding the class. Equals (MeanAttributeEntropy - MeanMutualInformation) divided by MeanMutualInformation.

NumberOfBinaryFeatures

Number of binary attributes.

Quartile1MutualInformation

0.02

First quartile of mutual information between the nominal attributes and the target attribute.

REPTreeDepth1ErrRate

0.23

Error rate achieved by the landmarker weka.classifiers.trees.REPTree -L 1

CfsSubsetEval_NaiveBayesKappa

0.59

Kappa coefficient achieved by the landmarker weka.classifiers.bayes.NaiveBayes -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

RandomTreeDepth3Kappa

0.62

Kappa coefficient achieved by the landmarker weka.classifiers.trees.RandomTree -depth 3

J48.001.AUC

0.78

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.J48 -C .001

MeanNominalAttDistinctValues

4.5

Average number of distinct values among the attributes of the nominal type.

Quartile1SkewnessOfNumericAtts

-0.12

First quartile of skewness among attributes of the numeric type.

REPTreeDepth1Kappa

0.47

Kappa coefficient achieved by the landmarker weka.classifiers.trees.REPTree -L 1

CfsSubsetEval_kNN1NAUC

0.78

Area Under the ROC Curve achieved by the landmarker weka.classifiers.lazy.IBk -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

StdvNominalAttDistinctValues

3.54

Standard deviation of the number of distinct values among attributes of the nominal type.

J48.001.ErrRate

0.18

Error rate achieved by the landmarker weka.classifiers.trees.J48 -C .001

Show all 107 properties

24 tasks

Supervised Classification on biomed

508 runs - estimation_procedure: 10-fold Crossvalidation - evaluation_measure: predictive_accuracy - target_feature: class

Supervised Classification on biomed

220 runs - estimation_procedure: 10 times 10-fold Crossvalidation - evaluation_measure: predictive_accuracy - target_feature: class

Supervised Classification on biomed

31 runs - estimation_procedure: 10-fold Crossvalidation - evaluation_measure: precision - target_feature: class

Supervised Classification on biomed

0 runs - estimation_procedure: 33% Holdout set - target_feature: class

Learning Curve on biomed

0 runs - estimation_procedure: 10-fold Learning Curve - target_feature: class

Learning Curve on biomed

0 runs - estimation_procedure: 10-fold Learning Curve - target_feature: class

Learning Curve on biomed

0 runs - estimation_procedure: 10-fold Learning Curve - target_feature: class

Learning Curve on biomed

0 runs - estimation_procedure: 10-fold Learning Curve - target_feature: class

Learning Curve on biomed

0 runs - estimation_procedure: 10-fold Learning Curve - target_feature: class

Learning Curve on biomed

0 runs - estimation_procedure: 10-fold Learning Curve - target_feature: class

Learning Curve on biomed

0 runs - estimation_procedure: 10-fold Learning Curve - target_feature: class

Learning Curve on biomed

0 runs - estimation_procedure: 10-fold Learning Curve - target_feature: class

Supervised Data Stream Classification on biomed

0 runs - estimation_procedure: Interleaved Test then Train - target_feature: class

Clustering on biomed