OpenML

JavaScript is required to properly view the contents of this page!

datatrieve

active ARFF Publicly available Visibility: public Uploaded 06-10-2014 by Joaquin Vanschoren
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes

Issue	#Downvotes for this reason	By

Loading wiki

Help us complete this description Edit

Author: Source: Unknown - Date unknown Please cite: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% This is a PROMISE Software Engineering Repository data set made publicly available in order to encourage repeatable, verifiable, refutable, and/or improvable predictive models of software engineering. If you publish material based on PROMISE data sets then, please follow the acknowledgment guidelines posted on the PROMISE repository web page http://promise.site.uottawa.ca/SERepository . %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 1. Title/Topic: The transition of the DATATRIEVE product from version 6.0 to version 6.1 2. Sources: -- Creators: DATATRIEVETM project carried out at Digital Engineering Italy -- Donor: Guenther Ruhe -- Date: January 15, 2005 3. Past usage: A hybrid approach to analyze empirical software engineering data and its application to predict module fault-proneness in maintenance Source Journal of Systems and Software archive Volume 53 , Issue 3 (September 2000) table of contents Pages: 225 - 237 Year of Publication: 2000 ISSN:0164-1212 Authors Sandro Morasca Gunther Ruhe 4. Relevant information: The DATATRIEVE product was undergoing both adaptive (DATATRIEVE was being transferred from platform OpenVMS/VAX to platform OpenVMS/Alpha) and corrective maintenance (failures reported from customers were being fixed) at the Gallarate (Italy) site of Digital Engineering. The DATATRIEVE product was originally developed in the BLISS language. BLISS is an expression language. It is block-structured, with exception handling facilities, coroutines, and a macro system. It was one of the first non-assembly languages for operating system implementation.. Some parts were later added or rewritten in the C language. Therefore, the overall structure of DATATRIEVE is composed of C functions and BLISS subroutines. The empirical study of this data set reports only the BLISS part, by far the bigger one. In what follows, we will use the term "module" to refer to a BLISS module, i.e., a set of declarations and subroutines usually belonging to one file. More than 100 BLISS modules have been studied. It was important to the DATATRIEVE team to better understand how the characteristics of the modules and transition process were correlated with the code quality. The objective of the data analysis was to study whether it was possible to classify modules as non-faulty or faulty, based on a set of measures collected on the project. 5. Number of records: 130 6. Number of attributes: 9 8 condition attributes 1 decision attribute 7. Attribute Information: 1. LOC6_0: number of lines of code of module m in version 6.0. 2. LOC6_1: number of lines of code of module m in version 6.1. 3. AddedLOC: number of lines of code that were added to module m in version 6.1, i.e., they were not present in module m in version 6.0. 4. DeletedLOC: number of lines of code that were deleted from module m in version 6.0, i.e., they were no longer present in module m in version 6.1. 5. DifferentBlocks: number of different blocks module m in between versions 6.0 and 6.1. 6. ModificationRate: rate of modification of module m, i.e., (AddedLOC + DeletedLOC) / (LOC6.0 + AddedLOC). 7. ModuleKnowledge: subjective variable that expresses the project team's knowledge on module m (low or high) 8. ReusedLOC: number of lines of code of module m in version 6.0 reused in module m in version 6.1. 9. Faulty6_1: its value is 0 for all those modules in which no faults were found; its value is 1 for all other modules. 8. Missing attributes: none 9. Class Distribution: 0: 119 = 91.54% 1: 11 = 8.46% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

9 features

Faulty6_1 (target)	nominal	2 unique values 0 missing
LOC6_0	numeric	125 unique values 0 missing
LOC6_1	numeric	123 unique values 0 missing
Added_LoC	numeric	103 unique values 0 missing
Del_LoC	numeric	98 unique values 0 missing
Diff_Block	numeric	58 unique values 0 missing
Mod_Rate	numeric	47 unique values 0 missing
Mod_Know	numeric	2 unique values 0 missing
ReusedLoC	numeric	122 unique values 0 missing

Show all 9 features

107 properties

NumberOfInstances

130

Number of instances (rows) of the dataset.

NumberOfFeatures

Number of attributes (columns) of the dataset.

NumberOfClasses

Number of distinct values of the target attribute (if it is nominal).

NumberOfMissingValues

Number of missing values in the dataset.

NumberOfInstancesWithMissingValues

Number of instances with at least one value missing.

NumberOfNumericFeatures

Number of numeric attributes.

NumberOfSymbolicFeatures

Number of nominal attributes.

REPTreeDepth3Kappa

-0.01

Kappa coefficient achieved by the landmarker weka.classifiers.trees.REPTree -L 3

DecisionStumpKappa

-0.01

Kappa coefficient achieved by the landmarker weka.classifiers.trees.DecisionStump

MaxMeansOfNumericAtts

902.87

Maximum of means among attributes of the numeric type.

MinMutualInformation

Minimal mutual information between the nominal attributes and the target attribute.

Quartile2SkewnessOfNumericAtts

1.71

Second quartile (Median) of skewness among attributes of the numeric type.

RandomTreeDepth1AUC

0.64

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.RandomTree -depth 1

Dimensionality

0.07

Number of attributes divided by the number of instances.

MaxMutualInformation

Maximum mutual information between the nominal attributes and the target attribute.

MinNominalAttDistinctValues

The minimal number of distinct values among attributes of the nominal type.

PercentageOfBinaryFeatures

11.11

Percentage of binary attributes.

Quartile2StdDevOfNumericAtts

114.79

Second quartile (Median) of standard deviation of attributes of the numeric type.

RandomTreeDepth1ErrRate

0.13

Error rate achieved by the landmarker weka.classifiers.trees.RandomTree -depth 1

EquivalentNumberOfAtts

Number of attributes needed to optimally describe the class (under the assumption of independence among attributes). Equals ClassEntropy divided by MeanMutualInformation.

MaxNominalAttDistinctValues

The maximum number of distinct values among attributes of the nominal type.

MinSkewnessOfNumericAtts

0.16

Minimum skewness among attributes of the numeric type.

PercentageOfInstancesWithMissingValues

Percentage of instances having missing values.

Quartile3AttributeEntropy

Third quartile of entropy among attributes.

RandomTreeDepth1Kappa

0.25

Kappa coefficient achieved by the landmarker weka.classifiers.trees.RandomTree -depth 1

J48.00001.AUC

0.59

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.J48 -C .00001

MaxSkewnessOfNumericAtts

2.06

Maximum skewness among attributes of the numeric type.

MinStdDevOfNumericAtts

0.5

Minimum standard deviation of attributes of the numeric type.

PercentageOfMissingValues

Percentage of missing values.

Quartile3KurtosisOfNumericAtts

6.18

Third quartile of kurtosis among attributes of the numeric type.

AutoCorrelation

0.95

Average class difference between consecutive instances.

RandomTreeDepth2AUC

0.64

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.RandomTree -depth 2

J48.00001.ErrRate

0.08

Error rate achieved by the landmarker weka.classifiers.trees.J48 -C .00001

MaxStdDevOfNumericAtts

838.74

Maximum standard deviation of attributes of the numeric type.

MinorityClassPercentage

8.46

Percentage of instances belonging to the least frequent class.

PercentageOfNumericFeatures

88.89

Percentage of numeric attributes.

Quartile3MeansOfNumericAtts

867.5

Third quartile of means among attributes of the numeric type.

CfsSubsetEval_DecisionStumpAUC

0.59

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.DecisionStump -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

RandomTreeDepth2ErrRate

0.13

Error rate achieved by the landmarker weka.classifiers.trees.RandomTree -depth 2

J48.00001.Kappa

0.13

Kappa coefficient achieved by the landmarker weka.classifiers.trees.J48 -C .00001

MeanAttributeEntropy

Average entropy of the attributes.

MinorityClassSize

Number of instances belonging to the least frequent class.

PercentageOfSymbolicFeatures

11.11

Percentage of nominal attributes.

Quartile3MutualInformation

Third quartile of mutual information between the nominal attributes and the target attribute.

CfsSubsetEval_DecisionStumpErrRate

0.08

Error rate achieved by the landmarker weka.classifiers.trees.DecisionStump -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

RandomTreeDepth2Kappa

0.25

Kappa coefficient achieved by the landmarker weka.classifiers.trees.RandomTree -depth 2

J48.0001.AUC

0.59

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.J48 -C .0001

MeanKurtosisOfNumericAtts

3.49

Mean kurtosis among attributes of the numeric type.

NaiveBayesAUC

0.59

Area Under the ROC Curve achieved by the landmarker weka.classifiers.bayes.NaiveBayes

Quartile1AttributeEntropy

First quartile of entropy among attributes.

Quartile3SkewnessOfNumericAtts

1.94

Third quartile of skewness among attributes of the numeric type.

CfsSubsetEval_DecisionStumpKappa

0.13

Kappa coefficient achieved by the landmarker weka.classifiers.trees.DecisionStump -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

RandomTreeDepth3AUC

0.64

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.RandomTree -depth 3

J48.0001.ErrRate

0.08

Error rate achieved by the landmarker weka.classifiers.trees.J48 -C .0001

MeanMeansOfNumericAtts

357.79

Mean of means among attributes of the numeric type.

NaiveBayesErrRate

0.2

Error rate achieved by the landmarker weka.classifiers.bayes.NaiveBayes

Quartile1KurtosisOfNumericAtts

1.57

First quartile of kurtosis among attributes of the numeric type.

Quartile3StdDevOfNumericAtts

815.84

Third quartile of standard deviation of attributes of the numeric type.

CfsSubsetEval_NaiveBayesAUC

0.59

Area Under the ROC Curve achieved by the landmarker weka.classifiers.bayes.NaiveBayes -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

RandomTreeDepth3ErrRate

0.13

Error rate achieved by the landmarker weka.classifiers.trees.RandomTree -depth 3

J48.0001.Kappa

0.13

Kappa coefficient achieved by the landmarker weka.classifiers.trees.J48 -C .0001

MeanMutualInformation

Average mutual information between the nominal attributes and the target attribute.

NaiveBayesKappa

0.14

Kappa coefficient achieved by the landmarker weka.classifiers.bayes.NaiveBayes

Quartile1MeansOfNumericAtts

24.7

First quartile of means among attributes of the numeric type.

REPTreeDepth1AUC

0.52

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.REPTree -L 1

CfsSubsetEval_NaiveBayesErrRate

0.08

Error rate achieved by the landmarker weka.classifiers.bayes.NaiveBayes -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

RandomTreeDepth3Kappa

0.25

Kappa coefficient achieved by the landmarker weka.classifiers.trees.RandomTree -depth 3

J48.001.AUC

0.59

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.J48 -C .001

MeanNoiseToSignalRatio

An estimate of the amount of irrelevant information in the attributes regarding the class. Equals (MeanAttributeEntropy - MeanMutualInformation) divided by MeanMutualInformation.

NumberOfBinaryFeatures

Number of binary attributes.

Quartile1MutualInformation

First quartile of mutual information between the nominal attributes and the target attribute.

REPTreeDepth1ErrRate

0.09

Error rate achieved by the landmarker weka.classifiers.trees.REPTree -L 1

CfsSubsetEval_NaiveBayesKappa

0.13

Kappa coefficient achieved by the landmarker weka.classifiers.bayes.NaiveBayes -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

CfsSubsetEval_kNN1NAUC

0.59

Area Under the ROC Curve achieved by the landmarker weka.classifiers.lazy.IBk -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

StdvNominalAttDistinctValues

Standard deviation of the number of distinct values among attributes of the nominal type.

J48.001.ErrRate

0.08

Error rate achieved by the landmarker weka.classifiers.trees.J48 -C .001

MeanNominalAttDistinctValues

Average number of distinct values among the attributes of the nominal type.

Quartile1SkewnessOfNumericAtts

1.36

First quartile of skewness among attributes of the numeric type.

REPTreeDepth1Kappa

-0.01

Kappa coefficient achieved by the landmarker weka.classifiers.trees.REPTree -L 1

CfsSubsetEval_kNN1NErrRate

0.08

Error rate achieved by the landmarker weka.classifiers.lazy.IBk -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

kNN1NAUC

0.57

Area Under the ROC Curve achieved by the landmarker weka.classifiers.lazy.IBk

J48.001.Kappa

0.13

Kappa coefficient achieved by the landmarker weka.classifiers.trees.J48 -C .001

MeanSkewnessOfNumericAtts

1.53

Mean skewness among attributes of the numeric type.

Quartile1StdDevOfNumericAtts

15.33

First quartile of standard deviation of attributes of the numeric type.

REPTreeDepth2AUC

0.52

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.REPTree -L 2

CfsSubsetEval_kNN1NKappa

0.13

Kappa coefficient achieved by the landmarker weka.classifiers.lazy.IBk -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W

kNN1NErrRate

0.1

Error rate achieved by the landmarker weka.classifiers.lazy.IBk

MajorityClassPercentage

91.54

Percentage of instances belonging to the most frequent class.

MeanStdDevOfNumericAtts

336.91

Mean standard deviation of attributes of the numeric type.

Quartile2AttributeEntropy

Second quartile (Median) of entropy among attributes.

REPTreeDepth2ErrRate

0.09

Error rate achieved by the landmarker weka.classifiers.trees.REPTree -L 2

ClassEntropy

0.42

Entropy of the target attribute values.

kNN1NKappa

0.19

Kappa coefficient achieved by the landmarker weka.classifiers.lazy.IBk

MajorityClassSize

119

Number of instances belonging to the most frequent class.

MinAttributeEntropy

Minimal entropy among attributes.

Quartile2KurtosisOfNumericAtts

3.67

Second quartile (Median) of kurtosis among attributes of the numeric type.

REPTreeDepth2Kappa

-0.01

Kappa coefficient achieved by the landmarker weka.classifiers.trees.REPTree -L 2

REPTreeDepth3AUC

0.52

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.REPTree -L 3

DecisionStumpAUC

0.65

Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.DecisionStump

MaxAttributeEntropy

Maximum entropy among attributes.

MinKurtosisOfNumericAtts

-2.01

Minimum kurtosis among attributes of the numeric type.

Quartile2MeansOfNumericAtts

114.05

Second quartile (Median) of means among attributes of the numeric type.

REPTreeDepth3ErrRate

0.09

Error rate achieved by the landmarker weka.classifiers.trees.REPTree -L 3

DecisionStumpErrRate

0.09

Error rate achieved by the landmarker weka.classifiers.trees.DecisionStump

MaxKurtosisOfNumericAtts

6.88

Maximum kurtosis among attributes of the numeric type.

MinMeansOfNumericAtts

1.46

Minimum of means among attributes of the numeric type.

Quartile2MutualInformation

Second quartile (Median) of mutual information between the nominal attributes and the target attribute.

Show all 107 properties

14 tasks

Supervised Classification on datatrieve

699 runs - estimation_procedure: 10-fold Crossvalidation - evaluation_measure: predictive_accuracy - target_feature: Faulty6_1

Supervised Classification on datatrieve

209 runs - estimation_procedure: 10 times 10-fold Crossvalidation - evaluation_measure: predictive_accuracy - target_feature: Faulty6_1

Supervised Data Stream Classification on datatrieve

0 runs - estimation_procedure: Interleaved Test then Train - target_feature: Faulty6_1

Clustering on datatrieve