{ "data_id": "1075", "name": "datatrieve", "exact_name": "datatrieve", "version": 1, "version_label": null, "description": "**Author**: \n**Source**: Unknown - Date unknown \n**Please cite**: \n\n%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%\nThis is a PROMISE Software Engineering Repository data set made publicly\navailable in order to encourage repeatable, verifiable, refutable, and\/or\nimprovable predictive models of software engineering.\n\nIf you publish material based on PROMISE data sets then, please\nfollow the acknowledgment guidelines posted on the PROMISE repository\nweb page http:\/\/promise.site.uottawa.ca\/SERepository .\n%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%\n1. Title\/Topic: The transition of the DATATRIEVE product from version 6.0 to\nversion 6.1\n\n2. Sources:\n-- Creators: DATATRIEVETM project carried out at Digital Engineering Italy\n-- Donor: Guenther Ruhe\n-- Date: January 15, 2005\n3. Past usage:\n\nA hybrid approach to analyze empirical software engineering data\nand its application to predict module fault-proneness in maintenance\nSource \tJournal of Systems and Software archive\nVolume 53 , Issue 3 (September 2000) table of contents\nPages: 225 - 237\nYear of Publication: 2000\nISSN:0164-1212\nAuthors\nSandro Morasca\nGunther Ruhe\n4. Relevant information:\n\nThe DATATRIEVE product was undergoing both adaptive (DATATRIEVE was being transferred\nfrom platform OpenVMS\/VAX to platform OpenVMS\/Alpha) and corrective maintenance\n(failures reported from customers were being fixed) at the Gallarate (Italy)\nsite of Digital Engineering.\n\nThe DATATRIEVE product was originally developed in the BLISS language. BLISS is an\nexpression language. It is block-structured, with exception handling facilities, coroutines,\nand a macro system. It was one of the first non-assembly languages for operating system\nimplementation.. Some parts were later added or rewritten in the C language. Therefore, the\noverall structure of DATATRIEVE is composed of C functions and BLISS subroutines.\n\nThe empirical study of this data set reports only the BLISS part, by far the bigger one.\nIn what follows, we will use the term \"module\" to refer to a BLISS module, i.e., a set of\ndeclarations and subroutines usually belonging to one file. More than 100 BLISS modules\nhave been studied. It was important to the DATATRIEVE team to better understand how the\ncharacteristics of the modules and transition process were correlated with the code quality.\n\nThe objective of the data analysis was to study whether it was possible to classify modules as\nnon-faulty or faulty, based on a set of measures collected on the project.\n\n5. Number of records: 130\n6. Number of attributes: 9\n8 condition attributes\n1 decision attribute\n7. Attribute Information:\n\n1. LOC6_0: number of lines of code of module m in version 6.0.\n2. LOC6_1: number of lines of code of module m in version 6.1.\n3. AddedLOC: number of lines of code that were added to module m in version 6.1, i.e., they\nwere not present in module m in version 6.0.\n4. DeletedLOC: number of lines of code that were deleted from module m in version 6.0, i.e.,\nthey were no longer present in module m in version 6.1.\n5. DifferentBlocks: number of different blocks module m in between versions 6.0 and 6.1.\n6. ModificationRate: rate of modification of module m, i.e.,\n(AddedLOC + DeletedLOC) \/ (LOC6.0 + AddedLOC).\n7. ModuleKnowledge: subjective variable that expresses the project team's knowledge on\nmodule m (low or high)\n8. ReusedLOC: number of lines of code of module m in version 6.0 reused in module m in\nversion 6.1.\n9. Faulty6_1: its value is 0 for all those modules in which no faults were found;\nits value is 1 for all other modules.\n\n8. Missing attributes: none\n\n9. Class Distribution:\n0: 119 = 91.54%\n1: 11 = 8.46%\n%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%", "format": "ARFF", "uploader": "Joaquin Vanschoren", "uploader_id": 2, "visibility": "public", "creator": "Guenther Ruhe", "contributor": null, "date": "2014-10-06 23:57:57", "update_comment": null, "last_update": "2014-10-06 23:57:57", "licence": "Public", "status": "active", "error_message": null, "url": "https:\/\/www.openml.org\/data\/download\/53958\/datatrieve.arff", "default_target_attribute": "Faulty6_1", "row_id_attribute": null, "ignore_attribute": null, "runs": 908, "suggest": { "input": [ "datatrieve", "%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% This is a PROMISE Software Engineering Repository data set made publicly available in order to encourage repeatable, verifiable, refutable, and\/or improvable predictive models of software engineering. If you publish material based on PROMISE data sets then, please follow the acknowledgment guidelines posted on the PROMISE repository web page http:\/\/promise.site.uottawa.ca\/SERepository . %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% " ], "weight": 5 }, "qualities": { "NumberOfInstances": 130, "NumberOfFeatures": 9, "NumberOfClasses": 2, "NumberOfMissingValues": 0, "NumberOfInstancesWithMissingValues": 0, "NumberOfNumericFeatures": 8, "NumberOfSymbolicFeatures": 1, "REPTreeDepth3Kappa": -0.014304291287385744, "DecisionStumpKappa": -0.014304291287385744, "MaxMeansOfNumericAtts": 902.8692307692307, "MinMutualInformation": null, "Quartile2SkewnessOfNumericAtts": 1.7118631118183498, "RandomTreeDepth1AUC": 0.6398013750954927, "Dimensionality": 0.06923076923076923, "MaxMutualInformation": null, "MinNominalAttDistinctValues": 2, "PercentageOfBinaryFeatures": 11.11111111111111, "Quartile2StdDevOfNumericAtts": 114.79042647876088, "RandomTreeDepth1ErrRate": 0.13076923076923078, "EquivalentNumberOfAtts": null, "MaxNominalAttDistinctValues": 2, "MinSkewnessOfNumericAtts": 0.15611042873498557, "PercentageOfInstancesWithMissingValues": 0, "Quartile3AttributeEntropy": null, "RandomTreeDepth1Kappa": 0.2488103331067303, "J48.00001.AUC": 0.5863254392666157, "MaxSkewnessOfNumericAtts": 2.0574514255877006, "MinStdDevOfNumericAtts": 0.5004470273579541, "PercentageOfMissingValues": 0, "Quartile3KurtosisOfNumericAtts": 6.181717710007378, "AutoCorrelation": 0.9457364341085271, "RandomTreeDepth2AUC": 0.6398013750954927, "J48.00001.ErrRate": 0.08461538461538462, "MaxStdDevOfNumericAtts": 838.738887110489, "MinorityClassPercentage": 8.461538461538462, "PercentageOfNumericFeatures": 88.88888888888889, "Quartile3MeansOfNumericAtts": 867.501923076923, "CfsSubsetEval_DecisionStumpAUC": 0.5863254392666157, "RandomTreeDepth2ErrRate": 0.13076923076923078, "J48.00001.Kappa": 0.13122721749696156, "MeanAttributeEntropy": null, "MinorityClassSize": 11, "PercentageOfSymbolicFeatures": 11.11111111111111, "Quartile3MutualInformation": null, "CfsSubsetEval_DecisionStumpErrRate": 0.08461538461538462, "RandomTreeDepth2Kappa": 0.2488103331067303, "J48.0001.AUC": 0.5863254392666157, "MeanKurtosisOfNumericAtts": 3.4910537240569135, "NaiveBayesAUC": 0.5897631779984722, "Quartile1AttributeEntropy": null, "Quartile3SkewnessOfNumericAtts": 1.9448110929487785, "CfsSubsetEval_DecisionStumpKappa": 0.13122721749696156, "RandomTreeDepth3AUC": 0.6398013750954927, "J48.0001.ErrRate": 0.08461538461538462, "MeanMeansOfNumericAtts": 357.7903846153846, "NaiveBayesErrRate": 0.2, "Quartile1KurtosisOfNumericAtts": 1.5664830682709354, "Quartile3StdDevOfNumericAtts": 815.8438986322116, "CfsSubsetEval_NaiveBayesAUC": 0.5863254392666157, "RandomTreeDepth3ErrRate": 0.13076923076923078, "J48.0001.Kappa": 0.13122721749696156, "MeanMutualInformation": null, "NaiveBayesKappa": 0.13643331630046024, "Quartile1MeansOfNumericAtts": 24.69615384615384, "REPTreeDepth1AUC": 0.5183346065699007, "CfsSubsetEval_NaiveBayesErrRate": 0.08461538461538462, "RandomTreeDepth3Kappa": 0.2488103331067303, "J48.001.AUC": 0.5863254392666157, "MeanNoiseToSignalRatio": null, "NumberOfBinaryFeatures": 1, "Quartile1MutualInformation": null, "REPTreeDepth1ErrRate": 0.09230769230769231, "CfsSubsetEval_NaiveBayesKappa": 0.13122721749696156, "CfsSubsetEval_kNN1NAUC": 0.5863254392666157, "StdvNominalAttDistinctValues": 0, "J48.001.ErrRate": 0.08461538461538462, "MeanNominalAttDistinctValues": 2, "Quartile1SkewnessOfNumericAtts": 1.3596807337002856, "REPTreeDepth1Kappa": -0.014304291287385744, "CfsSubsetEval_kNN1NErrRate": 0.08461538461538462, "kNN1NAUC": 0.5741023682200151, "J48.001.Kappa": 0.13122721749696156, "MeanSkewnessOfNumericAtts": 1.5292414236085814, "Quartile1StdDevOfNumericAtts": 15.328184990880494, "REPTreeDepth2AUC": 0.5183346065699007, "CfsSubsetEval_kNN1NKappa": 0.13122721749696156, "kNN1NErrRate": 0.1, "MajorityClassPercentage": 91.53846153846153, "MeanStdDevOfNumericAtts": 336.91079495271276, "Quartile2AttributeEntropy": null, "REPTreeDepth2ErrRate": 0.09230769230769231, "ClassEntropy": 0.41823656965418116, "kNN1NKappa": 0.18671799807507244, "MajorityClassSize": 119, "MinAttributeEntropy": null, "Quartile2KurtosisOfNumericAtts": 3.6713520483798057, "REPTreeDepth2Kappa": -0.014304291287385744, "REPTreeDepth3AUC": 0.5183346065699007, "DecisionStumpAUC": 0.6524064171122994, "MaxAttributeEntropy": null, "MinKurtosisOfNumericAtts": -2.006744762373452, "Quartile2MeansOfNumericAtts": 114.04615384615383, "REPTreeDepth3ErrRate": 0.09230769230769231, "DecisionStumpErrRate": 0.09230769230769231, "MaxKurtosisOfNumericAtts": 6.884163954334905, "MinMeansOfNumericAtts": 1.461538461538462, "Quartile2MutualInformation": null }, "tags": [ { "uploader": "38960", "tag": "Chemistry" }, { "uploader": "38960", "tag": "Life Science" }, { "uploader": "1", "tag": "mythbusting_1" }, { "uploader": "2", "tag": "study_1" }, { "uploader": "3886", "tag": "study_123" }, { "uploader": "939", "tag": "study_15" }, { "uploader": "939", "tag": "study_20" }, { "uploader": "1", "tag": "study_41" }, { "uploader": "64", "tag": "study_52" }, { "uploader": "64", "tag": "study_7" }, { "uploader": "4209", "tag": "study_88" } ], "features": [ { "name": "Faulty6_1", "index": "8", "type": "nominal", "distinct": "2", "missing": "0", "target": "1", "distr": [ [ "0", "1" ], [ [ "119", "0" ], [ "0", "11" ] ] ] }, { "name": "LOC6_0", "index": "0", "type": "numeric", "distinct": "125", "missing": "0", "min": "15", "max": "5408", "mean": "895", "stdev": "836" }, { "name": "LOC6_1", "index": "1", "type": "numeric", "distinct": "123", "missing": "0", "min": "15", "max": "5336", "mean": "903", "stdev": "839" }, { "name": "Added_LoC", "index": "2", "type": "numeric", "distinct": "103", "missing": "0", "min": "2", "max": "524", "mean": "118", "stdev": "119" }, { "name": "Del_LoC", "index": "3", "type": "numeric", "distinct": "98", "missing": "0", "min": "2", "max": "594", "mean": "110", "stdev": "111" }, { "name": "Diff_Block", "index": "4", "type": "numeric", "distinct": "58", "missing": "0", "min": "1", "max": "96", "mean": "26", "stdev": "23" }, { "name": "Mod_Rate", "index": "5", "type": "numeric", "distinct": "47", "missing": "0", "min": "0", "max": "78", "mean": "24", "stdev": "13" }, { "name": "Mod_Know", "index": "6", "type": "numeric", "distinct": "2", "missing": "0", "min": "1", "max": "2", "mean": "1", "stdev": "1" }, { "name": "ReusedLoC", "index": "7", "type": "numeric", "distinct": "122", "missing": "0", "min": "12", "max": "4925", "mean": "785", "stdev": "754" } ], "nr_of_issues": 0, "nr_of_downvotes": 0, "nr_of_likes": 0, "nr_of_downloads": 0, "total_downloads": 0, "reach": 0, "reuse": 0, "impact_of_reuse": 0, "reach_of_reuse": 0, "impact": 0 }