{ "data_id": "1076", "name": "nasa_numeric", "exact_name": "nasa_numeric", "version": 1, "version_label": null, "description": "**Author**: \n**Source**: Unknown - Date unknown \n**Please cite**: \n\n%-*- text -*-\n%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%\nThis is a PROMISE Software Engineering Repository data set made publicly\navailable in order to encourage repeatable, verifiable, refutable, and\/or\nimprovable predictive models of software engineering.\n\nIf you publish material based on PROMISE data sets then, please\nfollow the acknowledgment guidelines posted on the PROMISE repository\nweb page http:\/\/promise.site.uottawa.ca\/SERepository .\n%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%\n1. Title\/Topic: COCOMO NASA 2 \/ Software cost estimation\n2. Sources:\n\n-- 93 NASA projects from different centers\nfor projects from the following years:\n\nn year\n--- ----\n1 1971\n1 1974\n2 1975\n2 1976\n10 1977\n4 1978\n19 1979\n11 1980\n13 1982\n7 1983\n7 1984\n6 1985\n8 1986\n2 1987\n\nCollected by\nJairus Hihn, JPL, NASA, Manager SQIP Measurement &\nBenchmarking Element\nPhone (818) 354-1248 (Jairus.M.Hihn@jpl.nasa.gov)\n\n-- Donor: Tim Menzies (tim@menzies.us)\n\n-- Date: Feb 8 2006\n\n3. Past Usage\nNone with this specific data set. But for older work on similar data, see:\n\n1. \"Validation Methods for Calibrating Software Effort\nModels\", T. Menzies and D. Port and Z. Chen and\nJ. Hihn and S. Stukes, Proceedings ICSE 2005,\nhttp:\/\/menzies.us\/pdf\/04coconut.pdf\n-- Results\n-- Given background knowledge on 60 prior projects,\na new cost model can be tuned to local data using\nas little as 20 new projects.\n-- A very simple calibration method (COCONUT) can\nachieve PRED(30)=7% or PRED(20)=50% (after 20 projects).\nThese are results seen in 30 repeats of an incremental\ncross-validation study.\n-- Two cost models are compared; one based on just\nlines of code and one using over a dozen \"effort\nmultipliers\". Just using lines of code loses 10 to 20\nPRED(N) points.\n\n3.1 Additional Usage:\n2. \"Feature Subset Selection Can Improve Software Cost Estimation Accuracy\"\nZhihao Chen, Tim Menzies, Dan Port and Barry Boehm\nProceedings PROMISE Workshop 2005,\nhttp:\/\/www.etechstyle.com\/chen\/papers\/05fsscocomo.pdf\nP02, P03, P04 are used in this paper.\n-- Results\n-- To the best of our knowledge, this is the first report\nof applying feature subset selection (FSS)\nto software effort data.\n\n-- FSS can dramatically improve cost estimation.\n\n---T-tests are applied to the results to demonstrate\nthat always in our data sets, removing\nattributes improves performance without increasing the\nvariance in model behavior.\n\n4. Relevant Information\n\nThe COCOMO software cost model measures effort in calendar months\nof 152 hours (and includes development and management hours).\nCOCOMO assumes that the effort grows more than linearly on\nsoftware size; i.e. months=a* KSLOC^b*c. Here, \"a\" and \"b\" are\ndomain-specific parameters; \"KSLOC\" is estimated directly or\ncomputed from a function point analysis; and \"c\" is the product\nof over a dozen \"effort multipliers\". I.e.\n\nmonths=a*(KSLOC^b)*(EM1* EM2 * EM3 * ...)\n\nThe effort multipliers are as follows:\n\nincrease | acap | analysts capability\nthese to | pcap | programmers capability\ndecrease | aexp | application experience\neffort | modp | modern programing practices\n| tool | use of software tools\n| vexp | virtual machine experience\n| lexp | language experience\n----------+------+---------------------------\n| sced | schedule constraint\n----------+------+---------------------------\ndecrease | stor | main memory constraint\nthese to | data | data base size\ndecrease | time | time constraint for cpu\neffort | turn | turnaround time\n| virt | machine volatility\n| cplx | process complexity\n| rely | required software reliability\n\nIn COCOMO I, the exponent on KSLOC was a single value ranging from\n1.05 to 1.2. In COCOMO II, the exponent \"b\" was divided into a\nconstant, plus the sum of five \"scale factors\" which modeled\nissues such as ``have we built this kind of system before?''. The\nCOCOMO~II effort multipliers are similar but COCOMO~II dropped one\nof the effort multiplier parameters; renamed some others; and\nadded a few more (for \"required level of reuse\", \"multiple-site\ndevelopment\", and \"schedule pressure\").\n\nThe effort multipliers fall into three groups: those that are\npositively correlated to more effort; those that are\nnegatively correlated to more effort; and a third group\ncontaining just schedule information. In COCOMO~I, \"sced\" has a\nU-shaped correlation to effort; i.e. giving programmers either\ntoo much or too little time to develop a system can be\ndetrimental.\n\nThe numeric values of the effort multipliers are:\n\nvery\t\t\t\tvery\textra\tproductivity\nlow\tlow\tnominal\thigh\thigh\thigh\trange\n---------------------------------------------------------------------\nacap\t1.46 \t1.19 \t1.00 \t0.86 \t0.71 \t\t2.06\npcap\t1.42. \t1.17 \t1.00 \t0.86 \t0.70 \t\t1.67\naexp \t1.29 \t1.13 \t1.00 \t0.91 \t0.82 \t\t1.57\nmodp \t1.24. \t1.10 \t1.00 \t0.91 \t0.82 \t\t1.34\ntool \t1.24 \t1.10 \t1.00 \t0.91 \t0.83 \t\t1.49\nvexp \t1.21 \t1.10 \t1.00 \t0.90 \t \t\t1.34\nlexp \t1.14 \t1.07 \t1.00 \t0.95 \t \t\t1.20\nsced \t1.23 \t1.08 \t1.00 \t1.04 \t1.10 \t \te\nstor \t \t \t1.00 \t1.06 \t1.21 \t1.56\t-1.21\ndata \t \t 0.94 \t1.00 \t1.08 \t1.16\t\t-1.23\ntime \t \t \t1.00 \t1.11 \t1.30 \t1.66\t-1.30\nturn \t \t0.87 \t1.00 \t1.07 \t1.15 \t\t-1.32\nvirt \t \t0.87 \t1.00 \t1.15 \t1.30 \t\t-1.49\nrely \t0.75\t 0.88\t 1.00 \t 1.15 \t 1.40\t\t-1.87\ncplx \t0.70 \t0.85 \t1.00 \t1.15 \t1.30 \t1.65\t-2.36\n\nThese were learnt by Barry Boehm after a regression analysis of the\nprojects in the COCOMO I data set.\n@Book{boehm81,\nAuthor =\t \"B. Boehm\",\nTitle =\t \"Software Engineering Economics\",\nPublisher =\t \"Prentice Hall\",\nYear =\t 1981}\n\nThe last column of the above table shows max(E)\/min(EM) and shows\nthe overall effect of a single effort multiplier. For example,\nincreasing \"acap\" (analyst experience) from very low to very\nhigh will most decrease effort while increasing \"rely\"\n(required reliability) from very low to very high will most\nincrease effort.\n\nThere is much more to COCOMO that the above description. The\nCOCOMO~II text is over 500 pages long and offers\nall the details needed to implement data capture and analysis of\nCOCOMO in an industrial context.\n@Book{boehm00b,\nAuthor = \"Barry Boehm and Ellis Horowitz and Ray Madachy and\nDonald Reifer and Bradford K. Clark and Bert Steece\nand A. Winsor Brown and Sunita Chulani and Chris Abts\",\nTitle = \"Software Cost Estimation with Cocomo II\",\nPublisher = \"Prentice Hall\",\nYear = 2000,\nibsn = \"0130266922\"}\n\nIncluded in that book is not just an effort model but other\nmodels for schedule, risk, use of COTS, etc. However, most\n(?all) of the validation work on COCOMO has focused on the effort\nmodel.\n@article{chulani99,\nauthor =\t \"S. Chulani and B. Boehm and B. Steece\",\ntitle =\t \"Bayesian Analysis of Empirical Software Engineering\nCost Models\",\njournal =\t \"IEEE Transaction on Software Engineering\",\nvolume =\t 25,\nnumber =\t 4,\nmonth =\t \"July\/August\",\nyear =\t \"1999\"}\n\nThe value of an effort predictor can be reported many ways\nincluding MMRE and PRED(N).MMRE and PRED are computed from the\nrelative error, or RE, which is the relative size of the\ndifference between the actual and estimated value:\n\nRE.i = (estimate.i - actual.i) \/ (actual.i)\n\nGiven a data set of of size \"D\", a \"Train\"ing set of size\n\"(X=|Train|) <= D\", and a \"test\" set of size \"T=D-|Train|\", then\nthe mean magnitude of the relative error, or MMRE, is the\npercentage of the absolute values of the relative errors,\naveraged over the \"T\" items in the \"Test\" set; i.e.\n\nMRE.i = abs(RE.i)\nMMRE.i = 100\/T*( MRE.1 + MRE.2 + ... + MRE.T)\n\nPRED(N) reports the average percentage of estimates that were\nwithin N% of the actual values:\n\ncount=0\nfor(i=1;i<=T;i++) do if (MRE.i <= N\/100) then count++ fi done\nPRED(N) = 100\/T * sum\n\nFor example, e.g. PRED(30)=50% means that half the estimates are\nwithin 30% of the actual. Shepperd and Schofield comment that\n\"MMRE is fairly conservative with a bias against overestimates\nwhile Pred(25) will identify those prediction systems that are\ngenerally accurate but occasionally wildly inaccurate\".\n@article{shepperd97,\nauthor=\"M. Shepperd and C. Schofield\",\ntitle=\"Estimating Software Project Effort Using Analogies\",\njournal=\"IEEE Transactions on Software Engineering\",\nvolume=23,\nnumber=12,\nmonth=\"November\",\nyear=1997,\nnote=\"Available from\n\\url{http:\/\/www.utdallas.edu\/~rbanker\/SE_XII.pdf}\"}\n\n5. Number of instances: 93\n6. Number of attributes: 24\n- 15 standard COCOMO-I discrete attributes in the range Very_Low to\nExtra_High\n- 7 others describing the project;\n- one lines of code measure,\n- one goal field being the actual effort in person months.\n7. Attribute information:\nUnique id\nproject name\ncagetory of application\nflight or ground system?\nwhich nasa center?\nyear of development\ndevelopment mode\ncocomo attributes: described above in section 4\nequivalent physical 1000 lines of source code\ndevelopment effort in months (one month =152 hours and includes development and management hours)\nSection 8. Missing attributes: none\nSection 9: Distribution of class values\n\n# development months\n== ==================\n46 0 - 499\n28 500 - 999\n7 1000 - 1499\n3 1500 - 1999\n3 2000 - 2499\n3 2500 - 2999\n0 3000 - 3999\n1 4000 - 4499\n1 4500 - 4999\n0 5000 - 7999\n1 8000", "format": "ARFF", "uploader": "Joaquin Vanschoren", "uploader_id": 2, "visibility": "public", "creator": "Jairus Hihn", "contributor": null, "date": "2014-10-06 23:57:58", "update_comment": null, "last_update": "2014-10-06 23:57:58", "licence": "Public", "status": "active", "error_message": null, "url": "https:\/\/www.openml.org\/data\/download\/53959\/nasa_numeric.arff", "kaggle_url": null, "default_target_attribute": "act_effort", "row_id_attribute": null, "ignore_attribute": null, "runs": 2, "suggest": { "input": [ "nasa_numeric", "%-*- text -*- %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% This is a PROMISE Software Engineering Repository data set made publicly available in order to encourage repeatable, verifiable, refutable, and\/or improvable predictive models of software engineering. If you publish material based on PROMISE data sets then, please follow the acknowledgment guidelines posted on the PROMISE repository web page http:\/\/promise.site.uottawa.ca\/SERepository . %%%%%%%%%%%%%%%%%%%%%% " ], "weight": 5 }, "qualities": { "NumberOfInstances": 93, "NumberOfFeatures": 24, "NumberOfClasses": 0, "NumberOfMissingValues": 0, "NumberOfInstancesWithMissingValues": 0, "NumberOfNumericFeatures": 4, "NumberOfSymbolicFeatures": 20, "Quartile3MutualInformation": null, "CfsSubsetEval_DecisionStumpErrRate": null, "RandomTreeDepth2ErrRate": null, "J48.00001.Kappa": null, "MeanAttributeEntropy": null, "MinorityClassSize": null, "PercentageOfSymbolicFeatures": 83.33333333333334, "Quartile3SkewnessOfNumericAtts": 4.122936158734755, "CfsSubsetEval_DecisionStumpKappa": null, "RandomTreeDepth2Kappa": null, "J48.0001.AUC": null, "MeanKurtosisOfNumericAtts": 10.54885486009431, "NaiveBayesAUC": null, "Quartile1AttributeEntropy": null, "Quartile3StdDevOfNumericAtts": 885.3455356987022, "CfsSubsetEval_NaiveBayesAUC": null, "RandomTreeDepth3AUC": null, "J48.0001.ErrRate": null, "MeanMeansOfNumericAtts": 686.7536290322581, "NaiveBayesErrRate": null, "Quartile1KurtosisOfNumericAtts": -0.8813415072001536, "REPTreeDepth1AUC": null, "CfsSubsetEval_NaiveBayesErrRate": null, "RandomTreeDepth3ErrRate": null, "J48.0001.Kappa": null, "MeanMutualInformation": null, "NaiveBayesKappa": null, "Quartile1MeansOfNumericAtts": 59.32002688172042, "REPTreeDepth1ErrRate": null, "CfsSubsetEval_NaiveBayesKappa": null, "RandomTreeDepth3Kappa": null, "J48.001.AUC": null, "MeanNoiseToSignalRatio": null, "NumberOfBinaryFeatures": 1, "Quartile1MutualInformation": null, "REPTreeDepth1Kappa": null, "CfsSubsetEval_kNN1NAUC": null, "StdvNominalAttDistinctValues": 2.543826376067939, "J48.001.ErrRate": null, "MeanNominalAttDistinctValues": 4.55, "Quartile1SkewnessOfNumericAtts": -0.041609748757570134, "REPTreeDepth2AUC": null, "CfsSubsetEval_kNN1NErrRate": null, "kNN1NAUC": null, "J48.001.Kappa": null, "MeanSkewnessOfNumericAtts": 2.0027255874222396, "Quartile1StdDevOfNumericAtts": 9.55872840682517, "REPTreeDepth2ErrRate": null, "CfsSubsetEval_kNN1NKappa": null, "kNN1NErrRate": null, "MajorityClassPercentage": null, "MeanStdDevOfNumericAtts": 325.27272937604596, "Quartile2AttributeEntropy": null, "Quartile2KurtosisOfNumericAtts": 10.071694916482553, "REPTreeDepth2Kappa": null, "ClassEntropy": null, "kNN1NKappa": null, "MajorityClassSize": null, "MinAttributeEntropy": null, "Quartile2MeansOfNumericAtts": 359.21693548387117, "REPTreeDepth3AUC": null, "DecisionStumpAUC": null, "MaxAttributeEntropy": null, "MinKurtosisOfNumericAtts": -1.0436312499559004, "Quartile2MutualInformation": null, "REPTreeDepth3ErrRate": null, "DecisionStumpErrRate": null, "MaxKurtosisOfNumericAtts": 23.09566085736804, "MinMeansOfNumericAtts": 47.75268817204301, "Quartile2SkewnessOfNumericAtts": 1.9268503522895337, "REPTreeDepth3Kappa": null, "DecisionStumpKappa": null, "MaxMeansOfNumericAtts": 1980.8279569892472, "MinMutualInformation": null, "Quartile2StdDevOfNumericAtts": 80.91392402261056, "RandomTreeDepth1AUC": null, "Dimensionality": 0.25806451612903225, "MaxMutualInformation": null, "MinNominalAttDistinctValues": 2, "PercentageOfBinaryFeatures": 4.166666666666666, "Quartile3AttributeEntropy": null, "RandomTreeDepth1ErrRate": null, "EquivalentNumberOfAtts": null, "MaxNominalAttDistinctValues": 14, "MinSkewnessOfNumericAtts": -0.0998455124608889, "PercentageOfInstancesWithMissingValues": 0, "Quartile3KurtosisOfNumericAtts": 22.456211171000533, "AutoCorrelation": -691.4565217391304, "RandomTreeDepth1Kappa": null, "J48.00001.AUC": null, "MaxSkewnessOfNumericAtts": 4.2570471575707804, "MinStdDevOfNumericAtts": 3.3350042562842845, "PercentageOfMissingValues": 0, "Quartile3MeansOfNumericAtts": 1641.723924731183, "CfsSubsetEval_DecisionStumpAUC": null, "RandomTreeDepth2AUC": null, "J48.00001.ErrRate": null, "MaxStdDevOfNumericAtts": 1135.9280652026785, "MinorityClassPercentage": null, "PercentageOfNumericFeatures": 16.666666666666664 }, "tags": [], "features": [ { "name": "act_effort", "index": "23", "type": "numeric", "distinct": "74", "missing": "0", "target": "1", "min": "8", "max": "8211", "mean": "624", "stdev": "1136" }, { "name": "recordnumber", "index": "0", "type": "numeric", "distinct": "93", "missing": "0", "min": "1", "max": "101", "mean": "48", "stdev": "28" }, { "name": "projectname", "index": "1", "type": "nominal", "distinct": "8", "missing": "0", "distr": [] }, { "name": "cat2", "index": "2", "type": "nominal", "distinct": "14", "missing": "0", "distr": [] }, { "name": "forg", "index": "3", "type": "nominal", "distinct": "2", "missing": "0", "distr": [] }, { "name": "center", "index": "4", "type": "nominal", "distinct": "5", "missing": "0", "distr": [] }, { "name": "year", "index": "5", "type": "numeric", "distinct": "14", "missing": "0", "min": "1971", "max": "1987", "mean": "1981", "stdev": "3" }, { "name": "mode", "index": "6", "type": "nominal", "distinct": "3", "missing": "0", "distr": [] }, { "name": "rely", "index": "7", "type": "nominal", "distinct": "4", "missing": "0", "distr": [] }, { "name": "data", "index": "8", "type": "nominal", "distinct": "4", "missing": "0", "distr": [] }, { "name": "cplx", "index": "9", "type": "nominal", "distinct": "5", "missing": "0", "distr": [] }, { "name": "time", "index": "10", "type": "nominal", "distinct": "4", "missing": "0", "distr": [] }, { "name": "stor", "index": "11", "type": "nominal", "distinct": "4", "missing": "0", "distr": [] }, { "name": "virt", "index": "12", "type": "nominal", "distinct": "3", "missing": "0", "distr": [] }, { "name": "turn", "index": "13", "type": "nominal", "distinct": "4", "missing": "0", "distr": [] }, { "name": "acap", "index": "14", "type": "nominal", "distinct": "3", "missing": "0", "distr": [] }, { "name": "aexp", "index": "15", "type": "nominal", "distinct": "4", "missing": "0", "distr": [] }, { "name": "pcap", "index": "16", "type": "nominal", "distinct": "3", "missing": "0", "distr": [] }, { "name": "vexp", "index": "17", "type": "nominal", "distinct": "4", "missing": "0", "distr": [] }, { "name": "lexp", "index": "18", "type": "nominal", "distinct": "4", "missing": "0", "distr": [] }, { "name": "modp", "index": "19", "type": "nominal", "distinct": "5", "missing": "0", "distr": [] }, { "name": "tool", "index": "20", "type": "nominal", "distinct": "5", "missing": "0", "distr": [] }, { "name": "sced", "index": "21", "type": "nominal", "distinct": "3", "missing": "0", "distr": [] }, { "name": "equivphyskloc", "index": "22", "type": "numeric", "distinct": "79", "missing": "0", "min": "1", "max": "980", "mean": "94", "stdev": "134" } ], "nr_of_issues": 0, "nr_of_downvotes": 0, "nr_of_likes": 0, "nr_of_downloads": 0, "total_downloads": 0, "reach": 0, "reuse": 0, "impact_of_reuse": 0, "reach_of_reuse": 0, "impact": 0 }