{ "data_id": "481", "name": "biomed", "exact_name": "biomed", "version": 1, "version_label": null, "description": "**Author**: \r\n**Source**: Unknown - Date unknown \r\n**Please cite**: \r\n\r\nFebruary 23, 1982\r\n\r\nThe 1982 annual meetings of the American Statistical Association (ASA)\r\nwill be held August 16-19, 1982 in Cincinnati. At that meeting, the ASA\r\nCommittee on Statistical Graphics plans to sponsor an \"Exposition of\r\nStatistical Graphics Technology.\" The purpose of this activity is to\r\nmore fully inform the ASA membership about the capabilities and uses of\r\ncomputer graphcis in statistical work. This letter is to invite you to\r\nparticipate in the Exposition.\r\n\r\nAttached is a set of biomedical data containing 209 observations (134\r\nfor \"normals\" and 75 for \"carriers\"). Each vendor of provider of\r\nstatistical graphics software participating in the Exposition is to\r\nanalyze these data using their software and to prepare tabular, graphical\r\nand text output illustrating the use of graphics in these analyses and\r\nsummarizing their conclusions. The tabular and graphical materials must be\r\ndirect computer output from the statistical graphics software; the\r\ntextual descriptions and summaries need not be. The total display space\r\navailable to each participant at the meeting will be a standard poster-\r\nboard (approximately 4' x 2 1\/2'). All entries will be displayed in one\r\nlocation at the meetings, together with brief written commentary by\r\nthe committee summarizing the results of this activity.\r\n\r\nReference\r\n\r\nExposition of Statistical Graphics Technology,\r\nL. H. Cox, M. M. Johnson, K. Kafadar,\r\nASA Proc Stat. Comp Section, 1982, pp 55-56.\r\nEnclosures\r\n\r\n\r\nTHE DATA\r\n\r\nThe following data arose in a study to develop screening methods to\r\nidentify carriers of a rare genetic disorder. Four measurements m1,\r\nm2, m3, m4 were made on blood samples. One of these, m1, has been used\r\nbefore.\r\n\r\nThe disease is Duchenne muscular dystrophy. Measurements are: \r\nM1- serum creatine kinase. \r\nM2- hemopexin. \r\nM3- pyruvate kinase. \r\nM4- lactate dehydrogenase. \r\n\r\nBecause the disease is rare, there are only a few carriers of\r\nthe disease from whom data are available. The data come in two files,\r\none for normals and one for carriers of the disease. A description of\r\nthe files is provided. The data have been stripped of the names and\r\nother identifiers. Otherwise the data are as received by the analyst.\r\n\r\n\r\nPURPOSE OF THE ANALYSIS\r\n\r\nThe purpose of the analysis is to develop a screening procedure to\r\ndetect carriers and to describe its effectiveness. Experts in the\r\nfield have noted that young people tend to have higher measurements.\r\nThe laboratory which prepared the measurements is worried that there\r\nmay be a systematic drift over time in their measurement process.\r\nThese effects should be considered in the analysis. Can graphical\r\ndisplays show the differences between the distributions of carriers\r\nand normals?\r\n\r\n\r\nFILE DESCRIPTION\r\n\r\n\r\nColumn\tContent\r\n\r\n1\tObservation number (sequence number per patient)\r\nNote that there are several samples per patient\r\nfor some patients.\r\n2-8\tBlank\r\n9-12\tHospital identification number for blood sample\r\n13-18\tBlank\r\n19-20\tAge of patient\r\n21-26\tBlank\r\n27-32\tDate that blood sample was taken (mmddyy)\r\nNote that all day entries are 00.\r\n33-39\tBlank\r\n40-43\tml (measurement 1) sss.s\r\n44-50\tBlank\r\n51-54\tm2 (measurement 2) xxx.x Eight missing data points.\r\n55-61\tBlank\r\n62-65\tm3 (measurement 3) xxx.x\r\n66-72\tBlank\r\n73-75\tm4 (measurement 4) xxx Seven missing data points.\r\n\r\n\r\n\r\nInformation about the dataset\r\nCLASSTYPE: nominal\r\nCLASSINDEX: last", "format": "ARFF", "uploader": "Joaquin Vanschoren", "uploader_id": 2, "visibility": "public", "creator": null, "contributor": null, "date": "2014-09-28 23:51:49", "update_comment": null, "last_update": "2014-09-28 23:51:49", "licence": "Public", "status": "active", "error_message": null, "url": "https:\/\/www.openml.org\/data\/download\/52593\/biomed.arff", "default_target_attribute": "class", "row_id_attribute": null, "ignore_attribute": null, "runs": 759, "suggest": { "input": [ "biomed", "February 23, 1982 The 1982 annual meetings of the American Statistical Association (ASA) will be held August 16-19, 1982 in Cincinnati. At that meeting, the ASA Committee on Statistical Graphics plans to sponsor an \"Exposition of Statistical Graphics Technology.\" The purpose of this activity is to more fully inform the ASA membership about the capabilities and uses of computer graphcis in statistical work. This letter is to invite you to participate in the Exposition. Attached is a set of biomed " ], "weight": 5 }, "qualities": { "NumberOfInstances": 209, "NumberOfFeatures": 9, "NumberOfClasses": 2, "NumberOfMissingValues": 15, "NumberOfInstancesWithMissingValues": 15, "NumberOfNumericFeatures": 7, "NumberOfSymbolicFeatures": 2, "MeanSkewnessOfNumericAtts": 1.5923506450915546, "Quartile1StdDevOfNumericAtts": 12.46289591655429, "REPTreeDepth2AUC": 0.7377114427860697, "CfsSubsetEval_kNN1NErrRate": 0.17703349282296652, "kNN1NAUC": 0.7963681592039801, "J48.001.Kappa": 0.5925496601506928, "MeanStdDevOfNumericAtts": 4234.91487795027, "Quartile2AttributeEntropy": 1.5704122885866143, "REPTreeDepth2ErrRate": 0.22966507177033493, "CfsSubsetEval_kNN1NKappa": 0.5925496601506928, "kNN1NErrRate": 0.14832535885167464, "MajorityClassPercentage": 64.11483253588517, "MinAttributeEntropy": 1.5704122885866143, "Quartile2KurtosisOfNumericAtts": 2.0845355359470688, "REPTreeDepth2Kappa": 0.4697674418604651, "ClassEntropy": 0.9417258626620666, "kNN1NKappa": 0.6607320521547887, "MajorityClassSize": 134, "MinKurtosisOfNumericAtts": -1.0310599006941925, "Quartile2MeansOfNumericAtts": 92.26459330143533, "REPTreeDepth3AUC": 0.7377114427860697, "DecisionStumpAUC": 0.7639800995024876, "MaxAttributeEntropy": 1.5704122885866143, "MinMeansOfNumericAtts": 16.772248803827754, "Quartile2MutualInformation": 0.02467183368168, "REPTreeDepth3ErrRate": 0.22966507177033493, "DecisionStumpErrRate": 0.22966507177033493, "MaxKurtosisOfNumericAtts": 24.097344649877066, "MinMutualInformation": 0.02467183368168, "Quartile2SkewnessOfNumericAtts": 1.3318770202598833, "REPTreeDepth3Kappa": 0.4697674418604651, "DecisionStumpKappa": 0.48267326732673277, "MaxMeansOfNumericAtts": 65772.42105263157, "MinNominalAttDistinctValues": 2, "PercentageOfBinaryFeatures": 11.11111111111111, "Quartile2StdDevOfNumericAtts": 73.91776040762883, "RandomTreeDepth1AUC": 0.8068223475921826, "Dimensionality": 0.0430622009569378, "MaxMutualInformation": 0.02467183368168, "MinSkewnessOfNumericAtts": -0.37156847702232737, "PercentageOfInstancesWithMissingValues": 7.177033492822966, "Quartile3AttributeEntropy": 1.5704122885866143, "RandomTreeDepth1ErrRate": 0.1722488038277512, "EquivalentNumberOfAtts": 38.170079889981686, "MaxNominalAttDistinctValues": 7, "MinStdDevOfNumericAtts": 8.572594267176978, "PercentageOfMissingValues": 0.7974481658692184, "Quartile3KurtosisOfNumericAtts": 22.293551929086576, "AutoCorrelation": 0.9951923076923077, "RandomTreeDepth1Kappa": 0.616670063175056, "J48.00001.AUC": 0.782686567164179, "MaxSkewnessOfNumericAtts": 4.392481025483618, "MinorityClassPercentage": 35.88516746411483, "PercentageOfNumericFeatures": 77.77777777777779, "Quartile3MeansOfNumericAtts": 1054.8516746411485, "CfsSubsetEval_DecisionStumpAUC": 0.782686567164179, "RandomTreeDepth2AUC": 0.8068223475921826, "J48.00001.ErrRate": 0.17703349282296652, "MaxStdDevOfNumericAtts": 29164.753824092873, "MinorityClassSize": 75, "PercentageOfSymbolicFeatures": 22.22222222222222, "Quartile3MutualInformation": 0.02467183368168, "CfsSubsetEval_DecisionStumpErrRate": 0.17703349282296652, "RandomTreeDepth2ErrRate": 0.1722488038277512, "J48.00001.Kappa": 0.5925496601506928, "MeanAttributeEntropy": 1.5704122885866143, "NaiveBayesAUC": 0.9462686567164179, "Quartile1AttributeEntropy": 1.5704122885866143, "Quartile3SkewnessOfNumericAtts": 4.1451341555745005, "CfsSubsetEval_DecisionStumpKappa": 0.5925496601506928, "RandomTreeDepth2Kappa": 0.616670063175056, "J48.0001.AUC": 0.782686567164179, "MeanKurtosisOfNumericAtts": 7.39245522669678, "NaiveBayesErrRate": 0.10526315789473684, "Quartile1KurtosisOfNumericAtts": -0.8191729620187735, "Quartile3StdDevOfNumericAtts": 218.0208661713929, "CfsSubsetEval_NaiveBayesAUC": 0.782686567164179, "RandomTreeDepth3AUC": 0.8068223475921826, "J48.0001.ErrRate": 0.17703349282296652, "MeanMeansOfNumericAtts": 9607.618529622012, "MeanMutualInformation": 0.02467183368168, "NaiveBayesKappa": 0.7614402822455121, "Quartile1MeansOfNumericAtts": 32.15789473684211, "REPTreeDepth1AUC": 0.7377114427860697, "CfsSubsetEval_NaiveBayesErrRate": 0.17703349282296652, "RandomTreeDepth3ErrRate": 0.1722488038277512, "J48.0001.Kappa": 0.5925496601506928, "MeanNoiseToSignalRatio": 62.652029632184146, "NumberOfBinaryFeatures": 1, "Quartile1MutualInformation": 0.02467183368168, "REPTreeDepth1ErrRate": 0.22966507177033493, "CfsSubsetEval_NaiveBayesKappa": 0.5925496601506928, "RandomTreeDepth3Kappa": 0.616670063175056, "J48.001.AUC": 0.782686567164179, "MeanNominalAttDistinctValues": 4.5, "Quartile1SkewnessOfNumericAtts": -0.11655906036079298, "REPTreeDepth1Kappa": 0.4697674418604651, "CfsSubsetEval_kNN1NAUC": 0.782686567164179, "StdvNominalAttDistinctValues": 3.5355339059327378, "J48.001.ErrRate": 0.17703349282296652 }, "tags": [ { "tag": "biomed", "uploader": "27409" }, { "tag": "mythbusting_1", "uploader": "1" }, { "tag": "study_1", "uploader": "2" }, { "tag": "study_123", "uploader": "3886" }, { "tag": "study_15", "uploader": "939" }, { "tag": "study_20", "uploader": "939" }, { "tag": "study_41", "uploader": "1" }, { "tag": "study_52", "uploader": "64" } ], "features": [ { "name": "class", "index": "8", "type": "nominal", "distinct": "2", "missing": "0", "target": "1", "distr": [ [ "carrier", "normal" ], [ [ "75", "0" ], [ "0", "134" ] ] ] }, { "name": "Observation_number", "index": "0", "type": "nominal", "distinct": "7", "missing": "0", "distr": [ [ "1", "2", "3", "4", "5", "6", "7" ], [ [ "38", "87" ], [ "21", "29" ], [ "12", "13" ], [ "3", "2" ], [ "1", "1" ], [ "0", "1" ], [ "0", "1" ] ] ] }, { "name": "Hospital_identification_number_for_blood_sample", "index": "1", "type": "numeric", "distinct": "191", "missing": "0", "min": "657", "max": "1538", "mean": "1055", "stdev": "218" }, { "name": "Age_of_patient", "index": "2", "type": "numeric", "distinct": "33", "missing": "0", "min": "20", "max": "61", "mean": "32", "stdev": "9" }, { "name": "Date_that_blood_sample_was_taken", "index": "3", "type": "numeric", "distinct": "26", "missing": "0", "min": "10078", "max": "120079", "mean": "65772", "stdev": "29165" }, { "name": "ml", "index": "4", "type": "numeric", "distinct": "99", "missing": "0", "min": "15", "max": "1288", "mean": "92", "stdev": "153" }, { "name": "m2", "index": "5", "type": "numeric", "distinct": "119", "missing": "8", "min": "34", "max": "118", "mean": "86", "stdev": "12" }, { "name": "m3", "index": "6", "type": "numeric", "distinct": "140", "missing": "0", "min": "3", "max": "112", "mean": "17", "stdev": "14" }, { "name": "m4", "index": "7", "type": "numeric", "distinct": "125", "missing": "7", "min": "66", "max": "593", "mean": "199", "stdev": "74" } ], "nr_of_issues": 0, "nr_of_downvotes": 0, "nr_of_likes": 0, "nr_of_downloads": 0, "total_downloads": 0, "reach": 0, "reuse": 0, "impact_of_reuse": 0, "reach_of_reuse": 0, "impact": 0 }