{ "data_id": "516", "name": "pbcseq", "exact_name": "pbcseq", "version": 1, "version_label": null, "description": "**Author**: \n**Source**: Unknown - Date unknown \n**Please cite**: \n\nPrimary Biliary Cirrhosis\n\nThis data set is a follow-up to the original PBC data set, as discussed\nin appendix D of Fleming and Harrington, Counting Processes and Survival\nAnalysis, Wiley, 1991. An analysis based on the enclised data is found in\nMurtaugh PA. Dickson ER. Van Dam GM. Malinchoc M. Grambsch PM.\nLangworthy AL. Gips CH. \"Primary biliary cirrhosis: prediction of short-term\nsurvival based on repeated patient visits.\" Hepatology. 20(1.1):126-34, 1994.\n\nQuoting from F&H. \"The following pages contain the data from the Mayo Clinic\ntrial in primary biliary cirrhosis (PBC) of the liver conducted between 1974\nand 1984. A description of the clinical background for the trial and the\ncovariates recorded here is in Chapter 0, especially Section 0.2. A more\nextended discussion can be found in Dickson, et al., Hepatology 10:1-7 (1989)\nand in Markus, et al., N Eng J of Med 320:1709-13 (1989).\n\"A total of 424 PBC patients, referred to Mayo Clinic during that ten-year\ninterval, met eligibility criteria for the randomized placebo controlled\ntrial of the drug D-penicillamine. The first 312 cases in the data set\nparticipated in the randomized trial and contain largely complete data. The\nadditional 112 cases did not participate in the clinical trial, but consented\nto have basic measurements recorded and to be followed for survival. Six of\nthose cases were lost to follow-up shortly after diagnosis, so the data here\nare on an additional 106 cases as well as the 312 randomized participants.\nMissing data items are denoted by `.'. \"\n\nThe F&H data set contains only baseline measurements of the laboratory\nparamters. This data set contains multiple laboratory results, but\nonly on the first 312 patients. Some baseline data values in this file\ndiffer from the original PBC file, for instance, the data errors in\nprothrombin time and age which were discovered after the orignal analysis,\nduring research work on dfbeta residuals. (These two data points are\ndiscussed in F&H, figure 4.6.7). Another major difference is that\nthere was significantly more follow-up for many of the patients at the\ntime this data set was assembled.\n\nOne \"feature\" of the data deserves special comment. The last\nobservation before death or liver transplant often has many more\nmissing covariates than other data rows. The original clinical\nprotocol for these patients specified visits at 6 months, 1 year, and\nannually thereafter. At these protocol visits lab values were\nobtained for a large pre-specified battery of tests. \"Extra\" visits,\noften undertaken because of worsening medical condition, did not\nnecessarily have all this lab work. The missing values are thus\npotentially informative, and violate the usual \"missing at random\"\n(MCAR or MAC) assumptions that are assumed in analyses. Because of\nthe earlier published results on the Mayo PBC risk score, however, the\n5 variables involved in that computation were usually obtained, i.e.,\nage, bilirubin, albumin, prothrombin time, and edema score.\n\nVariables:\ncase number\nnumber of days between registration and the earlier of death,\ntransplantion, or study analysis time\nstatus: 0=alive, 1=transplanted, 2=dead\ndrug: 1= D-penicillamine, 0=placebo\nage in days, at registration\nsex: 0=male, 1=female\nday: number of days between enrollment and this visit date, remaining\nvalues on the line of data refer to this visit.\npresence of asictes: 0=no 1=yes\npresence of hepatomegaly 0=no 1=yes\npresence of spiders 0=no 1=yes\npresence of edema 0=no edema and no diuretic therapy for edema;\n.5 = edema present without diuretics, or edema resolved by diuretics;\n1 = edema despite diuretic therapy\nserum bilirubin in mg\/dl\nserum cholesterol in mg\/dl\nalbumin in gm\/dl\nalkaline phosphatase in U\/liter\nSGOT in U\/ml (serum glutamic-oxaloacetic transaminase, the enzyme name\nhas subsequently changed to \"ALT\" in the medical literature)\nplatelets per cubic ml \/ 1000\nprothrombin time in seconds\nhistologic stage of disease\n\n\nInformation about the dataset\nCLASSTYPE: numeric\nCLASSINDEX: 3", "format": "ARFF", "uploader": "Joaquin Vanschoren", "uploader_id": 2, "visibility": "public", "creator": "Murtaugh PA. Dickson ER. Van Dam GM. Malinchoc M. Grambsch PM", "contributor": null, "date": "2014-09-29 00:07:39", "update_comment": null, "last_update": "2014-09-29 00:07:39", "licence": "Public", "status": "active", "error_message": null, "url": "https:\/\/www.openml.org\/data\/download\/52628\/pbcseq.arff", "default_target_attribute": "status", "row_id_attribute": null, "ignore_attribute": null, "runs": 0, "suggest": { "input": [ "pbcseq", "Primary Biliary Cirrhosis This data set is a follow-up to the original PBC data set, as discussed in appendix D of Fleming and Harrington, Counting Processes and Survival Analysis, Wiley, 1991. An analysis based on the enclised data is found in Murtaugh PA. Dickson ER. Van Dam GM. Malinchoc M. Grambsch PM. Langworthy AL. Gips CH. \"Primary biliary cirrhosis: prediction of short-term survival based on repeated patient visits.\" Hepatology. 20(1.1):126-34, 1994. Quoting from F&H. \"The following page " ], "weight": 5 }, "qualities": { "NumberOfInstances": 1945, "NumberOfFeatures": 19, "NumberOfClasses": 0, "NumberOfMissingValues": 1133, "NumberOfInstancesWithMissingValues": 832, "NumberOfNumericFeatures": 13, "NumberOfSymbolicFeatures": 6, "MeanSkewnessOfNumericAtts": 1.7918520168176548, "Quartile1StdDevOfNumericAtts": 0.9088890747625015, "REPTreeDepth2AUC": null, "CfsSubsetEval_kNN1NErrRate": null, "kNN1NAUC": null, "J48.001.Kappa": null, "MeanStdDevOfNumericAtts": 506.1394862697822, "Quartile2AttributeEntropy": null, "REPTreeDepth2ErrRate": null, "CfsSubsetEval_kNN1NKappa": null, "kNN1NErrRate": null, "MajorityClassPercentage": null, "MinAttributeEntropy": null, "Quartile2KurtosisOfNumericAtts": 2.692912158452118, "REPTreeDepth2Kappa": null, "ClassEntropy": null, "kNN1NKappa": null, "MajorityClassSize": null, "MinKurtosisOfNumericAtts": -1.7824313589098277, "Quartile2MeansOfNumericAtts": 122.67038560411311, "REPTreeDepth3AUC": null, "DecisionStumpAUC": null, "MaxAttributeEntropy": null, "MinMeansOfNumericAtts": 0.18226221079691596, "Quartile2MutualInformation": null, "REPTreeDepth3ErrRate": null, "DecisionStumpErrRate": null, "MaxKurtosisOfNumericAtts": 77.0176424543949, "MinMutualInformation": null, "Quartile2SkewnessOfNumericAtts": 0.8709379501494258, "REPTreeDepth3Kappa": null, "DecisionStumpKappa": null, "MaxMeansOfNumericAtts": 17992.080205655526, "MinNominalAttDistinctValues": 2, "PercentageOfBinaryFeatures": 26.31578947368421, "Quartile2StdDevOfNumericAtts": 78.43751193730601, "RandomTreeDepth1AUC": null, "Dimensionality": 0.009768637532133676, "MaxMutualInformation": null, "MinSkewnessOfNumericAtts": -0.9826858251797085, "PercentageOfInstancesWithMissingValues": 42.77634961439588, "Quartile3AttributeEntropy": null, "RandomTreeDepth1ErrRate": null, "EquivalentNumberOfAtts": null, "MaxNominalAttDistinctValues": 1024, "MinStdDevOfNumericAtts": 0.31682748212923395, "PercentageOfMissingValues": 3.0658909484508183, "Quartile3KurtosisOfNumericAtts": 23.65142272368201, "AutoCorrelation": 0.8518518518518519, "RandomTreeDepth1Kappa": null, "J48.00001.AUC": null, "MaxSkewnessOfNumericAtts": 6.24240428357689, "MinorityClassPercentage": null, "PercentageOfNumericFeatures": 68.42105263157895, "Quartile3MeansOfNumericAtts": 851.191733294316, "CfsSubsetEval_DecisionStumpAUC": null, "RandomTreeDepth2AUC": null, "J48.00001.ErrRate": null, "MaxStdDevOfNumericAtts": 3675.028720772398, "MinorityClassSize": null, "PercentageOfSymbolicFeatures": 31.57894736842105, "Quartile3MutualInformation": null, "CfsSubsetEval_DecisionStumpErrRate": null, "RandomTreeDepth2ErrRate": null, "J48.00001.Kappa": null, "MeanAttributeEntropy": null, "NaiveBayesAUC": null, "Quartile1AttributeEntropy": null, "Quartile3SkewnessOfNumericAtts": 3.933608482043691, "CfsSubsetEval_DecisionStumpKappa": null, "RandomTreeDepth2Kappa": null, "J48.0001.AUC": null, "MeanKurtosisOfNumericAtts": 14.130620649624468, "NaiveBayesErrRate": null, "Quartile1KurtosisOfNumericAtts": -0.6241721992445812, "Quartile3StdDevOfNumericAtts": 681.1706410459014, "CfsSubsetEval_NaiveBayesAUC": null, "RandomTreeDepth3AUC": null, "J48.0001.ErrRate": null, "MeanMeansOfNumericAtts": 1780.7145221303601, "MeanMutualInformation": null, "NaiveBayesKappa": null, "Quartile1MeansOfNumericAtts": 3.3275912596401023, "REPTreeDepth1AUC": null, "CfsSubsetEval_NaiveBayesErrRate": null, "RandomTreeDepth3ErrRate": null, "J48.0001.Kappa": null, "MeanNoiseToSignalRatio": null, "NumberOfBinaryFeatures": 5, "Quartile1MutualInformation": null, "REPTreeDepth1ErrRate": null, "CfsSubsetEval_NaiveBayesKappa": null, "RandomTreeDepth3Kappa": null, "J48.001.AUC": null, "MeanNominalAttDistinctValues": 172.33333333333334, "Quartile1SkewnessOfNumericAtts": 0.04352178186648421, "REPTreeDepth1Kappa": null, "CfsSubsetEval_kNN1NAUC": null, "StdvNominalAttDistinctValues": 417.229752854068, "J48.001.ErrRate": null }, "tags": [], "features": [ { "name": "status", "index": "2", "type": "numeric", "distinct": "3", "missing": "0", "target": "1", "min": "0", "max": "2", "mean": "1", "stdev": "1" }, { "name": "case_number", "index": "0", "type": "numeric", "distinct": "312", "missing": "0", "min": "1", "max": "312", "mean": "135", "stdev": "86" }, { "name": "number_of_days", "index": "1", "type": "numeric", "distinct": "305", "missing": "0", "min": "41", "max": "5225", "mean": "2941", "stdev": "1271" }, { "name": "drug", "index": "3", "type": "nominal", "distinct": "2", "missing": "0", "distr": [] }, { "name": "age", "index": "4", "type": "numeric", "distinct": "308", "missing": "0", "min": "9598", "max": "28650", "mean": "17992", "stdev": "3675" }, { "name": "sex", "index": "5", "type": "nominal", "distinct": "2", "missing": "0", "distr": [] }, { "name": "day", "index": "6", "type": "nominal", "distinct": "1024", "missing": "0", "distr": [] }, { "name": "presence_of_asictes", "index": "7", "type": "nominal", "distinct": "2", "missing": "60", "distr": [] }, { "name": "presence_of_hepatomegaly", "index": "8", "type": "nominal", "distinct": "2", "missing": "61", "distr": [] }, { "name": "presence_of_spiders", "index": "9", "type": "nominal", "distinct": "2", "missing": "58", "distr": [] }, { "name": "presence_of_edema", "index": "10", "type": "numeric", "distinct": "3", "missing": "0", "min": "0", "max": "1", "mean": "0", "stdev": "0" }, { "name": "serum_bilirubin", "index": "11", "type": "numeric", "distinct": "193", "missing": "0", "min": "0", "max": "41", "mean": "4", "stdev": "5" }, { "name": "serum_cholesterol", "index": "12", "type": "numeric", "distinct": "375", "missing": "821", "min": "55", "max": "1775", "mean": "320", "stdev": "167" }, { "name": "albumin", "index": "13", "type": "numeric", "distinct": "254", "missing": "0", "min": "1", "max": "8", "mean": "3", "stdev": "1" }, { "name": "alkaline_phosphatase", "index": "14", "type": "numeric", "distinct": "1263", "missing": "60", "min": "73", "max": "13862", "mean": "1382", "stdev": "1196" }, { "name": "SGOT", "index": "15", "type": "numeric", "distinct": "418", "missing": "0", "min": "6", "max": "1205", "mean": "123", "stdev": "78" }, { "name": "platelets", "index": "16", "type": "numeric", "distinct": "414", "missing": "73", "min": "40", "max": "991", "mean": "234", "stdev": "98" }, { "name": "prothrombin_time", "index": "17", "type": "numeric", "distinct": "78", "missing": "0", "min": "9", "max": "36", "mean": "11", "stdev": "1" }, { "name": "histologic_stage_of_disease", "index": "18", "type": "numeric", "distinct": "4", "missing": "0", "min": "1", "max": "4", "mean": "3", "stdev": "1" } ], "nr_of_issues": 0, "nr_of_downvotes": 0, "nr_of_likes": 0, "nr_of_downloads": 0, "total_downloads": 0, "reach": 0, "reuse": 0, "impact_of_reuse": 0, "reach_of_reuse": 0, "impact": 0 }