{ "data_id": "1506", "name": "thoracic-surgery", "exact_name": "thoracic-surgery", "version": 1, "version_label": null, "description": "**Author**: \n**Source**: UCI \n**Please cite**: Zikeba, M., Tomczak, J. M., Lubicz, M., & Swikatek, J. (2013). Boosted SVM for extracting rules from imbalanced data in application to prediction of the post-operative life expectancy in the lung cancer patients. Applied Soft Computing. \n\n \n* Title: \nThoracic Surgery Data Data Set \n\n* Abstract: \nThe data is dedicated to classification problem related to the post-operative life expectancy in the lung cancer patients: class 1 - death within one year after surgery, class 2 - survival.\n\n* Source:\nCreators: Marek Lubicz (1), Konrad Pawelczyk (2), Adam Rzechonek (2), Jerzy Kolodziej (2) \n-- (1) Wroclaw University of Technology, wybrzeze Wyspianskiego 27, 50-370, Wroclaw, Poland \n-- (2) Wroclaw Medical University, wybrzeze L. Pasteura 1, 50-367 Wroclaw, Poland \n\nDonor: Maciej Zieba (maciej.zieba '@' pwr.wroc.pl), Jakub M. Tomczak (jakub.tomczak '@' pwr.wroc.pl), (+48) 71 320 44 53 \n\n* Data Set Information:\n\nThe data was collected retrospectively at Wroclaw Thoracic Surgery Centre for patients who underwent major lung resections for primary lung cancer in the years 2007\u00e2\u20ac\u201c2011. The Centre is associated with the Department of Thoracic Surgery of the Medical University of Wroclaw and Lower-Silesian Centre for Pulmonary Diseases, Poland, while the research database constitutes a part of the National Lung Cancer Registry, administered by the Institute of Tuberculosis and Pulmonary Diseases in Warsaw, Poland.\n\n\n* Attribute Information:\n\n1. DGN: Diagnosis - specific combination of ICD-10 codes for primary and secondary as well multiple tumours if any (DGN3,DGN2,DGN4,DGN6,DGN5,DGN8,DGN1) \n2. PRE4: Forced vital capacity - FVC (numeric) \n3. PRE5: Volume that has been exhaled at the end of the first second of forced expiration - FEV1 (numeric) \n4. PRE6: Performance status - Zubrod scale (PRZ2,PRZ1,PRZ0) \n5. PRE7: Pain before surgery (T,F) \n6. PRE8: Haemoptysis before surgery (T,F) \n7. PRE9: Dyspnoea before surgery (T,F) \n8. PRE10: Cough before surgery (T,F) \n9. PRE11: Weakness before surgery (T,F) \n10. PRE14: T in clinical TNM - size of the original tumour, from OC11 (smallest) to OC14 (largest) (OC11,OC14,OC12,OC13) \n11. PRE17: Type 2 DM - diabetes mellitus (T,F) \n12. PRE19: MI up to 6 months (T,F) \n13. PRE25: PAD - peripheral arterial diseases (T,F) \n14. PRE30: Smoking (T,F) \n15. PRE32: Asthma (T,F) \n16. AGE: Age at surgery (numeric) \n17. Risk1Y: 1 year survival period - (T)rue value if died (T,F) \n\nClass Distribution: the class value (Risk1Y) is binary valued. \n\n\n", "format": "ARFF", "uploader": "Rafael Gomes Mantovani", "uploader_id": 64, "visibility": "public", "creator": null, "contributor": null, "date": "2015-05-25 23:02:18", "update_comment": null, "last_update": "2015-11-09 20:17:07", "licence": "Public", "status": "active", "error_message": null, "url": "https:\/\/www.openml.org\/data\/download\/1592298\/phpjX67St", "default_target_attribute": "Class", "row_id_attribute": null, "ignore_attribute": null, "runs": 145, "suggest": { "input": [ "thoracic-surgery", "* Title: Thoracic Surgery Data Data Set * Abstract: The data is dedicated to classification problem related to the post-operative life expectancy in the lung cancer patients: class 1 - death within one year after surgery, class 2 - survival. * Source: Creators: Marek Lubicz (1), Konrad Pawelczyk (2), Adam Rzechonek (2), Jerzy Kolodziej (2) -- (1) Wroclaw University of Technology, wybrzeze Wyspianskiego 27, 50-370, Wroclaw, Poland -- (2) Wroclaw Medical University, wybrzeze L. Pasteura 1, 50-367 " ], "weight": 5 }, "qualities": { "NumberOfInstances": 470, "NumberOfFeatures": 17, "NumberOfClasses": 2, "NumberOfMissingValues": 0, "NumberOfInstancesWithMissingValues": 0, "NumberOfNumericFeatures": 3, "NumberOfSymbolicFeatures": 14, "REPTreeDepth3Kappa": 0, "DecisionStumpKappa": 0.011104617182934259, "MaxMeansOfNumericAtts": 62.53404255319149, "MinMutualInformation": 0.00086869304352, "Quartile2SkewnessOfNumericAtts": 0.5451881946466047, "RandomTreeDepth1AUC": 0.5291964285714286, "Dimensionality": 0.036170212765957444, "MaxMutualInformation": 0.02428417523536, "MinNominalAttDistinctValues": 2, "PercentageOfBinaryFeatures": 64.70588235294117, "Quartile2StdDevOfNumericAtts": 8.706901818284091, "RandomTreeDepth1ErrRate": 0.26595744680851063, "EquivalentNumberOfAtts": 87.17568677892774, "MaxNominalAttDistinctValues": 7, "MinSkewnessOfNumericAtts": -0.19115971860047198, "PercentageOfInstancesWithMissingValues": 0, "Quartile3AttributeEntropy": 1.0182737586936108, "RandomTreeDepth1Kappa": 0.04642103554617765, "J48.00001.AUC": 0.5, "MaxSkewnessOfNumericAtts": 5.6334910794233215, "MinStdDevOfNumericAtts": 0.8713950270684675, "PercentageOfMissingValues": 0, "Quartile3KurtosisOfNumericAtts": 30.481410939611273, "AutoCorrelation": 0.7398720682302772, "RandomTreeDepth2AUC": 0.5291964285714286, "J48.00001.ErrRate": 0.14893617021276595, "MaxStdDevOfNumericAtts": 11.767857120636801, "MinorityClassPercentage": 14.893617021276595, "PercentageOfNumericFeatures": 17.647058823529413, "Quartile3MeansOfNumericAtts": 62.53404255319149, "CfsSubsetEval_DecisionStumpAUC": 0.5, "RandomTreeDepth2ErrRate": 0.26595744680851063, "J48.00001.Kappa": 0, "MeanAttributeEntropy": 0.6065418134156311, "MinorityClassSize": 70, "PercentageOfSymbolicFeatures": 82.35294117647058, "Quartile3MutualInformation": 0.006957407495855, "CfsSubsetEval_DecisionStumpErrRate": 0.14893617021276595, "RandomTreeDepth2Kappa": 0.04642103554617765, "J48.0001.AUC": 0.5, "MeanKurtosisOfNumericAtts": 10.369463412635357, "NaiveBayesAUC": 0.6193161094224924, "Quartile1AttributeEntropy": 0.23751268845624024, "Quartile3SkewnessOfNumericAtts": 5.6334910794233215, "CfsSubsetEval_DecisionStumpKappa": 0, "RandomTreeDepth3AUC": 0.5291964285714286, "J48.0001.ErrRate": 0.14893617021276595, "MeanMeansOfNumericAtts": 23.4614609929078, "NaiveBayesErrRate": 0.3021276595744681, "Quartile1KurtosisOfNumericAtts": -0.12185403920809046, "Quartile3StdDevOfNumericAtts": 11.767857120636801, "CfsSubsetEval_NaiveBayesAUC": 0.5, "RandomTreeDepth3ErrRate": 0.26595744680851063, "J48.0001.Kappa": 0, "MeanMutualInformation": 0.006964919661729231, "NaiveBayesKappa": 0.11952506596306074, "Quartile1MeansOfNumericAtts": 3.2816382978723406, "REPTreeDepth1AUC": 0.5, "CfsSubsetEval_NaiveBayesErrRate": 0.14893617021276595, "RandomTreeDepth3Kappa": 0.04642103554617765, "J48.001.AUC": 0.5, "MeanNoiseToSignalRatio": 86.0852562375487, "NumberOfBinaryFeatures": 11, "Quartile1MutualInformation": 0.00155931848031, "REPTreeDepth1ErrRate": 0.14893617021276595, "CfsSubsetEval_NaiveBayesKappa": 0, "CfsSubsetEval_kNN1NAUC": 0.5, "StdvNominalAttDistinctValues": 1.398586413506136, "J48.001.ErrRate": 0.14893617021276595, "MeanNominalAttDistinctValues": 2.5714285714285716, "Quartile1SkewnessOfNumericAtts": -0.19115971860047198, "REPTreeDepth1Kappa": 0, "CfsSubsetEval_kNN1NErrRate": 0.14893617021276595, "kNN1NAUC": 0.5416071428571428, "J48.001.Kappa": 0, "MeanSkewnessOfNumericAtts": 1.9958398518231513, "Quartile1StdDevOfNumericAtts": 0.8713950270684675, "REPTreeDepth2AUC": 0.5, "CfsSubsetEval_kNN1NKappa": 0, "kNN1NErrRate": 0.20851063829787234, "MajorityClassPercentage": 85.1063829787234, "MeanStdDevOfNumericAtts": 7.115384655329787, "Quartile2AttributeEntropy": 0.596367471971484, "REPTreeDepth2ErrRate": 0.14893617021276595, "ClassEntropy": 0.6071716548713029, "kNN1NKappa": 0.09187697160883292, "MajorityClassSize": 400, "MinAttributeEntropy": 0.03964314068195012, "Quartile2KurtosisOfNumericAtts": 0.748833337502885, "REPTreeDepth2Kappa": 0, "REPTreeDepth3AUC": 0.5, "DecisionStumpAUC": 0.5167321428571429, "MaxAttributeEntropy": 1.3671357070666996, "MinKurtosisOfNumericAtts": -0.12185403920809046, "Quartile2MeansOfNumericAtts": 4.5687021276595745, "REPTreeDepth3ErrRate": 0.14893617021276595, "DecisionStumpErrRate": 0.15319148936170213, "MaxKurtosisOfNumericAtts": 30.481410939611273, "MinMeansOfNumericAtts": 3.2816382978723406, "Quartile2MutualInformation": 0.00600290579821 }, "tags": [ { "uploader": "38960", "tag": "Chemistry" }, { "uploader": "38960", "tag": "Life Science" }, { "uploader": "3886", "tag": "mf_less_than_80" }, { "uploader": "3886", "tag": "study_123" }, { "uploader": "4209", "tag": "study_127" }, { "uploader": "64", "tag": "study_50" }, { "uploader": "64", "tag": "study_52" }, { "uploader": "64", "tag": "study_7" }, { "uploader": "4209", "tag": "study_88" } ], "features": [ { "name": "Class", "index": "16", "type": "nominal", "distinct": "2", "missing": "0", "target": "1", "distr": [ [ "1", "2" ], [ [ "70", "0" ], [ "0", "400" ] ] ] }, { "name": "V1", "index": "0", "type": "nominal", "distinct": "7", "missing": "0", "distr": [ [ "1", "2", "3", "4", "5", "6", "7" ], [ [ "0", "1" ], [ "12", "40" ], [ "43", "306" ], [ "7", "40" ], [ "7", "8" ], [ "0", "4" ], [ "1", "1" ] ] ] }, { "name": "V2", "index": "1", "type": "numeric", "distinct": "134", "missing": "0", "min": "1", "max": "6", "mean": "3", "stdev": "1" }, { "name": "V3", "index": "2", "type": "numeric", "distinct": "136", "missing": "0", "min": "1", "max": "86", "mean": "5", "stdev": "12" }, { "name": "V4", "index": "3", "type": "nominal", "distinct": "3", "missing": "0", "distr": [ [ "1", "2", "3" ], [ [ "14", "116" ], [ "49", "264" ], [ "7", "20" ] ] ] }, { "name": "V5", "index": "4", "type": "nominal", "distinct": "2", "missing": "0", "distr": [ [ "1", "2" ], [ [ "63", "376" ], [ "7", "24" ] ] ] }, { "name": "V6", "index": "5", "type": "nominal", "distinct": "2", "missing": "0", "distr": [ [ "1", "2" ], [ [ "56", "346" ], [ "14", "54" ] ] ] }, { "name": "V7", "index": "6", "type": "nominal", "distinct": "2", "missing": "0", "distr": [ [ "1", "2" ], [ [ "61", "378" ], [ "9", "22" ] ] ] }, { "name": "V8", "index": "7", "type": "nominal", "distinct": "2", "missing": "0", "distr": [ [ "1", "2" ], [ [ "15", "132" ], [ "55", "268" ] ] ] }, { "name": "V9", "index": "8", "type": "nominal", "distinct": "2", "missing": "0", "distr": [ [ "1", "2" ], [ [ "53", "339" ], [ "17", "61" ] ] ] }, { "name": "V10", "index": "9", "type": "nominal", "distinct": "4", "missing": "0", "distr": [ [ "1", "2", "3", "4" ], [ [ "18", "159" ], [ "39", "218" ], [ "6", "13" ], [ "7", "10" ] ] ] }, { "name": "V11", "index": "10", "type": "nominal", "distinct": "2", "missing": "0", "distr": [ [ "1", "2" ], [ [ "60", "375" ], [ "10", "25" ] ] ] }, { "name": "V12", "index": "11", "type": "nominal", "distinct": "2", "missing": "0", "distr": [ [ "1", "2" ], [ [ "70", "398" ], [ "0", "2" ] ] ] }, { "name": "V13", "index": "12", "type": "nominal", "distinct": "2", "missing": "0", "distr": [ [ "1", "2" ], [ [ "68", "394" ], [ "2", "6" ] ] ] }, { "name": "V14", "index": "13", "type": "nominal", "distinct": "2", "missing": "0", "distr": [ [ "1", "2" ], [ [ "7", "77" ], [ "63", "323" ] ] ] }, { "name": "V15", "index": "14", "type": "nominal", "distinct": "2", "missing": "0", "distr": [ [ "1", "2" ], [ [ "70", "398" ], [ "0", "2" ] ] ] }, { "name": "V16", "index": "15", "type": "numeric", "distinct": "45", "missing": "0", "min": "21", "max": "87", "mean": "63", "stdev": "9" } ], "nr_of_issues": 0, "nr_of_downvotes": 0, "nr_of_likes": 0, "nr_of_downloads": 0, "total_downloads": 0, "reach": 0, "reuse": 0, "impact_of_reuse": 0, "reach_of_reuse": 0, "impact": 0 }