{ "data_id": "43343", "name": "Mammographic-Mass-Data-Set", "exact_name": "Mammographic-Mass-Data-Set", "version": 1, "version_label": "v1.0", "description": "Mammography is the most effective method for breast cancer screening \navailable today. However, the low positive predictive value of breast \nbiopsy resulting from mammogram interpretation leads to approximately \n70 unnecessary biopsies with benign outcomes. To reduce the high \nnumber of unnecessary breast biopsies, several computer-aided diagnosis \n(CAD) systems have been proposed in the last years.These systems \nhelp physicians in their decision to perform a breast biopsy on a suspicious \nlesion seen in a mammogram or to perform a short term follow-up \nexamination instead. \nThis data set can be used to predict the severity (benign or malignant) \nof a mammographic mass lesion from BI-RADS attributes and the patient's age. \nIt contains a BI-RADS assessment, the patient's age and three BI-RADS attributes \ntogether with the ground truth (the severity field) for 516 benign and \n445 malignant masses that have been identified on full field digital mammograms \ncollected at the Institute of Radiology of the \nUniversity Erlangen-Nuremberg between 2003 and 2006. \nEach instance has an associated BI-RADS assessment ranging from 1 (definitely benign) \nto 5 (highly suggestive of malignancy) assigned in a double-review process by \nphysicians. Assuming that all cases with BI-RADS assessments greater or equal \na given value (varying from 1 to 5), are malignant and the other cases benign, \nsensitivities and associated specificities can be calculated. These can be an \nindication of how well a CAD system performs compared to the radiologists. \nClass Distribution: benign: 516; malignant: 445 \nAttribute Information:\n6 Attributes in total (1 goal field, 1 non-predictive, 4 predictive attributes) \n\nBI-RADS assessment: 1 to 5 (ordinal, non-predictive!) \nAge: patient's age in years (integer) \nShape: mass shape: round=1 oval=2 lobular=3 irregular=4 (nominal) \nMargin: mass margin: circumscribed=1 microlobulated=2 obscured=3 ill-defined=4 spiculated=5 (nominal) \nDensity: mass density high=1 iso=2 low=3 fat-containing=4 (ordinal) \nSeverity: benign=0 or malignant=1 (binominal, goal field!) \n\nMissing Attribute Values: \n\nBI-RADS assessment: 2 \nAge: 5 \nShape: 31 \nMargin: 48 \nDensity: 76 \nSeverity: 0 \n\nI acknowledge that this dataset is not mine and I have only reformatted the data and uploaded it to kaggle.\nSource:\nMatthias Elter \nFraunhofer Institute for Integrated Circuits (IIS) \nImage Processing and Medical Engineering Department (BMT) \nAm Wolfsmantel 33 \n91058 Erlangen, Germany \nmatthias.elter '' iis.fraunhofer.de \n(49) 9131-7767327 \nProf. Dr. Rdiger Schulz-Wendtland \nInstitute of Radiology, Gynaecological Radiology, University Erlangen-Nuremberg \nUniversittsstrae 21-23 \n91054 Erlangen, Germany\nRelevant Papers:\nM. Elter, R. Schulz-Wendtland and T. Wittenberg (2007) \nThe prediction of breast cancer biopsy outcomes using two CAD approaches that both emphasize an intelligible decision process. \nMedical Physics 34(11), pp. 4164-4172\nCitation Request:\nM. Elter, R. Schulz-Wendtland and T. Wittenberg (2007) \nThe prediction of breast cancer biopsy outcomes using two CAD approaches that both emphasize an intelligible decision process. \nMedical Physics 34(11), pp. 4164-4172", "format": "arff", "uploader": "Dustin Carrion", "uploader_id": 30123, "visibility": "public", "creator": null, "contributor": null, "date": "2022-03-23 12:18:58", "update_comment": null, "last_update": "2022-03-23 12:18:58", "licence": "CC BY-NC-SA 4.0", "status": "active", "error_message": null, "url": "https:\/\/www.openml.org\/data\/download\/22102168\/dataset", "default_target_attribute": null, "row_id_attribute": null, "ignore_attribute": null, "runs": 0, "suggest": { "input": [ "Mammographic-Mass-Data-Set", "Mammography is the most effective method for breast cancer screening available today. However, the low positive predictive value of breast biopsy resulting from mammogram interpretation leads to approximately 70 unnecessary biopsies with benign outcomes. To reduce the high number of unnecessary breast biopsies, several computer-aided diagnosis (CAD) systems have been proposed in the last years.These systems help physicians in their decision to perform a breast biopsy on a suspicious lesion seen " ], "weight": 5 }, "qualities": { "NumberOfInstances": 830, "NumberOfFeatures": 6, "NumberOfClasses": null, "NumberOfMissingValues": 0, "NumberOfInstancesWithMissingValues": 0, "NumberOfNumericFeatures": 6, "NumberOfSymbolicFeatures": 0, "Dimensionality": 0.007228915662650603, "PercentageOfNumericFeatures": 100, "MajorityClassPercentage": null, "PercentageOfSymbolicFeatures": 0, "MajorityClassSize": null, "MinorityClassPercentage": null, "MinorityClassSize": null, "NumberOfBinaryFeatures": 0, "PercentageOfBinaryFeatures": 0, "PercentageOfInstancesWithMissingValues": 0, "AutoCorrelation": null, "PercentageOfMissingValues": 0 }, "tags": [ { "uploader": "38960", "tag": "Computer Systems" }, { "uploader": "38960", "tag": "Machine Learning" } ], "features": [ { "name": "BI-RADS", "index": "0", "type": "numeric", "distinct": "7", "missing": "0", "min": "0", "max": "55", "mean": "4", "stdev": "2" }, { "name": "Age", "index": "1", "type": "numeric", "distinct": "72", "missing": "0", "min": "18", "max": "96", "mean": "56", "stdev": "15" }, { "name": "Shape", "index": "2", "type": "numeric", "distinct": "4", "missing": "0", "min": "1", "max": "4", "mean": "3", "stdev": "1" }, { "name": "Margin", "index": "3", "type": "numeric", "distinct": "5", "missing": "0", "min": "1", "max": "5", "mean": "3", "stdev": "2" }, { "name": "Density", "index": "4", "type": "numeric", "distinct": "4", "missing": "0", "min": "1", "max": "4", "mean": "3", "stdev": "0" }, { "name": "Severity", "index": "5", "type": "numeric", "distinct": "2", "missing": "0", "min": "0", "max": "1", "mean": "0", "stdev": "1" } ], "nr_of_issues": 0, "nr_of_downvotes": 0, "nr_of_likes": 0, "nr_of_downloads": 0, "total_downloads": 0, "reach": 0, "reuse": 0, "impact_of_reuse": 0, "reach_of_reuse": 0, "impact": 0 }