Data
Mammographic-Mass-Data-Set

Mammographic-Mass-Data-Set

active ARFF CC BY-NC-SA 4.0 Visibility: public Uploaded 23-03-2022 by Dustin Carrion
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
  • Computer Systems Machine Learning
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
Mammography is the most effective method for breast cancer screening available today. However, the low positive predictive value of breast biopsy resulting from mammogram interpretation leads to approximately 70 unnecessary biopsies with benign outcomes. To reduce the high number of unnecessary breast biopsies, several computer-aided diagnosis (CAD) systems have been proposed in the last years.These systems help physicians in their decision to perform a breast biopsy on a suspicious lesion seen in a mammogram or to perform a short term follow-up examination instead. This data set can be used to predict the severity (benign or malignant) of a mammographic mass lesion from BI-RADS attributes and the patient's age. It contains a BI-RADS assessment, the patient's age and three BI-RADS attributes together with the ground truth (the severity field) for 516 benign and 445 malignant masses that have been identified on full field digital mammograms collected at the Institute of Radiology of the University Erlangen-Nuremberg between 2003 and 2006. Each instance has an associated BI-RADS assessment ranging from 1 (definitely benign) to 5 (highly suggestive of malignancy) assigned in a double-review process by physicians. Assuming that all cases with BI-RADS assessments greater or equal a given value (varying from 1 to 5), are malignant and the other cases benign, sensitivities and associated specificities can be calculated. These can be an indication of how well a CAD system performs compared to the radiologists. Class Distribution: benign: 516; malignant: 445 Attribute Information: 6 Attributes in total (1 goal field, 1 non-predictive, 4 predictive attributes) BI-RADS assessment: 1 to 5 (ordinal, non-predictive!) Age: patient's age in years (integer) Shape: mass shape: round=1 oval=2 lobular=3 irregular=4 (nominal) Margin: mass margin: circumscribed=1 microlobulated=2 obscured=3 ill-defined=4 spiculated=5 (nominal) Density: mass density high=1 iso=2 low=3 fat-containing=4 (ordinal) Severity: benign=0 or malignant=1 (binominal, goal field!) Missing Attribute Values: BI-RADS assessment: 2 Age: 5 Shape: 31 Margin: 48 Density: 76 Severity: 0 I acknowledge that this dataset is not mine and I have only reformatted the data and uploaded it to kaggle. Source: Matthias Elter Fraunhofer Institute for Integrated Circuits (IIS) Image Processing and Medical Engineering Department (BMT) Am Wolfsmantel 33 91058 Erlangen, Germany matthias.elter '' iis.fraunhofer.de (49) 9131-7767327 Prof. Dr. Rdiger Schulz-Wendtland Institute of Radiology, Gynaecological Radiology, University Erlangen-Nuremberg Universittsstrae 21-23 91054 Erlangen, Germany Relevant Papers: M. Elter, R. Schulz-Wendtland and T. Wittenberg (2007) The prediction of breast cancer biopsy outcomes using two CAD approaches that both emphasize an intelligible decision process. Medical Physics 34(11), pp. 4164-4172 Citation Request: M. Elter, R. Schulz-Wendtland and T. Wittenberg (2007) The prediction of breast cancer biopsy outcomes using two CAD approaches that both emphasize an intelligible decision process. Medical Physics 34(11), pp. 4164-4172

6 features

BI-RADSnumeric7 unique values
0 missing
Agenumeric72 unique values
0 missing
Shapenumeric4 unique values
0 missing
Marginnumeric5 unique values
0 missing
Densitynumeric4 unique values
0 missing
Severitynumeric2 unique values
0 missing

19 properties

830
Number of instances (rows) of the dataset.
6
Number of attributes (columns) of the dataset.
Number of distinct values of the target attribute (if it is nominal).
0
Number of missing values in the dataset.
0
Number of instances with at least one value missing.
6
Number of numeric attributes.
0
Number of nominal attributes.
0.01
Number of attributes divided by the number of instances.
100
Percentage of numeric attributes.
Percentage of instances belonging to the most frequent class.
0
Percentage of nominal attributes.
Number of instances belonging to the most frequent class.
Percentage of instances belonging to the least frequent class.
Number of instances belonging to the least frequent class.
0
Number of binary attributes.
0
Percentage of binary attributes.
0
Percentage of instances having missing values.
Average class difference between consecutive instances.
0
Percentage of missing values.

0 tasks

Define a new task