Data
echocardiogram-uci

echocardiogram-uci

active ARFF Publicly available Visibility: public Uploaded 14-10-2019 by Andreas Mueller
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
1. Title: Echocardiogram Data 2. Source Information: -- Donor: Steven Salzberg (salzberg@cs.jhu.edu) -- Collector: -- Dr. Evlin Kinney -- The Reed Institute -- P.O. Box 402603 -- Maimi, FL 33140-0603 -- Date Received: 28 February 1989 3. Past Usage: -- 1. Salzberg, S. (1988). Exemplar-based learning: Theory and implementation (Technical Report TR-10-88). Harvard University, Center for Research in Computing Technology, Aiken Computation Laboratory (33 Oxford Street; Cambridge, MA 02138). -- Steve applied his EACH program to predict survival (i.e., life or death), did not use the wall-motion attribute, and recorded 87 correct and 29 incorrect in an incremental application to this database. He also showed that, by tuning EACH to this domain, EACH was able to derive (non-incrementally) a set of 28 hyper-rectangles that could perfectly classify 119 instances. -- 2. Kan, G., Visser, C., Kooler, J., & Dunning, A. (1986). Short and long term predictive value of wall motion score in acute myocardial infarction. British Heart Journal, 56, 422-427. -- They predicted the same variable (whether patients will live one year after a heart attack) using a different set of 345 instances. Their statistical test recorded a 61% accuracy in predicting that a patient will die (post-hoc fit). -- 3. Elvin Kinney (in communication with Steven Salzberg) reported that a Cox regression application recorded a 60% accuracy in predicting that a patient will die. 4. Relevant Information: -- All the patients suffered heart attacks at some point in the past. Some are still alive and some are not. The survival and still-alive variables, when taken together, indicate whether a patient survived for at least one year following the heart attack. The problem addressed by past researchers was to predict from the other variables whether or not the patient will survive at least one year. The most difficult part of this problem is correctly predicting that the patient will NOT survive. (Part of the difficulty seems to be the size of the data set.) 5. Number of Instances: 132 6. Number of Attributes: 13 (all numeric-valued) 7. Attribute Information: 1. survival -- the number of months patient survived (has survived, if patient is still alive). Because all the patients had their heart attacks at different times, it is possible that some patients have survived less than one year but they are still alive. Check the second variable to confirm this. Such patients cannot be used for the prediction task mentioned above. 2. still-alive -- a binary variable. 0=dead at end of survival period, 1 means still alive 3. age-at-heart-attack -- age in years when heart attack occurred 4. pericardial-effusion -- binary. Pericardial effusion is fluid around the heart. 0=no fluid, 1=fluid 5. fractional-shortening -- a measure of contracility around the heart lower numbers are increasingly abnormal 6. epss -- E-point septal separation, another measure of contractility. Larger numbers are increasingly abnormal. 7. lvdd -- left ventricular end-diastolic dimension. This is a measure of the size of the heart at end-diastole. Large hearts tend to be sick hearts. 8. wall-motion-score -- a measure of how the segments of the left ventricle are moving 9. wall-motion-index -- equals wall-motion-score divided by number of segments seen. Usually 12-13 segments are seen in an echocardiogram. Use this variable INSTEAD of the wall motion score. 10. mult -- a derivate var which can be ignored 11. name -- the name of the patient (I have replaced them with "name") 12. group -- meaningless, ignore it 13. alive-at-1 -- Boolean-valued. Derived from the first two attributes. 0 means patient was either dead after 1 year or had been followed for less than 1 year. 1 means patient was alive at 1 year. 8. Missing Attribute Values: (denoted by "?") Attribute #: Number of Missing Values: (total: 132) ------------ ------------------------- 1 2 2 1 3 5 4 1 5 8 6 15 7 11 8 4 9 1 10 4 11 0 12 22 13 58 9. Distribution of attribute number 2: still-alive Value Number of instances with this value ---- ----------------------------------- 0 88 (dead) 1 43 (alive) ? 1 Total 132 10. Distribution of attribute number 13: alive-at-1 Value Number of instances with this value ---- ----------------------------------- 0 50 1 24 ? 58 Total 132

8 features

alive-at-1 (target)string3 unique values
57 missing
survival (ignore)string56 unique values
2 missing
still-alive (ignore)string2 unique values
1 missing
age-at-heart-attackstring38 unique values
6 missing
pericardial-effusionnumeric3 unique values
0 missing
fractional-shorteningstring72 unique values
8 missing
epssstring91 unique values
15 missing
lvddstring105 unique values
11 missing
wall-motion-scorestring46 unique values
4 missing
wall-motion-indexstring65 unique values
2 missing
mult (ignore)string30 unique values
3 missing
name (ignore)string1 unique values
1 missing
group (ignore)string3 unique values
22 missing

62 properties

132
Number of instances (rows) of the dataset.
8
Number of attributes (columns) of the dataset.
4
Number of distinct values of the target attribute (if it is nominal).
103
Number of missing values in the dataset.
25
Number of instances with at least one value missing.
1
Number of numeric attributes.
0
Number of nominal attributes.
Entropy of the target attribute values.
An estimate of the amount of irrelevant information in the attributes regarding the class. Equals (MeanAttributeEntropy - MeanMutualInformation) divided by MeanMutualInformation.
Second quartile (Median) of entropy among attributes.
0.06
Number of attributes divided by the number of instances.
Average number of distinct values among the attributes of the nominal type.
131.1
Second quartile (Median) of kurtosis among attributes of the numeric type.
Number of attributes needed to optimally describe the class (under the assumption of independence among attributes). Equals ClassEntropy divided by MeanMutualInformation.
11.43
Mean skewness among attributes of the numeric type.
0.77
Second quartile (Median) of means among attributes of the numeric type.
43.18
Percentage of instances belonging to the most frequent class.
6.7
Mean standard deviation of attributes of the numeric type.
Second quartile (Median) of mutual information between the nominal attributes and the target attribute.
57
Number of instances belonging to the most frequent class.
Minimal entropy among attributes.
11.43
Second quartile (Median) of skewness among attributes of the numeric type.
Maximum entropy among attributes.
131.1
Minimum kurtosis among attributes of the numeric type.
0
Percentage of binary attributes.
6.7
Second quartile (Median) of standard deviation of attributes of the numeric type.
131.1
Maximum kurtosis among attributes of the numeric type.
0.77
Minimum of means among attributes of the numeric type.
18.94
Percentage of instances having missing values.
Third quartile of entropy among attributes.
0.77
Maximum of means among attributes of the numeric type.
Minimal mutual information between the nominal attributes and the target attribute.
9.75
Percentage of missing values.
131.1
Third quartile of kurtosis among attributes of the numeric type.
Maximum mutual information between the nominal attributes and the target attribute.
The minimal number of distinct values among attributes of the nominal type.
12.5
Percentage of numeric attributes.
0.77
Third quartile of means among attributes of the numeric type.
The maximum number of distinct values among attributes of the nominal type.
11.43
Minimum skewness among attributes of the numeric type.
0
Percentage of nominal attributes.
Third quartile of mutual information between the nominal attributes and the target attribute.
11.43
Maximum skewness among attributes of the numeric type.
6.7
Minimum standard deviation of attributes of the numeric type.
First quartile of entropy among attributes.
11.43
Third quartile of skewness among attributes of the numeric type.
6.7
Maximum standard deviation of attributes of the numeric type.
0.76
Percentage of instances belonging to the least frequent class.
131.1
First quartile of kurtosis among attributes of the numeric type.
6.7
Third quartile of standard deviation of attributes of the numeric type.
Average entropy of the attributes.
1
Number of instances belonging to the least frequent class.
0.77
First quartile of means among attributes of the numeric type.
Standard deviation of the number of distinct values among attributes of the nominal type.
131.1
Mean kurtosis among attributes of the numeric type.
0
Number of binary attributes.
First quartile of mutual information between the nominal attributes and the target attribute.
0.77
Mean of means among attributes of the numeric type.
11.43
First quartile of skewness among attributes of the numeric type.
1
Average class difference between consecutive instances.
Average mutual information between the nominal attributes and the target attribute.
6.7
First quartile of standard deviation of attributes of the numeric type.

8 tasks

0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
Define a new task