Data
Asia_dataset

Asia_dataset

active ARFF Publicly available Visibility: public Uploaded 31-01-2022 by Oleksandr Zadorozhnyi
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
  • Computer Systems Machine Learning
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
Dataset description A synthetic dataset from Lauritzen and Spiegelhalter (1988) about lung diseases (tuberculosis, lung cancer or bronchitis) and visits to Asia. Format of the dataset A data frame with 5000 rows and 8 binary variables: D (dyspnoea), binary 1/0 corresponding to "yes" and "no" T (tuberculosis), binary 1/0 corresponding to "yes" and "no" L (lung cancer), binary 1/0 corresponding to "yes" and "no" B (bronchitis), binary 1/0 corresponding to "yes" and "no" A (visit to Asia), binary 1/0 corresponding to "yes" and "no" S (smoking), binary 1/0 corresponding to "yes" and "no" X (chest X-ray), binary 1/0 corresponding to "yes" and "no" E (tuberculosis versus lung cancer/bronchitis), binary 1/0 corresponding to "yes" and "no" Source https://www.bnlearn.com/bnrepository/ References Lauritzen S, Spiegelhalter D (1988). 'Local Computation with Probabilities on Graphical Structures and their Application to Expert Systems (with discussion)'. Journal of the Royal Statistical Society: Series B 50, 157-224.

8 features

Anominal2 unique values
0 missing
Snominal2 unique values
0 missing
Tnominal2 unique values
0 missing
Lnominal2 unique values
0 missing
Bnominal2 unique values
0 missing
Enominal2 unique values
0 missing
Xnominal2 unique values
0 missing
Dnominal2 unique values
0 missing

19 properties

5000
Number of instances (rows) of the dataset.
8
Number of attributes (columns) of the dataset.
Number of distinct values of the target attribute (if it is nominal).
0
Number of missing values in the dataset.
0
Number of instances with at least one value missing.
0
Number of numeric attributes.
8
Number of nominal attributes.
100
Percentage of binary attributes.
0
Percentage of instances having missing values.
Average class difference between consecutive instances.
0
Percentage of missing values.
0
Number of attributes divided by the number of instances.
0
Percentage of numeric attributes.
Percentage of instances belonging to the most frequent class.
100
Percentage of nominal attributes.
Number of instances belonging to the most frequent class.
Percentage of instances belonging to the least frequent class.
Number of instances belonging to the least frequent class.
8
Number of binary attributes.

0 tasks

Define a new task