OpenML
Hepatitis-C-Prediction-Dataset

Hepatitis-C-Prediction-Dataset

active ARFF Database: Open Database, Contents: Original Authors Visibility: public Uploaded 24-03-2022 by Elif Ceren Gok
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
Context The data set contains laboratory values of blood donors and Hepatitis C patients and demographic values like age. The data was obtained from UCI Machine Learning Repository: https://archive.ics.uci.edu/ml/datasets/HCV+data Content All attributes except Category and Sex are numerical. Attributes 1 to 4 refer to the data of the patient: 1) X (Patient ID/No.) 2) Category (diagnosis) (values: '0=Blood Donor', '0s=suspect Blood Donor', '1=Hepatitis', '2=Fibrosis', '3=Cirrhosis') 3) Age (in years) 4) Sex (f,m) Attributes 5 to 14 refer to laboratory data: 5) ALB 6) ALP 7) ALT 8) AST 9) BIL 10) CHE 11) CHOL 12) CREA 13) GGT 14) PROT The target attribute for classification is Category (2): blood donors vs. Hepatitis C patients (including its progress ('just' Hepatitis C, Fibrosis, Cirrhosis). Acknowledgements Creators: Ralf Lichtinghagen, Frank Klawonn, Georg Hoffmann Donor: Ralf Lichtinghagen: Institute of Clinical Chemistry; Medical University Hannover (MHH); Hannover, Germany; lichtinghagen.ralf '' mh-hannover.de Donor: Frank Klawonn; Helmholtz Centre for Infection Research; Braunschweig, Germany; frank.klawonn '' helmholtz-hzi.de Donor: Georg Hoffmann; Trillium GmbH; Grafrath, Germany; georg.hoffmann '' trillium.de Relevant Papers Lichtinghagen R et al. J Hepatol 2013; 59: 236-42 Hoffmann G et al. Using machine learning techniques to generate laboratory diagnostic pathways a case study. J Lab Precis Med 2018; 3: 58-67 Other Datasets Stroke Prediction Dataset: LINK

14 features

Unnamed:_0numeric615 unique values
0 missing
Categorystring5 unique values
0 missing
Agenumeric49 unique values
0 missing
Sexstring2 unique values
0 missing
ALBnumeric189 unique values
1 missing
ALPnumeric414 unique values
18 missing
ALTnumeric341 unique values
1 missing
ASTnumeric297 unique values
0 missing
BILnumeric188 unique values
0 missing
CHEnumeric407 unique values
0 missing
CHOLnumeric313 unique values
10 missing
CREAnumeric117 unique values
0 missing
GGTnumeric358 unique values
0 missing
PROTnumeric198 unique values
1 missing

19 properties

615
Number of instances (rows) of the dataset.
14
Number of attributes (columns) of the dataset.
Number of distinct values of the target attribute (if it is nominal).
31
Number of missing values in the dataset.
26
Number of instances with at least one value missing.
12
Number of numeric attributes.
0
Number of nominal attributes.
0.02
Number of attributes divided by the number of instances.
85.71
Percentage of numeric attributes.
Percentage of instances belonging to the most frequent class.
0
Percentage of nominal attributes.
Number of instances belonging to the most frequent class.
Percentage of instances belonging to the least frequent class.
Number of instances belonging to the least frequent class.
0
Number of binary attributes.
0
Percentage of binary attributes.
4.23
Percentage of instances having missing values.
Average class difference between consecutive instances.
0.36
Percentage of missing values.

0 tasks

Define a new task