Data
Diabetes-Dataset-2019

Diabetes-Dataset-2019

active ARFF Attribution 4.0 International (CC BY 4.0) Visibility: public Uploaded 23-03-2022 by Dustin Carrion
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
Context This dataset was collected by Neha Prerna Tigga and Dr. Shruti Garg of the Department of Computer Science and Engineering, BIT Mesra, Ranchi-835215 for research, non-commercial purposes only. An article is also published implementing this dataset. For more information and citation of this dataset please refer: Tigga, N. P., Garg, S. (2020). Prediction of Type 2 Diabetes using Machine Learning Classification Methods. Procedia Computer Science, 167, 706-716. DOI: https://doi.org/10.1016/j.procs.2020.03.336 Content There is a total of 952 instances with 17 independent predictor variables and one binary target or dependent variable, Diabetes. Acknowledgements We would like to thank all the participants who contributed towards the building of this dataset. Inspiration To build a machine learning algorithm to predict if a person has diabetes or not?

18 features

Agestring4 unique values
0 missing
Genderstring2 unique values
0 missing
Family_Diabetesstring2 unique values
0 missing
highBPstring2 unique values
0 missing
PhysicallyActivestring4 unique values
0 missing
BMInumeric26 unique values
4 missing
Smokingstring2 unique values
0 missing
Alcoholstring2 unique values
0 missing
Sleepnumeric8 unique values
0 missing
SoundSleepnumeric12 unique values
0 missing
RegularMedicinestring3 unique values
0 missing
JunkFoodstring4 unique values
0 missing
Stressstring4 unique values
0 missing
BPLevelstring6 unique values
0 missing
Preganciesnumeric5 unique values
42 missing
Pdiabetesstring3 unique values
1 missing
UriationFreqstring2 unique values
0 missing
Diabeticstring3 unique values
1 missing

19 properties

952
Number of instances (rows) of the dataset.
18
Number of attributes (columns) of the dataset.
Number of distinct values of the target attribute (if it is nominal).
48
Number of missing values in the dataset.
47
Number of instances with at least one value missing.
4
Number of numeric attributes.
0
Number of nominal attributes.
0.02
Number of attributes divided by the number of instances.
22.22
Percentage of numeric attributes.
Percentage of instances belonging to the most frequent class.
0
Percentage of nominal attributes.
Number of instances belonging to the most frequent class.
Percentage of instances belonging to the least frequent class.
Number of instances belonging to the least frequent class.
0
Number of binary attributes.
0
Percentage of binary attributes.
4.94
Percentage of instances having missing values.
Average class difference between consecutive instances.
0.28
Percentage of missing values.

0 tasks

Define a new task