OpenML

JavaScript is required to properly view the contents of this page!

WHO-national-life-expectancy

active ARFF CC0: Public Domain Visibility: public Uploaded 23-03-2022 by Elif Ceren Gok
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes

Issue	#Downvotes for this reason	By

Loading wiki

Help us complete this description Edit

Context I am developing my data science skills in areas outside of my previous work. An interesting problem for me was to identify which factors influence life expectancy on a national level. There is an existing Kaggle data set that explored this, but that information was corrupted. Part of the problem solving process is to step back periodically and ask "does this make sense?" Without reasonable data, it is harder to notice mistakes in my analysis code (as opposed to unusual behavior due to the data itself). I wanted to make a similar data set, but with reliable information. This is my first time exploring life expectancy, so I had to guess which features might be of interest when making the data set. Some were included for comparison with the other Kaggle data set. A number of potentially interesting features (like air pollution) were left off due to limited year or country coverage. Since the data was collected from more than one server, some features are present more than once, to explore the differences. Content A goal of the World Health Organization (WHO) is to ensure that a billion more people are protected from health emergencies, and provided better health and well-being. They provide public data collected from many sources to identify and monitor factors that are important to reach this goal. This set was primarily made using GHO (Global Health Observatory) and UNESCO (United Nations Educational Scientific and Culture Organization) information. The set covers the years 2000-2016 for 183 countries, in a single CSV file. Missing data is left in place, for the user to decide how to deal with it. Three notebooks are provided for my cursory analysis, a comparison with the other Kaggle set, and a template for creating this data set. Inspiration There is a lot to explore, if the user is interested. The GHO server alone has over 2000 "indicators". How are the GHO and UNESCO life expectancies calculated, and what is causing the difference? That could also be asked for Gross National Income (GNI) and mortality features. How does the life expectancy after age 60 compare to the life expectancy at birth? Is the relationship with the features in this data set different for those two targets? What other indicators on the servers might be interesting to use? Some of the GHO indicators are different studies with different coverage. Can they be combined to make a more useful and robust data feature? Unraveling the correlations between the features would take significant work.

32 features

country	string	183 unique values 0 missing
country_code	string	183 unique values 0 missing
region	string	6 unique values 0 missing
year	numeric	17 unique values 0 missing
life_expect	numeric	3109 unique values 0 missing
life_exp60	numeric	3107 unique values 0 missing
adult_mortality	numeric	3110 unique values 0 missing
infant_mort	numeric	2758 unique values 0 missing
age1-4mort	numeric	1360 unique values 0 missing
alcohol	numeric	2980 unique values 50 missing
bmi	numeric	122 unique values 34 missing
age5-19thinness	numeric	227 unique values 34 missing
age5-19obesity	numeric	213 unique values 34 missing
hepatitis	numeric	92 unique values 569 missing
measles	numeric	78 unique values 19 missing
polio	numeric	77 unique values 19 missing
diphtheria	numeric	79 unique values 19 missing
basic_water	numeric	2699 unique values 32 missing
doctors	numeric	1737 unique values 1331 missing
hospitals	numeric	128 unique values 2981 missing
gni_capita	numeric	1559 unique values 682 missing
gghe-d	numeric	3004 unique values 100 missing
che_gdp	numeric	2988 unique values 117 missing
une_pop	numeric	3073 unique values 37 missing
une_infant	numeric	867 unique values 0 missing
une_life	numeric	2944 unique values 0 missing
une_hiv	numeric	187 unique values 741 missing
une_gni	numeric	1870 unique values 117 missing
une_poverty	numeric	291 unique values 2198 missing
une_edu_spend	numeric	1824 unique values 1286 missing
une_literacy	numeric	565 unique values 2540 missing
une_school	numeric	805 unique values 2306 missing