OpenML

JavaScript is required to properly view the contents of this page!

Explore
- Data
- Task
- Flow
- Run
- Study
- Task type
- Measure
- People
Help
Blog
Contact
Please cite us

Adult-Census-Income

active ARFF CC0: Public Domain Visibility: public Uploaded 06-06-2023 by Matthias Feurer
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes

Issue	#Downvotes for this reason	By

Loading wiki

Help us complete this description Edit

This data was extracted from the 1994 Census bureau database by Ronny Kohavi and Barry Becker (Data Mining and Visualization, Silicon Graphics). A set of reasonably clean records was extracted using the following conditions: ((AAGE16) (AGI100) (AFNLWGT1) (HRSWK0)). The prediction task is to determine whether a person makes over 50K a year. Description of fnlwgt (final weight) The weights on the Current Population Survey (CPS) files are controlled to independent estimates of the civilian noninstitutional population of the US. These are prepared monthly for us by Population Division here at the Census Bureau. We use 3 sets of controls. These are: A single cell estimate of the population 16+ for each state. Controls for Hispanic Origin by age and sex. Controls by Race, age and sex. We use all three sets of controls in our weighting program and "rake" through them 6 times so that by the end we come back to all the controls we used. The term estimate refers to population totals derived from CPS by creating "weighted tallies" of any specified socio-economic characteristics of the population. People with similar demographic characteristics should have similar weights. There is one important caveat to remember about this statement. That is that since the CPS sample is actually a collection of 51 state samples, each with its own probability of selection, the statement only applies within state. Relevant papers Ron Kohavi, "Scaling Up the Accuracy of Naive-Bayes Classifiers: a Decision-Tree Hybrid", Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, 1996. (PDF)

15 features

age	numeric	73 unique values 0 missing
workclass	nominal	8 unique values 1836 missing
fnlwgt	numeric	21648 unique values 0 missing
education	nominal	16 unique values 0 missing
education.num	numeric	16 unique values 0 missing
marital.status	nominal	7 unique values 0 missing
occupation	nominal	14 unique values 1843 missing
relationship	nominal	6 unique values 0 missing
race	nominal	5 unique values 0 missing
sex	nominal	2 unique values 0 missing
capital.gain	numeric	119 unique values 0 missing
capital.loss	numeric	92 unique values 0 missing
hours.per.week	numeric	94 unique values 0 missing
native.country	nominal	41 unique values 583 missing
income	nominal	2 unique values 0 missing