Data
Income_Adult_Predictor

Income_Adult_Predictor

active ARFF Public Domain (CC0) Visibility: public Uploaded 31-05-2024 by Iwo Godzwon
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
Description: The adult.csv dataset encompasses a collection of socio-economic data for adult individuals. Provided data attributes include demographics, education, employment, and income indicators. This dataset is designed to offer insight into factors influencing income levels, providing a foundation for socio-economic analysis, labor market studies, and educational outcome research. Attribute Description: - age: An individual's age. Sample values include integers ranging from 23 to 58. - workclass: The type of employing sector. Examples include 'State-gov', 'Federal-gov', 'Private', and unspecified categories represented as '?'. - fnlwgt: Final weight. This number reflects the number of people the census believes the entry represents. Sample values range from 107302 to 261012. - education: The highest level of education attained by an individual. Categories range from 'Bachelors' to 'HS-grad'. - education.num: A numerical representation of the highest education attained. Values range from 9 to 13. - marital.status: Marital status of the individual, e.g., 'Married-civ-spouse', 'Separated', 'Never-married'. - occupation: The individual's occupation, including 'Prof-specialty', 'Transport-moving', 'Exec-managerial'. - relationship: The individual's role in the family, such as 'Wife', 'Husband', 'Not-in-family'. - race: Race of the individual, with examples including 'Black' and 'White'. - sex: The sex of the individual, either 'Male' or 'Female'. - capital.gain: Capital gains recorded, with sample entries uniformly at 0. - capital.loss: Capital losses recorded, sample values are consistently 0. - hours.per.week: Number of hours worked per week. Sample values include 20, 35, and 40. - native.country: Country of origin, with all sample individuals from 'United-States'. - income: Income categories divided into '<=50K' and '>50K'. Use Case: The adult.csv dataset is pivotal for studies focusing on income disparity, employment trends, the impact of education on earnings, and demographic analysis. Researchers and policymakers can leverage this dataset to understand the dynamics of the labor market, identify educational or skill gaps, and develop targeted social welfare programs. Moreover, it serves as a valuable dataset for machine learning projects aimed at predicting income levels based on a wide range of socio-economic factors.

15 features

agenumeric73 unique values
0 missing
workclassnominal8 unique values
1836 missing
fnlwgtnumeric21648 unique values
0 missing
educationnominal16 unique values
0 missing
education.numnumeric16 unique values
0 missing
marital.statusnominal7 unique values
0 missing
occupationnominal14 unique values
1843 missing
relationshipnominal6 unique values
0 missing
racenominal5 unique values
0 missing
sexnominal2 unique values
0 missing
capital.gainnumeric119 unique values
0 missing
capital.lossnumeric92 unique values
0 missing
hours.per.weeknumeric94 unique values
0 missing
native.countrynominal41 unique values
583 missing
incomenominal2 unique values
0 missing

19 properties

32561
Number of instances (rows) of the dataset.
15
Number of attributes (columns) of the dataset.
Number of distinct values of the target attribute (if it is nominal).
4262
Number of missing values in the dataset.
2399
Number of instances with at least one value missing.
6
Number of numeric attributes.
9
Number of nominal attributes.
0
Number of attributes divided by the number of instances.
40
Percentage of numeric attributes.
Percentage of instances belonging to the most frequent class.
60
Percentage of nominal attributes.
Number of instances belonging to the most frequent class.
Percentage of instances belonging to the least frequent class.
Number of instances belonging to the least frequent class.
2
Number of binary attributes.
13.33
Percentage of binary attributes.
7.37
Percentage of instances having missing values.
Average class difference between consecutive instances.
0.87
Percentage of missing values.

0 tasks

Define a new task