Data
california_housing

california_housing

active ARFF Publicly available Visibility: public Uploaded 22-12-2022 by Sebastian Fischer
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
Data Description Information on the variables was collected using all the block groups in California from the 1990 Census. In this sample a block group on average includes 1425.5 individuals living in a geographically compact area. Naturally, the geographical area included varies inversely with the population density. Distances among the centroids of each block group were computed as measured in latitude and longitude. All the block groups reporting zero entries for the independent and dependent variables were excluded. The final data contained 20,640 observations on 9 variables. Each row in the dataset represents one census block group. A block group is the smallest geographical unit for which the U.S. Census Bureau publishes sample data (a block group typically has a population of 600 to 3,000 people). The goal of the dataset is to predict the median house value. The original dataset description advised to predict the value using logarithmic transform. Attribute Description Census block group describing features: 1. *longitude* 2. *latitude* 3. *housingMedianAge* 4. *totalRooms* 5. *totalBedrooms* 6. *population* 7. *households* 8. *medianIncome* 9. *medianHouseValue* - target feature

9 features

medianHouseValue (target)numeric3842 unique values
0 missing
longitudenumeric844 unique values
0 missing
latitudenumeric862 unique values
0 missing
housingMedianAgenumeric52 unique values
0 missing
totalRoomsnumeric5926 unique values
0 missing
totalBedroomsnumeric1928 unique values
0 missing
populationnumeric3888 unique values
0 missing
householdsnumeric1815 unique values
0 missing
medianIncomenumeric12928 unique values
0 missing

19 properties

20640
Number of instances (rows) of the dataset.
9
Number of attributes (columns) of the dataset.
0
Number of distinct values of the target attribute (if it is nominal).
0
Number of missing values in the dataset.
0
Number of instances with at least one value missing.
9
Number of numeric attributes.
0
Number of nominal attributes.
0
Percentage of binary attributes.
0
Percentage of instances having missing values.
0
Percentage of missing values.
-38375.04
Average class difference between consecutive instances.
100
Percentage of numeric attributes.
0
Number of attributes divided by the number of instances.
0
Percentage of nominal attributes.
Percentage of instances belonging to the most frequent class.
Number of instances belonging to the most frequent class.
Percentage of instances belonging to the least frequent class.
Number of instances belonging to the least frequent class.
0
Number of binary attributes.

1 tasks

0 runs - estimation_procedure: 10-fold Crossvalidation - target_feature: medianHouseValue
Define a new task