Data
California-Housing-Prices

California-Housing-Prices

active ARFF CC0: Public Domain Visibility: public Uploaded 24-03-2022 by Dustin Carrion
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
Context This is the dataset used in the second chapter of Aurlien Gron's recent book 'Hands-On Machine learning with Scikit-Learn and TensorFlow'. It serves as an excellent introduction to implementing machine learning algorithms because it requires rudimentary data cleaning, has an easily understandable list of variables and sits at an optimal size between being to toyish and too cumbersome. The data contains information from the 1990 California census. So although it may not help you with predicting current housing prices like the Zillow Zestimate dataset, it does provide an accessible introductory dataset for teaching people about the basics of machine learning. Content The data pertains to the houses found in a given California district and some summary stats about them based on the 1990 census data. Be warned the data aren't cleaned so there are some preprocessing steps required! The columns are as follows, their names are pretty self explanitory: longitude latitude housingmedianage total_rooms total_bedrooms population households median_income medianhousevalue ocean_proximity Acknowledgements This data was initially featured in the following paper: Pace, R. Kelley, and Ronald Barry. "Sparse spatial autoregressions." Statistics Probability Letters 33.3 (1997): 291-297. and I encountered it in 'Hands-On Machine learning with Scikit-Learn and TensorFlow' by Aurlien Gron. Aurlien Gron wrote: This dataset is a modified version of the California Housing dataset available from: Lus Torgo's page (University of Porto) Inspiration See my kernel on machine learning basics in R using this dataset, or venture over to the following link for a python based introductory tutorial: https://github.com/ageron/handson-ml/tree/master/datasets/housing

10 features

longitudenumeric844 unique values
0 missing
latitudenumeric862 unique values
0 missing
housing_median_agenumeric52 unique values
0 missing
total_roomsnumeric5926 unique values
0 missing
total_bedroomsnumeric1923 unique values
207 missing
populationnumeric3888 unique values
0 missing
householdsnumeric1815 unique values
0 missing
median_incomenumeric12928 unique values
0 missing
median_house_valuenumeric3842 unique values
0 missing
ocean_proximitystring5 unique values
0 missing

19 properties

20640
Number of instances (rows) of the dataset.
10
Number of attributes (columns) of the dataset.
Number of distinct values of the target attribute (if it is nominal).
207
Number of missing values in the dataset.
207
Number of instances with at least one value missing.
9
Number of numeric attributes.
0
Number of nominal attributes.
0
Percentage of binary attributes.
1
Percentage of instances having missing values.
0.1
Percentage of missing values.
Average class difference between consecutive instances.
90
Percentage of numeric attributes.
0
Number of attributes divided by the number of instances.
0
Percentage of nominal attributes.
Percentage of instances belonging to the most frequent class.
Number of instances belonging to the most frequent class.
Percentage of instances belonging to the least frequent class.
Number of instances belonging to the least frequent class.
0
Number of binary attributes.

0 tasks

Define a new task