Data
Perth-House-Prices

Perth-House-Prices

active ARFF CC BY-NC-SA 4.0 Visibility: public Uploaded 24-03-2022 by Elif Ceren Gok
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
Acknowledgements This data was scraped from http://house.speakingsame.com/ and includes data from 322 Perth suburbs, resulting in an average of about 100 rows per suburb. Content I believe the columns chosen to represent this dataset are the most crucial in predicting house prices. Some preliminary analysis I conducted showed a significant correlation between each of these columns and the response variable (i.e. price). Data obtained from other than scrape source Longitude and Latitude data was obtained from data.gov.au. School ranking data was obtained from bettereducation. The nearest schools to each address selected in this dataset are schools which are defined to be 'ATAR-applicable'. In the Australian secondary school education system, ATAR is a scoring system used to assess a student's culminative academic results and is used for entry into Australian universities. As such, schools which do not have an ATAR program such as primary schools, vocational schools, special needs schools etc. are not considered in determining the nearest school. Do also note that under the "NEAREST_SCH_RANK" column, there are some missing rows as some schools are unranked according to this criteria by bettereducation.

18 features

ADDRESS (ignore)string33566 unique values
0 missing
SUBURBstring321 unique values
0 missing
PRICEnumeric2297 unique values
0 missing
BEDROOMSnumeric10 unique values
0 missing
BATHROOMSnumeric8 unique values
0 missing
GARAGEnumeric25 unique values
2478 missing
LAND_AREAnumeric4372 unique values
0 missing
FLOOR_AREAnumeric528 unique values
0 missing
BUILD_YEARnumeric124 unique values
3155 missing
CBD_DISTnumeric595 unique values
0 missing
NEAREST_STNstring68 unique values
0 missing
NEAREST_STN_DISTnumeric1189 unique values
0 missing
DATE_SOLDstring350 unique values
0 missing
POSTCODEnumeric114 unique values
0 missing
LATITUDEnumeric29707 unique values
0 missing
LONGITUDEnumeric28557 unique values
0 missing
NEAREST_SCHstring160 unique values
0 missing
NEAREST_SCH_DISTnumeric33318 unique values
0 missing
NEAREST_SCH_RANKnumeric103 unique values
10952 missing

19 properties

33656
Number of instances (rows) of the dataset.
18
Number of attributes (columns) of the dataset.
Number of distinct values of the target attribute (if it is nominal).
16585
Number of missing values in the dataset.
14448
Number of instances with at least one value missing.
14
Number of numeric attributes.
0
Number of nominal attributes.
0
Number of attributes divided by the number of instances.
77.78
Percentage of numeric attributes.
Percentage of instances belonging to the most frequent class.
0
Percentage of nominal attributes.
Number of instances belonging to the most frequent class.
Percentage of instances belonging to the least frequent class.
Number of instances belonging to the least frequent class.
0
Number of binary attributes.
0
Percentage of binary attributes.
42.93
Percentage of instances having missing values.
Average class difference between consecutive instances.
2.74
Percentage of missing values.

0 tasks

Define a new task