Data
MiamiHousing2016

MiamiHousing2016

active ARFF Attribution-NonCommercial-ShareAlike (CC BY-NC-SA) Visibility: public Uploaded 05-07-2022 by Leo Grin
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
Dataset used in the tabular data benchmark https://github.com/LeoGrin/tabular-benchmark, transformed in the same way. This dataset belongs to the "regression on numerical features" benchmark. Original description: The dataset contains information on 13,932 single-family homes sold in Miami in 2016. Besides publicly available information, the dataset creator Steven C. Bourassa has added distance variables, aviation noise as well as latitude and longitude. The dataset containts the following columns: - PARCELNO: unique identifier for each property. About 1% appear multiple times. - SALE_PRC: sale price ($) - LND_SQFOOT: land area (square feet) - TOT_LVG_AREA: floor area (square feet) - SPEC_FEAT_VAL: value of special features (e.g., swimming pools) ($) - RAIL_DIST: distance to the nearest rail line (an indicator of noise) (feet) - OCEAN_DIST: distance to the ocean (feet) - WATER_DIST: distance to the nearest body of water (feet) - CNTR_DIST: distance to the Miami central business district (feet) - SUBCNTR_DI: distance to the nearest subcenter (feet) - HWY_DIST: distance to the nearest highway (an indicator of noise) (feet) - age: age of the structure - avno60plus: dummy variable for airplane noise exceeding an acceptable level - structure_quality: quality of the structure - month_sold: sale month in 2016 (1 = jan) - LATITUDE - LONGITUDE A typical model would try to predict log(SALE_PRC) as a function of all variables except the PARCELNO.

14 features

SALEPRC (target)numeric2111 unique values
0 missing
LATITUDEnumeric13776 unique values
0 missing
LONGITUDEnumeric13776 unique values
0 missing
LND_SQFOOTnumeric4696 unique values
0 missing
TOT_LVG_AREAnumeric2978 unique values
0 missing
SPEC_FEAT_VALnumeric7583 unique values
0 missing
RAIL_DISTnumeric13235 unique values
0 missing
OCEAN_DISTnumeric13617 unique values
0 missing
WATER_DISTnumeric13218 unique values
0 missing
CNTR_DISTnumeric13682 unique values
0 missing
SUBCNTR_DInumeric13642 unique values
0 missing
HWY_DISTnumeric13213 unique values
0 missing
agenumeric96 unique values
0 missing
month_soldnumeric12 unique values
0 missing

19 properties

13932
Number of instances (rows) of the dataset.
14
Number of attributes (columns) of the dataset.
0
Number of distinct values of the target attribute (if it is nominal).
0
Number of missing values in the dataset.
0
Number of instances with at least one value missing.
14
Number of numeric attributes.
0
Number of nominal attributes.
0
Percentage of missing values.
0.73
Average class difference between consecutive instances.
100
Percentage of numeric attributes.
0
Number of attributes divided by the number of instances.
Percentage of instances belonging to the most frequent class.
0
Percentage of nominal attributes.
Number of instances belonging to the most frequent class.
Percentage of instances belonging to the least frequent class.
Number of instances belonging to the least frequent class.
0
Number of binary attributes.
0
Percentage of binary attributes.
0
Percentage of instances having missing values.

1 tasks

0 runs - estimation_procedure: 10-fold Crossvalidation - evaluation_measure: root_mean_squared_error - target_feature: SALEPRC
Define a new task