OpenML

JavaScript is required to properly view the contents of this page!

Explore
- Data
- Task
- Flow
- Run
- Study
- Task type
- Measure
- People
Help
Blog
Contact
Please cite us

house_16H

active ARFF Publicly available Visibility: public Uploaded 16-06-2022 by Leo Grin
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes

Issue	#Downvotes for this reason	By

Loading wiki

Help us complete this description Edit

Dataset used in the tabular data benchmark https://github.com/LeoGrin/tabular-benchmark, transformed in the same way. This dataset belongs to the "regression on numerical features" benchmark. Original description: Author: Source: Unknown - Date unknown Please cite: This database was designed on the basis of data provided by US Census Bureau [http://www.census.gov] (under Lookup Access [http://www.census.gov/cdrom/lookup]: Summary Tape File 1). The data were collected as part of the 1990 US census. These are mostly counts cumulated at different survey levels. For the purpose of this data set a level State-Place was used. Data from all states was obtained. Most of the counts were changed into appropriate proportions. There are 4 different data sets obtained from this database: House(8H) House(8L) House(16H) House(16L) These are all concerned with predicting the median price of the house in the region based on demographic composition and a state of housing market in the region. A number in the name signifies the number of attributes of the data set. A following letter denotes a very rough approximation to the difficulty of the task. For Low task difficulty, more correlated attributes were chosen as signified by univariate smooth fit of that input on the target. Tasks with High difficulty have had their attributes chosen to make the modelling more difficult due to higher variance or lower correlation of the inputs to the target. Original source: DELVE repository of data. Source: collection of regression datasets by Luis Torgo (ltorgo@ncc.up.pt) at http://www.ncc.up.pt/~ltorgo/Regression/DataSets.html Characteristics: 22784 cases, 17 continuous attributes.

17 features

price (target)	numeric	2045 unique values 0 missing
P1	numeric	8832 unique values 0 missing
P5p1	numeric	17504 unique values 0 missing
P6p2	numeric	13683 unique values 0 missing
P11p4	numeric	19220 unique values 0 missing
P14p9	numeric	16168 unique values 0 missing
P15p1	numeric	18753 unique values 0 missing
P15p3	numeric	9655 unique values 0 missing
P16p2	numeric	15570 unique values 0 missing
P18p2	numeric	8070 unique values 0 missing
P27p4	numeric	12052 unique values 0 missing
H2p2	numeric	15662 unique values 0 missing
H8p2	numeric	10941 unique values 0 missing
H10p1	numeric	10855 unique values 0 missing
H13p1	numeric	17097 unique values 0 missing
H18pA	numeric	9063 unique values 0 missing
H40p4	numeric	2421 unique values 0 missing