Data
particulate-matter-ukair-2017

particulate-matter-ukair-2017

active ARFF Open Government Licence (OGL) Visibility: public Uploaded 21-06-2022 by Leo Grin
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
  • Computer Systems Physical Sciences
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
Dataset used in the tabular data benchmark https://github.com/LeoGrin/tabular-benchmark, transformed in the same way. This dataset belongs to the "regression on categorical and numerical features" benchmark. Original description: Hourly particulate matter air polution data of Great Britain for the year 2017, provided by Ricardo Energy and Environment on behalf of the UK Department for Environment, Food and Rural Affairs (DEFRA) and the Devolved Administrations on [https://uk-air.defra.gov.uk/]. The data was scraped from the UK AIR homepage via the R-package 'rdefra' [Vitolo, C., Russell, A., & Tucker, A. (2016, August). Rdefra: interact with the UK AIR pollution database from DEFRA. The Journal of Open Source Software, 1(4). doi:10.21105/joss.00051] on 09.11.2018. The data was published by DEFRA under the Open Government Licence (OGL) [http://www.nationalarchives.gov.uk/doc/open-government-licence/version/2/]. For a description of all variables, checkout the UK AIR homepage. The variable 'PM.sub.10..sub..particulate.matter..Hourly.measured.' was chosen as the target. The dataset also contains another measure of particulate matter 'PM.sub.2.5..sub..particulate.matter..Hourly.measured.' (ignored by default) which could be used as the target instead. The string variable 'datetime' (ignored by default) could be used to construct additional date/time features. In this version of the dataset, the features 'Longitude' and 'Latitude' were removed to increase the importance of the categorical features 'Zone' and 'Site.Name'.

7 features

PM.sub.10..sub..particulate.matter..Hourly.measured. (target)numeric21599 unique values
0 missing
Hournumeric24 unique values
0 missing
Monthnominal12 unique values
0 missing
DayofWeeknominal7 unique values
0 missing
Environment.Typenominal4 unique values
0 missing
Altitude..m.numeric41 unique values
0 missing
PM.sub.2.5..sub..particulate.matter..Hourly.measured.numeric16605 unique values
0 missing

19 properties

394299
Number of instances (rows) of the dataset.
7
Number of attributes (columns) of the dataset.
0
Number of distinct values of the target attribute (if it is nominal).
0
Number of missing values in the dataset.
0
Number of instances with at least one value missing.
4
Number of numeric attributes.
3
Number of nominal attributes.
0
Percentage of binary attributes.
0
Percentage of instances having missing values.
0
Percentage of missing values.
0.77
Average class difference between consecutive instances.
57.14
Percentage of numeric attributes.
0
Number of attributes divided by the number of instances.
42.86
Percentage of nominal attributes.
Percentage of instances belonging to the most frequent class.
Number of instances belonging to the most frequent class.
Percentage of instances belonging to the least frequent class.
Number of instances belonging to the least frequent class.
0
Number of binary attributes.

1 tasks

0 runs - estimation_procedure: 10-fold Crossvalidation - evaluation_measure: root_mean_squared_error - target_feature: PM.sub.10..sub..particulate.matter..Hourly.measured.
Define a new task