Dataset used in the tabular data benchmark https://github.com/LeoGrin/tabular-benchmark,
transformed in the same way. This dataset belongs to the "regression on categorical and
numerical features" benchmark. Original description:
Hourly particulate matter air polution data of Great Britain for the year 2017, provided by Ricardo Energy and Environment on behalf of the UK Department for Environment, Food and Rural Affairs (DEFRA) and the Devolved Administrations on [https://uk-air.defra.gov.uk/]. The data was scraped from the UK AIR homepage via the R-package 'rdefra' [Vitolo, C., Russell, A., & Tucker, A. (2016, August). Rdefra: interact with the UK AIR pollution database from DEFRA. The Journal of Open Source Software, 1(4). doi:10.21105/joss.00051] on 09.11.2018. The data was published by DEFRA under the Open Government Licence (OGL) [http://www.nationalarchives.gov.uk/doc/open-government-licence/version/2/]. For a description of all variables, checkout the UK AIR homepage. The variable 'PM.sub.10..sub..particulate.matter..Hourly.measured.' was chosen as the target. The dataset also contains another measure of particulate matter 'PM.sub.2.5..sub..particulate.matter..Hourly.measured.' (ignored by default) which could be used as the target instead. The string variable 'datetime' (ignored by default) could be used to construct additional date/time features. In this version of the dataset, the features 'Longitude' and 'Latitude' were removed to increase the importance of the categorical features 'Zone' and 'Site.Name'.