OpenML

JavaScript is required to properly view the contents of this page!

Explore
- Data
- Task
- Flow
- Run
- Study
- Task type
- Measure
- People
Help
Blog
Contact
Please cite us

Rainfall-Temperature-Aus-hourly

active ARFF Creative Commons Attribution 4.0 International Visibility: public Uploaded 24-06-2024 by Bruno Belucci Teixeira
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes

Issue	#Downvotes for this reason	By

Loading wiki

Help us complete this description Edit

Hourly temperature and rainfall observation from the Bureau of Metereology of the Australian Government. From original source: ----- Historical rainfall and temperature forecast and observations hourly data (2015-05 to 2016-04), used to compare and verify forecasting. Observations data is from a sample of 518 automatic weather stations (AWS) over land, and is at the surface level. Data has been aggregated from one-minute readings into hourly values, for forecast comparison purposes. This observations data is partly QC'd. Forecasted weather elements include temperature, maximum and minimum temperature, rainfall probabilities and rainfall amounts. Different forecast products have different time resolutions, e.g. temperature forecasts are made for each hour, while maximum and minimum temperature forecasts are made for each day. ----- We have merged the datasets of historical observation (2015-05 to 2016-04) and verification (2016-05 to 2017-04) and performed some transformations. There are 8 columns: id_series: The id of the time series. date: The date of the time series in the format "%Y-%m-%d". time_step: The time step on the time series. value_X (X from 0 to 3): The values of the time series, which will be used for the forecasting task. Preprocessing: 1 - We have used the 'valid_start' column to resample the dataset by hour. For the 'AIR_TEMP' we have kept the first observation (sorted by 'valid_start' and 'qc_valid_minutes_start'). For the 'AIR_TEMP_MAX' we have taken the maximum value. For the 'AIR_TEMP_MIN' we have taken the minumum value. For the 'PRCP' we have summed the values. 2 - Dropped all columns except 'station_number', 'area_code', 'valid_start' (transformed in 'year', 'month' 'day', 'hour') 'parameter' 'value'. 3 - Merged both datasets and dropped duplicate values. There are some duplications on the end of the first dataset and beggining of second dataset around the 'valid_start' 2016-04-30 14:00:00'. When duplicated we have kept the values from the first dataset. 4 - Created 'date' column from 'year', 'month' 'day', 'hour' of the 'valid_start' column in the format %Y-%m-%d %H:%M:%S. 5 - Pivoted the table with index 'station_number', 'area_code', 'date', creating the columns from 'parameter' with 'value'. 6 - Created 'id_series' columns from 'station_number' and 'area_code', with index from 0 to 481. 7 - Ensured that the frequency of the date is hourly and add missing rows when needed with NaN values and keeping 'station_number' and 'area_code'. 8 - Dropped column 'station_number'. Renamed columns from 'AIR_TEMP', 'AIR_TEMP_MAX', 'AIR_TEMP_MIN', 'PRCP' to 'value_X' with X fron 0 to 3. Renamed column 'area_code' to 'covariate_0'. 9 - Created column 'time_step' with increasing values of the time_step for the time series. 10 - Casted columns 'value_X' to float. Defined 'id_series' and 'covariate_0' as 'category'. Note that there are still missing values.

8 features

id_series	nominal	482 unique values 0 missing
covariate_0	nominal	482 unique values 0 missing
date	string	17546 unique values 0 missing
value_0	numeric	616 unique values 97441 missing
value_1	numeric	615 unique values 97441 missing
value_2	numeric	636 unique values 97441 missing
value_3	numeric	1246 unique values 199995 missing
time_step	numeric	17545 unique values 0 missing