Data
Rainfall-Temperature-Aus-hourly

Rainfall-Temperature-Aus-hourly

active ARFF Creative Commons Attribution 4.0 International Visibility: public Uploaded 24-06-2024 by Bruno Belucci Teixeira
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
Hourly temperature and rainfall observation from the Bureau of Metereology of the Australian Government. From original source: ----- Historical rainfall and temperature forecast and observations hourly data (2015-05 to 2016-04), used to compare and verify forecasting. Observations data is from a sample of 518 automatic weather stations (AWS) over land, and is at the surface level. Data has been aggregated from one-minute readings into hourly values, for forecast comparison purposes. This observations data is partly QC'd. Forecasted weather elements include temperature, maximum and minimum temperature, rainfall probabilities and rainfall amounts. Different forecast products have different time resolutions, e.g. temperature forecasts are made for each hour, while maximum and minimum temperature forecasts are made for each day. ----- We have merged the datasets of historical observation (2015-05 to 2016-04) and verification (2016-05 to 2017-04) and performed some transformations. There are 8 columns: id_series: The id of the time series. date: The date of the time series in the format "%Y-%m-%d". time_step: The time step on the time series. value_X (X from 0 to 3): The values of the time series, which will be used for the forecasting task. Preprocessing: 1 - We have used the 'valid_start' column to resample the dataset by hour. For the 'AIR_TEMP' we have kept the first observation (sorted by 'valid_start' and 'qc_valid_minutes_start'). For the 'AIR_TEMP_MAX' we have taken the maximum value. For the 'AIR_TEMP_MIN' we have taken the minumum value. For the 'PRCP' we have summed the values. 2 - Dropped all columns except 'station_number', 'area_code', 'valid_start' (transformed in 'year', 'month' 'day', 'hour') 'parameter' 'value'. 3 - Merged both datasets and dropped duplicate values. There are some duplications on the end of the first dataset and beggining of second dataset around the 'valid_start' 2016-04-30 14:00:00'. When duplicated we have kept the values from the first dataset. 4 - Created 'date' column from 'year', 'month' 'day', 'hour' of the 'valid_start' column in the format %Y-%m-%d %H:%M:%S. 5 - Pivoted the table with index 'station_number', 'area_code', 'date', creating the columns from 'parameter' with 'value'. 6 - Created 'id_series' columns from 'station_number' and 'area_code', with index from 0 to 481. 7 - Ensured that the frequency of the date is hourly and add missing rows when needed with NaN values and keeping 'station_number' and 'area_code'. 8 - Dropped column 'station_number'. Renamed columns from 'AIR_TEMP', 'AIR_TEMP_MAX', 'AIR_TEMP_MIN', 'PRCP' to 'value_X' with X fron 0 to 3. Renamed column 'area_code' to 'covariate_0'. 9 - Created column 'time_step' with increasing values of the time_step for the time series. 10 - Casted columns 'value_X' to float. Defined 'id_series' and 'covariate_0' as 'category'. Note that there are still missing values.

8 features

id_seriesnominal482 unique values
0 missing
covariate_0nominal482 unique values
0 missing
datestring17546 unique values
0 missing
value_0numeric616 unique values
97441 missing
value_1numeric615 unique values
97441 missing
value_2numeric636 unique values
97441 missing
value_3numeric1246 unique values
199995 missing
time_stepnumeric17545 unique values
0 missing

19 properties

8058447
Number of instances (rows) of the dataset.
8
Number of attributes (columns) of the dataset.
Number of distinct values of the target attribute (if it is nominal).
492318
Number of missing values in the dataset.
258142
Number of instances with at least one value missing.
5
Number of numeric attributes.
2
Number of nominal attributes.
0
Number of binary attributes.
0
Percentage of binary attributes.
3.2
Percentage of instances having missing values.
Average class difference between consecutive instances.
0.76
Percentage of missing values.
0
Number of attributes divided by the number of instances.
62.5
Percentage of numeric attributes.
Percentage of instances belonging to the most frequent class.
25
Percentage of nominal attributes.
Number of instances belonging to the most frequent class.
Percentage of instances belonging to the least frequent class.
Number of instances belonging to the least frequent class.

0 tasks

Define a new task