{ "data_id": "46216", "name": "Rainfall-Temperature-Aus-hourly", "exact_name": "Rainfall-Temperature-Aus-hourly", "version": 1, "version_label": null, "description": "Hourly temperature and rainfall observation from the Bureau of Metereology of the Australian Government.\n\nFrom original source:\n-----\nHistorical rainfall and temperature forecast and observations hourly data (2015-05 to 2016-04), used to compare and verify forecasting. Observations data is from a sample of 518 automatic weather stations (AWS) over land, and is at the surface level. Data has been aggregated from one-minute readings into hourly values, for forecast comparison purposes. This observations data is partly QC'd.\n\nForecasted weather elements include temperature, maximum and minimum temperature, rainfall probabilities and rainfall amounts. Different forecast products have different time resolutions, e.g. temperature forecasts are made for each hour, while maximum and minimum temperature forecasts are made for each day.\n-----\n\nWe have merged the datasets of historical observation (2015-05 to 2016-04) and verification (2016-05 to 2017-04) and performed some transformations.\n\nThere are 8 columns:\n\nid_series: The id of the time series.\n\ndate: The date of the time series in the format \"%Y-%m-%d\".\n\ntime_step: The time step on the time series.\n\nvalue_X (X from 0 to 3): The values of the time series, which will be used for the forecasting task.\n\nPreprocessing:\n\n1 - We have used the 'valid_start' column to resample the dataset by hour.\n\nFor the 'AIR_TEMP' we have kept the first observation (sorted by 'valid_start' and 'qc_valid_minutes_start').\n\nFor the 'AIR_TEMP_MAX' we have taken the maximum value.\n\nFor the 'AIR_TEMP_MIN' we have taken the minumum value.\n\nFor the 'PRCP' we have summed the values.\n\n2 - Dropped all columns except 'station_number', 'area_code', 'valid_start' (transformed in 'year', 'month' 'day', 'hour') 'parameter' 'value'.\n\n3 - Merged both datasets and dropped duplicate values.\n\nThere are some duplications on the end of the first dataset and beggining of second dataset around the 'valid_start' 2016-04-30 14:00:00'. When duplicated\nwe have kept the values from the first dataset.\n\n4 - Created 'date' column from 'year', 'month' 'day', 'hour' of the 'valid_start' column in the format %Y-%m-%d %H:%M:%S.\n \n5 - Pivoted the table with index 'station_number', 'area_code', 'date', creating the columns from 'parameter' with 'value'.\n\n6 - Created 'id_series' columns from 'station_number' and 'area_code', with index from 0 to 481.\n\n7 - Ensured that the frequency of the date is hourly and add missing rows when needed with NaN values and keeping 'station_number' and 'area_code'.\n\n8 - Dropped column 'station_number'. Renamed columns from 'AIR_TEMP', 'AIR_TEMP_MAX', 'AIR_TEMP_MIN', 'PRCP' to 'value_X' with X fron 0 to 3. \nRenamed column 'area_code' to 'covariate_0'.\n\n9 - Created column 'time_step' with increasing values of the time_step for the time series.\n\n10 - Casted columns 'value_X' to float. Defined 'id_series' and 'covariate_0' as 'category'.\n\nNote that there are still missing values.", "format": "arff", "uploader": "Bruno Belucci Teixeira", "uploader_id": 30703, "visibility": "public", "creator": "\"Johns Hopkins University\"", "contributor": "\"Bruno Belucci\"", "date": "2024-06-24 23:31:19", "update_comment": null, "last_update": "2024-06-24 23:31:19", "licence": "Creative Commons Attribution 4.0 International", "status": "active", "error_message": null, "url": "https:\/\/api.openml.org\/data\/download\/22120680\/dataset", "default_target_attribute": null, "row_id_attribute": null, "ignore_attribute": null, "runs": 0, "suggest": { "input": [ "Rainfall-Temperature-Aus-hourly", "Hourly temperature and rainfall observation from the Bureau of Metereology of the Australian Government. From original source: ----- Historical rainfall and temperature forecast and observations hourly data (2015-05 to 2016-04), used to compare and verify forecasting. Observations data is from a sample of 518 automatic weather stations (AWS) over land, and is at the surface level. Data has been aggregated from one-minute readings into hourly values, for forecast comparison purposes. This observa " ], "weight": 5 }, "qualities": { "NumberOfInstances": 8058447, "NumberOfFeatures": 8, "NumberOfClasses": null, "NumberOfMissingValues": 492318, "NumberOfInstancesWithMissingValues": 258142, "NumberOfNumericFeatures": 5, "NumberOfSymbolicFeatures": 2, "NumberOfBinaryFeatures": 0, "PercentageOfBinaryFeatures": 0, "PercentageOfInstancesWithMissingValues": 3.203371567747483, "AutoCorrelation": null, "PercentageOfMissingValues": 0.7636676148642536, "Dimensionality": 9.92747113680837e-7, "PercentageOfNumericFeatures": 62.5, "MajorityClassPercentage": null, "PercentageOfSymbolicFeatures": 25, "MajorityClassSize": null, "MinorityClassPercentage": null, "MinorityClassSize": null }, "tags": [], "features": [ { "name": "id_series", "index": "0", "type": "nominal", "distinct": "482", "missing": "0", "distr": [] }, { "name": "covariate_0", "index": "1", "type": "nominal", "distinct": "482", "missing": "0", "distr": [] }, { "name": "date", "index": "2", "type": "string", "distinct": "17546", "missing": "0" }, { "name": "value_0", "index": "3", "type": "numeric", "distinct": "616", "missing": "97441", "min": "-30", "max": "48", "mean": "19", "stdev": "8" }, { "name": "value_1", "index": "4", "type": "numeric", "distinct": "615", "missing": "97441", "min": "-12", "max": "55", "mean": "19", "stdev": "8" }, { "name": "value_2", "index": "5", "type": "numeric", "distinct": "636", "missing": "97441", "min": "-69", "max": "46", "mean": "18", "stdev": "8" }, { "name": "value_3", "index": "6", "type": "numeric", "distinct": "1246", "missing": "199995", "min": "0", "max": "880", "mean": "0", "stdev": "2" }, { "name": "time_step", "index": "7", "type": "numeric", "distinct": "17545", "missing": "0", "min": "0", "max": "17544", "mean": "8672", "stdev": "5073" } ], "nr_of_issues": 0, "nr_of_downvotes": 0, "nr_of_likes": 0, "nr_of_downloads": 0, "total_downloads": 0, "reach": 0, "reuse": 0, "impact_of_reuse": 0, "reach_of_reuse": 0, "impact": 0 }