{ "data_id": "46227", "name": "M4-competition-yearly", "exact_name": "M4-competition-yearly", "version": 1, "version_label": null, "description": "M4-Competition for time series forecasting, yearly data\n\nFrom original source:\n-----\nThe fourth competition, M4, started on 1 January 2018 and ended in 31 May 2018.\n\nThe M4 extended and replicated the results of the previous three competitions, using an extended and diverse set of time series to identify the most accurate forecasting method(s) for different types of predictions. It aimed to provide answers on how to improve forecasting accuracy and identify the most appropriate methods for each case. To get precise and compelling answers, the M4 Competition utilized 100,000 real-life series, and incorporated all major forecasting methods, including those based on Artificial Intelligence (Machine Learning, ML), as well as traditional statistical ones.\n-----\n\nThe time series were downloaded via the R package M4comp2018 and then loaded with a python script to obtain the different datasets:\n'Yearly', 'Quarterly', 'Monthly', 'Weekly', 'Daily', 'Hourly'. The data in R already gives us some kind of date as the index for the time series.\n\nThe required number of forecast values to be forecasted for each time series, for the 'Yearly' dataset was always 6. Therefore, if one wants to evaluate \ntheir model to be compared with other models from the original competition, the last 6 values of each time series are considered the test dataset.\n\nNote that the participants did not have access to the date of the time series during the competition. Besides, some dates are ambiguous due to \nthe representation of only 2 digits for the year (XX-XX-17 could represent 1817, 1917, 2017 etc).\n\nThere are 5 columns:\n\nid_series: The id of the time series.\n\ndate: The date of the time series in the format \"%Y-%m-%d\".\n\ntime_step: The time step on the time series.\n\ncovariate_0: Covariate values of the time series, tied to the 'id_series'. Not interested in forecasting, but can help with the forecasting task.\n\nvalue_0: The values of the time series, which will be used for the forecasting task.\n\nPreprocessing:\n\n1 - We have have tried to fix the series for which we have a year > 2018 at some point by offseting the whole series by 100 year until we have \nmax(year) <= 2018.\n\n2 - Renamed 'Category' to 'covariate_0'.\n\n3 - Created column 'time_step' with increasing values of time step for the time series.\n\n4 - Casted 'date' to str, 'time_step' to int, 'value_0' to float, and defined 'id_series' and 'covariate_0' as 'category'.", "format": "arff", "uploader": "Bruno Belucci Teixeira", "uploader_id": 30703, "visibility": "public", "creator": "\"Spyros Makridakis\"", "contributor": "\"Bruno Belucci\"", "date": "2024-06-25 00:16:58", "update_comment": null, "last_update": "2024-06-25 00:16:58", "licence": "Creative Commons Attribution 4.0 International", "status": "active", "error_message": null, "url": "https:\/\/api.openml.org\/data\/download\/22120691\/dataset", "default_target_attribute": null, "row_id_attribute": null, "ignore_attribute": null, "runs": 0, "suggest": { "input": [ "M4-competition-yearly", "M4-Competition for time series forecasting, yearly data From original source: ----- The fourth competition, M4, started on 1 January 2018 and ended in 31 May 2018. The M4 extended and replicated the results of the previous three competitions, using an extended and diverse set of time series to identify the most accurate forecasting method(s) for different types of predictions. It aimed to provide answers on how to improve forecasting accuracy and identify the most appropriate methods for each ca " ], "weight": 5 }, "qualities": { "NumberOfInstances": 858458, "NumberOfFeatures": 5, "NumberOfClasses": null, "NumberOfMissingValues": 0, "NumberOfInstancesWithMissingValues": 0, "NumberOfNumericFeatures": 2, "NumberOfSymbolicFeatures": 2, "PercentageOfBinaryFeatures": 0, "PercentageOfInstancesWithMissingValues": 0, "PercentageOfMissingValues": 0, "AutoCorrelation": null, "PercentageOfNumericFeatures": 40, "Dimensionality": 5.824396767226818e-6, "PercentageOfSymbolicFeatures": 40, "MajorityClassPercentage": null, "MajorityClassSize": null, "MinorityClassPercentage": null, "MinorityClassSize": null, "NumberOfBinaryFeatures": 0 }, "tags": [], "features": [ { "name": "value_0", "index": "0", "type": "numeric", "distinct": "391919", "missing": "0", "min": "22", "max": "158430", "mean": "4099", "stdev": "3618" }, { "name": "id_series", "index": "1", "type": "nominal", "distinct": "23000", "missing": "0", "distr": [] }, { "name": "covariate_0", "index": "2", "type": "nominal", "distinct": "6", "missing": "0", "distr": [] }, { "name": "date", "index": "3", "type": "string", "distinct": "872", "missing": "0" }, { "name": "time_step", "index": "4", "type": "numeric", "distinct": "841", "missing": "0", "min": "0", "max": "840", "mean": "26", "stdev": "47" } ], "nr_of_issues": 0, "nr_of_downvotes": 0, "nr_of_likes": 0, "nr_of_downloads": 0, "total_downloads": 0, "reach": 0, "reuse": 0, "impact_of_reuse": 0, "reach_of_reuse": 0, "impact": 0 }