{ "data_id": "46225", "name": "M3-competition-month", "exact_name": "M3-competition-month", "version": 1, "version_label": null, "description": "M3-Competition for time series forecasting, monthly data.\n\nFrom original source:\n-----\nThe 3003 series of the M3-Competition were selected on a quota basis to include various types of time series data (micro, industry, macro, etc.) \nand different time intervals between successive observations (yearly, quarterly, etc.). In order to ensure that enough data were available to \ndevelop an adequate forecasting model it was decided to have a minimum number of observations for each type of data.\nThis minimum was set as 14 observations for yearly series (the median length for the 645 yearly series is 19 observations), \n16 for quarterly (the median length for the 756 quarterly series is 44 observations), 48 for monthly (the median length for the 1428 monthly\nseries is 115 observations) and 60 for 'other' series (the median length for the 174 'other' series is 63 observations). Table 1 shows the\nclassification of the 3003 series according to the two major groupings described above. All the time series data are strictly positive; a test\nhas been done on all the forecasted values: in the case of a negative value, it was substituted by zero. This avoids any problem in the various \nMAPE measures.\n\nAs in the M-Competition, the participating experts were asked to make the following numbers of forecasts beyond the available data they had been \ngiven: six for yearly, eight for quarterly, 18 for monthly and eight for the category 'other'. Their forecasts were, subsequently, compared by \nthe authors (the actual values referred to such forecasts were not available to the participating experts when they were making their forecasts \nand were not, therefore, used in developing their forecasting model). A presentation of the accuracy of such forecasts together with a discussion\nof the major findings is provided in the next section.\n-----\n\nThere are 5 columns:\n\nid_series: The id of the time series.\n\ndate: The date of the time series in the format \"%Y-%m-%d\".\n\ntime_step: The time step on the time series.\n\ncovariate_0: Covariate values of the time series, tied to the 'id_series'. Not interested in forecasting, but can help with the forecasting task.\n\nvalue_0: The values of the time series, which will be used for the forecasting task.\n\nPreprocessing:\n\n1 - Melted the data, obtaining columns 'time_step' and 'value_0'.\n\n2 - Dropped nan values.\n\nThe nan values correspond to time series that are shorter than the time series with maximum lenght, there are no nans in the middle of a time series.\n\n3 - Created a 'date' column using the 'Time Step', 'Starting Year', and 'Starting Quarter'.\n\nWe offset the starting date (created from 'Starting Year' and 'Starting Month') by ('Time Steps' - 1) months.\n\n4 - Dropped columns 'N', 'NF', 'Starting Year' and renamed column 'Series' to 'id_series' and 'Category' to 'covariate_0'.\n\nThese values can be recreated in preprocessing steps if needed. N was the total number of observations. NF was the required number of forecast values\nto be forecasted for each time series, for the monthly dataset it was always 18. Therefore, if one wants to evaluate their model to be compared with\nother models from the original competition, the last 18 values of each time series are considered the test dataset.\n\n5 - Casted 'date' to str, 'time_step' to int, 'value_0' to float, and defined 'id_series' and 'covariate_0' as 'category'.", "format": "arff", "uploader": "Bruno Belucci Teixeira", "uploader_id": 30703, "visibility": "public", "creator": "\"Spyros Makridakis, Michele Hibon\"", "contributor": "\"Bruno Belucci\"", "date": "2024-06-25 00:12:21", "update_comment": null, "last_update": "2024-06-25 00:12:21", "licence": "Creative Commons Attribution 4.0 International", "status": "active", "error_message": null, "url": "https:\/\/api.openml.org\/data\/download\/22120689\/dataset", "default_target_attribute": null, "row_id_attribute": null, "ignore_attribute": null, "runs": 0, "suggest": { "input": [ "M3-competition-month", "M3-Competition for time series forecasting, monthly data. From original source: ----- The 3003 series of the M3-Competition were selected on a quota basis to include various types of time series data (micro, industry, macro, etc.) and different time intervals between successive observations (yearly, quarterly, etc.). In order to ensure that enough data were available to develop an adequate forecasting model it was decided to have a minimum number of observations for each type of data. This minim " ], "weight": 5 }, "qualities": { "NumberOfInstances": 167562, "NumberOfFeatures": 5, "NumberOfClasses": null, "NumberOfMissingValues": 0, "NumberOfInstancesWithMissingValues": 0, "NumberOfNumericFeatures": 2, "NumberOfSymbolicFeatures": 2, "PercentageOfBinaryFeatures": 0, "PercentageOfInstancesWithMissingValues": 0, "AutoCorrelation": null, "PercentageOfMissingValues": 0, "Dimensionality": 2.9839701125553526e-5, "PercentageOfNumericFeatures": 40, "MajorityClassPercentage": null, "PercentageOfSymbolicFeatures": 40, "MajorityClassSize": null, "MinorityClassPercentage": null, "MinorityClassSize": null, "NumberOfBinaryFeatures": 0 }, "tags": [], "features": [ { "name": "id_series", "index": "0", "type": "nominal", "distinct": "1428", "missing": "0", "distr": [] }, { "name": "covariate_0", "index": "1", "type": "nominal", "distinct": "6", "missing": "0", "distr": [] }, { "name": "time_step", "index": "2", "type": "numeric", "distinct": "144", "missing": "0", "min": "1", "max": "144", "mean": "63", "stdev": "38" }, { "name": "value_0", "index": "3", "type": "numeric", "distinct": "39161", "missing": "0", "min": "-1200", "max": "86730", "mean": "5000", "stdev": "2296" }, { "name": "date", "index": "4", "type": "string", "distinct": "1778", "missing": "0" } ], "nr_of_issues": 0, "nr_of_downvotes": 0, "nr_of_likes": 0, "nr_of_downloads": 0, "total_downloads": 0, "reach": 0, "reuse": 0, "impact_of_reuse": 0, "reach_of_reuse": 0, "impact": 0 }