Data
CIF-2016-competition

CIF-2016-competition

active ARFF Creative Commons Attribution 4.0 International Visibility: public Uploaded 25-06-2024 by Bruno Belucci Teixeira
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
CIF 2016 time series forecasting competition , monthly data. From original source: ----- Competition Data Format Data file containing time series to be predicted is a text file having the following format: Each row contains a single time series data record; items in the row are delimited with semicolon (";"); the first item is an ID of the time series; the second item determines the forecasting horizon, i.e., the number of values to be forecasted; the third item determines the frequency of the time series (this year "monthly" only); the rest of the row contains numeric data of the time series; the number of values in each row may differ because each time series is of different length. Example of the competition data format: ts1;4;yearly;26.5;38.2;5.3 ts2;12;monthly;1;2;4;5;5;6;8;9;10 ... ts72;12;daily;1;2;4;5;5;6;8;9;10 ----- There are 3 columns: id_series: The id of the time series. time_step: The time step on the time series. value_0: The values of the time series, which will be used for the forecasting task. Preprocessing: Training set 1 - Renamed first three columns to 'id_series' and 'horizon' and 'period', and renamed the other columns to reflect the actual time_step of the time series. 2 - Melted the data, obtaining columns 'time_step' and 'value_0'. 3 - Dropped nan values. The nan values correspond to time series that are shorter than the time series with maximum lenght, there are no nans in the middle of a time series. 4 - Defined columns 'id_series' as 'category', casted 'time_step' to int. Test set: Same as for the training set. Finally, we have concatenated both training and test set. If one wants to use the same train and test set of the competition, we invite them to get the forecasting horizon of the original data on the provided website.

3 features

id_seriesnominal72 unique values
0 missing
time_stepnumeric120 unique values
0 missing
value_0numeric6994 unique values
0 missing

19 properties

7108
Number of instances (rows) of the dataset.
3
Number of attributes (columns) of the dataset.
Number of distinct values of the target attribute (if it is nominal).
0
Number of missing values in the dataset.
0
Number of instances with at least one value missing.
2
Number of numeric attributes.
1
Number of nominal attributes.
0
Percentage of binary attributes.
0
Percentage of instances having missing values.
0
Percentage of missing values.
Average class difference between consecutive instances.
66.67
Percentage of numeric attributes.
0
Number of attributes divided by the number of instances.
33.33
Percentage of nominal attributes.
Percentage of instances belonging to the most frequent class.
Number of instances belonging to the most frequent class.
Percentage of instances belonging to the least frequent class.
Number of instances belonging to the least frequent class.
0
Number of binary attributes.

0 tasks

Define a new task