Data
ETTh2

ETTh2

active ARFF Creative Commons Attribution-NoDerivatives 4.0 International Visibility: public Uploaded 24-06-2024 by Bruno Belucci Teixeira
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
Electric power distribution, hourly data. From original source: ----- The electric power distribution problem is the distribution of electricity to different areas depends on its sequential usage. But predicting the following demand of a specific area is difficult, as it varies with weekdays, holidays, seasons, weather, temperatures, etc. However, no existing method can perform a long-term prediction based on super long-term real-world data with high precision. Any false prophecy may damage the electrical transformer. So currently, without an efficient method to predict future electric usage, managers have to make decisions based on the empirical number, which is much higher than the real-world demands. It causes unnecessary waste of electric and equipment depreciation. On the other hand, the oil temperatures can reflect the conditon of electricity Transformer. One of the most efficient strategies is to predict how the electrical transformers' oil temperature is safe and avoid unnecessary waste. As a result, to address this problem, our team and Beijing Guowang Fuda Science & Technology Development Company built a real-world platform and collected 2-year data. We work on it to predict the electrical transformers' oil temperature and investigate the extreme load capacity. We donated two years of data, in which each data point is recorded every minute (marked by m), and they were from two regions of a province of China, named ETT-small-m1 and ETT-small-m2, respectively. Each dataset contains 2 year * 365 days * 24 hours * 4 times = 70,080 data point. Besides, we also provide the hourly-level variants for fast development (marked by h), i.e. ETT-small-h1 and ETT-small-h2. Each data point consists of 8 features, including the date of the point, the predictive value "oil temperature", and 6 different types of external power load features. ----- This data corresponds to the ETTh2 variant. There are 10 columns: id_series: The id of the time series. date: The date of the time series in the format "%Y-%m-%d %H:%M:%S". time_step: The time step on the time series. value_X (X from 0 to 6): The values of the time series, which will be used for the forecasting task. Preprocessing: 1 - Standardize the 'date' column in the format "%Y-%m-%d %H:%M:%S". 2 - Renamed columns [1:] to 'value_X' with X from 0 to 6. 3 - Created 'id_series' with value 0. There is only one multivariate time series. 4 - Ensured that there are no missing dates and that the frequency of the time_series is hourly. 5 - Created 'time_step' column from 'date' and 'id_series' with increasing values from 0 to the size of the time series. 6 - Casted 'date' to str, 'time_step' to int, 'value_X' columns to float and defined 'id_series' as 'category'.

10 features

id_seriesnominal1 unique values
0 missing
datestring17420 unique values
0 missing
value_0numeric683 unique values
0 missing
value_1numeric325 unique values
0 missing
value_2numeric1521 unique values
0 missing
value_3numeric813 unique values
0 missing
value_4numeric1680 unique values
0 missing
value_5numeric363 unique values
0 missing
value_6numeric1628 unique values
0 missing
time_stepnumeric17420 unique values
0 missing

19 properties

17420
Number of instances (rows) of the dataset.
10
Number of attributes (columns) of the dataset.
Number of distinct values of the target attribute (if it is nominal).
0
Number of missing values in the dataset.
0
Number of instances with at least one value missing.
8
Number of numeric attributes.
1
Number of nominal attributes.
0
Number of attributes divided by the number of instances.
80
Percentage of numeric attributes.
Percentage of instances belonging to the most frequent class.
10
Percentage of nominal attributes.
Number of instances belonging to the most frequent class.
Percentage of instances belonging to the least frequent class.
Number of instances belonging to the least frequent class.
0
Number of binary attributes.
0
Percentage of binary attributes.
0
Percentage of instances having missing values.
Average class difference between consecutive instances.
0
Percentage of missing values.

0 tasks

Define a new task