OpenML

JavaScript is required to properly view the contents of this page!

Explore
- Data
- Task
- Flow
- Run
- Study
- Task type
- Measure
- People
Help
Blog
Contact
Please cite us

ETTh2

active ARFF Creative Commons Attribution-NoDerivatives 4.0 International Visibility: public Uploaded 24-06-2024 by Bruno Belucci Teixeira
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes

Issue	#Downvotes for this reason	By

Loading wiki

Help us complete this description Edit

Electric power distribution, hourly data. From original source: ----- The electric power distribution problem is the distribution of electricity to different areas depends on its sequential usage. But predicting the following demand of a specific area is difficult, as it varies with weekdays, holidays, seasons, weather, temperatures, etc. However, no existing method can perform a long-term prediction based on super long-term real-world data with high precision. Any false prophecy may damage the electrical transformer. So currently, without an efficient method to predict future electric usage, managers have to make decisions based on the empirical number, which is much higher than the real-world demands. It causes unnecessary waste of electric and equipment depreciation. On the other hand, the oil temperatures can reflect the conditon of electricity Transformer. One of the most efficient strategies is to predict how the electrical transformers' oil temperature is safe and avoid unnecessary waste. As a result, to address this problem, our team and Beijing Guowang Fuda Science & Technology Development Company built a real-world platform and collected 2-year data. We work on it to predict the electrical transformers' oil temperature and investigate the extreme load capacity. We donated two years of data, in which each data point is recorded every minute (marked by m), and they were from two regions of a province of China, named ETT-small-m1 and ETT-small-m2, respectively. Each dataset contains 2 year * 365 days * 24 hours * 4 times = 70,080 data point. Besides, we also provide the hourly-level variants for fast development (marked by h), i.e. ETT-small-h1 and ETT-small-h2. Each data point consists of 8 features, including the date of the point, the predictive value "oil temperature", and 6 different types of external power load features. ----- This data corresponds to the ETTh2 variant. There are 10 columns: id_series: The id of the time series. date: The date of the time series in the format "%Y-%m-%d %H:%M:%S". time_step: The time step on the time series. value_X (X from 0 to 6): The values of the time series, which will be used for the forecasting task. Preprocessing: 1 - Standardize the 'date' column in the format "%Y-%m-%d %H:%M:%S". 2 - Renamed columns [1:] to 'value_X' with X from 0 to 6. 3 - Created 'id_series' with value 0. There is only one multivariate time series. 4 - Ensured that there are no missing dates and that the frequency of the time_series is hourly. 5 - Created 'time_step' column from 'date' and 'id_series' with increasing values from 0 to the size of the time series. 6 - Casted 'date' to str, 'time_step' to int, 'value_X' columns to float and defined 'id_series' as 'category'.

10 features

id_series	nominal	1 unique values 0 missing
date	string	17420 unique values 0 missing
value_0	numeric	683 unique values 0 missing
value_1	numeric	325 unique values 0 missing
value_2	numeric	1521 unique values 0 missing
value_3	numeric	813 unique values 0 missing
value_4	numeric	1680 unique values 0 missing
value_5	numeric	363 unique values 0 missing
value_6	numeric	1628 unique values 0 missing
time_step	numeric	17420 unique values 0 missing