Electric power distribution, 15 minutely data.
From original source:
-----
The electric power distribution problem is the distribution of electricity to different areas depends on its sequential usage. But predicting the following demand of a specific area is difficult, as it varies with weekdays, holidays, seasons, weather, temperatures, etc. However, no existing method can perform a long-term prediction based on super long-term real-world data with high precision. Any false prophecy may damage the electrical transformer. So currently, without an efficient method to predict future electric usage, managers have to make decisions based on the empirical number, which is much higher than the real-world demands. It causes unnecessary waste of electric and equipment depreciation. On the other hand, the oil temperatures can reflect the conditon of electricity Transformer. One of the most efficient strategies is to predict how the electrical transformers' oil temperature is safe and avoid unnecessary waste. As a result, to address this problem, our team and Beijing Guowang Fuda Science & Technology Development Company built a real-world platform and collected 2-year data. We work on it to predict the electrical transformers' oil temperature and investigate the extreme load capacity.
We donated two years of data, in which each data point is recorded every minute (marked by m), and they were from two regions of a province of China, named ETT-small-m1 and ETT-small-m2, respectively. Each dataset contains 2 year * 365 days * 24 hours * 4 times = 70,080 data point. Besides, we also provide the hourly-level variants for fast development (marked by h), i.e. ETT-small-h1 and ETT-small-h2. Each data point consists of 8 features, including the date of the point, the predictive value "oil temperature", and 6 different types of external power load features.
-----
This data corresponds to the ETTm2 variant.
There are 10 columns:
id_series: The id of the time series.
date: The date of the time series in the format "%Y-%m-%d %H:%M:%S".
time_step: The time step on the time series.
value_X (X from 0 to 6): The values of the time series, which will be used for the forecasting task.
Preprocessing:
1 - Standardize the 'date' column in the format "%Y-%m-%d %H:%M:%S".
2 - Renamed columns [1:] to 'value_X' with X from 0 to 6.
3 - Created 'id_series' with value 0. There is only one multivariate time series.
4 - Ensured that there are no missing dates and that the frequency of the time_series is 15 minutes.
5 - Created 'time_step' column from 'date' and 'id_series' with increasing values from 0 to the size of the time series.
6 - Casted 'date' to str, 'time_step' to int, 'value_X' columns to float and defined 'id_series' as 'category'.