Data
Weather-Beutenberg

Weather-Beutenberg

active ARFF Creative Commons Attribution 4.0 International Visibility: public Uploaded 24-06-2024 by Bruno Belucci Teixeira
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
Weather measures from Beutenberg provided by the Max-Planck-Institute for Biogeochemistry Several weather measures provided by Max-Planck-Institute for Biogeochemistry from the Weather Station on Top of the Roof of the Institute Building. We have assembled all the files available as of 24-05-2024 on https://www.bgc-jena.mpg.de/wetter/weather_data.html There are 23 columns: id_series: The id of the time series. date: The date of the time series in the format "%Y-%m-%d %H:%M:%S". time_step: The time step on the time series. value_X (X from 0 to 20): The values of the time series, which will be used for the forecasting task. Preprocessing: 1 - Renamed column 'Date Time' to 'date' 2 - Parsed the date with the format '%d.%m.%Y %H:%M:%S' and converted it to string with format %Y-%m-%d %H:%M:%S. 3 - Replaced values of -9999 to nan. Values of -9999 seems to indicate a problem with the measure. Besides, it seems that the measure for 'CO2 (ppm)' started to be recorded on 2008, before all the values were already NaN. 4 - Renamed columns with characters that cannot be encoded with encoding utf8. 5 - Renamed columns [1:] to 'value_X' with X from 0 to 20. 6 - Created 'id_series' with value 0. There is only one multivariate time series. 7 - Ensured that there are no missing dates and that the frequency of the time_series is 10 minutes. Filled the missing dates with NaNs. 8 - Created 'time_step' column from 'date' and 'id_series' with increasing values from 0 to the size of the time series. 9 - Casted 'date' to str, 'time_step' to int, 'value_X' to float, and defined 'id_series' as 'category'.

24 features

id_seriesnominal1 unique values
0 missing
datestring1078121 unique values
0 missing
value_0numeric6765 unique values
1229 missing
value_1numeric5645 unique values
1229 missing
value_2numeric5756 unique values
1229 missing
value_3numeric4393 unique values
1229 missing
value_4numeric6304 unique values
1229 missing
value_5numeric3754 unique values
1229 missing
value_6numeric2524 unique values
1229 missing
value_7numeric4195 unique values
1229 missing
value_8numeric1636 unique values
1229 missing
value_9numeric2551 unique values
1229 missing
value_10numeric24554 unique values
1229 missing
value_11numeric1329 unique values
1248 missing
value_12numeric1853 unique values
1249 missing
value_13numeric10776 unique values
1229 missing
value_14numeric98 unique values
1229 missing
value_15numeric61 unique values
1231 missing
value_16numeric54049 unique values
1795 missing
value_17numeric72337 unique values
1231 missing
value_18numeric77009 unique values
1478 missing
value_19numeric5221 unique values
1229 missing
value_20numeric9303 unique values
218481 missing
time_stepnumeric1078742 unique values
0 missing

19 properties

1078742
Number of instances (rows) of the dataset.
24
Number of attributes (columns) of the dataset.
Number of distinct values of the target attribute (if it is nominal).
243919
Number of missing values in the dataset.
218727
Number of instances with at least one value missing.
22
Number of numeric attributes.
1
Number of nominal attributes.
0
Percentage of binary attributes.
20.28
Percentage of instances having missing values.
0.94
Percentage of missing values.
Average class difference between consecutive instances.
91.67
Percentage of numeric attributes.
0
Number of attributes divided by the number of instances.
4.17
Percentage of nominal attributes.
Percentage of instances belonging to the most frequent class.
Number of instances belonging to the most frequent class.
Percentage of instances belonging to the least frequent class.
Number of instances belonging to the least frequent class.
0
Number of binary attributes.

0 tasks

Define a new task