Data
Sunspot-daily

Sunspot-daily

active ARFF Creative Commons Attribution 4.0 International Visibility: public Uploaded 25-06-2024 by Bruno Belucci Teixeira
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
Daily total sunspot number from 1818 to 2023. From original source: ----- Time range: 1/1/1818 - last elapsed month (provisional values) Data description: Daily total sunspot number derived by the formula: R= Ns + 10 * Ng, with Ns the number of spots and Ng the number of groups counted over the entire solar disk. No daily data are provided before 1818 because daily observations become too sparse in earlier years. Therefore, R. Wolf only compiled monthly means and yearly means for all years before 1818. In the TXT and CSV files, the missing values are marked by -1 (valid Sunspot Number are always positive). New scale: The conventional 0.6 Zurich scale factor is not used anymore and A. Wolfer (Wolf's successor) is now defining the scale of the entire series. This puts the Sunspot Number at the scale of raw modern counts, instead of reducing it to the level of early counts by R. Wolf. Error values: Those values correspond to the standard deviation of raw numbers provided by all stations. Before 1981, the errors are estimated with the help of an auto-regressive model based on the Poissonian distribution of actual Sunspot Numbers. From 1981 onwards, the error value is the actual standard deviation of the sample of raw observations used to compute the daily value. The standard error of the daily Sunspot Number can be computed by: sigma/sqrt(N) where sigma is the listed standard deviation and N the number of observations for the day. Before 1981, the number of observations is set to 1, as the Sunspot Number was then essentially the raw Wolf number from the Zurich Observatory. ----- There are 6 columns: id_series: The id of the time series. date: The date of the time series in the format "%Y-%m-%d". time_step: The time step on the time series. value_X (X from 0 to 2): The values of the time series, which will be used for the forecasting task. Preprocessing: 1 - Kept only the data with year (column 0) <= 2023. 2 - Created the 'date' column from columns 0 (year), 1 (month) and 2 (day) in the format %Y-%m-%d. 3 - Dropped the columns (0, 1, 2, 3, 7). Column 3 was the date in fraction of year and 7 was an indicator if the data was under revision or not (there is no data under revision for our data). 4 - Replaced values of -1 to NaNs to evidenceate the missing data. 5 - Dropped the rows with 'date' < 1818-01-08, as there are only NaNs for these dates. 6 - Created the column 'id_series' with value 0, there is only one long time series. 7 - Ensured that there are no missing dates and that the frequency of the time_series is daily.. 8 - Created column 'time_step' with increasing values of time step for the time series. 9 - Casted columns 'value_0' and 'value_1' to float ('value_0' is always int, but casted to float to accomodate NaNs), casted column 'value_2' to int . Defined 'id_series' as 'category'.

6 features

id_seriesnominal1 unique values
0 missing
datestring75233 unique values
0 missing
value_0numeric437 unique values
3240 missing
value_1numeric306 unique values
3240 missing
value_2numeric66 unique values
0 missing
time_stepnumeric75233 unique values
0 missing

19 properties

75233
Number of instances (rows) of the dataset.
6
Number of attributes (columns) of the dataset.
Number of distinct values of the target attribute (if it is nominal).
6480
Number of missing values in the dataset.
3240
Number of instances with at least one value missing.
4
Number of numeric attributes.
1
Number of nominal attributes.
Percentage of instances belonging to the most frequent class.
16.67
Percentage of nominal attributes.
Number of instances belonging to the most frequent class.
Percentage of instances belonging to the least frequent class.
Number of instances belonging to the least frequent class.
0
Number of binary attributes.
0
Percentage of binary attributes.
4.31
Percentage of instances having missing values.
Average class difference between consecutive instances.
1.44
Percentage of missing values.
0
Number of attributes divided by the number of instances.
66.67
Percentage of numeric attributes.

0 tasks

Define a new task