

active ARFF Creative Commons Attribution 4.0 International Visibility: public Uploaded 25-06-2024 by Bruno Belucci Teixeira
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
Issue #Downvotes for this reason By

Loading wiki
Help us complete this description Edit
Daily values of confirmed cases, deaths and recovers for COVID-19 in several countries. From original source: ----- MThis folder contains daily time series summary tables, including confirmed, deaths and recovered. All data is read in from the daily case report. The time series tables are subject to be updated if inaccuracies are identified in our historical data. Two time series tables are for the US confirmed cases and deaths, reported at the county level. They are named time_series_covid19_confirmed_US.csv, time_series_covid19_deaths_US.csv, respectively. Three time series tables are for the global confirmed cases, recovered cases and deaths. Australia, Canada and China are reported at the province/state level. Dependencies of the Netherlands, the UK, France and Denmark are listed under the province/state level. The US and other countries are at the country level. The tables are renamed time_series_covid19_confirmed_global.csv and time_series_covid19_deaths_global.csv, and time_series_covid19_recovered_global.csv, respectively. ----- We have joined the confirmed, deaths and recovered datasets to create multivariate series. Note that we have chosen to use these columns as values to forecast, but we could have transformed the dataset in multiple columns (as many as Province/State - Country) as the series are aligned. There are 10 columns: id_series: The id of the time series. date: The date of the time series in the format "%Y-%m-%d". time_step: The time step on the time series. value_X (X from 0 to 2): The values of the time series, which will be used for the forecasting task. covariate_X (X from 0 to 3): Covariate values of the time series, tied to the 'id_series'. Not interested in forecasting, but can help with the forecasting task. Preprocessing: 1 - For the 'confirmed' and 'deaths' datasets, we have grouped the values for the 'Country/Region' 'Canada' for all the 'Province/State'. The 'recovered' dataset does not have the several 'Province/State' for 'Canada', only the country, so we grouped in order to merge all the datasets. 2 - Filled NaN values for 'Province/State' with the value 'Country'. 3 - Filled NaN values for 'Lat' and 'Long' with 0.0. 4 - Melted the datasets with identifiers 'Province/State', 'Country/Region', 'Lat', 'Long', obtaining columns 'date' and 'value_X', where X is 0 for confirmed cases, 1 for deaths and 2 for recoveries. 5 - Standardize the date to the format %Y-%m-%d and ensured that the frequency is daily. 6 - Merged all the datasets. 7 - Created column 'id_series' from 'Province/State', 'Country/Region' with index from 0 to 273. 8 - Renamed columns 'Province/State', 'Country/Region', 'Lat', 'Long' to 'covariate_0', 'covariate_1', 'covariate_2', 'covariate_3'. 9 - Created column 'time_step' with increasing values of the time_step for the time series. 10 - Casted 'value_X' columns to int, defined 'id_series', covariate_0' and 'covariate_1' as 'category' and casted 'covariate_2' and 'covariate_3' to float.

10 features

covariate_0nominal76 unique values
0 missing
covariate_1nominal201 unique values
0 missing
covariate_2numeric272 unique values
0 missing
covariate_3numeric272 unique values
0 missing
datestring1143 unique values
0 missing
value_0numeric116121 unique values
0 missing
value_1numeric38827 unique values
0 missing
value_2numeric44630 unique values
0 missing
id_seriesnominal274 unique values
0 missing
time_stepnumeric1143 unique values
0 missing

19 properties

Number of instances (rows) of the dataset.
Number of attributes (columns) of the dataset.
Number of distinct values of the target attribute (if it is nominal).
Number of missing values in the dataset.
Number of instances with at least one value missing.
Number of numeric attributes.
Number of nominal attributes.
Percentage of binary attributes.
Percentage of instances having missing values.
Average class difference between consecutive instances.
Percentage of missing values.
Number of attributes divided by the number of instances.
Percentage of numeric attributes.
Percentage of instances belonging to the most frequent class.
Percentage of nominal attributes.
Number of instances belonging to the most frequent class.
Percentage of instances belonging to the least frequent class.
Number of instances belonging to the least frequent class.
Number of binary attributes.

0 tasks

Define a new task