Daily values of confirmed cases, deaths and recovers for COVID-19 in US.
From original source:
-----
MThis folder contains daily time series summary tables, including confirmed, deaths and recovered. All data is read in from the daily case report. The time series tables are subject to be updated if inaccuracies are identified in our historical data.
Two time series tables are for the US confirmed cases and deaths, reported at the county level. They are named time_series_covid19_confirmed_US.csv, time_series_covid19_deaths_US.csv, respectively.
Three time series tables are for the global confirmed cases, recovered cases and deaths. Australia, Canada and China are reported at the province/state level. Dependencies of the Netherlands, the UK, France and Denmark are listed under the province/state level. The US and other countries are at the country level. The tables are renamed time_series_covid19_confirmed_global.csv and time_series_covid19_deaths_global.csv, and time_series_covid19_recovered_global.csv, respectively.
-----
We have joined the confirmed and deaths datasets to create multivariate series.
There are 15 columns:
id_series: The id of the time series.
date: The date of the time series in the format "%Y-%m-%d".
time_step: The time step on the time series.
value_X (X from 0 to 1): The values of the time series, which will be used for the forecasting task.
covariate_X (X from 0 to 9): Covariate values of the time series, tied to the 'id_series'. Not interested in forecasting, but can help with the forecasting task.
Preprocessing:
1 - Filled NaN values for 'FIPS' with 0 and for 'Admin2' with the value 'None'.
2 - Melted the datasets with identifiers 'UID', 'iso2', 'iso3', 'code3', 'FIPS', 'Admin2', 'Province_State', 'Lat', 'Long_', 'Combined_Key'
('Population' for deaths dataset), obtaining columns 'date' and 'value_X', where X is 0 for confirmed cases and 1 for deaths.
3 - Standardize the date to the format %Y-%m-%d.
4 - Merged all the datasets.
5 - Renamed column 'UID' to 'id_series'.
6 - Renamed columns 'UID', 'iso2', 'iso3', 'code3', 'FIPS', 'Admin2', 'Province_State', 'Lat', 'Long_', 'Combined_Key', 'Population' to 'covariate_X',
with X from 0 to 9.
7 - Created column 'time_step' with increasing values of the time_step for the time series.
8 - Casted 'value_X' columns to int, defined 'id_series', 'covariate_X' with X in [0, 1, 2, 4, 5, 8] as 'category',
casted 'covariate_X' with X in [3, 6, 7] to float and casted 'covariate_9' to int.