Data
Covid19-us

Covid19-us

active ARFF Creative Commons Attribution 4.0 International Visibility: public Uploaded 25-06-2024 by Bruno Belucci Teixeira
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
Daily values of confirmed cases, deaths and recovers for COVID-19 in US. From original source: ----- MThis folder contains daily time series summary tables, including confirmed, deaths and recovered. All data is read in from the daily case report. The time series tables are subject to be updated if inaccuracies are identified in our historical data. Two time series tables are for the US confirmed cases and deaths, reported at the county level. They are named time_series_covid19_confirmed_US.csv, time_series_covid19_deaths_US.csv, respectively. Three time series tables are for the global confirmed cases, recovered cases and deaths. Australia, Canada and China are reported at the province/state level. Dependencies of the Netherlands, the UK, France and Denmark are listed under the province/state level. The US and other countries are at the country level. The tables are renamed time_series_covid19_confirmed_global.csv and time_series_covid19_deaths_global.csv, and time_series_covid19_recovered_global.csv, respectively. ----- We have joined the confirmed and deaths datasets to create multivariate series. There are 15 columns: id_series: The id of the time series. date: The date of the time series in the format "%Y-%m-%d". time_step: The time step on the time series. value_X (X from 0 to 1): The values of the time series, which will be used for the forecasting task. covariate_X (X from 0 to 9): Covariate values of the time series, tied to the 'id_series'. Not interested in forecasting, but can help with the forecasting task. Preprocessing: 1 - Filled NaN values for 'FIPS' with 0 and for 'Admin2' with the value 'None'. 2 - Melted the datasets with identifiers 'UID', 'iso2', 'iso3', 'code3', 'FIPS', 'Admin2', 'Province_State', 'Lat', 'Long_', 'Combined_Key' ('Population' for deaths dataset), obtaining columns 'date' and 'value_X', where X is 0 for confirmed cases and 1 for deaths. 3 - Standardize the date to the format %Y-%m-%d. 4 - Merged all the datasets. 5 - Renamed column 'UID' to 'id_series'. 6 - Renamed columns 'UID', 'iso2', 'iso3', 'code3', 'FIPS', 'Admin2', 'Province_State', 'Lat', 'Long_', 'Combined_Key', 'Population' to 'covariate_X', with X from 0 to 9. 7 - Created column 'time_step' with increasing values of the time_step for the time series. 8 - Casted 'value_X' columns to int, defined 'id_series', 'covariate_X' with X in [0, 1, 2, 4, 5, 8] as 'category', casted 'covariate_X' with X in [3, 6, 7] to float and casted 'covariate_9' to int.

15 features

id_seriesnominal3342 unique values
0 missing
covariate_0nominal6 unique values
0 missing
covariate_1nominal6 unique values
0 missing
covariate_2nominal6 unique values
0 missing
covariate_3numeric3333 unique values
0 missing
covariate_4nominal1981 unique values
0 missing
covariate_5nominal58 unique values
0 missing
covariate_6numeric3228 unique values
0 missing
covariate_7numeric3228 unique values
0 missing
covariate_8nominal3342 unique values
0 missing
covariate_9numeric3171 unique values
0 missing
datestring1143 unique values
0 missing
value_1numeric10112 unique values
0 missing
value_0numeric135413 unique values
0 missing
time_stepnumeric1143 unique values
0 missing

19 properties

3819906
Number of instances (rows) of the dataset.
15
Number of attributes (columns) of the dataset.
Number of distinct values of the target attribute (if it is nominal).
0
Number of missing values in the dataset.
0
Number of instances with at least one value missing.
7
Number of numeric attributes.
7
Number of nominal attributes.
0
Percentage of binary attributes.
0
Percentage of instances having missing values.
0
Percentage of missing values.
Average class difference between consecutive instances.
46.67
Percentage of numeric attributes.
0
Number of attributes divided by the number of instances.
46.67
Percentage of nominal attributes.
Percentage of instances belonging to the most frequent class.
Number of instances belonging to the most frequent class.
Percentage of instances belonging to the least frequent class.
Number of instances belonging to the least frequent class.
0
Number of binary attributes.

0 tasks

Define a new task