Data
COVID-19-Mexico-Clean--Order-by-States

COVID-19-Mexico-Clean--Order-by-States

active ARFF CC0: Public Domain Visibility: public Uploaded 23-03-2022 by Onur Yildirim
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
  • Computer Systems Machine Learning
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
Context The data obtained from the Mexico's General Direction of Epidemiology contains multiple information on the current pandemic situation. However, these data are saturated with features that may not be very useful in a predictive analysis. Due to this I decided to clean and format the original data and generate a dataset that groups confirmed, dead, recovered and active cases by State, Municipality and Date. This is very useful if you want to generate geographically specific models Content The data set contains the covid cases columns (positive, dead, recovered and active) that are counted by state and municipality. I.e Sate Municipality Date Deaths Confirmed recovered Active Ciudad de Mexico Iztapalapa 2020-07-18 1 42 0 41 Ciudad de Mexico Iztapalapa 2020-07-19 0 14 0 14 Ciudad de Mexico Iztapalapa 2020-07-20 0 41 0 41 Would you like to see the data cleaning notebook? You can check it in my Github Classification criteria Recovered cases: If the patient is not dead and it has been more than 15 days then he is considered as recovered. Active cases: If the patien isn't recovered an isn't dead then is active Time lapse The first documented case is on 2020-01-13. The dataset will be updated every day adding new cases Acknowledgements For this project, the data are obtained from the official URL of the government of Mxico whose author is Direccin General de Epidemiologa: Corona Virus Data: https://www.gob.mx/salud/documentos/datos-abiertos-152127 Data Dictionary: https://www.gob.mx/salud/documentos/datos-abiertos-152127 Differences in results According to the official results obtained from: https://coronavirus.gob.mx/datos/ The main difference between the official data and this dataset is in the recovered cases. This is because the Mexican government only considers outpatient cases when counting recovered cases. This dataset considers outpatient and inpatient cases when counting recovered people. The second difference is some rows that contained nonsense information(I think this was a data collection error by the institution), these were eliminated.

7 features

Statestring32 unique values
0 missing
Municipalitystring2057 unique values
0 missing
Datestring213 unique values
0 missing
Deathsnumeric37 unique values
0 missing
Confirmednumeric228 unique values
0 missing
Recoverednumeric213 unique values
0 missing
Activenumeric94 unique values
0 missing

19 properties

92320
Number of instances (rows) of the dataset.
7
Number of attributes (columns) of the dataset.
Number of distinct values of the target attribute (if it is nominal).
0
Number of missing values in the dataset.
0
Number of instances with at least one value missing.
4
Number of numeric attributes.
0
Number of nominal attributes.
0
Number of attributes divided by the number of instances.
57.14
Percentage of numeric attributes.
Percentage of instances belonging to the most frequent class.
0
Percentage of nominal attributes.
Number of instances belonging to the most frequent class.
Percentage of instances belonging to the least frequent class.
Number of instances belonging to the least frequent class.
0
Number of binary attributes.
0
Percentage of binary attributes.
0
Percentage of instances having missing values.
Average class difference between consecutive instances.
0
Percentage of missing values.

0 tasks

Define a new task