{ "data_id": "43428", "name": "Mexico-COVID-19-clinical-data", "exact_name": "Mexico-COVID-19-clinical-data", "version": 1, "version_label": "v1.0", "description": "Mexico COVID-19 clinical data \nThis dataset contains the results of real-time PCR testing for COVID-19 in Mexico as reported by the [General Directorate of Epidemiology](https:\/\/www.gob.mx\/salud\/documentos\/datos-abiertos-152127).\nThe official, raw dataset is available in the Official Secretary of Epidemiology website: https:\/\/www.gob.mx\/salud\/documentos\/datos-abiertos-152127.\nYou might also want to download the official column descriptors and the variable definitions - e.g. SEXO=1 - Female; SEXO=2 - Male; SEXO=99 - Undisclosed) - in the following [zip file](http:\/\/datosabiertos.salud.gob.mx\/gobmx\/salud\/datos_abiertos\/diccionario_datos_covid19.zip). I've maintained the original levels as described in the official dataset, unless otherwise specified.\nIMPORTANT: This dataset has been maintained since the original data releases, which weren't tabular, but rather consisted of PDF files, often with many\/different inconsistencies which had to be resolved carefully and is annotated in the .R script. More later datasets should be more reliable, but earlier there were a lot of things to figure out like e.g. when the official methodology to assign the region of the case was changed to be based on residence rather than origin). I've added more notes on very early data here: https:\/\/github.com\/marianarf\/covid19_mexico_data.\n\n[More official information here](https:\/\/datos.gob.mx\/busca\/dataset\/informacion-referente-a-casos-covid-19-en-mexico\/resource\/e8c7079c-dc2a-4b6e-8035-08042ed37165).\nMotivation\nI hope that this data serves to as a base to understand the clinical symptoms that characterize a COVID-19 positive case from another viral respiratory disease and help expand the knowledge about COVID-19 worldwide.\n\nWith more models tested, added features and fine-tuning, clinical data could be used to predict a patient with pending COVID-19 results will get a positive or a negative result in two scenarios:\n\nAs lab results are processed, this leaves a window when it's uncertain whether a result will return positive or negative (this is merely didactic, as new reports will corroborate the prediction as soon as the laboratory data for missing cases is reported).\nMore importantly, it could help predict for similar symptoms e.g. from a survey or an app that checks for similar data (ideally, containing most of the parameters that can be assessed without using variables only available after hospitalization, like e.g. age of the person which is readily available).\n\nThe value of the lab result comes from a RT-PCR, and is stored in RESULTADO, where the original data is encoded 1 = POSITIVE and 2 = NEGATIVE.\nSource\nThe data was gathered using a \"sentinel model\" that samples 10 of the patients that present a viral respiratory diagnosis to test for COVID-19, and consists of data reported by 475 viral respiratory disease monitoring units (hospitals) named USMER (Unidades Monitoras de Enfermedad Respiratoria Viral) throughout the country in the entire health sector (IMSS, ISSSTE, SEDENA, SEMAR, and others).\nPreprocess\nData is first processed with this [this .R script](https:\/\/github.com\/marianarf\/covid19_mexico_analysis\/blob\/master\/notebooks\/preprocess.R). The file containing the processed data will be updated daily until. Important: Since the data is updated to Github, assume the data uploaded here isn't the latest version, and instead, load data directly from the 'csv' [in this github repository](https:\/\/raw.githubusercontent.com\/marianarf\/covid19_mexico_analysis\/master\/mexico_covid19.csv).\n\n\nThe data aggregates official daily reports of patients admitted in COVID-19 designated units.\nNew cases are usually concatenated at the end of the file, but each individual case also contains a unique (official) identifier 'ID_REGISTRO' as well as a (new) unique reference 'id' to remove duplicates.\nI fixed a specific change in methodology in reporting, where the patient record used to be assigned in ENTIDAD_UM (the region of the medical unit) but now uses ENTIDAD_RES (the region of residence of the patient).\n\nNote: I have preserved the original structure (column names and factors) as closely as possible to the official data, so that code is reproducible in cross-reference to the official sources.\nAdded features\nIn addition to original features reported, I've included missing regional names and also a field 'DELAY' which corresponds to the lag in the processing lab results (since new data contains records from the previous day, this allows to keep track of this lag).\nAdditional info\nAccording to the Ministry of Health, preliminary data is subject to validation by through the General Directorate of Epidemiology. Also note that the information contained corresponds only to the data obtained from the epidemiological study of a suspected case of viral respiratory disease at the time it is identified in the medical units of the Health Sector. Depending on the clinical diagnosis of admission, it is considered as an outpatient or hospitalized patient. The base does not include the evolution during the stay in the medical units, with the exception of updates of discharge by the hospital epidemiological surveillance units or health jurisdictions in the case of deaths.", "format": "arff", "uploader": "Onur Yildirim", "uploader_id": 30126, "visibility": "public", "creator": null, "contributor": null, "date": "2022-03-23 13:19:12", "update_comment": null, "last_update": "2022-03-23 13:19:12", "licence": "CC0: Public Domain", "status": "active", "error_message": null, "url": "https:\/\/www.openml.org\/data\/download\/22102253\/dataset", "default_target_attribute": null, "row_id_attribute": null, "ignore_attribute": null, "runs": 0, "suggest": { "input": [ "Mexico-COVID-19-clinical-data", "Mexico COVID-19 clinical data This dataset contains the results of real-time PCR testing for COVID-19 in Mexico as reported by the [General Directorate of Epidemiology](https:\/\/www.gob.mx\/salud\/documentos\/datos-abiertos-152127). The official, raw dataset is available in the Official Secretary of Epidemiology website: https:\/\/www.gob.mx\/salud\/documentos\/datos-abiertos-152127. You might also want to download the official column descriptors and the variable definitions - e.g. SEXO=1 - Female; SEXO= " ], "weight": 5 }, "qualities": { "NumberOfInstances": 263007, "NumberOfFeatures": 41, "NumberOfClasses": null, "NumberOfMissingValues": 6, "NumberOfInstancesWithMissingValues": 6, "NumberOfNumericFeatures": 31, "NumberOfSymbolicFeatures": 0, "Dimensionality": 0.00015588938697449117, "PercentageOfNumericFeatures": 75.60975609756098, "MajorityClassPercentage": null, "PercentageOfSymbolicFeatures": 0, "MajorityClassSize": null, "MinorityClassPercentage": null, "MinorityClassSize": null, "NumberOfBinaryFeatures": 0, "PercentageOfBinaryFeatures": 0, "PercentageOfInstancesWithMissingValues": 0.0022813081020657245, "AutoCorrelation": null, "PercentageOfMissingValues": 5.564166102599328e-5 }, "tags": [ { "uploader": "38960", "tag": "Computer Systems" }, { "uploader": "38960", "tag": "Machine Learning" } ], "features": [ { "name": "id", "index": "0", "type": "numeric", "distinct": "263007", "missing": "0", "min": "1", "max": "7277125", "mean": "2946176", "stdev": "2212368" }, { "name": "FECHA_ARCHIVO", "index": "1", "type": "string", "distinct": "53", "missing": "0" }, { "name": "ID_REGISTRO", "index": "2", "type": "string", "distinct": "263007", "missing": "0" }, { "name": "ENTIDAD_UM", "index": "3", "type": "numeric", "distinct": "32", "missing": "0", "min": "1", "max": "32", "mean": "15", "stdev": "8" }, { "name": "ENTIDAD_RES", "index": "4", "type": "numeric", "distinct": "32", "missing": "0", "min": "1", "max": "32", "mean": "15", "stdev": "8" }, { "name": "RESULTADO", "index": "5", "type": "numeric", "distinct": "2", "missing": "0", "min": "1", "max": "2", "mean": "2", "stdev": "0" }, { "name": "DELAY", "index": "6", "type": "numeric", "distinct": "1", "missing": "0", "min": "0", "max": "0", "mean": "0", "stdev": "0" }, { "name": "ENTIDAD_REGISTRO", "index": "7", "type": "numeric", "distinct": "32", "missing": "0", "min": "1", "max": "32", "mean": "15", "stdev": "8" }, { "name": "ENTIDAD", "index": "8", "type": "string", "distinct": "32", "missing": "0" }, { "name": "ABR_ENT", "index": "9", "type": "string", "distinct": "32", "missing": "0" }, { "name": "FECHA_ACTUALIZACION", "index": "10", "type": "string", "distinct": "46", "missing": "0" }, { "name": "ORIGEN", "index": "11", "type": "numeric", "distinct": "2", "missing": "0", "min": "1", "max": "2", "mean": "2", "stdev": "0" }, { "name": "SECTOR", "index": "12", "type": "numeric", "distinct": "14", "missing": "0", "min": "1", "max": "99", "mean": "10", "stdev": "7" }, { "name": "SEXO", "index": "13", "type": "numeric", "distinct": "2", "missing": "0", "min": "1", "max": "2", "mean": "2", "stdev": "0" }, { "name": "ENTIDAD_NAC", "index": "14", "type": "numeric", "distinct": "33", "missing": "0", "min": "1", "max": "99", "mean": "20", "stdev": "19" }, { "name": "MUNICIPIO_RES", "index": "15", "type": "numeric", "distinct": "359", "missing": "6", "min": "1", "max": "999", "mean": "36", "stdev": "48" }, { "name": "TIPO_PACIENTE", "index": "16", "type": "numeric", "distinct": "2", "missing": "0", "min": "1", "max": "2", "mean": "1", "stdev": "0" }, { "name": "FECHA_INGRESO", "index": "17", "type": "string", "distinct": "155", "missing": "0" }, { "name": "FECHA_SINTOMAS", "index": "18", "type": "string", "distinct": "154", "missing": "0" }, { "name": "FECHA_DEF", "index": "19", "type": "string", "distinct": "88", "missing": "0" }, { "name": "INTUBADO", "index": "20", "type": "numeric", "distinct": "4", "missing": "0", "min": "1", "max": "99", "mean": "75", "stdev": "40" }, { "name": "NEUMONIA", "index": "21", "type": "numeric", "distinct": "3", "missing": "0", "min": "1", "max": "99", "mean": "2", "stdev": "1" }, { "name": "EDAD", "index": "22", "type": "numeric", "distinct": "117", "missing": "0", "min": "0", "max": "120", "mean": "43", "stdev": "17" }, { "name": "NACIONALIDAD", "index": "23", "type": "numeric", "distinct": "2", "missing": "0", "min": "1", "max": "2", "mean": "1", "stdev": "0" }, { "name": "EMBARAZO", "index": "24", "type": "numeric", "distinct": "4", "missing": "0", "min": "1", "max": "98", "mean": "51", "stdev": "47" }, { "name": "HABLA_LENGUA_INDIG", "index": "25", "type": "numeric", "distinct": "3", "missing": "0", "min": "1", "max": "99", "mean": "5", "stdev": "16" }, { "name": "DIABETES", "index": "26", "type": "numeric", "distinct": "3", "missing": "0", "min": "1", "max": "98", "mean": "2", "stdev": "6" }, { "name": "EPOC", "index": "27", "type": "numeric", "distinct": "3", "missing": "0", "min": "1", "max": "98", "mean": "2", "stdev": "6" }, { "name": "ASMA", "index": "28", "type": "numeric", "distinct": "3", "missing": "0", "min": "1", "max": "98", "mean": "2", "stdev": "6" }, { "name": "INMUSUPR", "index": "29", "type": "numeric", "distinct": "3", "missing": "0", "min": "1", "max": "98", "mean": "2", "stdev": "6" }, { "name": "HIPERTENSION", "index": "30", "type": "numeric", "distinct": "3", "missing": "0", "min": "1", "max": "98", "mean": "2", "stdev": "6" }, { "name": "OTRA_COM", "index": "31", "type": "numeric", "distinct": "3", "missing": "0", "min": "1", "max": "98", "mean": "2", "stdev": "7" }, { "name": "CARDIOVASCULAR", "index": "32", "type": "numeric", "distinct": "3", "missing": "0", "min": "1", "max": "98", "mean": "2", "stdev": "6" }, { "name": "OBESIDAD", "index": "33", "type": "numeric", "distinct": "3", "missing": "0", "min": "1", "max": "98", "mean": "2", "stdev": "6" }, { "name": "RENAL_CRONICA", "index": "34", "type": "numeric", "distinct": "3", "missing": "0", "min": "1", "max": "98", "mean": "2", "stdev": "6" }, { "name": "TABAQUISMO", "index": "35", "type": "numeric", "distinct": "3", "missing": "0", "min": "1", "max": "98", "mean": "2", "stdev": "6" }, { "name": "OTRO_CASO", "index": "36", "type": "numeric", "distinct": "3", "missing": "0", "min": "1", "max": "99", "mean": "32", "stdev": "45" }, { "name": "MIGRANTE", "index": "37", "type": "numeric", "distinct": "3", "missing": "0", "min": "1", "max": "99", "mean": "99", "stdev": "6" }, { "name": "PAIS_NACIONALIDAD", "index": "38", "type": "string", "distinct": "78", "missing": "0" }, { "name": "PAIS_ORIGEN", "index": "39", "type": "string", "distinct": "44", "missing": "0" }, { "name": "UCI", "index": "40", "type": "numeric", "distinct": "4", "missing": "0", "min": "1", "max": "99", "mean": "75", "stdev": "40" } ], "nr_of_issues": 0, "nr_of_downvotes": 0, "nr_of_likes": 0, "nr_of_downloads": 0, "total_downloads": 0, "reach": 0, "reuse": 0, "impact_of_reuse": 0, "reach_of_reuse": 0, "impact": 0 }