OpenML

JavaScript is required to properly view the contents of this page!

Explore
- Data
- Task
- Flow
- Run
- Study
- Task type
- Measure
- People
Help
Blog
Contact
Please cite us

ILINet

active ARFF Public Domain Visibility: public Uploaded 24-06-2024 by Bruno Belucci Teixeira
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes

Issue	#Downvotes for this reason	By

Loading wiki

Help us complete this description Edit

Outpatient Illness Surveillance weekly data. From original source: ----- Outpatient Illness Surveillance - Information on patient visits to health care providers for influenza-like illness is collected through the U.S. Outpatient Influenza-like Illness Surveillance Network (ILINet). This collaborative effort between CDC, state and local health departments, and health care providers started during the 1997-98 influenza season when approximately 250 providers were enrolled. Enrollment in the system has increased over time and there were >3,000 providers enrolled during the 2010-11 season. The number and percent of patients presenting with ILI each week will vary by region and season due to many factors, including having different provider type mixes (children present with higher rates of ILI than adults, and therefore regions with a higher percentage of pediatric practices will have higher numbers of cases). Therefore it is not appropriate to compare the magnitude of the percent of visits due to ILI between regions and seasons. Baseline levels are calculated both nationally and for each region. Percentages at or above the baseline level are considered to be elevated. For more information on ILI surveillance and baselines please visit:http://www.cdc.gov/flu/weekly/overview.htm#Outpatient ----- This data is the extraction of "National" data from seasons 1997-98 to 2023-24. There are 12 columns: id_series: The id of the time series. date: The date of the time series in the format "%Y-%m-%d". time_step: The time step on the time series. value_X (X from 0 to 8): The values of the time series, which will be used for the forecasting task. Preprocessing: 1 - Dropped columns 'REGION' and 'REGION TYPE', as they have only the value 'X'. 2 - Dropped rows with 'YEAR' <= 2002 and 'YEAR' >= 2024. Before the year 2002, there is a seasonal gap every year between the weeks [21, 39]. This does not happen after 2002. Effectively, this drop 274 rows, or ~20% of the original amount. We could imagine that a model will automatically account for this, but we prefered to work with a clean dataset as it is already common for this dataset in other works. Besides, the data is not yet completed for 2024. 2 - Replaced values 'X' by 0, and casted columns 'AGE 25-49', 'AGE 50-64', and 'AGE 25-64' to int. 3 - Summed columns 'AGE 25-49', 'AGE 50-64', and 'AGE 25-64' to replace the column 'AGE 25-64'. 4 - Dropped columns AGE 25-49', 'AGE 50-64'. It seems that the values 'X' in the 'AGE X' columns are due to a change on how the age of the patients were accounted for before and after the year-week 2009-40. With our preprocessing, we correctly find the 'ILITOTAL' if we sum all the 'AGE X' columns. 5 - Created date column 'date' from columns 'YEAR' and 'WEEK', considering the end of week on Saturday in the format "%Y-%m-%d". 6 - Dropped columns 'YEAR' and 'MONTH'. 7 - Renamed columns [:-1] to 'value_X' with X from 0 to 8. 8 - Created 'id_series' with value 0. There is only one multivariate time series. 9 - Ensured that there are no missing dates and that the frequency of the time_series is weekly. There were only 3 missing rows with dates '2008-01-05', '2013-01-05' and '2019-01-05', they were filled with the last valid values. 10 - Created 'time_step' column from 'date' and 'id_series' with increasing values from 0 to the size of the time series. 11 - Casted 'date' to str, 'time_step' to int, 'value_X' with X in [0, 1] columns to float, the other 'value_X' columns to int and defined 'id_series' as 'category'.

12 features

id_series	nominal	1 unique values 0 missing
date	string	1095 unique values 0 missing
value_0	numeric	1094 unique values 0 missing
value_1	numeric	1093 unique values 0 missing
value_2	numeric	1023 unique values 0 missing
value_3	numeric	1034 unique values 0 missing
value_4	numeric	1045 unique values 0 missing
value_5	numeric	855 unique values 0 missing
value_6	numeric	1078 unique values 0 missing
value_7	numeric	866 unique values 0 missing
value_8	numeric	1096 unique values 0 missing
time_step	numeric	1099 unique values 0 missing