Data
ILINet

ILINet

active ARFF Public Domain Visibility: public Uploaded 24-06-2024 by Bruno Belucci Teixeira
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
Outpatient Illness Surveillance weekly data. From original source: ----- Outpatient Illness Surveillance - Information on patient visits to health care providers for influenza-like illness is collected through the U.S. Outpatient Influenza-like Illness Surveillance Network (ILINet). This collaborative effort between CDC, state and local health departments, and health care providers started during the 1997-98 influenza season when approximately 250 providers were enrolled. Enrollment in the system has increased over time and there were >3,000 providers enrolled during the 2010-11 season. The number and percent of patients presenting with ILI each week will vary by region and season due to many factors, including having different provider type mixes (children present with higher rates of ILI than adults, and therefore regions with a higher percentage of pediatric practices will have higher numbers of cases). Therefore it is not appropriate to compare the magnitude of the percent of visits due to ILI between regions and seasons. Baseline levels are calculated both nationally and for each region. Percentages at or above the baseline level are considered to be elevated. For more information on ILI surveillance and baselines please visit:http://www.cdc.gov/flu/weekly/overview.htm#Outpatient ----- This data is the extraction of "National" data from seasons 1997-98 to 2023-24. There are 12 columns: id_series: The id of the time series. date: The date of the time series in the format "%Y-%m-%d". time_step: The time step on the time series. value_X (X from 0 to 8): The values of the time series, which will be used for the forecasting task. Preprocessing: 1 - Dropped columns 'REGION' and 'REGION TYPE', as they have only the value 'X'. 2 - Dropped rows with 'YEAR' <= 2002 and 'YEAR' >= 2024. Before the year 2002, there is a seasonal gap every year between the weeks [21, 39]. This does not happen after 2002. Effectively, this drop 274 rows, or ~20% of the original amount. We could imagine that a model will automatically account for this, but we prefered to work with a clean dataset as it is already common for this dataset in other works. Besides, the data is not yet completed for 2024. 2 - Replaced values 'X' by 0, and casted columns 'AGE 25-49', 'AGE 50-64', and 'AGE 25-64' to int. 3 - Summed columns 'AGE 25-49', 'AGE 50-64', and 'AGE 25-64' to replace the column 'AGE 25-64'. 4 - Dropped columns AGE 25-49', 'AGE 50-64'. It seems that the values 'X' in the 'AGE X' columns are due to a change on how the age of the patients were accounted for before and after the year-week 2009-40. With our preprocessing, we correctly find the 'ILITOTAL' if we sum all the 'AGE X' columns. 5 - Created date column 'date' from columns 'YEAR' and 'WEEK', considering the end of week on Saturday in the format "%Y-%m-%d". 6 - Dropped columns 'YEAR' and 'MONTH'. 7 - Renamed columns [:-1] to 'value_X' with X from 0 to 8. 8 - Created 'id_series' with value 0. There is only one multivariate time series. 9 - Ensured that there are no missing dates and that the frequency of the time_series is weekly. There were only 3 missing rows with dates '2008-01-05', '2013-01-05' and '2019-01-05', they were filled with the last valid values. 10 - Created 'time_step' column from 'date' and 'id_series' with increasing values from 0 to the size of the time series. 11 - Casted 'date' to str, 'time_step' to int, 'value_X' with X in [0, 1] columns to float, the other 'value_X' columns to int and defined 'id_series' as 'category'.

12 features

id_seriesnominal1 unique values
0 missing
datestring1095 unique values
0 missing
value_0numeric1094 unique values
0 missing
value_1numeric1093 unique values
0 missing
value_2numeric1023 unique values
0 missing
value_3numeric1034 unique values
0 missing
value_4numeric1045 unique values
0 missing
value_5numeric855 unique values
0 missing
value_6numeric1078 unique values
0 missing
value_7numeric866 unique values
0 missing
value_8numeric1096 unique values
0 missing
time_stepnumeric1099 unique values
0 missing

19 properties

1099
Number of instances (rows) of the dataset.
12
Number of attributes (columns) of the dataset.
Number of distinct values of the target attribute (if it is nominal).
0
Number of missing values in the dataset.
0
Number of instances with at least one value missing.
10
Number of numeric attributes.
1
Number of nominal attributes.
0.01
Number of attributes divided by the number of instances.
83.33
Percentage of numeric attributes.
Percentage of instances belonging to the most frequent class.
8.33
Percentage of nominal attributes.
Number of instances belonging to the most frequent class.
Percentage of instances belonging to the least frequent class.
Number of instances belonging to the least frequent class.
0
Number of binary attributes.
0
Percentage of binary attributes.
0
Percentage of instances having missing values.
Average class difference between consecutive instances.
0
Percentage of missing values.

0 tasks

Define a new task