daily pickup data for 329 FHV companies from January 2015 through August 2015.
From original source:
-----
There is also a file other-FHV-data-jan-aug-2015.csv containing daily pickup data for 329 FHV companies from January 2015 through August 2015.
-----
There are 5 columns:
id_series: The id of the time series.
date: The date of the time series in the format "%Y-%m-%d".
time_step: The time step on the time series.
value_X (X from 0 to 1): The values of the time series, which will be used for the forecasting task.
Preprocessing:
1 - Renamed columns: 'Number of Trips' to 'value_0', 'Number of Vehicles' to 'value_1', 'Base Number' to 'id_series', 'Pick Up Date' to 'date'.
2 - Dropped column 'Base Name', which contains the same information as id_series.
3 - Trimmed white spaces and capitalize the column 'id_series'.
4 - Standardize the date to the format %Y-%m-%d.
5 - Replace ' - ' in column 'value_1' with NaNs.
6 - Added missing dates to time series to have evenly spaced values with daily frequency.
There were some dates missing for some time series, this could be entire months or some missing days between two values. The values were considered NaNs.
7 - Created column 'time_step' with increasing values of the time_step for each time series.
8 - Casted 'value_X' columns to float (to accomodate NaNs, as all the other values are int) and 'id_series' as 'category'.