Data
Pedestrian-Counting-System-Melbourne-preprocessed

Pedestrian-Counting-System-Melbourne-preprocessed

active ARFF Creative Commons Attribution 4.0 International Visibility: public Uploaded 25-06-2024 by Bruno Belucci Teixeira
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
Pedestrian Counting System published by the city of Melbourne, preprocessed data. From original source: ----- This dataset contains hourly pedestrian counts since 2009 from pedestrian sensor devices located across the city. The data is updated on a monthly basis and can be used to determine variations in pedestrian activity throughout the day. The sensor_id column can be used to merge the data with the Pedestrian Counting System - Sensor Locations dataset which details the location, status and directional readings of sensors. Any changes to sensor locations are important to consider when analysing and interpreting pedestrian counts over time. Importants notes about this dataset: - Where no pedestrians have passed underneath a sensor during an hour, a count of zero will be shown for the sensor for that hour. - Directional readings are not included, though we hope to make this available later in the year. Directional readings are provided in the Pedestrian Counting System - Past Hour (counts per minute) dataset. The Pedestrian Counting System helps to understand how people use different city locations at different times of day to better inform decision-making and plan for the future. A representation of pedestrian volume which compares each location on any given day and time can be found in our Online Visualisation. ----- We have acquired the data by scrapping the website 'https://www.pedestrian.melbourne.vic.gov.au/#date=1-6-2010&time=8', as the data seems to be incomplete from https://data.melbourne.vic.gov.au/explore/dataset/pedestrian-counting-system-monthly-counts-per-hour/information/. There are 54 columns: id_series: The id of the time series. date: The date of the time series in the format "%Y-%m-%d %H:%M:$S". time_step: The time step on the time series. value_X (X from 0 to 50): The values of the time series, which will be used for the forecasting task. Preprocessing: 1 - Standardize the date to the format %Y-%m-%d %H:%M:%S 2 - Replaced values 'na' and 'undefined' to NaNs and casted value columns to float. Even though the values are 'int', we use float to accomodate NaN values. 3 - Coalesced columns that we judged that were the same in a single column. The columns that we judged that were the same were the tuple: ('Bourke St-Russel St (West)', 'Bourke St-Russell St (West)'), ('Flinders La - Swanston St (West) Temporary', 'Flinders La-Swanston St (West) Temporary'), ('Flinders Ln - Degraves St (Crossing)', 'Flinders Ln -Degraves St (Crossing)'), ('Flinders Ln - Degraves St (North)', 'Flinders Ln -Degraves St (North)'), ('Flinders Ln - Degraves St (South)', 'Flinders Ln -Degraves St (South)'), ('Flinders St - ACMI', 'Flinders St- ACMI'), ('Flinders St Station Underpass', 'Flinders Street Station Underpass'), ('Flinders St-Spark La', 'Flinders St-Spark Lane'), ('Lincoln-Swanston (W)', 'Lincoln-Swanston (West)'), ('Macaulay Rd - Bellair St', 'Macaulay Rd-Bellair St'), ('QV Market-Elizabeth (West)', 'QV Market-Elizabeth St (West)'), ('Queen St (West)', 'Queen Street (West)'), ('Spring St - Flinders St (West)', 'Spring St- Flinders St (West)'), ('State Library - New', 'State Library- New'), ('St Kilda Rd-Alexandra Gardens', 'St. Kilda-Alexandra Gardens') 4 - Replaced negative values with NaNs. 5 - Dropped columns with the last non NaN value before 2024. 6 - Dropped columns with the first non NaN value after 2018. 7 - Select the data between the years 2019 and 2023. 8 - Renamed value columns to 'value_X' where X is between 0 and 50. 9 - Created column 'id_series' with value 0, there is only one (multivariate) series, and column 'time_step' with increasing values of the time_step. 10 - Casted 'date' to str, 'time_step' to int, 'value_X' to float, and defined 'id_series' as 'category'.

54 features

id_seriesnominal1 unique values
0 missing
datestring43824 unique values
0 missing
value_0numeric2887 unique values
27501 missing
value_1numeric1999 unique values
2955 missing
value_2numeric1632 unique values
1563 missing
value_3numeric3236 unique values
1818 missing
value_4numeric4303 unique values
167 missing
value_5numeric3928 unique values
461 missing
value_6numeric3168 unique values
397 missing
value_7numeric3208 unique values
1091 missing
value_8numeric923 unique values
2260 missing
value_9numeric1272 unique values
485 missing
value_10numeric560 unique values
3786 missing
value_11numeric841 unique values
2061 missing
value_12numeric3844 unique values
6 missing
value_13numeric3839 unique values
204 missing
value_14numeric1650 unique values
2177 missing
value_15numeric1752 unique values
10334 missing
value_16numeric2003 unique values
302 missing
value_17numeric4046 unique values
14195 missing
value_18numeric3797 unique values
693 missing
value_19numeric1606 unique values
801 missing
value_20numeric1816 unique values
997 missing
value_21numeric1471 unique values
453 missing
value_22numeric607 unique values
2101 missing
value_23numeric1492 unique values
117 missing
value_24numeric1082 unique values
203 missing
value_25numeric1199 unique values
4358 missing
value_26numeric1258 unique values
1165 missing
value_27numeric1016 unique values
8047 missing
value_28numeric414 unique values
3036 missing
value_29numeric2370 unique values
1143 missing
value_30numeric2969 unique values
8306 missing
value_31numeric3260 unique values
294 missing
value_32numeric2192 unique values
3009 missing
value_33numeric4260 unique values
2127 missing
value_34numeric1169 unique values
679 missing
value_35numeric703 unique values
2365 missing
value_36numeric1195 unique values
823 missing
value_37numeric2473 unique values
5860 missing
value_38numeric4231 unique values
10311 missing
value_39numeric516 unique values
3082 missing
value_40numeric2731 unique values
5 missing
value_41numeric1329 unique values
1122 missing
value_42numeric975 unique values
413 missing
value_43numeric912 unique values
794 missing
value_44numeric513 unique values
645 missing
value_45numeric1223 unique values
4629 missing
value_46numeric784 unique values
3190 missing
value_47numeric989 unique values
657 missing
value_48numeric3349 unique values
2121 missing
value_49numeric2480 unique values
30 missing
value_50numeric1738 unique values
6247 missing
time_stepnumeric43824 unique values
0 missing

19 properties

43824
Number of instances (rows) of the dataset.
54
Number of attributes (columns) of the dataset.
Number of distinct values of the target attribute (if it is nominal).
151586
Number of missing values in the dataset.
37715
Number of instances with at least one value missing.
52
Number of numeric attributes.
1
Number of nominal attributes.
Average class difference between consecutive instances.
6.41
Percentage of missing values.
0
Number of attributes divided by the number of instances.
96.3
Percentage of numeric attributes.
Percentage of instances belonging to the most frequent class.
1.85
Percentage of nominal attributes.
Number of instances belonging to the most frequent class.
Percentage of instances belonging to the least frequent class.
Number of instances belonging to the least frequent class.
0
Number of binary attributes.
0
Percentage of binary attributes.
86.06
Percentage of instances having missing values.

0 tasks

Define a new task