Context
This data set was created to help Kaggle users in the New Your City Taxi Trip Duration competition. New features were generated using Wolfram Mathematica system.
Hope that this data set will help both young and experienced researchers in their data mastering path.
All sources can be found here.
Content
Given dataset consists of both features from initial dataset and generated via Wolfram Mathematica computational system. Thus, all features can be split into following groups:
Initial features (extracted from initial data),
Calendar features (contains of season, day name and day period),
Weather features (information about temperature, snow, and rain),
Travel features (geo distance with estimated driving distance and time).
Dataset contains the following columns:
id - a unique identifier for each trip,
vendorId - a code indicating the provider associated with the trip record,
passengerCount - the number of passengers in the vehicle (driver entered value),
year,
month,
day,
hour,
minute,
second,
season,
dayName,
dayPeriod - day period, e.g. late night, morning, and etc.,
temperature,
rain,
snow,
startLatitude,
startLongitude,
endLatitude,
endLongitude,
flag - this flag indicates whether the trip record was held in vehicle memory before sending to the vendor because the vehicle did not have a connection to the server - Y=store and forward; N=not a store and forward trip,
drivingDistance - driving distance, estimated via Wolfram Mathematica system,
drivingTime - driving time, estimated via Wolfram Mathematica system,
geoDistance - distance between starting and ending points,
tripDuration - duration of the trip in seconds (value -1 indicates test rows).