Data
New-York-Taxi-Trip-enriched-by-Mathematica

New-York-Taxi-Trip-enriched-by-Mathematica

active ARFF CC0: Public Domain Visibility: public Uploaded 24-03-2022 by Dustin Carrion
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
Context This data set was created to help Kaggle users in the New Your City Taxi Trip Duration competition. New features were generated using Wolfram Mathematica system. Hope that this data set will help both young and experienced researchers in their data mastering path. All sources can be found here. Content Given dataset consists of both features from initial dataset and generated via Wolfram Mathematica computational system. Thus, all features can be split into following groups: Initial features (extracted from initial data), Calendar features (contains of season, day name and day period), Weather features (information about temperature, snow, and rain), Travel features (geo distance with estimated driving distance and time). Dataset contains the following columns: id - a unique identifier for each trip, vendorId - a code indicating the provider associated with the trip record, passengerCount - the number of passengers in the vehicle (driver entered value), year, month, day, hour, minute, second, season, dayName, dayPeriod - day period, e.g. late night, morning, and etc., temperature, rain, snow, startLatitude, startLongitude, endLatitude, endLongitude, flag - this flag indicates whether the trip record was held in vehicle memory before sending to the vendor because the vehicle did not have a connection to the server - Y=store and forward; N=not a store and forward trip, drivingDistance - driving distance, estimated via Wolfram Mathematica system, drivingTime - driving time, estimated via Wolfram Mathematica system, geoDistance - distance between starting and ending points, tripDuration - duration of the trip in seconds (value -1 indicates test rows).

24 features

idstring2083778 unique values
0 missing
vendorIdnumeric2 unique values
0 missing
passengerCountnumeric10 unique values
0 missing
yearnumeric1 unique values
0 missing
monthnumeric6 unique values
0 missing
daynumeric31 unique values
0 missing
hournumeric24 unique values
0 missing
minutenumeric60 unique values
0 missing
secondnumeric60 unique values
0 missing
seasonstring3 unique values
0 missing
dayNamestring7 unique values
0 missing
dayPeriodstring5 unique values
0 missing
temperaturenumeric1800606 unique values
0 missing
rainnumeric2 unique values
0 missing
snownumeric2 unique values
0 missing
startLatitudenumeric48068 unique values
0 missing
startLongitudenumeric24960 unique values
0 missing
endLatitudenumeric67086 unique values
0 missing
endLongitudenumeric36977 unique values
0 missing
flagstring2 unique values
0 missing
drivingDistancenumeric1720327 unique values
4847 missing
drivingTimenumeric230 unique values
1963 missing
geoDistancenumeric2075378 unique values
0 missing
tripDurationnumeric7418 unique values
0 missing

19 properties

2083778
Number of instances (rows) of the dataset.
24
Number of attributes (columns) of the dataset.
Number of distinct values of the target attribute (if it is nominal).
6810
Number of missing values in the dataset.
4847
Number of instances with at least one value missing.
19
Number of numeric attributes.
0
Number of nominal attributes.
0
Number of attributes divided by the number of instances.
79.17
Percentage of numeric attributes.
Percentage of instances belonging to the most frequent class.
0
Percentage of nominal attributes.
Number of instances belonging to the most frequent class.
Percentage of instances belonging to the least frequent class.
Number of instances belonging to the least frequent class.
0
Number of binary attributes.
0
Percentage of binary attributes.
0.23
Percentage of instances having missing values.
Average class difference between consecutive instances.
0.01
Percentage of missing values.

0 tasks

Define a new task