OpenML

JavaScript is required to properly view the contents of this page!

nyc-taxi-green-dec-2016

active ARFF Publicly available Visibility: public Uploaded 18-06-2022 by Leo Grin
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes

Issue	#Downvotes for this reason	By

Loading wiki

Help us complete this description Edit

Dataset used in the tabular data benchmark https://github.com/LeoGrin/tabular-benchmark, transformed in the same way. This dataset belongs to the "regression on categorical and numerical features" benchmark. Original description: Trip Record Data provided by the New York City Taxi and Limousine Commission (TLC) [http://www.nyc.gov/html/tlc/html/about/trip_record_data.shtml]. The dataset includes TLC trips of the green line in December 2016. Data was downloaded on 03.11.2018. For a description of all variables in the dataset checkout the TLC homepage [http://www.nyc.gov/html/tlc/downloads/pdf/data_dictionary_trip_records_green.pdf]. The variable 'tip_amount' was chosen as target variable. The variable 'total_amount' is ignored by default, otherwise the target could be predicted deterministically. The date variables 'lpep_pickup_datetime' and 'lpep_dropoff_datetime' (ignored by default) could be used to compute additional time features. In this version, we chose only trips with 'payment_type' == 1 (credit card), as tips are not included for most other payment types. We also removed the variables 'trip_distance' and 'fare_amount' to increase the importance of the categorical features 'PULocationID' and 'DOLocationID'.

11 features

tip_amount (target)	numeric	1811 unique values 0 missing
VendorID	nominal	2 unique values 0 missing
store_and_fwd_flag	nominal	2 unique values 0 missing
RatecodeID	nominal	5 unique values 0 missing
passenger_count	numeric	10 unique values 0 missing
extra	nominal	5 unique values 0 missing
mta_tax	nominal	3 unique values 0 missing
tolls_amount	numeric	105 unique values 0 missing
improvement_surcharge	nominal	3 unique values 0 missing
total_amount	numeric	5377 unique values 0 missing
trip_type	nominal	2 unique values 0 missing