Data
SigmaCabPrediction

SigmaCabPrediction

active ARFF CC BY-NC-SA 4.0 Visibility: public Uploaded 24-03-2022 by Dustin Carrion
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
  • Computer Systems Machine Learning
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
Problem Statement Welcome to Sigma Cab Private Limited - a cab aggregator service. Their customers can download their app on smartphones and book a cab from any where in the cities they operate in. They, in turn search for cabs from various service providers and provide the best option to their client across available options. They have been in operation for little less than a year now. During this period, they have captured surgepricingtype from the service providers. You have been hired by Sigma Cabs as a Data Scientist and have been asked to build a predictive model, which could help them in predicting the surgepricingtype pro-actively. This would in turn help them in matching the right cabs with the right customers quickly and efficiently. Data Variable Definition TripID - ID for TRIP (Can not be used for purposes of modelling) TripDistance - The distance for the trip requested by the customer TypeofCab - Category of the cab requested by the customer CustomerSinceMonths - Customer using cab services since n months; 0 month means current month LifeStyleIndex - Proprietary index created by Sigma Cabs showing lifestyle of the customer based on their behaviour ConfidenceLifeStyleIndex - Category showing confidence on the index mentioned above DestinationType - Sigma Cabs divides any destination in one of the 14 categories CustomerRating - Average of life time ratings of the customer till date CancellationLast1Month - Number of trips cancelled by the customer in last 1 month Var1, Var2 and Var3 - Continuous variables masked by the company. Can be used for modelling purposes Gender - Gender of the customer SurgePricing_Type - Predictor variable can be of 3 types

14 features

Trip_IDstring131662 unique values
0 missing
Trip_Distancenumeric10326 unique values
0 missing
Type_of_Cabstring5 unique values
20210 missing
Customer_Since_Monthsnumeric11 unique values
5920 missing
Life_Style_Indexnumeric55978 unique values
20193 missing
Confidence_Life_Style_Indexstring3 unique values
20193 missing
Destination_Typestring14 unique values
0 missing
Customer_Ratingnumeric3931 unique values
0 missing
Cancellation_Last_1Monthnumeric9 unique values
0 missing
Var1numeric122 unique values
71030 missing
Var2numeric58 unique values
0 missing
Var3numeric96 unique values
0 missing
Genderstring2 unique values
0 missing
Surge_Pricing_Typenumeric3 unique values
0 missing

19 properties

131662
Number of instances (rows) of the dataset.
14
Number of attributes (columns) of the dataset.
Number of distinct values of the target attribute (if it is nominal).
137546
Number of missing values in the dataset.
90054
Number of instances with at least one value missing.
9
Number of numeric attributes.
0
Number of nominal attributes.
0
Number of attributes divided by the number of instances.
64.29
Percentage of numeric attributes.
Percentage of instances belonging to the most frequent class.
0
Percentage of nominal attributes.
Number of instances belonging to the most frequent class.
Percentage of instances belonging to the least frequent class.
Number of instances belonging to the least frequent class.
0
Number of binary attributes.
0
Percentage of binary attributes.
68.4
Percentage of instances having missing values.
Average class difference between consecutive instances.
7.46
Percentage of missing values.

0 tasks

Define a new task