Data
porto-seguro

porto-seguro

active ARFF Publicly available Visibility: public Uploaded 03-12-2020 by Marcos de Paula Bueno
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
  • Language Machine Learning study_271 study_270
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
Training dataset of the 'Porto Seguros Safe Driver Prediction' Kaggle challenge [https://www.kaggle.com/c/porto-seguro-safe-driver-prediction]. The goal was to predict whether a driver will file an insurance claim next year. The official rules of the challenge explicitely state that the data may be used for 'academic research and education, and other non-commercial purposes' [https://www.kaggle.com/c/porto-seguro-safe-driver-prediction/rules]. For a description of all variables checkout the Kaggle dataset repository [https://www.kaggle.com/c/porto-seguro-safe-driver-prediction/data]. It states that numeric features with integer values that do not contain 'bin' or 'cat' in their variable names are in fact ordinal features which could be treated as ordinal factors in R. For further information on effective preprocessing and feature engineering checkout the 'Kernels' section of the Kaggle challenge website [https://www.kaggle.com/c/porto-seguro-safe-driver-prediction/kernels]. Note that many Kagglers removed all 'calc' variables as they do not seem to carry much information.

58 features

target (target)nominal2 unique values
0 missing
ps_ind_01numeric8 unique values
0 missing
ps_ind_02_catnominal4 unique values
216 missing
ps_ind_03numeric12 unique values
0 missing
ps_ind_04_catnominal2 unique values
83 missing
ps_ind_05_catnominal7 unique values
5809 missing
ps_ind_06_binnominal2 unique values
0 missing
ps_ind_07_binnominal2 unique values
0 missing
ps_ind_08_binnominal2 unique values
0 missing
ps_ind_09_binnominal2 unique values
0 missing
ps_ind_10_binnominal2 unique values
0 missing
ps_ind_11_binnominal2 unique values
0 missing
ps_ind_12_binnominal2 unique values
0 missing
ps_ind_13_binnominal2 unique values
0 missing
ps_ind_14numeric5 unique values
0 missing
ps_ind_15numeric14 unique values
0 missing
ps_ind_16_binnominal2 unique values
0 missing
ps_ind_17_binnominal2 unique values
0 missing
ps_ind_18_binnominal2 unique values
0 missing
ps_reg_01numeric10 unique values
0 missing
ps_reg_02numeric19 unique values
0 missing
ps_reg_03numeric5012 unique values
107772 missing
ps_car_01_catnominal12 unique values
107 missing
ps_car_02_catnominal2 unique values
5 missing
ps_car_03_catnominal2 unique values
411231 missing
ps_car_04_catnominal10 unique values
0 missing
ps_car_05_catnominal2 unique values
266551 missing
ps_car_06_catnominal18 unique values
0 missing
ps_car_07_catnominal2 unique values
11489 missing
ps_car_08_catnominal2 unique values
0 missing
ps_car_09_catnominal5 unique values
569 missing
ps_car_10_catnominal3 unique values
0 missing
ps_car_11_catnominal104 unique values
0 missing
ps_car_11numeric4 unique values
5 missing
ps_car_12numeric183 unique values
1 missing
ps_car_13numeric70482 unique values
0 missing
ps_car_14numeric849 unique values
42620 missing
ps_car_15numeric15 unique values
0 missing
ps_calc_01numeric10 unique values
0 missing
ps_calc_02numeric10 unique values
0 missing
ps_calc_03numeric10 unique values
0 missing
ps_calc_04numeric6 unique values
0 missing
ps_calc_05numeric7 unique values
0 missing
ps_calc_06numeric11 unique values
0 missing
ps_calc_07numeric10 unique values
0 missing
ps_calc_08numeric11 unique values
0 missing
ps_calc_09numeric8 unique values
0 missing
ps_calc_10numeric26 unique values
0 missing
ps_calc_11numeric20 unique values
0 missing
ps_calc_12numeric11 unique values
0 missing
ps_calc_13numeric14 unique values
0 missing
ps_calc_14numeric24 unique values
0 missing
ps_calc_15_binnominal2 unique values
0 missing
ps_calc_16_binnominal2 unique values
0 missing
ps_calc_17_binnominal2 unique values
0 missing
ps_calc_18_binnominal2 unique values
0 missing
ps_calc_19_binnominal2 unique values
0 missing
ps_calc_20_binnominal2 unique values
0 missing

19 properties

595212
Number of instances (rows) of the dataset.
58
Number of attributes (columns) of the dataset.
2
Number of distinct values of the target attribute (if it is nominal).
846458
Number of missing values in the dataset.
470281
Number of instances with at least one value missing.
26
Number of numeric attributes.
32
Number of nominal attributes.
41.38
Percentage of binary attributes.
79.01
Percentage of instances having missing values.
0.93
Average class difference between consecutive instances.
2.45
Percentage of missing values.
44.83
Percentage of numeric attributes.
0
Number of attributes divided by the number of instances.
55.17
Percentage of nominal attributes.
96.36
Percentage of instances belonging to the most frequent class.
573518
Number of instances belonging to the most frequent class.
3.64
Percentage of instances belonging to the least frequent class.
21694
Number of instances belonging to the least frequent class.
24
Number of binary attributes.

2 tasks

0 runs - estimation_procedure: 4-fold Crossvalidation - target_feature: target
0 runs - estimation_procedure: 10-fold Crossvalidation - target_feature: target
Define a new task