Data
Traffic_violations

Traffic_violations

active ARFF Public Domain (CC0) Visibility: public Uploaded 03-04-2020 by Florian Pargent
1 likes downloaded by 1 people , 2 total downloads 0 issues 0 downvotes
  • Computer Systems Machine Learning
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
This dataset contains traffic violation information from all electronic traffic violations issued in the County. Any information that can be used to uniquely identify the vehicle, the vehicle owner or the officer issuing the violation will not be published. For this version, some features were removed and all remaining character features were recoded as nominal factor variables. All punctuation characters were removed from factor levels. The variable 'Violation.Type' is used as target by default. The smaller target categories 'SERO' and 'ESERO' were collapsed into one category labeled 'SERO'. For this version, the dataset was downsampled to 5% of the original size. Unused factor levels and a few almost constant features were dropped.

21 features

Violation.Type (target)nominal3 unique values
0 missing
Descriptionnominal2130 unique values
0 missing
Beltsnominal2 unique values
0 missing
Personal.Injurynominal2 unique values
0 missing
Property.Damagenominal2 unique values
0 missing
Commercial.Licensenominal2 unique values
0 missing
Commercial.Vehiclenominal2 unique values
0 missing
Statenominal57 unique values
3 missing
VehicleTypenominal22 unique values
0 missing
Yearnumeric96 unique values
434 missing
Makenominal888 unique values
448 missing
Modelnominal3830 unique values
455 missing
Colornominal26 unique values
888 missing
Chargenominal605 unique values
0 missing
Contributed.To.Accidentnominal2 unique values
0 missing
Racenominal6 unique values
0 missing
Gendernominal3 unique values
0 missing
Driver.Citynominal1889 unique values
8 missing
Driver.Statenominal57 unique values
0 missing
DL.Statenominal63 unique values
52 missing
Arrest.Typenominal19 unique values
0 missing

19 properties

70340
Number of instances (rows) of the dataset.
21
Number of attributes (columns) of the dataset.
3
Number of distinct values of the target attribute (if it is nominal).
2288
Number of missing values in the dataset.
957
Number of instances with at least one value missing.
1
Number of numeric attributes.
20
Number of nominal attributes.
34382
Number of instances belonging to the most frequent class.
4.98
Percentage of instances belonging to the least frequent class.
3506
Number of instances belonging to the least frequent class.
6
Number of binary attributes.
28.57
Percentage of binary attributes.
1.36
Percentage of instances having missing values.
1
Average class difference between consecutive instances.
0.15
Percentage of missing values.
0
Number of attributes divided by the number of instances.
4.76
Percentage of numeric attributes.
48.88
Percentage of instances belonging to the most frequent class.
95.24
Percentage of nominal attributes.

10 tasks

0 runs - estimation_procedure: 10% Holdout set - target_feature: Violation.Type
0 runs - estimation_procedure: 33% Holdout set - target_feature: Violation.Type
0 runs - estimation_procedure: 10-fold Crossvalidation - evaluation_measure: predictive_accuracy - target_feature: Violation.Type
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
Define a new task