Data
Used-Cars-Dataset

Used-Cars-Dataset

active ARFF CC0: Public Domain Visibility: public Uploaded 23-03-2022 by Dustin Carrion
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
  • Computer Systems Machine Learning
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
Context Craigslist is the world's largest collection of used vehicles for sale, yet it's very difficult to collect all of them in the same place. I built a scraper for a school project and expanded upon it later to create this dataset which includes every used vehicle entry within the United States on Craigslist. Content This data is scraped every few months, it contains most all relevant information that Craigslist provides on car sales including columns like price, condition, manufacturer, latitude/longitude, and 18 other categories. For ML projects, consider feature engineering on location columns such as long/lat. For previous listings, check older versions of the dataset. See https://github.com/AustinReese/UsedVehicleSearch

25 features

id (ignore)numeric426880 unique values
0 missing
urlstring426880 unique values
0 missing
regionstring404 unique values
0 missing
region_urlstring413 unique values
0 missing
pricenumeric15655 unique values
0 missing
yearnumeric114 unique values
1205 missing
manufacturerstring42 unique values
17646 missing
modelstring29667 unique values
5277 missing
conditionstring6 unique values
174104 missing
cylindersstring8 unique values
177678 missing
fuelstring5 unique values
3013 missing
odometernumeric104870 unique values
4400 missing
title_statusstring6 unique values
8242 missing
transmissionstring3 unique values
2556 missing
VINstring118264 unique values
161042 missing
drivestring3 unique values
130567 missing
sizestring4 unique values
306361 missing
typestring13 unique values
92858 missing
paint_colorstring12 unique values
130203 missing
image_urlstring241899 unique values
68 missing
descriptionstring360911 unique values
70 missing
countynumeric0 unique values
426880 missing
statestring51 unique values
0 missing
latnumeric53181 unique values
6549 missing
longnumeric53772 unique values
6549 missing
posting_datestring381536 unique values
68 missing

19 properties

426880
Number of instances (rows) of the dataset.
25
Number of attributes (columns) of the dataset.
Number of distinct values of the target attribute (if it is nominal).
1655336
Number of missing values in the dataset.
426880
Number of instances with at least one value missing.
6
Number of numeric attributes.
0
Number of nominal attributes.
0
Number of attributes divided by the number of instances.
24
Percentage of numeric attributes.
Percentage of instances belonging to the most frequent class.
0
Percentage of nominal attributes.
Number of instances belonging to the most frequent class.
Percentage of instances belonging to the least frequent class.
Number of instances belonging to the least frequent class.
0
Number of binary attributes.
0
Percentage of binary attributes.
100
Percentage of instances having missing values.
Average class difference between consecutive instances.
15.51
Percentage of missing values.

0 tasks

Define a new task