OpenML
Aircraft-Pricing-Dataset

Aircraft-Pricing-Dataset

active ARFF CC0: Public Domain Visibility: public Uploaded 24-03-2022 by Dustin Carrion
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
For a more comprehensive dataset with many more features check out the "Yacht/Motorboat Pricing Data (10,000+ listings)" dataset. Link below: https://www.kaggle.com/artemkorottchenko/large-boatyacht-pricing-dataset Context What are the most important features in determining the price of a new or used aircraft? Is it the aircraft type? Year? Manufacturer? Other characteristics? This is one of many questions regarding the used/new aircraft markets I hope to answer with this dataset. The dataset contains over 2000 aircraft that are for sale around the world. The data was scraped during July of 2020. Content The data was scraped from various websites using the Scrapy framework for Python. Scrapy script: https://github.com/akorott/Aircraft-Scrapy-Script.git Content scraped: New/Used Price Currency (USD, EUR, GBP) Category Year Make Model Location Serial number Registration number Total hours Engine 1 hours Engine 2 hours Prop 1 hours Prop 2 hours Total Seats Flight Rules National Origin Keep in mind that the data was scraped from 2 different sources. Some of the data (New/Used, Engine 1 hours, Engine 2 hours, Prop 1 hours, Prop 2 hours, Total Seats, Flight Rules) was only easily accessible on one source, thus is missing for part of the dataset. FAQ Flight Rules: Visual Flight Rules (VFR) VS Instrument Flight Rules (IFR). In a nutshell, an aircraft equipped with IFR is one where a pilot can fully navigate an aircraft using instruments in the cockpit. Any aircraft flying over 18,000 feet, by law, has to be equipped with IFR equipment. BTH - Beyond the Horizon - according to my research, BTH means that an aircraft is equipped with a radar, but doesn't fully meet IFR criteria. VFR - (https://en.wikipedia.org/wiki/Visual_flight_rules) IFR - (https://en.wikipedia.org/wiki/Instrument_flight_rules) Some of the acronyms used within total hours, engine 1, engine 2, prop 1, prop 2 columns: SMOH - Since major overhaul SNEW - Since new SPOH - Since prop overhaul SFOH - Since factory overhaul (more reliable) SOH - Since overhaul STOH - Since top overhaul SFRM - Since factory re-manufactured Thank you for checking out this dataset and happy kaggling!

18 features

Conditionstring3 unique values
769 missing
Pricestring1148 unique values
0 missing
Currencystring5 unique values
552 missing
Categorystring13 unique values
0 missing
Yearstring94 unique values
0 missing
Makestring187 unique values
0 missing
Modelstring1020 unique values
0 missing
Locationstring1007 unique values
12 missing
S/Nstring1824 unique values
3 missing
REGstring2029 unique values
2 missing
Total_Hoursstring1832 unique values
97 missing
Engine_1_Hoursstring1252 unique values
948 missing
Engine_2_Hoursstring352 unique values
2149 missing
Prop_1_Hoursstring913 unique values
1365 missing
Prop_2_Hoursstring243 unique values
2267 missing
Total_Seatsnumeric15 unique values
1378 missing
Flight_Rulesstring3 unique values
1662 missing
National_Originstring26 unique values
8 missing

19 properties

2530
Number of instances (rows) of the dataset.
18
Number of attributes (columns) of the dataset.
Number of distinct values of the target attribute (if it is nominal).
11212
Number of missing values in the dataset.
2397
Number of instances with at least one value missing.
1
Number of numeric attributes.
0
Number of nominal attributes.
0.01
Number of attributes divided by the number of instances.
5.56
Percentage of numeric attributes.
Percentage of instances belonging to the most frequent class.
0
Percentage of nominal attributes.
Number of instances belonging to the most frequent class.
Percentage of instances belonging to the least frequent class.
Number of instances belonging to the least frequent class.
0
Number of binary attributes.
0
Percentage of binary attributes.
94.74
Percentage of instances having missing values.
Average class difference between consecutive instances.
24.62
Percentage of missing values.

0 tasks

Define a new task