Data
diamonds

diamonds

active ARFF Publicly available Visibility: public Uploaded 22-12-2022 by Sebastian Fischer
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
Description This classic dataset originally contained the prices and other attributes of almost 54,000 diamonds. However, 14184 of those seem to be the same diamonds, measure from a different angle. This can be found out but checking for duplicated value when disregarding the variables x, y, z , depth and table, which are dependent on the angle. Attribute Information 1. *price* - Content price price in US dollars (\$326--\$18,823), target feature 2. *carat* - weight of the diamond (0.2--5.01) 3. *cut* - quality of the cut (Fair, Good, Very Good, Premium, Ideal) 4. *color* - diamond colour, from J (worst) to D (best) 5. *clarity* - a measurement of how clear the diamond is (I1 (worst), SI2, SI1, VS2, VS1, VVS2, VVS1, IF (best)) 6. *x* - length in mm (0--10.74) 7. *y* - width in mm (0--58.9) 8. *z* - depth in mm (0--31.8) 9. *depth* - total depth percentage = z / mean(x, y) = 2 * z / (x + y) (43--79) 10. *table* - width of top of diamond relative to widest point (43--95)

10 features

price (target)numeric11602 unique values
0 missing
caratnumeric273 unique values
0 missing
cutnominal5 unique values
0 missing
colornominal7 unique values
0 missing
claritynominal8 unique values
0 missing
depthnumeric184 unique values
0 missing
tablenumeric127 unique values
0 missing
xnumeric554 unique values
0 missing
ynumeric552 unique values
0 missing
znumeric375 unique values
0 missing

19 properties

53940
Number of instances (rows) of the dataset.
10
Number of attributes (columns) of the dataset.
0
Number of distinct values of the target attribute (if it is nominal).
0
Number of missing values in the dataset.
0
Number of instances with at least one value missing.
7
Number of numeric attributes.
3
Number of nominal attributes.
Number of instances belonging to the most frequent class.
Percentage of instances belonging to the least frequent class.
Number of instances belonging to the least frequent class.
0
Number of binary attributes.
0
Percentage of binary attributes.
0
Percentage of instances having missing values.
-22.56
Average class difference between consecutive instances.
0
Percentage of missing values.
0
Number of attributes divided by the number of instances.
70
Percentage of numeric attributes.
Percentage of instances belonging to the most frequent class.
30
Percentage of nominal attributes.

1 tasks

0 runs - estimation_procedure: 10-fold Crossvalidation - target_feature: price
Define a new task