Data
Glass-Classification

Glass-Classification

active ARFF Database: Open Database, Contents: Database Contents Visibility: public Uploaded 24-03-2022 by Dustin Carrion
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
Context This is a Glass Identification Data Set from UCI. It contains 10 attributes including id. The response is glass type(discrete 7 values) Content Attribute Information: Id number: 1 to 214 (removed from CSV file) RI: refractive index Na: Sodium (unit measurement: weight percent in corresponding oxide, as are attributes 4-10) Mg: Magnesium Al: Aluminum Si: Silicon K: Potassium Ca: Calcium Ba: Barium Fe: Iron Type of glass: (class attribute) -- 1 buildingwindowsfloatprocessed -- 2 buildingwindowsnonfloatprocessed -- 3 vehiclewindowsfloatprocessed -- 4 vehiclewindowsnonfloatprocessed (none in this database) -- 5 containers -- 6 tableware -- 7 headlamps Acknowledgements https://archive.ics.uci.edu/ml/datasets/Glass+Identification Source: Creator: B. German Central Research Establishment Home Office Forensic Science Service Aldermaston, Reading, Berkshire RG7 4PN Donor: Vina Spiehler, Ph.D., DABFT Diagnostic Products Corporation (213) 776-0180 (ext 3014) Inspiration Data exploration of this dataset reveals two important characteristics : 1) The variables are highly corelated with each other including the response variables: So which kind of ML algorithm is most suitable for this dataset Random Forest , KNN or other? Also since dataset is too small is there any chance of applying PCA or it should be completely avoided? 2) Highly Skewed Data: Is scaling sufficient or are there any other techniques which should be applied to normalize data? Like BOX-COX Power transformation?

10 features

RInumeric178 unique values
0 missing
Nanumeric142 unique values
0 missing
Mgnumeric94 unique values
0 missing
Alnumeric118 unique values
0 missing
Sinumeric133 unique values
0 missing
Knumeric65 unique values
0 missing
Canumeric143 unique values
0 missing
Banumeric34 unique values
0 missing
Fenumeric32 unique values
0 missing
Typenumeric6 unique values
0 missing

19 properties

214
Number of instances (rows) of the dataset.
10
Number of attributes (columns) of the dataset.
Number of distinct values of the target attribute (if it is nominal).
0
Number of missing values in the dataset.
0
Number of instances with at least one value missing.
10
Number of numeric attributes.
0
Number of nominal attributes.
0
Percentage of binary attributes.
0
Percentage of instances having missing values.
Average class difference between consecutive instances.
0
Percentage of missing values.
0.05
Number of attributes divided by the number of instances.
100
Percentage of numeric attributes.
Percentage of instances belonging to the most frequent class.
0
Percentage of nominal attributes.
Number of instances belonging to the most frequent class.
Percentage of instances belonging to the least frequent class.
Number of instances belonging to the least frequent class.
0
Number of binary attributes.

0 tasks

Define a new task