Data
Comprehensive-database-of-Minerals

Comprehensive-database-of-Minerals

active ARFF CC0: Public Domain Visibility: public Uploaded 23-03-2022 by Elif Ceren Gok
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
This dataset is the collection of 3112 minerals, their chemical compositions, crystal structure, physical and optical properties. The properties that are included in this database are the Crystal structure, Mohs Hardness, Refractive Index, Optical axes, Optical Dispersion, Molar Volume, Molar, Mass, Specific Gravity, and Calculated Density. Introduction The term dielectric is applied to a class of materials - usually solids - that are poor conductors of electricity. Dielectrics are of significant technological and industrial importance, being essential functional components of almost all electronic devices. For most of these applications, they are required to be mechanically tough and thermally robust. The defining physical attribute of a dielectric is electric polarizability which is the tendency for charges to be non-uniformly distributed across a chemical bond. Most dielectrics contain dipoles due to their ionic bonds or covalent bonds with strong ionic nature. At a macroscopic scale, this implies that an external electric field can interact with these charges and result in various optical and electric phenomena. Optically, dielectrics can be transparent, opaque, or vitreous. They can also be isotropic, biaxial, or fully anisotropic. The luster of gem minerals such as emerald, sapphire, and ruby is due to their high refractive index which causes white light to be split into its components. The presence of two refractive indices in a material can result in an incident beam being split into two rays that interfere with each other. This common phenomenon is called Birefringence. These effects are made use of in many commercially important applications such as transparent conductive oxides, liquid crystal displays, medical diagnostics, stress sensing, light modulation, etc. As an example, transparent conducting oxides (TCO) are derived from dielectrics by doping oxides with impurity atoms. TCOs do not absorb light in the visible spectrum rendering them transparent and are also conductors of charge. The most important application of TCOs is as the top electrode of solar cells where they allow light to fall on a semiconducting layer while capturing the released hole/electron to generate current. Airplane windshields have a thin coating of a TCO material on them that is used to generate heat by passing a current. This is necessary to keep the glass defrosted allowing the pilot visibility to navigate. Other applications of TCOs is as substrates in electronics, flexible displays, high definition TVs, and the screens of mobile smart devices. The figure for merit for optical phenomena is the refractive index, which is defined as the ratio of the speed of light in the medium to the speed of light in vavacuum. Provenance of Data The list of minerals with individual pages in Wikipedia is given at: https://en.wikipedia.org/wiki/List_of_minerals. The get method of the requests library is used to retrieve this page and the content is parsed using BeautifulSoup a python library specifically engineered for parsing html and lxml content. The URLs for all the minerals given in this page is extracted using their href attribute and are stored in a dictionary, along with the mineral name. Each of the webpages has textual information on the mineral (origin, etymology, variety, history etc.), images (cleavages, and other data) as well as an Infobox on the right that tabulates some common mineral properties such as category, formula, strunz classification, crystal structure, unit cell, Mohs hardness, color, cleavage, fracture, luster, diaphaneity, specific gravity, optical properties a and refractive index. The soup object for the page is retrieved and the table element with class name infobox is extracted. The specified row heading and row data are then read into a dictionary which is wrapped in a class object. A class method writes this data into a csv file while another method writes the text from the webpage into a text file. The American Mineralogist Crystal Structure Database at http://rruff.geo.arizona.edu/AMS/amcsd.php has a list of over 4000 minerals with their cif files. The name and the URL of all these minerals are found at http://rruff.geo.arizona.edu/AMS. From here, each mineral name and the corresponding URL is extracted using the approach outlined above. Accessing each page, we find the crystallographic information of the mineral. The a,b,c edge lengths and alpha, beta, gamma - unit cell angles are given at the top followed by a list of all atoms and their x,y,z positions. The header is extracted and stored in a pandas dataframe while the atomic species and their positions are saved into a separate CSV file. This is repeated for all the 4000 minerals. Before inclusion into the machine learning stage of this study, each of these cif files are read and parsed into a vector with each cell corresponding to an element of the periodic table and the number of atoms of the element in the formula is counted as the cell value. This is detailed further in the data processing part of the project. Compared to other properties, dispersion of minerals has been hard to find. Dispersion values of 60 minerals found at: http://gemologyproject.com/wiki. The chemical formula, molar mass, molar volume, and calculated density are available for all minerals. The availability of other properties vary. Chemical Formula The chemical formula has been parsed so that the number of each element has been separated tabulated. For example, the mineral Quartz has the formula 'SiO2' - so that the corresponding entry for the column 'Silicon' is 1 and the entry for 'Oxygen' is 2. The entries for all the other elements are 0. In this way, the chemical formula for each mineral is converted into a vector where each column corresponds to an element in the periodic table and the value corresponds to the number of atoms of the element in a formula unit of the mineral. In addition to the pure elements, ionic species such as carbonate, phosphate, nitrate, cyanide, hydrated water, etc are also counted separately. Molar Mass The molar mass of the mineral is calculated by adding together the mass of each atom in a mole of the mineral. Molar mass = Summation( no of atoms * mass of each atom) Molar Volume The molar volume of the mineral is calculated by adding together the volume of each atom in a mole of the mineral. Molar volume = Summation( no of atoms * volume of each atom) Refractive Index The refractive index of the mineral is defined as the ratio of the speed of light in the mineral to the speed of light in free space. This is a function of the frequency of light. The RI of blue light is not the same as the RI of red light in the same mineral. This variation is measured by 'dispersion'. Mohs Hardness Mohs hardness is a qualitative measure for the hardness of a mineral that is frequently used by the geologist. Diamond (hardest mineral) is given the highest value of 10 and talc (softest mineral) is given the value of 1. A mineral that can scratch a second mineral has a higher Mohs hardness. In this way, all the minerals can be ranked on a relative scale of hardness. It is not exactly clear what physical parameter is represented by the Mohs Hardness. Several absolute scales for hardness such as toughness, yield strength, etc. are known from the mechanics of materials, however, none of them seem to correspond exactly to Mohs Hardness. However, this remains a very intuitive way to understand the physical property of a material.

140 features

Unnamed:_0numeric3112 unique values
0 missing
Namestring3112 unique values
0 missing
Crystal_Structurenumeric7 unique values
0 missing
Mohs_Hardnessnumeric46 unique values
0 missing
Diaphaneitynumeric4 unique values
0 missing
Specific_Gravitynumeric406 unique values
0 missing
Opticalnumeric5 unique values
0 missing
Refractive_Indexnumeric367 unique values
0 missing
Dispersionnumeric26 unique values
0 missing
Hydrogennumeric34 unique values
0 missing
Heliumnumeric2 unique values
0 missing
Lithiumnumeric6 unique values
0 missing
Berylliumnumeric8 unique values
0 missing
Boronnumeric17 unique values
0 missing
Carbonnumeric13 unique values
0 missing
Nitrogennumeric7 unique values
0 missing
Oxygennumeric61 unique values
0 missing
Fluorinenumeric11 unique values
0 missing
Neonnumeric2 unique values
0 missing
Sodiumnumeric23 unique values
0 missing
Magnesiumnumeric16 unique values
0 missing
Aluminiumnumeric19 unique values
0 missing
Siliconnumeric22 unique values
0 missing
Phosphorusnumeric12 unique values
0 missing
Sulfurnumeric15 unique values
0 missing
Chlorinenumeric10 unique values
0 missing
Argonnumeric2 unique values
0 missing
Potassiumnumeric11 unique values
0 missing
Calciumnumeric15 unique values
0 missing
Scandiumnumeric4 unique values
0 missing
Titaniumnumeric10 unique values
0 missing
Vanadiumnumeric9 unique values
0 missing
Chromiumnumeric6 unique values
0 missing
Manganesenumeric20 unique values
0 missing
Ironnumeric19 unique values
0 missing
Cobaltnumeric5 unique values
0 missing
Nickelnumeric6 unique values
0 missing
Coppernumeric12 unique values
0 missing
Zincnumeric10 unique values
0 missing
Galliumnumeric5 unique values
0 missing
Germaniumnumeric4 unique values
0 missing
Arsenicnumeric10 unique values
0 missing
Seleniumnumeric5 unique values
0 missing
Brominenumeric3 unique values
0 missing
Kryptonnumeric2 unique values
0 missing
Rubidiumnumeric5 unique values
0 missing
Strontiumnumeric8 unique values
0 missing
Yttriumnumeric10 unique values
0 missing
Zirconiumnumeric8 unique values
0 missing
Niobiumnumeric5 unique values
0 missing
Molybdenumnumeric5 unique values
0 missing
Technetiumnumeric2 unique values
0 missing
Rutheniumnumeric1 unique values
0 missing
Rhodiumnumeric1 unique values
0 missing
Palladiumnumeric2 unique values
0 missing
Silvernumeric2 unique values
0 missing
Cadmiumnumeric4 unique values
0 missing
Indiumnumeric2 unique values
0 missing
Tinnumeric4 unique values
0 missing
Antimonynumeric9 unique values
0 missing
Telluriumnumeric5 unique values
0 missing
Iodinenumeric6 unique values
0 missing
Xenonnumeric2 unique values
0 missing
Cesiumnumeric5 unique values
0 missing
Bariumnumeric8 unique values
0 missing
Lanthanumnumeric6 unique values
0 missing
Ceriumnumeric7 unique values
0 missing
Praseodymiumnumeric5 unique values
0 missing
Neodymiumnumeric6 unique values
0 missing
Promethiumnumeric1 unique values
0 missing
Samariumnumeric4 unique values
0 missing
Europiumnumeric2 unique values
0 missing
Gadoliniumnumeric4 unique values
0 missing
Terbiumnumeric2 unique values
0 missing
Dysprosiumnumeric2 unique values
0 missing
Holmiumnumeric2 unique values
0 missing
Erbiumnumeric3 unique values
0 missing
Thuliumnumeric2 unique values
0 missing
Ytterbiumnumeric3 unique values
0 missing
Lutetiumnumeric3 unique values
0 missing
Hafniumnumeric3 unique values
0 missing
Tantalumnumeric5 unique values
0 missing
Tungstennumeric18 unique values
0 missing
Rheniumnumeric1 unique values
0 missing
Osmiumnumeric1 unique values
0 missing
Iridiumnumeric1 unique values
0 missing
Platinumnumeric1 unique values
0 missing
Goldnumeric1 unique values
0 missing
Mercurynumeric5 unique values
0 missing
Thalliumnumeric5 unique values
0 missing
Leadnumeric15 unique values
0 missing
Bismuthnumeric8 unique values
0 missing
Poloniumnumeric1 unique values
0 missing
Astatinenumeric1 unique values
0 missing
Radonnumeric1 unique values
0 missing
Franciumnumeric1 unique values
0 missing
Radiumnumeric1 unique values
0 missing
Actiniumnumeric1 unique values
0 missing
Thoriumnumeric3 unique values
0 missing
Protactiniumnumeric1 unique values
0 missing
Uraniumnumeric10 unique values
0 missing
Neptuniumnumeric1 unique values
0 missing
Plutoniumnumeric1 unique values
0 missing
Americiumnumeric1 unique values
0 missing
Curiumnumeric1 unique values
0 missing
Berkeliumnumeric1 unique values
0 missing
Californiumnumeric1 unique values
0 missing
Einsteiniumnumeric1 unique values
0 missing
Fermiumnumeric1 unique values
0 missing
Mendeleviumnumeric1 unique values
0 missing
Nobeliumnumeric1 unique values
0 missing
Lawrenciumnumeric1 unique values
0 missing
Rutherfordiumnumeric1 unique values
0 missing
Dubniumnumeric1 unique values
0 missing
Seaborgiumnumeric1 unique values
0 missing
Bohriumnumeric1 unique values
0 missing
Hassiumnumeric1 unique values
0 missing
Meitneriumnumeric1 unique values
0 missing
Darmstadtiumnumeric1 unique values
0 missing
Roentgeniumnumeric1 unique values
0 missing
Coperniciumnumeric1 unique values
0 missing
Nihoniumnumeric1 unique values
0 missing
Fleroviumnumeric1 unique values
0 missing
Moscoviumnumeric2 unique values
0 missing
Livermoriumnumeric1 unique values
0 missing
Tennessinenumeric1 unique values
0 missing
Oganessonnumeric1 unique values
0 missing
Cyanidenumeric1 unique values
0 missing
Nitratenumeric1 unique values
0 missing
Hydroxylnumeric19 unique values
0 missing
Acetatenumeric1 unique values
0 missing
Phosphatenumeric1 unique values
0 missing
Sulphatenumeric1 unique values
0 missing
Carbonatenumeric1 unique values
0 missing
Ammoniumnumeric3 unique values
0 missing
Hydrated_Waternumeric27 unique values
0 missing
countnumeric116 unique values
0 missing
Molar_Massnumeric2937 unique values
0 missing
Molar_Volumenumeric2903 unique values
0 missing
Calculated_Densitynumeric2509 unique values
0 missing

19 properties

3112
Number of instances (rows) of the dataset.
140
Number of attributes (columns) of the dataset.
Number of distinct values of the target attribute (if it is nominal).
0
Number of missing values in the dataset.
0
Number of instances with at least one value missing.
139
Number of numeric attributes.
0
Number of nominal attributes.
0.04
Number of attributes divided by the number of instances.
99.29
Percentage of numeric attributes.
Percentage of instances belonging to the most frequent class.
0
Percentage of nominal attributes.
Number of instances belonging to the most frequent class.
Percentage of instances belonging to the least frequent class.
Number of instances belonging to the least frequent class.
0
Number of binary attributes.
0
Percentage of binary attributes.
0
Percentage of instances having missing values.
Average class difference between consecutive instances.
0
Percentage of missing values.

0 tasks

Define a new task