Data
Cosmetics-datasets

Cosmetics-datasets

active ARFF GPL 2 Visibility: public Uploaded 23-03-2022 by Onur Yildirim
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
  • Computer Systems Machine Learning
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
Context Whenever I want to try a new cosmetic item, it's so difficult to choose. It's actually more than difficult. It's sometimes scary because new items that I've never tried end up giving me skin trouble. We know the information we need is on the back of each product, but it's really hard to interpret those ingredient lists unless you're a chemist. You may be able to relate to this situation. Content we are going to create a content-based recommendation system where the 'content' will be the chemical components of cosmetics. Specifically, we will process ingredient lists for 1472 cosmetics on Sephora via word embedding, then visualize ingredient similarity using a machine learning method called t-SNE and an interactive visualization library called Bokeh. Let's inspect our data first. Acknowledgements DataCamp

11 features

Labelstring6 unique values
0 missing
Brandstring116 unique values
0 missing
Namestring1472 unique values
0 missing
Pricenumeric146 unique values
0 missing
Ranknumeric29 unique values
0 missing
Ingredientsstring1334 unique values
0 missing
Combinationnumeric2 unique values
0 missing
Drynumeric2 unique values
0 missing
Normalnumeric2 unique values
0 missing
Oilynumeric2 unique values
0 missing
Sensitivenumeric2 unique values
0 missing

19 properties

1472
Number of instances (rows) of the dataset.
11
Number of attributes (columns) of the dataset.
Number of distinct values of the target attribute (if it is nominal).
0
Number of missing values in the dataset.
0
Number of instances with at least one value missing.
7
Number of numeric attributes.
0
Number of nominal attributes.
Percentage of instances belonging to the most frequent class.
0
Percentage of nominal attributes.
Number of instances belonging to the most frequent class.
Percentage of instances belonging to the least frequent class.
Number of instances belonging to the least frequent class.
0
Number of binary attributes.
0
Percentage of binary attributes.
0
Percentage of instances having missing values.
Average class difference between consecutive instances.
0
Percentage of missing values.
0.01
Number of attributes divided by the number of instances.
63.64
Percentage of numeric attributes.

0 tasks

Define a new task