

active ARFF CC0: Public Domain Visibility: public Uploaded 24-03-2022 by Dustin Carrion
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
Issue #Downvotes for this reason By

Loading wiki
Help us complete this description Edit
Context Welcome. This is a Womens Clothing E-Commerce dataset revolving around the reviews written by customers. Its nine supportive features offer a great environment to parse out the text through its multiple dimensions. Because this is real commercial data, it has been anonymized, and references to the company in the review text and body have been replaced with retailer. Content This dataset includes 23486 rows and 10 feature variables. Each row corresponds to a customer review, and includes the variables: Clothing ID: Integer Categorical variable that refers to the specific piece being reviewed. Age: Positive Integer variable of the reviewers age. Title: String variable for the title of the review. Review Text: String variable for the review body. Rating: Positive Ordinal Integer variable for the product score granted by the customer from 1 Worst, to 5 Best. Recommended IND: Binary variable stating where the customer recommends the product where 1 is recommended, 0 is not recommended. Positive Feedback Count: Positive Integer documenting the number of other customers who found this review positive. Division Name: Categorical name of the product high level division. Department Name: Categorical name of the product department name. Class Name: Categorical name of the product class name. Acknowledgements Anonymous but real source Inspiration I look forward to come quality NLP! There is also some great opportunities for feature engineering, and multivariate analysis. Publications Statistical Analysis on E-Commerce Reviews, with Sentiment Classification using Bidirectional Recurrent Neural Network by Abien Fred Agarap - Github

11 features

Unnamed:_0numeric23486 unique values
0 missing
Clothing_IDnumeric1206 unique values
0 missing
Agenumeric77 unique values
0 missing
Titlestring13992 unique values
3810 missing
Review_Textstring22634 unique values
845 missing
Ratingnumeric5 unique values
0 missing
Recommended_INDnumeric2 unique values
0 missing
Positive_Feedback_Countnumeric82 unique values
0 missing
Division_Namestring3 unique values
14 missing
Department_Namestring6 unique values
14 missing
Class_Namestring20 unique values
14 missing

19 properties

Number of instances (rows) of the dataset.
Number of attributes (columns) of the dataset.
Number of distinct values of the target attribute (if it is nominal).
Number of missing values in the dataset.
Number of instances with at least one value missing.
Number of numeric attributes.
Number of nominal attributes.
Number of attributes divided by the number of instances.
Percentage of numeric attributes.
Percentage of instances belonging to the most frequent class.
Percentage of nominal attributes.
Number of instances belonging to the most frequent class.
Percentage of instances belonging to the least frequent class.
Number of instances belonging to the least frequent class.
Number of binary attributes.
Percentage of binary attributes.
Percentage of instances having missing values.
Average class difference between consecutive instances.
Percentage of missing values.

0 tasks

Define a new task