Data
Winedata

Winedata

active ARFF CC0: Public Domain Visibility: public Uploaded 24-03-2022 by Dustin Carrion
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
  • Computer Systems Machine Learning
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
Context Thinking of Natural Language Processing as a beginner!! The dataset has been about the wine comments or reviews that has been given by various wine tasters. The concept was to use text classification to classify the commenters over the reviews. This is mainly used to demonstrate the techniques involved in language processing. The assumption is that every taster has some specific style of describing their object of interest. Content The columns contain the reviews, country and provinces of wines, their variety and winery they belong to along with commenters or tasters. Acknowledgements The kaggle dataset for wine data with comments, the courses regarding the NLP had been very much helpful for understanding and implementation of the concept.

14 features

Unnamed:_0numeric129971 unique values
0 missing
countrystring43 unique values
63 missing
descriptionstring119954 unique values
0 missing
designationstring37886 unique values
37467 missing
pointsnumeric21 unique values
0 missing
pricenumeric390 unique values
8996 missing
provincestring425 unique values
63 missing
region_1string1229 unique values
21247 missing
region_2string17 unique values
79460 missing
taster_namestring19 unique values
26244 missing
taster_twitter_handlestring15 unique values
31213 missing
titlestring118836 unique values
0 missing
varietystring707 unique values
1 missing
winerystring16750 unique values
0 missing

19 properties

129971
Number of instances (rows) of the dataset.
14
Number of attributes (columns) of the dataset.
Number of distinct values of the target attribute (if it is nominal).
204754
Number of missing values in the dataset.
107586
Number of instances with at least one value missing.
3
Number of numeric attributes.
0
Number of nominal attributes.
0
Number of attributes divided by the number of instances.
21.43
Percentage of numeric attributes.
Percentage of instances belonging to the most frequent class.
0
Percentage of nominal attributes.
Number of instances belonging to the most frequent class.
Percentage of instances belonging to the least frequent class.
Number of instances belonging to the least frequent class.
0
Number of binary attributes.
0
Percentage of binary attributes.
82.78
Percentage of instances having missing values.
Average class difference between consecutive instances.
11.25
Percentage of missing values.

0 tasks

Define a new task