OpenML
covid-19_sentiments-India200320---310520

covid-19_sentiments-India200320---310520

active ARFF Database: Open Database, Contents: Database Contents Visibility: public Uploaded 24-03-2022 by Dustin Carrion
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
About our Dataset The journey of the collection of this Covid-19 India dataset begin with a competition where we have to do sentiment analysis of tweets. The data was collected from https://ieee-dataport.org/open-access/coronavirus-covid-19-tweets-dataset . This site gave us the tweet Id of relevant tweets and to extract the tweets text and other information, we used Hydrator app. About features of dataset There are total 5 columns. Column 1: 'Text ID' It contains unique ID for each tweet. Column 2: 'Text' It is the tweet text of that particular tweet ID. Column 3: 'Date' The date on which the tweet was tweeted. Column 4: 'Location' The place from where the tweet was tweeted. Column 5: 'Sentiments' The sentiment value of that tweet, whether it is positive, negative or neutral. If sentiment score is greater then 0 then sentiment is positive. If sentiment score is equal to 0 then sentiment is neutral. If sentiment score is less then 0 then sentiment is negative. Acknowledgements We wouldn't be here without the help of others. We would like to acknowledge https://ieee-dataport.org/open-access/coronavirus-covid-19-tweets-dataset for providing the tweet Id's. We would also like to acknowledge Hydrator app for fectching tweets. Inspiration Actually we got the inspiration from the competition where we were given the task to categorize the sentiment values of COVID - 19 India tweets.

4 features

Text_Id (ignore)numeric324045 unique values
0 missing
Textstring161648 unique values
0 missing
Datestring172416 unique values
0 missing
Locationstring8489 unique values
0 missing
Sentimentsnumeric7508 unique values
10980 missing

19 properties

648958
Number of instances (rows) of the dataset.
4
Number of attributes (columns) of the dataset.
Number of distinct values of the target attribute (if it is nominal).
10980
Number of missing values in the dataset.
10980
Number of instances with at least one value missing.
1
Number of numeric attributes.
0
Number of nominal attributes.
0
Number of attributes divided by the number of instances.
25
Percentage of numeric attributes.
Percentage of instances belonging to the most frequent class.
0
Percentage of nominal attributes.
Number of instances belonging to the most frequent class.
Percentage of instances belonging to the least frequent class.
Number of instances belonging to the least frequent class.
0
Number of binary attributes.
0
Percentage of binary attributes.
1.69
Percentage of instances having missing values.
Average class difference between consecutive instances.
0.42
Percentage of missing values.

0 tasks

Define a new task