Data
Tweets-with-keyword-lockdown-in-April-July-2020

Tweets-with-keyword-lockdown-in-April-July-2020

active ARFF CC0: Public Domain Visibility: public Uploaded 24-03-2022 by Elif Ceren Gok
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
Context This data was collected to be used with an academic project of mine. The project was about sentiment analysis of tweets during lockdown. Content I used the GetOldTweets3 (https://pypi.org/project/GetOldTweets3/) python3 library to pull the tweets off Twitter. The tweets range between 1 April 2020 to 1 August 2020, which was the peak lockdown period in India. Tweets with duplicate text and NaN values and that was the only cleaning I did on the data. Total rows of tweets: 95488 Columns: Index (be sure to use df = pandas.read_csv("tweets_lockdown.csv", index_col=0)) Text - The text of the tweet Date - Date and time of tweet in datetime format Retweets - Number of retweets for the tweet Favorites - Favorites on the tweet Mentions - Usernames mentioned in the tweets in format HashTags - Hashtags present in the tweet in format "Top Tweets" attribute was turned off while scraping. Inspiration Twitter data gives us a lot of scope for data cleaning, text preprocessing, association rule mining, sentiment analysis and so on.

7 features

Unnamed:_0numeric95488 unique values
0 missing
Textstring95344 unique values
19 missing
Datestring58281 unique values
0 missing
Retweetsnumeric321 unique values
0 missing
Favoritesnumeric607 unique values
0 missing
Mentionsstring8729 unique values
82588 missing
HashTagsstring12795 unique values
77637 missing

19 properties

95488
Number of instances (rows) of the dataset.
7
Number of attributes (columns) of the dataset.
Number of distinct values of the target attribute (if it is nominal).
160244
Number of missing values in the dataset.
90899
Number of instances with at least one value missing.
3
Number of numeric attributes.
0
Number of nominal attributes.
0
Percentage of binary attributes.
95.19
Percentage of instances having missing values.
23.97
Percentage of missing values.
Average class difference between consecutive instances.
42.86
Percentage of numeric attributes.
0
Number of attributes divided by the number of instances.
0
Percentage of nominal attributes.
Percentage of instances belonging to the most frequent class.
Number of instances belonging to the most frequent class.
Percentage of instances belonging to the least frequent class.
Number of instances belonging to the least frequent class.
0
Number of binary attributes.

0 tasks

Define a new task