Context
This data was collected to be used with an academic project of mine. The project was about sentiment analysis of tweets during lockdown.
Content
I used the GetOldTweets3 (https://pypi.org/project/GetOldTweets3/) python3 library to pull the tweets off Twitter. The tweets range between 1 April 2020 to 1 August 2020, which was the peak lockdown period in India. Tweets with duplicate text and NaN values and that was the only cleaning I did on the data.
Total rows of tweets: 95488
Columns:
Index (be sure to use df = pandas.read_csv("tweets_lockdown.csv", index_col=0))
Text - The text of the tweet
Date - Date and time of tweet in datetime format
Retweets - Number of retweets for the tweet
Favorites - Favorites on the tweet
Mentions - Usernames mentioned in the tweets in format
HashTags - Hashtags present in the tweet in format
"Top Tweets" attribute was turned off while scraping.
Inspiration
Twitter data gives us a lot of scope for data cleaning, text preprocessing, association rule mining, sentiment analysis and so on.