OpenML

JavaScript is required to properly view the contents of this page!

Explore
- Data
- Task
- Flow
- Run
- Study
- Task type
- Measure
- People
Help
Blog
Contact
Please cite us

Tweets-with-keyword-lockdown-in-April-July-2020

active ARFF CC0: Public Domain Visibility: public Uploaded 24-03-2022 by Elif Ceren Gok
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes

Issue	#Downvotes for this reason	By

Loading wiki

Help us complete this description Edit

Context This data was collected to be used with an academic project of mine. The project was about sentiment analysis of tweets during lockdown. Content I used the GetOldTweets3 (https://pypi.org/project/GetOldTweets3/) python3 library to pull the tweets off Twitter. The tweets range between 1 April 2020 to 1 August 2020, which was the peak lockdown period in India. Tweets with duplicate text and NaN values and that was the only cleaning I did on the data. Total rows of tweets: 95488 Columns: Index (be sure to use df = pandas.read_csv("tweets_lockdown.csv", index_col=0)) Text - The text of the tweet Date - Date and time of tweet in datetime format Retweets - Number of retweets for the tweet Favorites - Favorites on the tweet Mentions - Usernames mentioned in the tweets in format HashTags - Hashtags present in the tweet in format "Top Tweets" attribute was turned off while scraping. Inspiration Twitter data gives us a lot of scope for data cleaning, text preprocessing, association rule mining, sentiment analysis and so on.

7 features

Unnamed:_0	numeric	95488 unique values 0 missing
Text	string	95344 unique values 19 missing
Date	string	58281 unique values 0 missing
Retweets	numeric	321 unique values 0 missing
Favorites	numeric	607 unique values 0 missing
Mentions	string	8729 unique values 82588 missing
HashTags	string	12795 unique values 77637 missing