Data
The-Tweets-of-Wisdom

The-Tweets-of-Wisdom

active ARFF CC0: Public Domain Visibility: public Uploaded 23-03-2022 by Onur Yildirim
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
  • Computer Systems Machine Learning
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
Context In the last few years, Twitter became one of the most popular social media platforms. From celebrity status to government policies, Twitter can accommodate a diverse range of people and thoughts. In these diverse set of thoughts, there are many Twitter accounts who tweet "self-help" thoughts often. These so-called "wise" thoughts are often related to improving one's life and how to excel at what you're doing. So I went down to the rabbit-hole to search these sorts of tweets. I find many common themes between them. Therefore, I decided to scrap the tweets so that you can explore the words of these "self-help" tweets and understand them much better. Content I scraped the data using Tweepy API. I have scraped all the tweets, retweets and retweets with a comment of 40 authors. The data contains more than 40 authors because every retweet from any of the 40 authors is stored as a tweet from the original author. Also, every retweet with a comment contains and tags. The author's comment is followed by tag and then the content of the retweet comes which is followed by . The script I used for scrapping can be found here. Acknowledgements I would like to thanks Stack Overflow which helped me at literally every stage of this project from scrapping to data analysis. Also kudos to the Tweepy API which made it far more easier to fetch tweets. Inspiration I downloaded this dataset for many reasons. The most important one is that I want to know how similar these tweets are. Also, I like to know what makes some tweets viral and what factors affect a viral tweet. I explore these and many more questions in my kernel which you can find in the kernel section. Contact Me

5 features

author_namestring2733 unique values
92 missing
created_atstring30997 unique values
0 missing
handle (ignore)string2804 unique values
0 missing
likesnumeric3554 unique values
0 missing
retweetsnumeric2216 unique values
0 missing
tweet_contentstring29488 unique values
364 missing

19 properties

31115
Number of instances (rows) of the dataset.
5
Number of attributes (columns) of the dataset.
Number of distinct values of the target attribute (if it is nominal).
456
Number of missing values in the dataset.
456
Number of instances with at least one value missing.
2
Number of numeric attributes.
0
Number of nominal attributes.
0
Number of attributes divided by the number of instances.
40
Percentage of numeric attributes.
Percentage of instances belonging to the most frequent class.
0
Percentage of nominal attributes.
Number of instances belonging to the most frequent class.
Percentage of instances belonging to the least frequent class.
Number of instances belonging to the least frequent class.
0
Number of binary attributes.
0
Percentage of binary attributes.
1.47
Percentage of instances having missing values.
Average class difference between consecutive instances.
0.29
Percentage of missing values.

0 tasks

Define a new task