Context
A collection of tweets (in dutch) and features, gathered in april 2022 using the Twitter API.
A small portion of the tweets are annotated by volunteer annotators.
The main task is to identify which of the tweets are rumours, based on the features and the labelled examples in the dataset.
Content
'followers_count' : Number of users following the account.
'tweet_count' : number of tweets by the account.
'question_marks' : presence of questions marks. 0 or 1.
'verified' : Whether the account is verified or not.
'accountlife' : How long the account has existed at the time of posting.
'followers_ratio' : ratio of number of users following / number of users followed by the account.
'exclamation_marks' : presence of exclamation marks. 0 or 1.
'capital letters' : ratio of capital to lowercase letters.
'retweet_count' : number of retweets on the tweet.
'hashtags' : presence of the hashtag symbol. one or zero.
'following' : number of users the account follows.
'text length' : length of the text.
'listed_count' : number of lists the account is in.
'emoticons' : Presence of emoticons, 0 or 1.
'like_count' : number of likes on the tweet.
'time_after_posting' : How long the account existed before posting the tweet.
'activity' : how active the account is.
"text" : tweet_id.
'hashtag' : Which twitter hashtag the tweet was from. One of three: #jinek, #vleestaks, or #inflatie.
'upsample_group' : a feature to allow one to sample each combination of hashtag and label in equal amounts.
'label' : 1 for Rumour, 0 for Non-Rumour, -1 for unannotated
Acknowledgements :
Dr. Peter van der Putten
Dr. Jan N. van Rijn