OpenML

JavaScript is required to properly view the contents of this page!

DutchTwitterDataset

active ARFF Public Domain (CC0) Visibility: public Uploaded 12-04-2023 by Nicky van der Linden
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes

Issue	#Downvotes for this reason	By

Loading wiki

Help us complete this description Edit

Context A collection of tweets (in dutch) and features, gathered in april 2022 using the Twitter API. A small portion of the tweets are annotated by volunteer annotators. The main task is to identify which of the tweets are rumours, based on the features and the labelled examples in the dataset. Content 'followers_count' : Number of users following the account. 'tweet_count' : number of tweets by the account. 'question_marks' : presence of questions marks. 0 or 1. 'verified' : Whether the account is verified or not. 'accountlife' : How long the account has existed at the time of posting. 'followers_ratio' : ratio of number of users following / number of users followed by the account. 'exclamation_marks' : presence of exclamation marks. 0 or 1. 'capital letters' : ratio of capital to lowercase letters. 'retweet_count' : number of retweets on the tweet. 'hashtags' : presence of the hashtag symbol. one or zero. 'following' : number of users the account follows. 'text length' : length of the text. 'listed_count' : number of lists the account is in. 'emoticons' : Presence of emoticons, 0 or 1. 'like_count' : number of likes on the tweet. 'time_after_posting' : How long the account existed before posting the tweet. 'activity' : how active the account is. "text" : tweet_id. 'hashtag' : Which twitter hashtag the tweet was from. One of three: #jinek, #vleestaks, or #inflatie. 'upsample_group' : a feature to allow one to sample each combination of hashtag and label in equal amounts. 'label' : 1 for Rumour, 0 for Non-Rumour, -1 for unannotated Acknowledgements : Dr. Peter van der Putten Dr. Jan N. van Rijn

20 features

label (target)	numeric	3 unique values 0 missing
followers_count	numeric	982 unique values 0 missing
tweet_count	numeric	1360 unique values 0 missing
question_marks	numeric	2 unique values 0 missing
verified	numeric	2 unique values 0 missing
accountlife	numeric	210903 unique values 0 missing
followers_ratio	numeric	1341 unique values 0 missing
exclamation_marks	numeric	2 unique values 0 missing
capital letters	numeric	315 unique values 0 missing
retweet_count	numeric	283 unique values 0 missing
hashtags	numeric	2 unique values 0 missing
following	numeric	1000 unique values 0 missing
text length	numeric	389 unique values 0 missing
listed_count	numeric	182 unique values 0 missing
emoticons	numeric	2 unique values 0 missing
like_count	numeric	622 unique values 0 missing
time_after_posting	numeric	23608 unique values 0 missing
activity	numeric	212914 unique values 0 missing
text	numeric	225372 unique values 0 missing
hashtag	string	3 unique values 0 missing
upsample_group (ignore)	string	9 unique values 0 missing