OpenML
DutchTwitterDataset

DutchTwitterDataset

active ARFF Public Domain (CC0) Visibility: public Uploaded 12-04-2023 by Nicky van der Linden
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
Context A collection of tweets (in dutch) and features, gathered in april 2022 using the Twitter API. A small portion of the tweets are annotated by volunteer annotators. The main task is to identify which of the tweets are rumours, based on the features and the labelled examples in the dataset. Content 'followers_count' : Number of users following the account. 'tweet_count' : number of tweets by the account. 'question_marks' : presence of questions marks. 0 or 1. 'verified' : Whether the account is verified or not. 'accountlife' : How long the account has existed at the time of posting. 'followers_ratio' : ratio of number of users following / number of users followed by the account. 'exclamation_marks' : presence of exclamation marks. 0 or 1. 'capital letters' : ratio of capital to lowercase letters. 'retweet_count' : number of retweets on the tweet. 'hashtags' : presence of the hashtag symbol. one or zero. 'following' : number of users the account follows. 'text length' : length of the text. 'listed_count' : number of lists the account is in. 'emoticons' : Presence of emoticons, 0 or 1. 'like_count' : number of likes on the tweet. 'time_after_posting' : How long the account existed before posting the tweet. 'activity' : how active the account is. "text" : tweet_id. 'hashtag' : Which twitter hashtag the tweet was from. One of three: #jinek, #vleestaks, or #inflatie. 'upsample_group' : a feature to allow one to sample each combination of hashtag and label in equal amounts. 'label' : 1 for Rumour, 0 for Non-Rumour, -1 for unannotated Acknowledgements : Dr. Peter van der Putten Dr. Jan N. van Rijn

20 features

label (target)numeric3 unique values
0 missing
followers_countnumeric982 unique values
0 missing
tweet_countnumeric1360 unique values
0 missing
question_marksnumeric2 unique values
0 missing
verifiednumeric2 unique values
0 missing
accountlifenumeric210903 unique values
0 missing
followers_rationumeric1341 unique values
0 missing
exclamation_marksnumeric2 unique values
0 missing
capital lettersnumeric315 unique values
0 missing
retweet_countnumeric283 unique values
0 missing
hashtagsnumeric2 unique values
0 missing
followingnumeric1000 unique values
0 missing
text lengthnumeric389 unique values
0 missing
listed_countnumeric182 unique values
0 missing
emoticonsnumeric2 unique values
0 missing
like_countnumeric622 unique values
0 missing
time_after_postingnumeric23608 unique values
0 missing
activitynumeric212914 unique values
0 missing
textnumeric225372 unique values
0 missing
hashtagstring3 unique values
0 missing
upsample_group (ignore)string9 unique values
0 missing

19 properties

451200
Number of instances (rows) of the dataset.
20
Number of attributes (columns) of the dataset.
0
Number of distinct values of the target attribute (if it is nominal).
0
Number of missing values in the dataset.
0
Number of instances with at least one value missing.
19
Number of numeric attributes.
0
Number of nominal attributes.
Number of instances belonging to the most frequent class.
Percentage of instances belonging to the least frequent class.
Number of instances belonging to the least frequent class.
0
Number of binary attributes.
0
Percentage of binary attributes.
0
Percentage of instances having missing values.
0.98
Average class difference between consecutive instances.
0
Percentage of missing values.
0
Number of attributes divided by the number of instances.
95
Percentage of numeric attributes.
Percentage of instances belonging to the most frequent class.
0
Percentage of nominal attributes.

0 tasks

Define a new task