Data
Source-based-Fake-News-Classification

Source-based-Fake-News-Classification

active ARFF CC0: Public Domain Visibility: public Uploaded 23-03-2022 by Onur Yildirim
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
  • Computer Systems Machine Learning
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
Context Social media is a vast pool of content, and among all the content available for users to access, news is an element that is accessed most frequently. These news can be posted by politicians, news channels, newspaper websites, or even common civilians. These posts have to be checked for their authenticity, since spreading misinformation has been a real concern in todays times, and many firms are taking steps to make the common people aware of the consequences of spread misinformation. The measure of authenticity of the news posted online cannot be definitively measured, since the manual classification of news is tedious and time-consuming, and is also subject to bias. Published paper: http://www.ijirset.com/upload/2020/june/115_4_Source.PDF Content Data preprocessing has been done on the dataset Getting Real about Fake News and skew has been eliminated. Inspiration In an era where fake WhatsApp forwards and Tweets are capable of influencing naive minds, tools and knowledge have to be put to practical use in not only mitigating the spread of misinformation but also to inform people about the type of news they consume. Development of practical applications for users to gain insight from the articles they consume, fact-checking websites, built-in plugins and article parsers can further be refined, made easier to access, and more importantly, should create more awareness. Acknowledgements Getting Real about Fake News seemed the most promising for preprocessing, feature extraction, and model classification. The reason is due to the fact that all the other datasets lacked the sources from where the article/statement text was produced and published from. Citing the sources for article text is crucial to check the trustworthiness of the news and further helps in labelling the data as fake or untrustworthy. Thanks to the datasets comprehensiveness in terms of citing the source information of the text along with author names, date of publication and labels.

12 features

authorstring491 unique values
0 missing
publishedstring2006 unique values
0 missing
titlestring1784 unique values
0 missing
textstring1941 unique values
46 missing
languagestring5 unique values
1 missing
site_urlstring68 unique values
1 missing
main_img_urlstring1229 unique values
1 missing
typestring8 unique values
1 missing
labelstring2 unique values
1 missing
title_without_stopwordsstring1780 unique values
2 missing
text_without_stopwordsstring1937 unique values
50 missing
hasImagenumeric2 unique values
1 missing

19 properties

2096
Number of instances (rows) of the dataset.
12
Number of attributes (columns) of the dataset.
Number of distinct values of the target attribute (if it is nominal).
104
Number of missing values in the dataset.
51
Number of instances with at least one value missing.
1
Number of numeric attributes.
0
Number of nominal attributes.
0.01
Number of attributes divided by the number of instances.
8.33
Percentage of numeric attributes.
Percentage of instances belonging to the most frequent class.
0
Percentage of nominal attributes.
Number of instances belonging to the most frequent class.
Percentage of instances belonging to the least frequent class.
Number of instances belonging to the least frequent class.
0
Number of binary attributes.
0
Percentage of binary attributes.
2.43
Percentage of instances having missing values.
Average class difference between consecutive instances.
0.41
Percentage of missing values.

0 tasks

Define a new task