OpenML

JavaScript is required to properly view the contents of this page!

Explore
- Data
- Task
- Flow
- Run
- Study
- Task type
- Measure
- People
Help
Blog
Contact
Please cite us

Source-based-Fake-News-Classification

active ARFF CC0: Public Domain Visibility: public Uploaded 23-03-2022 by Onur Yildirim
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes

Issue	#Downvotes for this reason	By

Loading wiki

Help us complete this description Edit

Context Social media is a vast pool of content, and among all the content available for users to access, news is an element that is accessed most frequently. These news can be posted by politicians, news channels, newspaper websites, or even common civilians. These posts have to be checked for their authenticity, since spreading misinformation has been a real concern in todays times, and many firms are taking steps to make the common people aware of the consequences of spread misinformation. The measure of authenticity of the news posted online cannot be definitively measured, since the manual classification of news is tedious and time-consuming, and is also subject to bias. Published paper: http://www.ijirset.com/upload/2020/june/115_4_Source.PDF Content Data preprocessing has been done on the dataset Getting Real about Fake News and skew has been eliminated. Inspiration In an era where fake WhatsApp forwards and Tweets are capable of influencing naive minds, tools and knowledge have to be put to practical use in not only mitigating the spread of misinformation but also to inform people about the type of news they consume. Development of practical applications for users to gain insight from the articles they consume, fact-checking websites, built-in plugins and article parsers can further be refined, made easier to access, and more importantly, should create more awareness. Acknowledgements Getting Real about Fake News seemed the most promising for preprocessing, feature extraction, and model classification. The reason is due to the fact that all the other datasets lacked the sources from where the article/statement text was produced and published from. Citing the sources for article text is crucial to check the trustworthiness of the news and further helps in labelling the data as fake or untrustworthy. Thanks to the datasets comprehensiveness in terms of citing the source information of the text along with author names, date of publication and labels.

12 features

author	string	491 unique values 0 missing
published	string	2006 unique values 0 missing
title	string	1784 unique values 0 missing
text	string	1941 unique values 46 missing
language	string	5 unique values 1 missing
site_url	string	68 unique values 1 missing
main_img_url	string	1229 unique values 1 missing
type	string	8 unique values 1 missing
label	string	2 unique values 1 missing
title_without_stopwords	string	1780 unique values 2 missing
text_without_stopwords	string	1937 unique values 50 missing
hasImage	numeric	2 unique values 1 missing