OpenML
Multipurpose-World-News-Dataset

Multipurpose-World-News-Dataset

active ARFF GPL 2 Visibility: public Uploaded 23-03-2022 by Onur Yildirim
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
  • Computer Systems Machine Learning
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
Content This is a dataset I started building for my future personal projects, as I think this kind of data is quite hard to acquire for free and in short time. I started acquiring data on March 21st, 2020 and intend to keep doing that constantly. What you'll have inside this are news extracted from the following sources: Foxbusiness.com Youtube.com Cnet.com The Verge Nytimes.com Rawstory.com Investors.com Wreg.com Reuters Koin.com Inc.com CNBC, Nj.com Wmtw.com Nbcdfw.com Bloomberg Wowt.com Bbc.com For every 20-minute interval, a script checks for new headlines on these sources and add'em into a database. This CSV file is generated from that. I intend to update this dataset every day if I can (and if the machine I run this script is up).

4 features

id (ignore)numeric193279 unique values
0 missing
timestampstring164190 unique values
0 missing
sourcestring20 unique values
0 missing
titlestring193245 unique values
11 missing
descriptionstring120986 unique values
29943 missing

19 properties

193279
Number of instances (rows) of the dataset.
4
Number of attributes (columns) of the dataset.
Number of distinct values of the target attribute (if it is nominal).
29954
Number of missing values in the dataset.
29954
Number of instances with at least one value missing.
0
Number of numeric attributes.
0
Number of nominal attributes.
0
Number of attributes divided by the number of instances.
0
Percentage of numeric attributes.
Percentage of instances belonging to the most frequent class.
0
Percentage of nominal attributes.
Number of instances belonging to the most frequent class.
Percentage of instances belonging to the least frequent class.
Number of instances belonging to the least frequent class.
0
Number of binary attributes.
0
Percentage of binary attributes.
15.5
Percentage of instances having missing values.
Average class difference between consecutive instances.
3.87
Percentage of missing values.

0 tasks

Define a new task