Data
Dutch-News-Articles

Dutch-News-Articles

active ARFF CC BY-NC-SA 4.0 Visibility: public Uploaded 23-03-2022 by Elif Ceren Gok
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
Dutch News Articles This dataset contains all the articles published by the NOS as of the 1st of January 2010. The data is obtained by scraping the NOS website. The NOS is one of the biggest (online) news organizations in the Netherlands. Features: datetime: date and time of publication of the article. title: the title of the news article. content: the content of the news article. category: the category under which the NOS filed the article. url: link to the original article. About the data The title and content of features somewhat clean. Meaning extra whites spaces and newlines are removed. Furthermore, these features are normalized (NFKD). The NOS also publishes liveblogs. The posts in this live blog are not part of this dataset. Example I used this dataset in a recent blog post.

5 features

datetimestring233814 unique values
0 missing
titlestring235655 unique values
0 missing
contentstring237528 unique values
0 missing
categorystring9 unique values
0 missing
urlstring237784 unique values
0 missing

19 properties

237861
Number of instances (rows) of the dataset.
5
Number of attributes (columns) of the dataset.
Number of distinct values of the target attribute (if it is nominal).
0
Number of missing values in the dataset.
0
Number of instances with at least one value missing.
0
Number of numeric attributes.
0
Number of nominal attributes.
0
Number of attributes divided by the number of instances.
0
Percentage of numeric attributes.
Percentage of instances belonging to the most frequent class.
0
Percentage of nominal attributes.
Number of instances belonging to the most frequent class.
Percentage of instances belonging to the least frequent class.
Number of instances belonging to the least frequent class.
0
Number of binary attributes.
0
Percentage of binary attributes.
0
Percentage of instances having missing values.
Average class difference between consecutive instances.
0
Percentage of missing values.

0 tasks

Define a new task