OpenML
Articles-From-Buzzfeed-2020

Articles-From-Buzzfeed-2020

active ARFF CC0: Public Domain Visibility: public Uploaded 24-03-2022 by Elif Ceren Gok
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
  • Machine Learning Transportation
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
Context This dataset was created by our in house teams at PromptCloud(https://www.promptcloud.com/) and DataStock(https://datastock.shop/). We have about 5K samples in this dataset. You can download the full dataset here(https://app.datastock.shop/?site_name=Articles_From_BuzzFeed_2020). We have a 30 discount on all datasets in our data repository. Feel free to head over to DataStock(https://datastock.shop/) and avail the discount. Content This dataset contains the following: Total Records Count :: 14831 Domain Name: buzzfeed.com Date Range: 01st Jan 2020 - 30th Apr 2020 File Extension :: csv Available Fields: Uniq Id, Crawl Timestamp, Title Headline, Short Description Sub Headline, Content Body, Author, Date And Time Of Posting, Image Urls Acknowledgements We wouldn't be here without the help of our web scraping and data mining experts at PromptCloud and DataStock. Inspiration The inspiration for this dataset came from Buzzfeed itself. We thought long and hard about the informative articles that we have on Buzzfeed. So we came up with a dataset for it.

8 features

Uniq_Idstring741 unique values
0 missing
Crawl_Timestampstring741 unique values
0 missing
Title__Headlinestring687 unique values
48 missing
Short_Description__Sub_Headlinestring689 unique values
45 missing
Content__Bodystring741 unique values
0 missing
Authornumeric0 unique values
741 missing
Date_And_Time_Of_Postingstring728 unique values
0 missing
Image_Urlsnumeric0 unique values
741 missing

19 properties

741
Number of instances (rows) of the dataset.
8
Number of attributes (columns) of the dataset.
Number of distinct values of the target attribute (if it is nominal).
1575
Number of missing values in the dataset.
741
Number of instances with at least one value missing.
2
Number of numeric attributes.
0
Number of nominal attributes.
0.01
Number of attributes divided by the number of instances.
25
Percentage of numeric attributes.
Percentage of instances belonging to the most frequent class.
0
Percentage of nominal attributes.
Number of instances belonging to the most frequent class.
Percentage of instances belonging to the least frequent class.
Number of instances belonging to the least frequent class.
0
Number of binary attributes.
0
Percentage of binary attributes.
100
Percentage of instances having missing values.
Average class difference between consecutive instances.
26.57
Percentage of missing values.

0 tasks

Define a new task