Data
Marginal-Revolution-Blog-Post-Data

Marginal-Revolution-Blog-Post-Data

active ARFF CC0: Public Domain Visibility: public Uploaded 23-03-2022 by Dustin Carrion
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
  • Earth Science Machine Learning
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
The following dataset contains data on blog posts from MarginalRevolution.com. For posts from Jan. 1, 2010 to 9/17/2016, the following attributes are gathered. Author Name Post Title Post Date Post content (words) Number of Words in post Number of Comments in post Dummy variable for several commonly used categories The data was scraped using Python's Beautiful Soup package, and cleaned in R. See my github page (https://github.com/wnowak10/) for the Python and R code.

22 features

authorstring2 unique values
0 missing
comment.countnumeric306 unique values
24 missing
datestring2448 unique values
0 missing
textstring12772 unique values
22 missing
titlestring9743 unique values
2 missing
wordcountnumeric802 unique values
33 missing
timestring1338 unique values
0 missing
CurrentAffairsnumeric2 unique values
0 missing
Educationnumeric2 unique values
0 missing
Musicnumeric2 unique values
0 missing
Philosophynumeric2 unique values
0 missing
PoliticalSciencenumeric2 unique values
0 missing
Sciencenumeric2 unique values
0 missing
Historynumeric2 unique values
0 missing
Lawnumeric2 unique values
0 missing
Gamesnumeric2 unique values
0 missing
Booksnumeric2 unique values
0 missing
FoodandDrinknumeric2 unique values
0 missing
DataSourcenumeric2 unique values
0 missing
WebTechnumeric2 unique values
0 missing
Economicsnumeric2 unique values
0 missing
Medicinenumeric2 unique values
0 missing

19 properties

12820
Number of instances (rows) of the dataset.
22
Number of attributes (columns) of the dataset.
Number of distinct values of the target attribute (if it is nominal).
81
Number of missing values in the dataset.
57
Number of instances with at least one value missing.
17
Number of numeric attributes.
0
Number of nominal attributes.
0
Number of attributes divided by the number of instances.
77.27
Percentage of numeric attributes.
Percentage of instances belonging to the most frequent class.
0
Percentage of nominal attributes.
Number of instances belonging to the most frequent class.
Percentage of instances belonging to the least frequent class.
Number of instances belonging to the least frequent class.
0
Number of binary attributes.
0
Percentage of binary attributes.
0.44
Percentage of instances having missing values.
Average class difference between consecutive instances.
0.03
Percentage of missing values.

0 tasks

Define a new task