Data
Harry-Potter-fanfiction-data

Harry-Potter-fanfiction-data

active ARFF CC0: Public Domain Visibility: public Uploaded 23-03-2022 by Onur Yildirim
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
Context Huge Harry Potter fan. Wanted to collect fan-fiction data to make a dashboard and visualize it. Its in the works. Content I scraped this data from https://www.fanfiction.net/book/Harry-Potter/ using requests and beautiful soup. The data is completely structured. The scraping code can be found at https://github.com/nt03/HarryPotter_fanfics/tree/master/ffnet It contains all HP Fanfic entries written between 2001-2019 in all available languages. The data doesn't contain the story itself but just the story blurb. Acknowledgements The code is entirely mine. The thumbnail and banner are attributed to [Photo by Christian Wagner on Unsplash] Inspiration You can answer questions like 'which is the most popular pairing', which language has the most ffs written in it, what has been the general trend like since the last movie or book came out.

16 features

Chaptersnumeric228 unique values
0 missing
Favsstring3483 unique values
72163 missing
Followsstring3139 unique values
143204 missing
Publishedstring5502 unique values
0 missing
Reviewsnumeric2458 unique values
52189 missing
Updated (ignore)string5535 unique values
410228 missing
Wordsstring68086 unique values
0 missing
authorstring156280 unique values
0 missing
charactersstring36517 unique values
63040 missing
genrestring403 unique values
59927 missing
languagestring43 unique values
0 missing
ratingstring4 unique values
0 missing
story_linkstring648090 unique values
0 missing
synopsisstring647085 unique values
37 missing
titlestring474188 unique values
203 missing
published_mmyystring181 unique values
0 missing
pairingstring9991 unique values
256823 missing

19 properties

648493
Number of instances (rows) of the dataset.
16
Number of attributes (columns) of the dataset.
Number of distinct values of the target attribute (if it is nominal).
647586
Number of missing values in the dataset.
374277
Number of instances with at least one value missing.
2
Number of numeric attributes.
0
Number of nominal attributes.
0
Number of attributes divided by the number of instances.
12.5
Percentage of numeric attributes.
Percentage of instances belonging to the most frequent class.
0
Percentage of nominal attributes.
Number of instances belonging to the most frequent class.
Percentage of instances belonging to the least frequent class.
Number of instances belonging to the least frequent class.
0
Number of binary attributes.
0
Percentage of binary attributes.
57.71
Percentage of instances having missing values.
Average class difference between consecutive instances.
6.24
Percentage of missing values.

0 tasks

Define a new task