Data
Pandas-QA-on-Stack-Overflow

Pandas-QA-on-Stack-Overflow

active ARFF CC BY-SA 3.0 Visibility: public Uploaded 24-03-2022 by Dustin Carrion
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
Context This works focuses upon creating a data set on Pandas Q/A over StackOverflow. Presently, there are more than 90k+ questions available on StackOverflow which have been asked under Pandas section. Many questions on SO have bad quality or are a duplicate of already answered questions. A new SO user can ask a question which can fall in any of these sections (low quality, duplicate, spam, etc). Similarly, a new SO user might not flag a question if a question doesn't abide with SO guidelines, due to lack of experience. Therefore, users who have spent long efforts on SO are the ones who provide quality answers, classify a question as a duplicate, can close them, downvote, etc. We focus upon 40 such users who have earned Pandas gold tag on their profile which in simple term means that they have answered enough questions to at least evaluate an upcoming question quality and answer or not answer accordingly. Content To create this data set, I felt no need to perform any web scraping to extract SO data. SO provides an online API where one can simply run SQL query to get a downloadable CSV file. To learn how I did this, read here. Acknowledgements All thanks to Stack Overflow data API. All copyrights to Stack Overflow and its network sites licensed under CC BY-SA 3.0. Task! So what can be performed with given 87241 rows and 16 columns? Since all the questions and answers given by 40 users have been extracted in data set, suggest what it takes for an answer to be accepted when one of the associated tag is "pandas"? Reputation calculation - Refer here

16 features

Unnamed:_0numeric87241 unique values
0 missing
Post_Linknumeric87241 unique values
0 missing
Typestring4 unique values
0 missing
Titlestring76980 unique values
42 missing
Markdownstring87233 unique values
3 missing
Tagsstring25746 unique values
4 missing
Createdstring87125 unique values
0 missing
Last_Editstring41349 unique values
45106 missing
Edited_Bystring1999 unique values
45224 missing
Scorenumeric259 unique values
0 missing
Favoritesnumeric31 unique values
86810 missing
Viewsnumeric8885 unique values
46 missing
Answersnumeric11 unique values
86193 missing
Acceptedstring1 unique values
35169 missing
CWstring1 unique values
87074 missing
Closedstring2 unique values
87193 missing

19 properties

87241
Number of instances (rows) of the dataset.
16
Number of attributes (columns) of the dataset.
Number of distinct values of the target attribute (if it is nominal).
472864
Number of missing values in the dataset.
87241
Number of instances with at least one value missing.
6
Number of numeric attributes.
0
Number of nominal attributes.
0
Percentage of binary attributes.
100
Percentage of instances having missing values.
33.88
Percentage of missing values.
Average class difference between consecutive instances.
37.5
Percentage of numeric attributes.
0
Number of attributes divided by the number of instances.
0
Percentage of nominal attributes.
Percentage of instances belonging to the most frequent class.
Number of instances belonging to the most frequent class.
Percentage of instances belonging to the least frequent class.
Number of instances belonging to the least frequent class.
0
Number of binary attributes.

0 tasks

Define a new task