OpenML

JavaScript is required to properly view the contents of this page!

Explore
- Data
- Task
- Flow
- Run
- Study
- Task type
- Measure
- People
Help
Blog
Contact
Please cite us

Pandas-QA-on-Stack-Overflow

active ARFF CC BY-SA 3.0 Visibility: public Uploaded 24-03-2022 by Dustin Carrion
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes

Issue	#Downvotes for this reason	By

Loading wiki

Help us complete this description Edit

Context This works focuses upon creating a data set on Pandas Q/A over StackOverflow. Presently, there are more than 90k+ questions available on StackOverflow which have been asked under Pandas section. Many questions on SO have bad quality or are a duplicate of already answered questions. A new SO user can ask a question which can fall in any of these sections (low quality, duplicate, spam, etc). Similarly, a new SO user might not flag a question if a question doesn't abide with SO guidelines, due to lack of experience. Therefore, users who have spent long efforts on SO are the ones who provide quality answers, classify a question as a duplicate, can close them, downvote, etc. We focus upon 40 such users who have earned Pandas gold tag on their profile which in simple term means that they have answered enough questions to at least evaluate an upcoming question quality and answer or not answer accordingly. Content To create this data set, I felt no need to perform any web scraping to extract SO data. SO provides an online API where one can simply run SQL query to get a downloadable CSV file. To learn how I did this, read here. Acknowledgements All thanks to Stack Overflow data API. All copyrights to Stack Overflow and its network sites licensed under CC BY-SA 3.0. Task! So what can be performed with given 87241 rows and 16 columns? Since all the questions and answers given by 40 users have been extracted in data set, suggest what it takes for an answer to be accepted when one of the associated tag is "pandas"? Reputation calculation - Refer here

16 features

Unnamed:_0	numeric	87241 unique values 0 missing
Post_Link	numeric	87241 unique values 0 missing
Type	string	4 unique values 0 missing
Title	string	76980 unique values 42 missing
Markdown	string	87233 unique values 3 missing
Tags	string	25746 unique values 4 missing
Created	string	87125 unique values 0 missing
Last_Edit	string	41349 unique values 45106 missing
Edited_By	string	1999 unique values 45224 missing
Score	numeric	259 unique values 0 missing
Favorites	numeric	31 unique values 86810 missing
Views	numeric	8885 unique values 46 missing
Answers	numeric	11 unique values 86193 missing
Accepted	string	1 unique values 35169 missing
CW	string	1 unique values 87074 missing
Closed	string	2 unique values 87193 missing