Context
This works focuses upon creating a data set on Pandas Q/A over StackOverflow. Presently, there are more than 90k+ questions available on StackOverflow which have been asked under Pandas section. Many questions on SO have bad quality or are a duplicate of already answered questions. A new SO user can ask a question which can fall in any of these sections (low quality, duplicate, spam, etc). Similarly, a new SO user might not flag a question if a question doesn't abide with SO guidelines, due to lack of experience. Therefore, users who have spent long efforts on SO are the ones who provide quality answers, classify a question as a duplicate, can close them, downvote, etc.
We focus upon 40 such users who have earned Pandas gold tag on their profile which in simple term means that they have answered enough questions to at least evaluate an upcoming question quality and answer or not answer accordingly.
Content
To create this data set, I felt no need to perform any web scraping to extract SO data. SO provides an online API where one can simply run SQL query to get a downloadable CSV file. To learn how I did this, read here.
Acknowledgements
All thanks to Stack Overflow data API. All copyrights to Stack Overflow and its network sites licensed under CC BY-SA 3.0.
Task!
So what can be performed with given 87241 rows and 16 columns?
Since all the questions and answers given by 40 users have been extracted in data set, suggest what it takes for an answer to be accepted when one of the associated tag is "pandas"?
Reputation calculation -
Refer here