OpenML

JavaScript is required to properly view the contents of this page!

Explore
- Data
- Task
- Flow
- Run
- Study
- Task type
- Measure
- People
Help
Blog
Contact
Please cite us

Quora_Insincere_Questions_2018

active ARFF CC0: Public Domain Visibility: public Uploaded 23-03-2022 by Dustin Carrion
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes

Issue	#Downvotes for this reason	By

Loading wiki

Help us complete this description Edit

Context It's the preprocessed train data from Quora Insincere Questions competition 2018 The original train data is preprocessed to remove stop words, numbers, punctuations, common words and converted to lower case. The resultant data set is lemmatised and stemmed with scikit-learn/NLTK library. Content It contains approximately 1.3 million rows of quora questions with target =0 for sincere questions and target=1 for insincere questions. Acknowledgements Thanks for Co-learning lounge mentors to help me to work on this problem Inspiration It's very handy to build the ML models in NLP.

4 features

Unnamed:_0	numeric	1306122 unique values 0 missing
qid	string	1306122 unique values 0 missing
question_text	string	1304660 unique values 1 missing
target	numeric	2 unique values 0 missing