Context
I wanted a highly imbalanced dataset to share with others. It has the perfect one for us.
Imbalanced data typically refers to a classification problem where the number of observations per class is not equally distributed; often you'll have a large amount of data/observations for one class (referred to as the majority class), and much fewer observations for one or more other classes (referred to as the minority classes).
For example, In this dataset, There are way more samples of fully paid borrowers versus not fully paid borrowers.
Full LendingClub data available from their site.
Content
For companies like Lending Club correctly predicting whether or not a loan will be default is very important. This dataset contains historical data from 2007 to 2015, you can to build a deep learning model to predict the chance of default for future loans. As you will see this dataset is highly imbalanced and includes a lot of features that make this problem more challenging.