OpenML

JavaScript is required to properly view the contents of this page!

Lending-Club-Loan-Data

active ARFF CC0: Public Domain Visibility: public Uploaded 24-03-2022 by Dustin Carrion
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes

Issue	#Downvotes for this reason	By

Loading wiki

Help us complete this description Edit

Context I wanted a highly imbalanced dataset to share with others. It has the perfect one for us. Imbalanced data typically refers to a classification problem where the number of observations per class is not equally distributed; often you'll have a large amount of data/observations for one class (referred to as the majority class), and much fewer observations for one or more other classes (referred to as the minority classes). For example, In this dataset, There are way more samples of fully paid borrowers versus not fully paid borrowers. Full LendingClub data available from their site. Content For companies like Lending Club correctly predicting whether or not a loan will be default is very important. This dataset contains historical data from 2007 to 2015, you can to build a deep learning model to predict the chance of default for future loans. As you will see this dataset is highly imbalanced and includes a lot of features that make this problem more challenging.

14 features

credit.policy	numeric	2 unique values 0 missing
purpose	string	7 unique values 0 missing
int.rate	numeric	249 unique values 0 missing
installment	numeric	4788 unique values 0 missing
log.annual.inc	numeric	1987 unique values 0 missing
dti	numeric	2529 unique values 0 missing
fico	numeric	44 unique values 0 missing
days.with.cr.line	numeric	2687 unique values 0 missing
revol.bal	numeric	7869 unique values 0 missing
revol.util	numeric	1035 unique values 0 missing
inq.last.6mths	numeric	28 unique values 0 missing
delinq.2yrs	numeric	11 unique values 0 missing
pub.rec	numeric	6 unique values 0 missing
not.fully.paid	numeric	2 unique values 0 missing