OpenML
Lending-Club-Loan-Data

Lending-Club-Loan-Data

active ARFF CC0: Public Domain Visibility: public Uploaded 24-03-2022 by Dustin Carrion
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
Context I wanted a highly imbalanced dataset to share with others. It has the perfect one for us. Imbalanced data typically refers to a classification problem where the number of observations per class is not equally distributed; often you'll have a large amount of data/observations for one class (referred to as the majority class), and much fewer observations for one or more other classes (referred to as the minority classes). For example, In this dataset, There are way more samples of fully paid borrowers versus not fully paid borrowers. Full LendingClub data available from their site. Content For companies like Lending Club correctly predicting whether or not a loan will be default is very important. This dataset contains historical data from 2007 to 2015, you can to build a deep learning model to predict the chance of default for future loans. As you will see this dataset is highly imbalanced and includes a lot of features that make this problem more challenging.

14 features

credit.policynumeric2 unique values
0 missing
purposestring7 unique values
0 missing
int.ratenumeric249 unique values
0 missing
installmentnumeric4788 unique values
0 missing
log.annual.incnumeric1987 unique values
0 missing
dtinumeric2529 unique values
0 missing
ficonumeric44 unique values
0 missing
days.with.cr.linenumeric2687 unique values
0 missing
revol.balnumeric7869 unique values
0 missing
revol.utilnumeric1035 unique values
0 missing
inq.last.6mthsnumeric28 unique values
0 missing
delinq.2yrsnumeric11 unique values
0 missing
pub.recnumeric6 unique values
0 missing
not.fully.paidnumeric2 unique values
0 missing

19 properties

9578
Number of instances (rows) of the dataset.
14
Number of attributes (columns) of the dataset.
Number of distinct values of the target attribute (if it is nominal).
0
Number of missing values in the dataset.
0
Number of instances with at least one value missing.
13
Number of numeric attributes.
0
Number of nominal attributes.
0
Number of attributes divided by the number of instances.
92.86
Percentage of numeric attributes.
Percentage of instances belonging to the most frequent class.
0
Percentage of nominal attributes.
Number of instances belonging to the most frequent class.
Percentage of instances belonging to the least frequent class.
Number of instances belonging to the least frequent class.
0
Number of binary attributes.
0
Percentage of binary attributes.
0
Percentage of instances having missing values.
Average class difference between consecutive instances.
0
Percentage of missing values.

0 tasks

Define a new task