Data
Give-Me-Some-Credit

Give-Me-Some-Credit

active ARFF Publicly available Visibility: public Uploaded 20-06-2023 by Matthias Feurer
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
Improve on the state of the art in credit scoring by predicting the probability that somebody will experience financial distress in the next two years. yeah ## Description Banks play a crucial role in market economies. They decide who can get finance and on what terms and can make or break investment decisions. For markets and society to function, individuals and companies need access to credit. Credit scoring algorithms, which make a guess at the probability of default, are the method banks use to determine whether or not a loan should be granted. This competition requires participants to improve on the state of the art in credit scoring, by predicting the probability that somebody will experience financial distress in the next two years. The goal of this competition is to build a model that borrowers can use to help make the best financial decisions. Historical data are provided on 250,000 borrowers and the prize pool is $5,000 ($3,000 for first, $1,500 for second and $500 for third). ## Features | Variable Name | Description | Type | |--------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------|------------| | SeriousDlqin2yrs | Person experienced 90 days past due delinquency or worse | Y/N | | RevolvingUtilizationOfUnsecuredLines | Total balance on credit cards and personal lines of credit except real estate and no installment debt like car loans divided by the sum of credit limits | percentage | | age | Age of borrower in years | integer | | NumberOfTime30-59DaysPastDueNotWorse | Number of times borrower has been 30-59 days past due but no worse in the last 2 years. | integer | | DebtRatio | Monthly debt payments, alimony,living costs divided by monthy gross income | percentage | | MonthlyIncome | Monthly income | real | | NumberOfOpenCreditLinesAndLoans | Number of Open loans (installment like car loan or mortgage) and Lines of credit (e.g. credit cards) | integer | | NumberOfTimes90DaysLate | Number of times borrower has been 90 days or more past due. | integer | | NumberRealEstateLoansOrLines | Number of mortgage and real estate loans including home equity lines of credit | integer | | NumberOfTime60-89DaysPastDueNotWorse | Number of times borrower has been 60-89 days past due but no worse in the last 2 years. | integer | | NumberOfDependents | Number of dependents in family excluding themselves (spouse, children etc.) | integer | Note: This is the training part of the Kaggle competition going by the dataset name hosted [here](https://www.kaggle.com/competitions/GiveMeSomeCredit/overview).

11 features

SeriousDlqin2yrs (target)nominal2 unique values
0 missing
RevolvingUtilizationOfUnsecuredLinesnumeric125728 unique values
0 missing
agenumeric86 unique values
0 missing
NumberOfTime30-59DaysPastDueNotWorsenumeric16 unique values
0 missing
DebtRationumeric114194 unique values
0 missing
MonthlyIncomenumeric13594 unique values
29731 missing
NumberOfOpenCreditLinesAndLoansnumeric58 unique values
0 missing
NumberOfTimes90DaysLatenumeric19 unique values
0 missing
NumberRealEstateLoansOrLinesnumeric28 unique values
0 missing
NumberOfTime60-89DaysPastDueNotWorsenumeric13 unique values
0 missing
NumberOfDependentsnumeric13 unique values
3924 missing

19 properties

150000
Number of instances (rows) of the dataset.
11
Number of attributes (columns) of the dataset.
2
Number of distinct values of the target attribute (if it is nominal).
33655
Number of missing values in the dataset.
29731
Number of instances with at least one value missing.
10
Number of numeric attributes.
1
Number of nominal attributes.
9.09
Percentage of binary attributes.
19.82
Percentage of instances having missing values.
0.88
Average class difference between consecutive instances.
2.04
Percentage of missing values.
0
Number of attributes divided by the number of instances.
90.91
Percentage of numeric attributes.
93.32
Percentage of instances belonging to the most frequent class.
9.09
Percentage of nominal attributes.
139974
Number of instances belonging to the most frequent class.
6.68
Percentage of instances belonging to the least frequent class.
10026
Number of instances belonging to the least frequent class.
1
Number of binary attributes.

2 tasks

0 runs - estimation_procedure: 4-fold Crossvalidation - target_feature: SeriousDlqin2yrs
0 runs - estimation_procedure: 10-fold Crossvalidation - target_feature: SeriousDlqin2yrs
Define a new task