OpenML

JavaScript is required to properly view the contents of this page!

Explore
- Data
- Task
- Flow
- Run
- Study
- Task type
- Measure
- People
Help
Blog
Contact
Please cite us

default-of-credit-card-clients

active ARFF Public Domain (CC0) Visibility: public Uploaded 07-01-2023 by Leo Grin
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes

Issue	#Downvotes for this reason	By

Loading wiki

Help us complete this description Edit

Dataset used in the tabular data benchmark https://github.com/LeoGrin/tabular-benchmark, transformed in the same way. This dataset belongs to the "classification on both numerical and categorical features" benchmark. Original link: https://openml.org/d/42477 Original description: Author: Yeh, I. C., & Lien, C. H Source: [original](https://archive.ics.uci.edu/ml/datasets/default+of+credit+card+clients#) - 2016-01-26 Please cite: Yeh, I. C., & Lien, C. H. (2009). The comparisons of data mining techniques for the predictive accuracy of probability of default of credit card clients. Expert Systems with Applications, 36(2), 2473-2480. Data Set Information: This research aimed at the case of customersaEUR(tm) default payments in Taiwan and compares the predictive accuracy of probability of default among six data mining methods. From the perspective of risk management, the result of predictive accuracy of the estimated probability of default will be more valuable than the binary result of classification - credible or not credible clients. Because the real probability of default is unknown, this study presented the novel Sorting Smoothing Method to estimate the real probability of default. With the real probability of default as the response variable (Y), and the predictive probability of default as the independent variable (X), the simple linear regression result (Y = A + BX) shows that the forecasting model produced by artificial neural network has the highest coefficient of determination; its regression intercept (A) is close to zero, and regression coefficient (B) to one. Therefore, among the six data mining techniques, artificial neural network is the only one that can accurately estimate the real probability of default. ## Attribute Information: This research employed a binary variable, default payment (Yes = 1, No = 0), as the response variable. This study reviewed the literature and used the following 23 variables as explanatory variables: X1: Amount of the given credit (NT dollar): it includes both the individual consumer credit and his/her family (supplementary) credit. X2: Gender (1 = male; 2 = female). X3: Education (1 = graduate school; 2 = university; 3 = high school; 4 = others). X4: Marital status (1 = married; 2 = single; 3 = others). X5: Age (year). X6 - X11: History of past payment. We tracked the past monthly payment records (from April to September, 2005) as follows: X6 = the repayment status in September, 2005; X7 = the repayment status in August, 2005; . . .;X11 = the repayment status in April, 2005. The measurement scale for the repayment status is: -1 = pay duly; 1 = payment delay for one month; 2 = payment delay for two months; . . .; 8 = payment delay for eight months; 9 = payment delay for nine months and above. X12-X17: Amount of bill statement (NT dollar). X12 = amount of bill statement in September, 2005; X13 = amount of bill statement in August, 2005; . . .; X17 = amount of bill statement in April, 2005. X18-X23: Amount of previous payment (NT dollar). X18 = amount paid in September, 2005; X19 = amount paid in August, 2005; . . .;X23 = amount paid in April, 2005. ## Relevant Papers: Yeh, I. C., & Lien, C. H. (2009). The comparisons of data mining techniques for the predictive accuracy of probability of default of credit card clients. Expert Systems with Applications, 36(2), 2473-2480.

22 features

y (target)	nominal	2 unique values 0 missing
x1	numeric	74 unique values 0 missing
x2	nominal	2 unique values 0 missing
x5	numeric	54 unique values 0 missing
x6	numeric	11 unique values 0 missing
x7	numeric	10 unique values 0 missing
x8	numeric	11 unique values 0 missing
x9	numeric	11 unique values 0 missing
x10	numeric	10 unique values 0 missing
x11	numeric	10 unique values 0 missing
x12	numeric	10811 unique values 0 missing
x13	numeric	10662 unique values 0 missing
x14	numeric	10491 unique values 0 missing
x15	numeric	10314 unique values 0 missing
x16	numeric	10097 unique values 0 missing
x17	numeric	9888 unique values 0 missing
x18	numeric	4229 unique values 0 missing
x19	numeric	4232 unique values 0 missing
x20	numeric	4025 unique values 0 missing
x21	numeric	3730 unique values 0 missing
x22	numeric	3696 unique values 0 missing
x23	numeric	3751 unique values 0 missing