Data
default-of-credit-card-clients

default-of-credit-card-clients

active ARFF Public Domain (CC0) Visibility: public Uploaded 11-06-2020 by Felipe Farias
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
Author: Yeh, I. C., & Lien, C. H Source: [original](https://archive.ics.uci.edu/ml/datasets/default+of+credit+card+clients#) - 2016-01-26 Please cite: Yeh, I. C., & Lien, C. H. (2009). The comparisons of data mining techniques for the predictive accuracy of probability of default of credit card clients. Expert Systems with Applications, 36(2), 2473-2480. Data Set Information: This research aimed at the case of customers’ default payments in Taiwan and compares the predictive accuracy of probability of default among six data mining methods. From the perspective of risk management, the result of predictive accuracy of the estimated probability of default will be more valuable than the binary result of classification - credible or not credible clients. Because the real probability of default is unknown, this study presented the novel Sorting Smoothing Method to estimate the real probability of default. With the real probability of default as the response variable (Y), and the predictive probability of default as the independent variable (X), the simple linear regression result (Y = A + BX) shows that the forecasting model produced by artificial neural network has the highest coefficient of determination; its regression intercept (A) is close to zero, and regression coefficient (B) to one. Therefore, among the six data mining techniques, artificial neural network is the only one that can accurately estimate the real probability of default. ## Attribute Information: This research employed a binary variable, default payment (Yes = 1, No = 0), as the response variable. This study reviewed the literature and used the following 23 variables as explanatory variables: X1: Amount of the given credit (NT dollar): it includes both the individual consumer credit and his/her family (supplementary) credit. X2: Gender (1 = male; 2 = female). X3: Education (1 = graduate school; 2 = university; 3 = high school; 4 = others). X4: Marital status (1 = married; 2 = single; 3 = others). X5: Age (year). X6 - X11: History of past payment. We tracked the past monthly payment records (from April to September, 2005) as follows: X6 = the repayment status in September, 2005; X7 = the repayment status in August, 2005; . . .;X11 = the repayment status in April, 2005. The measurement scale for the repayment status is: -1 = pay duly; 1 = payment delay for one month; 2 = payment delay for two months; . . .; 8 = payment delay for eight months; 9 = payment delay for nine months and above. X12-X17: Amount of bill statement (NT dollar). X12 = amount of bill statement in September, 2005; X13 = amount of bill statement in August, 2005; . . .; X17 = amount of bill statement in April, 2005. X18-X23: Amount of previous payment (NT dollar). X18 = amount paid in September, 2005; X19 = amount paid in August, 2005; . . .;X23 = amount paid in April, 2005. ## Relevant Papers: Yeh, I. C., & Lien, C. H. (2009). The comparisons of data mining techniques for the predictive accuracy of probability of default of credit card clients. Expert Systems with Applications, 36(2), 2473-2480.

24 features

y (target)nominal2 unique values
0 missing
id (row identifier)numeric30000 unique values
0 missing
x1numeric81 unique values
0 missing
x2numeric2 unique values
0 missing
x3numeric7 unique values
0 missing
x4numeric4 unique values
0 missing
x5numeric56 unique values
0 missing
x6numeric11 unique values
0 missing
x7numeric11 unique values
0 missing
x8numeric11 unique values
0 missing
x9numeric11 unique values
0 missing
x10numeric10 unique values
0 missing
x11numeric10 unique values
0 missing
x12numeric22723 unique values
0 missing
x13numeric22346 unique values
0 missing
x14numeric22026 unique values
0 missing
x15numeric21548 unique values
0 missing
x16numeric21010 unique values
0 missing
x17numeric20604 unique values
0 missing
x18numeric7943 unique values
0 missing
x19numeric7899 unique values
0 missing
x20numeric7518 unique values
0 missing
x21numeric6937 unique values
0 missing
x22numeric6897 unique values
0 missing
x23numeric6939 unique values
0 missing

19 properties

30000
Number of instances (rows) of the dataset.
24
Number of attributes (columns) of the dataset.
2
Number of distinct values of the target attribute (if it is nominal).
0
Number of missing values in the dataset.
0
Number of instances with at least one value missing.
23
Number of numeric attributes.
1
Number of nominal attributes.
0
Number of attributes divided by the number of instances.
95.83
Percentage of numeric attributes.
77.88
Percentage of instances belonging to the most frequent class.
4.17
Percentage of nominal attributes.
23364
Number of instances belonging to the most frequent class.
22.12
Percentage of instances belonging to the least frequent class.
6636
Number of instances belonging to the least frequent class.
1
Number of binary attributes.
4.17
Percentage of binary attributes.
0
Percentage of instances having missing values.
0.66
Average class difference between consecutive instances.
0
Percentage of missing values.

8 tasks

0 runs - estimation_procedure: 10-fold Crossvalidation - target_feature: y
0 runs - estimation_procedure: 4-fold Crossvalidation - target_feature: y
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
Define a new task