# Loan Approval Classification Dataset
## Data Source
This dataset is a synthetic version inspired by the original Credit Risk dataset on Kaggle and enriched with additional variables based on historical loan approval data. SMOTENC was used to simulate new data points to enlarge the databank. The dataset is structured for both categorical and continuous features.
## Metadata
The dataset contains 45,000 records and 14 variables, each described below:
### Personal Information Features:
1. person_age: Age of the person
2. person_gender: Gender of the person
3. person_education: Highest education level
4. person_income: Annual income
5. person_emp_exp: Years of employment experience
6. person_home_ownership: Home ownership status (e.g., rent, own, mortgage)
### Loan Information Features:
7. loan_amnt: Loan amount requested
8. loan_intent: Purpose of the loan
9. loan_int_rate: Loan interest rate
10. loan_percent_income: Loan amount as a percentage of annual income
### Credit Information Features:
11. cb_person_cred_hist_length: Length of credit history in years
12. credit_score: Credit score of the person
13. previous_loan_defaults_on_file: Indicator of previous loan defaults
14. loan_status: Loan approval status: 1 = approved, 0 = rejected (target variable)
## Target Variable
The target variable 'loan_status' is binary:
- 1 = approved
- 0 = rejected