Data
SBA-Loans-Case-Data-Set

SBA-Loans-Case-Data-Set

active ARFF CC0: Public Domain Visibility: public Uploaded 23-03-2022 by Onur Yildirim
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
  • Computer Systems Machine Learning
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
Should This Loan be Approved or Denied? If you like the data set and download it, an upvote would be appreciated. The Small Business Administration (SBA) was founded in 1953 to assist small businesses in obtaining loans. Small businesses have been the primary source of employment in the United States. Helping small businesses help with job creation, which reduces unemployment. Small business growth also promotes economic growth. One of the ways the SBA helps small businesses is by guaranteeing bank loans. This guarantee reduces the risk to banks and encourages them to lend to small businesses. If the loan defaults, the SBA covers the amount guaranteed, and the bank suffers a loss for the remaining balance. There have been several small business success stories like FedEx and Apple. However, the rate of default is very high. Many economists believe the banking market works better without the assistance of the SBA. Supporter claim that the social benefits and job creation outweigh any financial costs to the government in defaulted loans. The Data Set The original data set is from the U.S.SBA loan database, which includes historical data from 1987 through 2014 (899,164 observations) with 27 variables. The data set includes information on whether the loan was paid off in full or if the SMA had to charge off any amount and how much that amount was. The data set used is a subset of the original set. It contains loans about the Real Estate and Rental and Leasing industry in California. This file has 2,102 observations and 35 variables. The column Default is an integer of 1 or zero, and I had to change this column to a factor. For more information on this data set go to https://amstat.tandfonline.com/doi/full/10.1080/10691898.2018.1434342

35 features

Selectednumeric2 unique values
0 missing
LoanNr_ChkDgtnumeric2102 unique values
0 missing
Namestring2005 unique values
0 missing
Citystring519 unique values
0 missing
Statestring1 unique values
0 missing
Zipnumeric814 unique values
0 missing
Bankstring154 unique values
3 missing
BankStatestring21 unique values
3 missing
NAICSnumeric24 unique values
0 missing
ApprovalDatenumeric1367 unique values
0 missing
ApprovalFYnumeric24 unique values
0 missing
Termnumeric170 unique values
0 missing
NoEmpnumeric83 unique values
0 missing
NewExistnumeric3 unique values
1 missing
CreateJobnumeric43 unique values
0 missing
RetainedJobnumeric62 unique values
0 missing
FranchiseCodenumeric33 unique values
0 missing
UrbanRuralnumeric3 unique values
0 missing
RevLineCrstring4 unique values
2 missing
LowDocstring5 unique values
3 missing
ChgOffDatenumeric503 unique values
1405 missing
DisbursementDatenumeric321 unique values
3 missing
DisbursementGrossnumeric1182 unique values
0 missing
BalanceGrossnumeric1 unique values
0 missing
MIS_Statusstring2 unique values
0 missing
ChgOffPrinGrnumeric615 unique values
0 missing
GrAppvnumeric659 unique values
0 missing
SBA_Appvnumeric755 unique values
0 missing
Newnumeric2 unique values
0 missing
RealEstatenumeric2 unique values
0 missing
Portionnumeric31 unique values
0 missing
Recessionnumeric2 unique values
0 missing
daystermnumeric170 unique values
0 missing
xxnumeric1212 unique values
3 missing
Defaultnumeric2 unique values
0 missing

19 properties

2102
Number of instances (rows) of the dataset.
35
Number of attributes (columns) of the dataset.
Number of distinct values of the target attribute (if it is nominal).
1423
Number of missing values in the dataset.
1407
Number of instances with at least one value missing.
27
Number of numeric attributes.
0
Number of nominal attributes.
0.02
Number of attributes divided by the number of instances.
77.14
Percentage of numeric attributes.
Percentage of instances belonging to the most frequent class.
0
Percentage of nominal attributes.
Number of instances belonging to the most frequent class.
Percentage of instances belonging to the least frequent class.
Number of instances belonging to the least frequent class.
0
Number of binary attributes.
0
Percentage of binary attributes.
66.94
Percentage of instances having missing values.
Average class difference between consecutive instances.
1.93
Percentage of missing values.

0 tasks

Define a new task