Data
German-Credit-Risk

German-Credit-Risk

active ARFF Database: Open Database, Contents: Database Contents Visibility: public Uploaded 24-03-2022 by Dustin Carrion
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
Context The original dataset contains 1000 entries with 20 categorial/symbolic attributes prepared by Prof. Hofmann. In this dataset, each entry represents a person who takes a credit by a bank. Each person is classified as good or bad credit risks according to the set of attributes. The link to the original dataset can be found below. Content It is almost impossible to understand the original dataset due to its complicated system of categories and symbols. Thus, I wrote a small Python script to convert it into a readable CSV file. Several columns are simply ignored, because in my opinion either they are not important or their descriptions are obscure. The selected attributes are: Age (numeric) Sex (text: male, female) Job (numeric: 0 - unskilled and non-resident, 1 - unskilled and resident, 2 - skilled, 3 - highly skilled) Housing (text: own, rent, or free) Saving accounts (text - little, moderate, quite rich, rich) Checking account (numeric, in DM - Deutsch Mark) Credit amount (numeric, in DM) Duration (numeric, in month) Purpose (text: car, furniture/equipment, radio/TV, domestic appliances, repairs, education, business, vacation/others) Acknowledgements Source: UCI

10 features

Unnamed:_0numeric1000 unique values
0 missing
Agenumeric53 unique values
0 missing
Sexstring2 unique values
0 missing
Jobnumeric4 unique values
0 missing
Housingstring3 unique values
0 missing
Saving_accountsstring4 unique values
183 missing
Checking_accountstring3 unique values
394 missing
Credit_amountnumeric921 unique values
0 missing
Durationnumeric33 unique values
0 missing
Purposestring8 unique values
0 missing

19 properties

1000
Number of instances (rows) of the dataset.
10
Number of attributes (columns) of the dataset.
Number of distinct values of the target attribute (if it is nominal).
577
Number of missing values in the dataset.
478
Number of instances with at least one value missing.
5
Number of numeric attributes.
0
Number of nominal attributes.
0
Percentage of binary attributes.
47.8
Percentage of instances having missing values.
Average class difference between consecutive instances.
5.77
Percentage of missing values.
0.01
Number of attributes divided by the number of instances.
50
Percentage of numeric attributes.
Percentage of instances belonging to the most frequent class.
0
Percentage of nominal attributes.
Number of instances belonging to the most frequent class.
Percentage of instances belonging to the least frequent class.
Number of instances belonging to the least frequent class.
0
Number of binary attributes.

0 tasks

Define a new task