OpenML

JavaScript is required to properly view the contents of this page!

Explore
- Data
- Task
- Flow
- Run
- Study
- Task type
- Measure
- People
Help
Blog
Contact
Please cite us

German-Credit-Risk

active ARFF Database: Open Database, Contents: Database Contents Visibility: public Uploaded 24-03-2022 by Dustin Carrion
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes

Issue	#Downvotes for this reason	By

Loading wiki

Help us complete this description Edit

Context The original dataset contains 1000 entries with 20 categorial/symbolic attributes prepared by Prof. Hofmann. In this dataset, each entry represents a person who takes a credit by a bank. Each person is classified as good or bad credit risks according to the set of attributes. The link to the original dataset can be found below. Content It is almost impossible to understand the original dataset due to its complicated system of categories and symbols. Thus, I wrote a small Python script to convert it into a readable CSV file. Several columns are simply ignored, because in my opinion either they are not important or their descriptions are obscure. The selected attributes are: Age (numeric) Sex (text: male, female) Job (numeric: 0 - unskilled and non-resident, 1 - unskilled and resident, 2 - skilled, 3 - highly skilled) Housing (text: own, rent, or free) Saving accounts (text - little, moderate, quite rich, rich) Checking account (numeric, in DM - Deutsch Mark) Credit amount (numeric, in DM) Duration (numeric, in month) Purpose (text: car, furniture/equipment, radio/TV, domestic appliances, repairs, education, business, vacation/others) Acknowledgements Source: UCI

10 features

Unnamed:_0	numeric	1000 unique values 0 missing
Age	numeric	53 unique values 0 missing
Sex	string	2 unique values 0 missing
Job	numeric	4 unique values 0 missing
Housing	string	3 unique values 0 missing
Saving_accounts	string	4 unique values 183 missing
Checking_account	string	3 unique values 394 missing
Credit_amount	numeric	921 unique values 0 missing
Duration	numeric	33 unique values 0 missing
Purpose	string	8 unique values 0 missing