Data
Municipal-Debt-Risk-Analysis

Municipal-Debt-Risk-Analysis

active ARFF CC0: Public Domain Visibility: public Uploaded 24-03-2022 by Elif Ceren Gok
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
Context This data has been extracted from the billing systems of 8 Municipalities in South Africa over a 2 year period and summarised according to their total amount billed versus the total amount paid. For each account there is an indicator of whether that account resulted in a Bad Debt. This is a Classification exercise with the aim of finding out whether it is feasible to determine the probability of an account becoming a Bad Debt so that it will be possible to forecast the number (and value) of accounts that are at risk of developing into a Bad Debt. Content AccCategoryID: (Account Category ID) The numeric link in the database to the Account Category AccCategory: (Account Category) A classification of the type of account AccCategoryAbbr: (Account Category Abbreviation) An abbreviation of the classification of the type of account - to be used for One-hot encoding PropertyValue: (Property Value) The market value of the property PropertySize: (Property Size) The size of the property in square metres TotalBilling: (Total Billing) The total amount billed to the account for all services AverageBilling: (Average Billing) The average amount billed to the account for all services TotalReceipting: (Total Receipting) The total amount receipted to the account for all services AverageReceipting: (Average Receipting) The average amount receipted to the account for all services TotalDebt: (Total Debt) The Total Debt that is at 90 days or more TotalWriteOff: (Total Write Off) The Total amount of debt that has been written off CollectionRatio: (Collection Ratio) The ratio between the Total Receipting and Total Billing (ie. Total Receipting/Total Billing) DebtBillingRatio: (Billing Debt Ratio) The ratio between the Total Debt and Total Billing (ie. (Total Debt + Total Write Off)/Total Billing) TotalElectricityBill: (Total Electricity Bill) The total amount billed for electricity. This field was put in place because it is used as a means to recover debt - ie. If an amount is outstanding for any service the municipality has the right to cut a consumer's electricity connection. HasIDNo: (Has ID No.) The consumer has an ID number. This is similar to a Social Security number in the US and can be useful in legal proceedings. A consumer without any ID No. details is a lot harder to collect debt from. In addition, this field denotes that the account is held by a person and not a business. However, it is not very reliable as it's often not captured properly or at all. BadDebtIndic: (Bad Debt Indicator) 1 = Is considered to be a Bad Debt, 0 = Not considered to be a Bad Debt Inspiration I welcome any feedback on the dataset as well as my methodology in classifying and modelling this dataset. The kernel that I have run against this dataset is my first and I am now working on a second attempt with different parameters. Any advice, criticisms etc - will be much appreciated

16 features

accountcategoryidnumeric12 unique values
0 missing
accountcategorystring12 unique values
0 missing
acccatabbrstring12 unique values
0 missing
propertyvaluenumeric11951 unique values
0 missing
propertysizenumeric17424 unique values
0 missing
totalbillingnumeric33209 unique values
0 missing
avgbillingnumeric6467 unique values
0 missing
totalreceiptingnumeric27765 unique values
0 missing
avgreceiptingnumeric10536 unique values
0 missing
total90debtnumeric25001 unique values
0 missing
totalwriteoffnumeric6129 unique values
0 missing
collectionrationumeric1967 unique values
0 missing
debtbillingrationumeric7467 unique values
0 missing
totalelecbillnumeric13717 unique values
0 missing
hasidnonumeric2 unique values
0 missing
baddebtnumeric2 unique values
0 missing

19 properties

138509
Number of instances (rows) of the dataset.
16
Number of attributes (columns) of the dataset.
Number of distinct values of the target attribute (if it is nominal).
0
Number of missing values in the dataset.
0
Number of instances with at least one value missing.
14
Number of numeric attributes.
0
Number of nominal attributes.
Average class difference between consecutive instances.
0
Percentage of missing values.
0
Number of attributes divided by the number of instances.
87.5
Percentage of numeric attributes.
Percentage of instances belonging to the most frequent class.
0
Percentage of nominal attributes.
Number of instances belonging to the most frequent class.
Percentage of instances belonging to the least frequent class.
Number of instances belonging to the least frequent class.
0
Number of binary attributes.
0
Percentage of binary attributes.
0
Percentage of instances having missing values.

0 tasks

Define a new task