Context
This data has been extracted from the billing systems of 8 Municipalities in South Africa over a 2 year period and summarised according to their total amount billed versus the total amount paid. For each account there is an indicator of whether that account resulted in a Bad Debt.
This is a Classification exercise with the aim of finding out whether it is feasible to determine the probability of an account becoming a Bad Debt so that it will be possible to forecast the number (and value) of accounts that are at risk of developing into a Bad Debt.
Content
AccCategoryID: (Account Category ID) The numeric link in the database to the Account Category
AccCategory: (Account Category) A classification of the type of account
AccCategoryAbbr: (Account Category Abbreviation) An abbreviation of the classification of the type of account - to be used for One-hot encoding
PropertyValue: (Property Value) The market value of the property
PropertySize: (Property Size) The size of the property in square metres
TotalBilling: (Total Billing) The total amount billed to the account for all services
AverageBilling: (Average Billing) The average amount billed to the account for all services
TotalReceipting: (Total Receipting) The total amount receipted to the account for all services
AverageReceipting: (Average Receipting) The average amount receipted to the account for all services
TotalDebt: (Total Debt) The Total Debt that is at 90 days or more
TotalWriteOff: (Total Write Off) The Total amount of debt that has been written off
CollectionRatio: (Collection Ratio) The ratio between the Total Receipting and Total Billing (ie. Total Receipting/Total Billing)
DebtBillingRatio: (Billing Debt Ratio) The ratio between the Total Debt and Total Billing (ie. (Total Debt + Total Write Off)/Total Billing)
TotalElectricityBill: (Total Electricity Bill) The total amount billed for electricity. This field was put in place because it is used as a means to recover debt - ie. If an amount is outstanding for any service the municipality has the right to cut a consumer's electricity connection.
HasIDNo: (Has ID No.) The consumer has an ID number. This is similar to a Social Security number in the US and can be useful in legal proceedings. A consumer without any ID No. details is a lot harder to collect debt from. In addition, this field denotes that the account is held by a person and not a business. However, it is not very reliable as it's often not captured properly or at all.
BadDebtIndic: (Bad Debt Indicator) 1 = Is considered to be a Bad Debt, 0 = Not considered to be a Bad Debt
Inspiration
I welcome any feedback on the dataset as well as my methodology in classifying and modelling this dataset. The kernel that I have run against this dataset is my first and I am now working on a second attempt with different parameters. Any advice, criticisms etc - will be much appreciated