OpenML
FICO-HELOC-cleaned

FICO-HELOC-cleaned

active ARFF Unknown (Kaggle)/Custom (FICO website) Visibility: public Uploaded 04-06-2023 by Matthias Feurer
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
This dataset is from the "Explainable Machine Learning Challenge": > The Explainable Machine Learning Challenge is a collaboration between Google, FICO and academics at Berkeley, Oxford, Imperial, UC Irvine and MIT, to generate new research in the area of algorithmic explainability. Teams will be challenged to create machine learning models with both high accuracy and explainability; they will use a real-world financial dataset provided by FICO. Designers and end users of machine learning algorithms will both benefit from more interpretable and explainable algorithms. Machine learning model designers will benefit from Model explanations, written explanations describing the functioning of a trained model. These might include information about which variables or examples are particularly important, they might explain the logic used by an algorithm, and/or characterize input/output relationships between variables and predictions. We expect teams to tell the story of their model such that these explanations will be qualitatively evaluated by data scientists at FICO. Further information can be retrieved from the [FICO website](https://community.fico.com/s/explainable-machine-learning-challenge). Notes * We have obtained the dataset from [Kaggle](https://www.kaggle.com/datasets/averkiyoliabev/home-equity-line-of-creditheloc) * This is a cleaned version of the Kaggle dataset, in which we have removed all rows that only contained `-9`, a special value according to the FAQ. * Please request access to the data on the FICO website to obtain the full description of the features. * In this version we have encoded the special values (-9, -8, -7) as missing values to make the data more amenable to non-tree models.

24 features

RiskPerformance (target)nominal2 unique values
0 missing
ExternalRiskEstimatenumeric60 unique values
10 missing
MSinceOldestTradeOpennumeric524 unique values
239 missing
MSinceMostRecentTradeOpennumeric111 unique values
0 missing
AverageMInFilenumeric236 unique values
0 missing
NumSatisfactoryTradesnumeric73 unique values
0 missing
NumTrades60Ever2DerogPubRecnumeric18 unique values
0 missing
NumTrades90Ever2DerogPubRecnumeric16 unique values
0 missing
PercentTradesNeverDelqnumeric71 unique values
0 missing
MSinceMostRecentDelqnumeric84 unique values
4840 missing
MaxDelq2PublicRecLast12Mnominal9 unique values
0 missing
MaxDelqEvernominal7 unique values
0 missing
NumTotalTradesnumeric87 unique values
0 missing
NumTradesOpeninLast12Mnumeric18 unique values
0 missing
PercentInstallTradesnumeric95 unique values
0 missing
MSinceMostRecentInqexcl7daysnumeric25 unique values
2331 missing
NumInqLast6Mnumeric26 unique values
0 missing
NumInqLast6Mexcl7daysnumeric26 unique values
0 missing
NetFractionRevolvingBurdennumeric126 unique values
186 missing
NetFractionInstallBurdennumeric137 unique values
3419 missing
NumRevolvingTradesWBalancenumeric29 unique values
156 missing
NumInstallTradesWBalancenumeric18 unique values
861 missing
NumBank2NatlTradesWHighUtilizationnumeric17 unique values
583 missing
PercentTradesWBalancenumeric93 unique values
18 missing

19 properties

9871
Number of instances (rows) of the dataset.
24
Number of attributes (columns) of the dataset.
2
Number of distinct values of the target attribute (if it is nominal).
12643
Number of missing values in the dataset.
7369
Number of instances with at least one value missing.
21
Number of numeric attributes.
3
Number of nominal attributes.
0
Number of attributes divided by the number of instances.
87.5
Percentage of numeric attributes.
52.03
Percentage of instances belonging to the most frequent class.
12.5
Percentage of nominal attributes.
5136
Number of instances belonging to the most frequent class.
47.97
Percentage of instances belonging to the least frequent class.
4735
Number of instances belonging to the least frequent class.
1
Number of binary attributes.
4.17
Percentage of binary attributes.
74.65
Percentage of instances having missing values.
0.56
Average class difference between consecutive instances.
5.34
Percentage of missing values.

1 tasks

0 runs - estimation_procedure: 4-fold Crossvalidation - target_feature: RiskPerformance
Define a new task