OpenML
audit-data

audit-data

active ARFF Publicly available Visibility: public Uploaded 25-05-2021 by Hage Tuin
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
Author: Nishtha Hooda, CSED, TIET, Patiala Source: [UCI](https://archive.ics.uci.edu/ml/datasets/Audit+Data) - 2018 Please cite: [Hooda, Nishtha, Seema Bawa, and Prashant Singh Rana. 'Fraudulent Firm Classification: A Case Study of an External Audit.' Applied Artificial Intelligence 32.1 (2018): 48-64.]( https://doi.org/10.1080/08839514.2018.1451032) The goal of the research is to help the auditors by building a classification model that can predict the fraudulent firm on the basis the present and historical risk factors. The information about the sectors and the counts of firms are listed respectively as Irrigation (114), Public Health (77), Buildings and Roads (82), Forest (70), Corporate (47), Animal Husbandry (95), Communication (1), Electrical (4), Land (5), Science and Technology (3), Tourism (1), Fisheries (41), Industries (37), Agriculture (200). The original dataset was separated into a trial and audit dataset. In this dataset these are concatenated into 1 dataset. Two features (trial and audit) have been added to indicate whether the data was originally from the trial or audit dataset.

37 features

Risk (target)numeric2 unique values
0 missing
Sector_scorenumeric13 unique values
0 missing
LOCATION_IDstring45 unique values
0 missing
PARA_Anumeric363 unique values
0 missing
Score_Anumeric3 unique values
776 missing
Risk_Anumeric363 unique values
776 missing
PARA_Bnumeric358 unique values
0 missing
Score_Bnumeric3 unique values
776 missing
Risk_Bnumeric360 unique values
776 missing
TOTALnumeric471 unique values
0 missing
numbersnumeric5 unique values
0 missing
Score_B.1numeric3 unique values
776 missing
Risk_Cnumeric5 unique values
776 missing
Money_Valuenumeric328 unique values
2 missing
Score_MVnumeric3 unique values
776 missing
Risk_Dnumeric328 unique values
776 missing
District_Lossnumeric3 unique values
776 missing
PROBnumeric3 unique values
776 missing
RiSk_Enumeric5 unique values
776 missing
Historynumeric7 unique values
0 missing
Probnumeric3 unique values
776 missing
Risk_Fnumeric7 unique values
776 missing
Scorenumeric17 unique values
0 missing
Inherent_Risknumeric584 unique values
776 missing
CONTROL_RISKnumeric11 unique values
776 missing
Detection_Risknumeric1 unique values
776 missing
Audit_Risknumeric601 unique values
776 missing
auditnumeric2 unique values
0 missing
trialnumeric2 unique values
0 missing
SCORE_Anumeric3 unique values
776 missing
SCORE_Bnumeric3 unique values
776 missing
Marksnumeric3 unique values
776 missing
MONEY_Marksnumeric3 unique values
776 missing
Districtnumeric3 unique values
776 missing
Lossnumeric3 unique values
776 missing
LOSS_SCOREnumeric3 unique values
776 missing
History_scorenumeric3 unique values
776 missing

19 properties

1552
Number of instances (rows) of the dataset.
37
Number of attributes (columns) of the dataset.
0
Number of distinct values of the target attribute (if it is nominal).
19402
Number of missing values in the dataset.
1552
Number of instances with at least one value missing.
36
Number of numeric attributes.
0
Number of nominal attributes.
0.02
Number of attributes divided by the number of instances.
97.3
Percentage of numeric attributes.
Percentage of instances belonging to the most frequent class.
0
Percentage of nominal attributes.
Number of instances belonging to the most frequent class.
Percentage of instances belonging to the least frequent class.
Number of instances belonging to the least frequent class.
0
Number of binary attributes.
0
Percentage of binary attributes.
100
Percentage of instances having missing values.
0.72
Average class difference between consecutive instances.
33.79
Percentage of missing values.

0 tasks

Define a new task