OpenML

JavaScript is required to properly view the contents of this page!

Explore
- Data
- Task
- Flow
- Run
- Study
- Task type
- Measure
- People
Help
Blog
Contact
Please cite us

audit-data

active ARFF Publicly available Visibility: public Uploaded 25-05-2021 by Hage Tuin
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes

Issue	#Downvotes for this reason	By

Loading wiki

Help us complete this description Edit

Author: Nishtha Hooda, CSED, TIET, Patiala Source: [UCI](https://archive.ics.uci.edu/ml/datasets/Audit+Data) - 2018 Please cite: [Hooda, Nishtha, Seema Bawa, and Prashant Singh Rana. 'Fraudulent Firm Classification: A Case Study of an External Audit.' Applied Artificial Intelligence 32.1 (2018): 48-64.]( https://doi.org/10.1080/08839514.2018.1451032) The goal of the research is to help the auditors by building a classification model that can predict the fraudulent firm on the basis the present and historical risk factors. The information about the sectors and the counts of firms are listed respectively as Irrigation (114), Public Health (77), Buildings and Roads (82), Forest (70), Corporate (47), Animal Husbandry (95), Communication (1), Electrical (4), Land (5), Science and Technology (3), Tourism (1), Fisheries (41), Industries (37), Agriculture (200). The original dataset was separated into a trial and audit dataset. In this dataset these are concatenated into 1 dataset. Two features (trial and audit) have been added to indicate whether the data was originally from the trial or audit dataset.

37 features

Risk (target)	numeric	2 unique values 0 missing
Sector_score	numeric	13 unique values 0 missing
LOCATION_ID	string	45 unique values 0 missing
PARA_A	numeric	363 unique values 0 missing
Score_A	numeric	3 unique values 776 missing
Risk_A	numeric	363 unique values 776 missing
PARA_B	numeric	358 unique values 0 missing
Score_B	numeric	3 unique values 776 missing
Risk_B	numeric	360 unique values 776 missing
TOTAL	numeric	471 unique values 0 missing
numbers	numeric	5 unique values 0 missing
Score_B.1	numeric	3 unique values 776 missing
Risk_C	numeric	5 unique values 776 missing
Money_Value	numeric	328 unique values 2 missing
Score_MV	numeric	3 unique values 776 missing
Risk_D	numeric	328 unique values 776 missing
District_Loss	numeric	3 unique values 776 missing
PROB	numeric	3 unique values 776 missing
RiSk_E	numeric	5 unique values 776 missing
History	numeric	7 unique values 0 missing
Prob	numeric	3 unique values 776 missing
Risk_F	numeric	7 unique values 776 missing
Score	numeric	17 unique values 0 missing
Inherent_Risk	numeric	584 unique values 776 missing
CONTROL_RISK	numeric	11 unique values 776 missing
Detection_Risk	numeric	1 unique values 776 missing
Audit_Risk	numeric	601 unique values 776 missing
audit	numeric	2 unique values 0 missing
trial	numeric	2 unique values 0 missing
SCORE_A	numeric	3 unique values 776 missing
SCORE_B	numeric	3 unique values 776 missing
Marks	numeric	3 unique values 776 missing
MONEY_Marks	numeric	3 unique values 776 missing
District	numeric	3 unique values 776 missing
Loss	numeric	3 unique values 776 missing
LOSS_SCORE	numeric	3 unique values 776 missing
History_score	numeric	3 unique values 776 missing