Data
Within-project-Defect-Prediction-for-Ansible

Within-project-Defect-Prediction-for-Ansible

active ARFF CC0: Public Domain Visibility: public Uploaded 23-03-2022 by Elif Ceren Gok
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
  • Computer Systems Machine Learning
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
Context Infrastructure-as-code (IaC) is the DevOps strategy that allows management and provisioning of infrastructure through the definition of machine-readable files and automation around them, rather than physical hardware configuration or interactive configuration tools. On the one hand, although IaC represents an ever-increasing widely adopted practice nowadays, still little is known concerning how to best maintain, speedily evolve, and continuously improve the code behind the IaC strategy in a measurable fashion. On the other hand, source code measurements are often computed and analyzed to evaluate the different quality aspects of the software developed. In particular, Infrastructure-as-Code is simply "code", as such it is prone to defects as any other programming languages. This dataset targets the YAML-based Ansible language to devise defects prediction approaches for IaC based on Machine-learning. Content The dataset contains metrics extracted from 86 open-source GitHub repositories based on the Ansible language that satisfied the following criteria: The repository has at least one push event to its master branch in the last six months; The repository has at least 2 releases; At least 11 of the files in the repository are IaC scripts; The repository has at least 2 core contributors; The repository has evidence of continuous integration practice, such as the presence of a .travis.yaml file; The repository has a comments ratio of at least 0.2; The repository has commit frequency of at least 2 per month on average; The repository has an issue frequency of at least 0.023 events per month on average; The repository has evidence of a license, such as the presence of a LICENSE.md file The repository has at least 190 source lines of code. Metrics are grouped into three categories: IaC-oriented: metrics of structural properties derived from the source code of infrastructure scripts; Delta: metrics that capture the amount of change in a file between two successive releases, collected for each IaC-oriented metric; Process: metrics that capture aspects of the development process rather than aspects about the code itself. Description of the process metrics in this dataset can be found here. Acknowledgements Thanks to the open-source community. Inspiration What source code properties and properties about the development process are good predictors of defects in Infrastructure-as-Code scripts?

113 features

failure_prone (target)numeric2 unique values
0 missing
additionsnumeric374 unique values
0 missing
additions_avgnumeric221 unique values
0 missing
additions_maxnumeric286 unique values
0 missing
avg_play_sizenumeric178 unique values
3439 missing
avg_task_sizenumeric288 unique values
3439 missing
change_set_avgnumeric36 unique values
0 missing
change_set_maxnumeric121 unique values
0 missing
code_churn_avgnumeric343 unique values
0 missing
code_churn_countnumeric475 unique values
0 missing
code_churn_maxnumeric378 unique values
0 missing
commitstring3420 unique values
0 missing
commits_countnumeric53 unique values
0 missing
committed_atstring3420 unique values
0 missing
contributors_countnumeric19 unique values
0 missing
deletionsnumeric276 unique values
0 missing
deletions_avgnumeric142 unique values
0 missing
deletions_maxnumeric210 unique values
0 missing
filepathstring10829 unique values
0 missing
highest_contributor_experiencenumeric1468 unique values
0 missing
hunks_mediannumeric69 unique values
0 missing
lines_blanknumeric137 unique values
3439 missing
lines_codenumeric502 unique values
3439 missing
lines_commentnumeric163 unique values
3439 missing
minor_contributors_countnumeric15 unique values
0 missing
num_authorized_keynumeric5 unique values
3439 missing
num_block_error_handlingnumeric6 unique values
3439 missing
num_blocksnumeric19 unique values
3439 missing
num_commandsnumeric43 unique values
3439 missing
num_conditionsnumeric109 unique values
3439 missing
num_decisionsnumeric89 unique values
3439 missing
num_deprecated_keywordsnumeric22 unique values
3439 missing
num_deprecated_modulesnumeric5 unique values
3439 missing
num_distinct_modulesnumeric93 unique values
3439 missing
num_external_modulesnumeric25 unique values
3439 missing
num_fact_modulesnumeric6 unique values
3439 missing
num_file_existsnumeric9 unique values
3439 missing
num_file_modenumeric29 unique values
3439 missing
num_file_modulesnumeric23 unique values
3439 missing
num_filtersnumeric85 unique values
3439 missing
num_ignore_errorsnumeric32 unique values
3439 missing
num_import_playbooknumeric20 unique values
3439 missing
num_import_rolenumeric26 unique values
3439 missing
num_import_tasksnumeric22 unique values
3439 missing
num_includenumeric33 unique values
3439 missing
num_include_rolenumeric14 unique values
3439 missing
num_include_tasksnumeric24 unique values
3439 missing
num_include_varsnumeric12 unique values
3439 missing
num_keysnumeric435 unique values
3439 missing
num_lookupsnumeric16 unique values
3439 missing
num_loopsnumeric44 unique values
3439 missing
num_math_operationsnumeric1 unique values
3439 missing
num_names_with_varsnumeric32 unique values
3439 missing
num_parametersnumeric164 unique values
3439 missing
num_pathsnumeric55 unique values
3439 missing
num_playsnumeric28 unique values
3439 missing
num_promptsnumeric11 unique values
3439 missing
num_regexnumeric27 unique values
3439 missing
num_rolesnumeric23 unique values
3439 missing
num_suspicious_commentsnumeric11 unique values
3439 missing
num_tasksnumeric96 unique values
3439 missing
num_tokensnumeric1192 unique values
3439 missing
num_unique_namesnumeric112 unique values
3439 missing
num_urinumeric18 unique values
3439 missing
num_varsnumeric54 unique values
3439 missing
text_entropynumeric551 unique values
3439 missing
delta_avg_play_sizenumeric119 unique values
13034 missing
delta_avg_task_sizenumeric127 unique values
13034 missing
delta_lines_blanknumeric85 unique values
13034 missing
delta_lines_codenumeric332 unique values
13034 missing
delta_lines_commentnumeric104 unique values
13034 missing
delta_num_authorized_keynumeric7 unique values
13034 missing
delta_num_block_error_handlingnumeric4 unique values
13034 missing
delta_num_blocksnumeric18 unique values
13034 missing
delta_num_commandsnumeric35 unique values
13034 missing
delta_num_conditionsnumeric89 unique values
13034 missing
delta_num_decisionsnumeric76 unique values
13034 missing
delta_num_deprecated_keywordsnumeric20 unique values
13034 missing
delta_num_deprecated_modulesnumeric3 unique values
13034 missing
delta_num_distinct_modulesnumeric82 unique values
13034 missing
delta_num_external_modulesnumeric22 unique values
13034 missing
delta_num_fact_modulesnumeric6 unique values
13034 missing
delta_num_file_existsnumeric11 unique values
13034 missing
delta_num_file_modenumeric21 unique values
13034 missing
delta_num_file_modulesnumeric18 unique values
13034 missing
delta_num_filtersnumeric59 unique values
13034 missing
delta_num_ignore_errorsnumeric28 unique values
13034 missing
delta_num_import_playbooknumeric14 unique values
13034 missing
delta_num_import_rolenumeric13 unique values
13034 missing
delta_num_import_tasksnumeric19 unique values
13034 missing
delta_num_includenumeric29 unique values
13034 missing
delta_num_include_rolenumeric14 unique values
13034 missing
delta_num_include_tasksnumeric28 unique values
13034 missing
delta_num_include_varsnumeric11 unique values
13034 missing
delta_num_keysnumeric287 unique values
13034 missing
delta_num_lookupsnumeric14 unique values
13034 missing
delta_num_loopsnumeric34 unique values
13034 missing
delta_num_math_operationsnumeric1 unique values
13034 missing
delta_num_names_with_varsnumeric23 unique values
13034 missing
delta_num_parametersnumeric132 unique values
13034 missing
delta_num_pathsnumeric44 unique values
13034 missing
delta_num_playsnumeric21 unique values
13034 missing
delta_num_promptsnumeric17 unique values
13034 missing
delta_num_regexnumeric18 unique values
13034 missing
delta_num_rolesnumeric24 unique values
13034 missing
delta_num_suspicious_commentsnumeric11 unique values
13034 missing
delta_num_tasksnumeric82 unique values
13034 missing
delta_num_tokensnumeric654 unique values
13034 missing
delta_num_unique_namesnumeric72 unique values
13034 missing
delta_num_urinumeric11 unique values
13034 missing
delta_num_varsnumeric46 unique values
13034 missing
delta_text_entropynumeric586 unique values
13034 missing
repositorystring139 unique values
0 missing

19 properties

227272
Number of instances (rows) of the dataset.
113
Number of attributes (columns) of the dataset.
0
Number of distinct values of the target attribute (if it is nominal).
757758
Number of missing values in the dataset.
13078
Number of instances with at least one value missing.
109
Number of numeric attributes.
0
Number of nominal attributes.
0
Number of attributes divided by the number of instances.
96.46
Percentage of numeric attributes.
Percentage of instances belonging to the most frequent class.
0
Percentage of nominal attributes.
Number of instances belonging to the most frequent class.
Percentage of instances belonging to the least frequent class.
Number of instances belonging to the least frequent class.
0
Number of binary attributes.
0
Percentage of binary attributes.
5.75
Percentage of instances having missing values.
0.9
Average class difference between consecutive instances.
2.95
Percentage of missing values.

0 tasks

Define a new task