OpenML
Code_Smells_Blob

Code_Smells_Blob

active ARFF Publicly available Visibility: public Uploaded 10-08-2021 by Jan van Rijn
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
  • Computer Systems Machine Learning
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
This dataset combines records from the MLCQ dataset with metrics extracted using the PMD Tool and the Understand tool, to determine whether a file contains code smells. Please note that the records are on (sub)class level. Classification task, the default class (severity) should be binarized with a static threshold (preferably between 0.5 and 2.5). Please carefully read the publication to understand how to use this dataset.

67 features

severity (target)numeric22 unique values
0 missing
repository (ignore)string431 unique values
0 missing
package (ignore)string2076 unique values
0 missing
filename (ignore)string2249 unique values
0 missing
code_name (ignore)string2325 unique values
0 missing
commit_hash (ignore)string431 unique values
0 missing
smell (ignore)string1 unique values
0 missing
AvgCyclomaticnumeric21 unique values
33669 missing
AvgCyclomaticModifiednumeric21 unique values
33669 missing
AvgCyclomaticStrictnumeric24 unique values
33669 missing
AvgEssentialnumeric13 unique values
33669 missing
AvgLinenumeric74 unique values
33669 missing
AvgLineBlanknumeric20 unique values
33669 missing
AvgLineCodenumeric64 unique values
33669 missing
AvgLineCommentnumeric29 unique values
33669 missing
CountClassBasenumeric9 unique values
33669 missing
CountClassCouplednumeric77 unique values
33669 missing
CountClassCoupledModifiednumeric77 unique values
33669 missing
CountClassDerivednumeric29 unique values
33669 missing
CountDeclClassnumeric0 unique values
83943 missing
CountDeclClassMethodnumeric32 unique values
33669 missing
CountDeclClassVariablenumeric26 unique values
33669 missing
CountDeclExecutableUnitnumeric0 unique values
83943 missing
CountDeclFilenumeric0 unique values
83943 missing
CountDeclFunctionnumeric0 unique values
83943 missing
CountDeclInstanceMethodnumeric64 unique values
33669 missing
CountDeclInstanceVariablenumeric41 unique values
33669 missing
CountDeclMethodnumeric66 unique values
33669 missing
CountDeclMethodAllnumeric139 unique values
33669 missing
CountDeclMethodDefaultnumeric27 unique values
33669 missing
CountDeclMethodPrivatenumeric29 unique values
33669 missing
CountDeclMethodProtectednumeric21 unique values
33669 missing
CountDeclMethodPublicnumeric58 unique values
33669 missing
CountInputnumeric0 unique values
83943 missing
CountLinenumeric428 unique values
33669 missing
CountLineBlanknumeric122 unique values
33669 missing
CountLineCodenumeric338 unique values
33669 missing
CountLineCodeDeclnumeric159 unique values
33669 missing
CountLineCodeExenumeric251 unique values
33669 missing
CountLineCommentnumeric188 unique values
33669 missing
CountOutputnumeric0 unique values
83943 missing
CountPathnumeric0 unique values
83943 missing
CountPathLognumeric0 unique values
83943 missing
CountSemicolonnumeric217 unique values
33669 missing
CountStmtnumeric268 unique values
33669 missing
CountStmtDeclnumeric140 unique values
33669 missing
CountStmtExenumeric223 unique values
33669 missing
Cyclomaticnumeric0 unique values
83943 missing
CyclomaticModifiednumeric0 unique values
83943 missing
CyclomaticStrictnumeric0 unique values
83943 missing
Essentialnumeric0 unique values
83943 missing
Knotsnumeric0 unique values
83943 missing
MaxCyclomaticnumeric48 unique values
33669 missing
MaxCyclomaticModifiednumeric47 unique values
33669 missing
MaxCyclomaticStrictnumeric54 unique values
33669 missing
MaxEssentialnumeric28 unique values
33669 missing
MaxEssentialKnotsnumeric0 unique values
83943 missing
MaxInheritanceTreenumeric11 unique values
33669 missing
MaxNestingnumeric10 unique values
33669 missing
MinEssentialKnotsnumeric0 unique values
83943 missing
PercentLackOfCohesionnumeric83 unique values
33669 missing
PercentLackOfCohesionModifiednumeric96 unique values
33669 missing
RatioCommentToCodenumeric285 unique values
33669 missing
SumCyclomaticnumeric125 unique values
33669 missing
SumCyclomaticModifiednumeric123 unique values
33669 missing
SumCyclomaticStrictnumeric133 unique values
33669 missing
SumEssentialnumeric84 unique values
33669 missing
WOCnumeric320 unique values
919 missing
NOPAnumeric35 unique values
195 missing
NOAMnumeric43 unique values
195 missing
WMCnumeric276 unique values
195 missing
TCCnumeric900 unique values
25678 missing
ATFDnumeric234 unique values
195 missing
class_name (ignore)string16678 unique values
195 missing

19 properties

83943
Number of instances (rows) of the dataset.
67
Number of attributes (columns) of the dataset.
0
Number of distinct values of the target attribute (if it is nominal).
2801627
Number of missing values in the dataset.
83943
Number of instances with at least one value missing.
67
Number of numeric attributes.
0
Number of nominal attributes.
0
Number of attributes divided by the number of instances.
100
Percentage of numeric attributes.
Percentage of instances belonging to the most frequent class.
0
Percentage of nominal attributes.
Number of instances belonging to the most frequent class.
Percentage of instances belonging to the least frequent class.
Number of instances belonging to the least frequent class.
0
Number of binary attributes.
0
Percentage of binary attributes.
100
Percentage of instances having missing values.
0.99
Average class difference between consecutive instances.
49.81
Percentage of missing values.

0 tasks

Define a new task