Data
QSAR-DATASET-FOR-DRUG-TARGET-CHEMBL4481

QSAR-DATASET-FOR-DRUG-TARGET-CHEMBL4481

deactivated ARFF Publicly available Visibility: public Uploaded 14-07-2016 by Noureddin Sadawi
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
This dataset contains QSAR data (from ChEMBL version 17) showing activity values (unit is pseudo-pCI50) of several compounds on drug target ChEMBL_ID: CHEMBL4481 (TID: 12425), and it has 913 rows and 138 features (not including molecule IDs and class feature: molecule_id and pXC50). The features represent Molecular Descriptors which were generated from SMILES strings. Missing value imputation was applied to this dataset (By choosing the Median). Feature selection was also applied.

140 features

pXC50 (target)numeric437 unique values
0 missing
molecule_id (row identifier)nominal913 unique values
0 missing
H.049numeric5 unique values
0 missing
GMTIVnumeric791 unique values
0 missing
SMTIVnumeric786 unique values
0 missing
Eta_betaSnumeric105 unique values
0 missing
P_VSA_LogP_6numeric103 unique values
0 missing
nHAccnumeric12 unique values
0 missing
ATS1snumeric603 unique values
0 missing
Chi1_EA.ed.numeric524 unique values
0 missing
SpMax4_Bh.m.numeric573 unique values
0 missing
ON1numeric229 unique values
0 missing
Chi0_AEA.bo.numeric395 unique values
0 missing
Chi0_AEA.dm.numeric395 unique values
0 missing
Chi0_AEA.ed.numeric395 unique values
0 missing
Chi0_AEA.ri.numeric395 unique values
0 missing
Chi0_EAnumeric395 unique values
0 missing
SpMaxA_AEA.ri.numeric285 unique values
0 missing
Chi0_EA.ed.numeric531 unique values
0 missing
Dznumeric200 unique values
0 missing
IVDMnumeric239 unique values
0 missing
SpMaxA_EA.bo.numeric270 unique values
0 missing
Chi1_AEA.bo.numeric470 unique values
0 missing
Chi1_AEA.dm.numeric470 unique values
0 missing
Chi1_AEA.ed.numeric470 unique values
0 missing
Chi1_AEA.ri.numeric470 unique values
0 missing
Chi1_EAnumeric470 unique values
0 missing
X1numeric392 unique values
0 missing
LPRSnumeric554 unique values
0 missing
Chi1_EA.ri.numeric774 unique values
0 missing
Chi0_EA.ri.numeric739 unique values
0 missing
SpMax5_Bh.e.numeric538 unique values
0 missing
SpMaxA_EAnumeric208 unique values
0 missing
Eig14_AEA.ed.numeric304 unique values
0 missing
SpMaxA_AEA.ed.numeric294 unique values
0 missing
SNarnumeric156 unique values
0 missing
MWnumeric556 unique values
0 missing
CATS2D_04_AAnumeric8 unique values
0 missing
IDDMnumeric316 unique values
0 missing
SdNHnumeric196 unique values
0 missing
SpMax5_Bh.m.numeric613 unique values
0 missing
BIDnumeric194 unique values
0 missing
nSKnumeric40 unique values
0 missing
Psi_e_0numeric748 unique values
0 missing
IDMnumeric487 unique values
0 missing
SpAD_AEA.ri.numeric785 unique values
0 missing
X1Madnumeric736 unique values
0 missing
SpAD_AEA.dm.numeric780 unique values
0 missing
SpAD_EAnumeric551 unique values
0 missing
MPC01numeric47 unique values
0 missing
MWC01numeric47 unique values
0 missing
nBOnumeric47 unique values
0 missing
SRW02numeric47 unique values
0 missing
Eig08_AEA.dm.numeric579 unique values
0 missing
SpAD_AEA.bo.numeric644 unique values
0 missing
P_VSA_MR_7numeric62 unique values
0 missing
XMODnumeric712 unique values
0 missing
SpMax5_Bh.i.numeric541 unique values
0 missing
Xunumeric552 unique values
0 missing
SpMaxA_EA.ed.numeric364 unique values
0 missing
IDETnumeric544 unique values
0 missing
piPC01numeric92 unique values
0 missing
SCBOnumeric92 unique values
0 missing
CATS2D_06_ALnumeric18 unique values
0 missing
Eig13_AEA.ed.numeric314 unique values
0 missing
SpMax8_Bh.s.numeric568 unique values
0 missing
SpMaxA_EA.ri.numeric247 unique values
0 missing
SpMaxA_AEA.bo.numeric279 unique values
0 missing
X2numeric504 unique values
0 missing
ZM1Madnumeric657 unique values
0 missing
Xtnumeric135 unique values
0 missing
C.032numeric2 unique values
0 missing
SpMax5_Bh.v.numeric577 unique values
0 missing
X0numeric257 unique values
0 missing
SpAD_EA.ri.numeric791 unique values
0 missing
SpAD_AEA.ed.numeric554 unique values
0 missing
Eig07_AEA.dm.numeric616 unique values
0 missing
N.075numeric6 unique values
0 missing
NaaNnumeric6 unique values
0 missing
nHetnumeric13 unique values
0 missing
ATSC7enumeric464 unique values
0 missing
IDMTnumeric549 unique values
0 missing
RDSQnumeric553 unique values
0 missing
C.042numeric3 unique values
0 missing
MWC02numeric101 unique values
0 missing
ZM1numeric101 unique values
0 missing
P_VSA_m_2numeric664 unique values
0 missing
Eig12_AEA.bo.numeric381 unique values
0 missing
NdNHnumeric3 unique values
0 missing
Psi_e_1numeric730 unique values
0 missing
SM02_AEA.bo.numeric309 unique values
0 missing
Eig05_AEA.bo.numeric495 unique values
0 missing
Eig15_AEA.ed.numeric309 unique values
0 missing
Eig12_AEA.ed.numeric319 unique values
0 missing
C.027numeric5 unique values
0 missing
ON0numeric139 unique values
0 missing
SRW04numeric146 unique values
0 missing
X3numeric533 unique values
0 missing
SpMax4_Bh.p.numeric523 unique values
0 missing
Eta_epsinumeric495 unique values
0 missing
Eig04_EA.bo.numeric514 unique values
0 missing
SM14_AEA.ri.numeric514 unique values
0 missing
ATS4mnumeric685 unique values
0 missing
BBInumeric67 unique values
0 missing
MPC02numeric67 unique values
0 missing
SM02_EAnumeric67 unique values
0 missing
nPyrimidinesnumeric2 unique values
0 missing
X4numeric542 unique values
0 missing
X0solnumeric340 unique values
0 missing
SM03_AEA.bo.numeric415 unique values
0 missing
SpMax3_Bh.m.numeric554 unique values
0 missing
S1Knumeric514 unique values
0 missing
SssOnumeric224 unique values
0 missing
X1solnumeric442 unique values
0 missing
MWC03numeric191 unique values
0 missing
ZM2numeric191 unique values
0 missing
Uinumeric27 unique values
0 missing
Eig08_EA.bo.numeric486 unique values
0 missing
Eig09_EAnumeric363 unique values
0 missing
SM03_AEA.dm.numeric363 unique values
0 missing
Eig05_EA.bo.numeric526 unique values
0 missing
SM15_AEA.ri.numeric526 unique values
0 missing
Eig07_EA.ri.numeric574 unique values
0 missing
Eta_alphanumeric360 unique values
0 missing
GGI10numeric145 unique values
0 missing
Eig07_EAnumeric393 unique values
0 missing
SM15_AEA.bo.numeric393 unique values
0 missing
ATS1mnumeric491 unique values
0 missing
Eig04_EA.ri.numeric589 unique values
0 missing
CATS2D_02_DDnumeric4 unique values
0 missing
Eig07_AEA.ri.numeric589 unique values
0 missing
Eig14_EA.bo.numeric343 unique values
0 missing
ATSC2enumeric386 unique values
0 missing
ATS5mnumeric704 unique values
0 missing
ATSC6enumeric509 unique values
0 missing
nRCONHRnumeric2 unique values
0 missing
SM04_AEA.bo.numeric498 unique values
0 missing
NaasCnumeric11 unique values
0 missing
Eig10_AEA.ri.numeric513 unique values
0 missing
ATS8snumeric560 unique values
0 missing

107 properties

913
Number of instances (rows) of the dataset.
140
Number of attributes (columns) of the dataset.
0
Number of distinct values of the target attribute (if it is nominal).
0
Number of missing values in the dataset.
0
Number of instances with at least one value missing.
139
Number of numeric attributes.
1
Number of nominal attributes.
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.REPTree -L 3
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.DecisionStump
Maximum entropy among attributes.
-1.58
Minimum kurtosis among attributes of the numeric type.
3.99
Second quartile (Median) of means among attributes of the numeric type.
Error rate achieved by the landmarker weka.classifiers.trees.REPTree -L 3
Error rate achieved by the landmarker weka.classifiers.trees.DecisionStump
8.81
Maximum kurtosis among attributes of the numeric type.
-0.32
Minimum of means among attributes of the numeric type.
Second quartile (Median) of mutual information between the nominal attributes and the target attribute.
Kappa coefficient achieved by the landmarker weka.classifiers.trees.REPTree -L 3
Kappa coefficient achieved by the landmarker weka.classifiers.trees.DecisionStump
16821.9
Maximum of means among attributes of the numeric type.
Minimal mutual information between the nominal attributes and the target attribute.
0.21
Second quartile (Median) of skewness among attributes of the numeric type.
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.RandomTree -depth 1
0.15
Number of attributes divided by the number of instances.
Maximum mutual information between the nominal attributes and the target attribute.
The minimal number of distinct values among attributes of the nominal type.
0
Percentage of binary attributes.
1.34
Second quartile (Median) of standard deviation of attributes of the numeric type.
Error rate achieved by the landmarker weka.classifiers.trees.RandomTree -depth 1
Number of attributes needed to optimally describe the class (under the assumption of independence among attributes). Equals ClassEntropy divided by MeanMutualInformation.
The maximum number of distinct values among attributes of the nominal type.
-1.15
Minimum skewness among attributes of the numeric type.
0
Percentage of instances having missing values.
Third quartile of entropy among attributes.
Kappa coefficient achieved by the landmarker weka.classifiers.trees.RandomTree -depth 1
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.J48 -C .00001
2.2
Maximum skewness among attributes of the numeric type.
0.08
Minimum standard deviation of attributes of the numeric type.
0
Percentage of missing values.
0.27
Third quartile of kurtosis among attributes of the numeric type.
-0.01
Average class difference between consecutive instances.
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.RandomTree -depth 2
Error rate achieved by the landmarker weka.classifiers.trees.J48 -C .00001
18229.56
Maximum standard deviation of attributes of the numeric type.
Percentage of instances belonging to the least frequent class.
99.29
Percentage of numeric attributes.
13.73
Third quartile of means among attributes of the numeric type.
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.DecisionStump -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
Error rate achieved by the landmarker weka.classifiers.trees.RandomTree -depth 2
Kappa coefficient achieved by the landmarker weka.classifiers.trees.J48 -C .00001
Average entropy of the attributes.
Number of instances belonging to the least frequent class.
0.71
Percentage of nominal attributes.
Third quartile of mutual information between the nominal attributes and the target attribute.
Error rate achieved by the landmarker weka.classifiers.trees.DecisionStump -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
Kappa coefficient achieved by the landmarker weka.classifiers.trees.RandomTree -depth 2
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.J48 -C .0001
-0.11
Mean kurtosis among attributes of the numeric type.
Area Under the ROC Curve achieved by the landmarker weka.classifiers.bayes.NaiveBayes
First quartile of entropy among attributes.
0.61
Third quartile of skewness among attributes of the numeric type.
Kappa coefficient achieved by the landmarker weka.classifiers.trees.DecisionStump -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
Area Under the ROC Curve achieved by the landmarker weka.classifiers.bayes.NaiveBayes -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.RandomTree -depth 3
Error rate achieved by the landmarker weka.classifiers.trees.J48 -C .0001
288.19
Mean of means among attributes of the numeric type.
Error rate achieved by the landmarker weka.classifiers.bayes.NaiveBayes
-0.99
First quartile of kurtosis among attributes of the numeric type.
6.02
Third quartile of standard deviation of attributes of the numeric type.
Error rate achieved by the landmarker weka.classifiers.bayes.NaiveBayes -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
Error rate achieved by the landmarker weka.classifiers.trees.RandomTree -depth 3
Kappa coefficient achieved by the landmarker weka.classifiers.trees.J48 -C .0001
Average mutual information between the nominal attributes and the target attribute.
Kappa coefficient achieved by the landmarker weka.classifiers.bayes.NaiveBayes
1.45
First quartile of means among attributes of the numeric type.
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.REPTree -L 1
Kappa coefficient achieved by the landmarker weka.classifiers.bayes.NaiveBayes -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
Kappa coefficient achieved by the landmarker weka.classifiers.trees.RandomTree -depth 3
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.J48 -C .001
An estimate of the amount of irrelevant information in the attributes regarding the class. Equals (MeanAttributeEntropy - MeanMutualInformation) divided by MeanMutualInformation.
0
Number of binary attributes.
First quartile of mutual information between the nominal attributes and the target attribute.
Error rate achieved by the landmarker weka.classifiers.trees.REPTree -L 1
Area Under the ROC Curve achieved by the landmarker weka.classifiers.lazy.IBk -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
Standard deviation of the number of distinct values among attributes of the nominal type.
Error rate achieved by the landmarker weka.classifiers.trees.J48 -C .001
Average number of distinct values among the attributes of the nominal type.
-0.4
First quartile of skewness among attributes of the numeric type.
Kappa coefficient achieved by the landmarker weka.classifiers.trees.REPTree -L 1
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.REPTree -L 2
Error rate achieved by the landmarker weka.classifiers.lazy.IBk -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
Area Under the ROC Curve achieved by the landmarker weka.classifiers.lazy.IBk
Kappa coefficient achieved by the landmarker weka.classifiers.trees.J48 -C .001
0.21
Mean skewness among attributes of the numeric type.
0.54
First quartile of standard deviation of attributes of the numeric type.
Error rate achieved by the landmarker weka.classifiers.trees.REPTree -L 2
Kappa coefficient achieved by the landmarker weka.classifiers.lazy.IBk -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
Error rate achieved by the landmarker weka.classifiers.lazy.IBk
Percentage of instances belonging to the most frequent class.
300.84
Mean standard deviation of attributes of the numeric type.
Second quartile (Median) of entropy among attributes.
Kappa coefficient achieved by the landmarker weka.classifiers.trees.REPTree -L 2
Entropy of the target attribute values.
Kappa coefficient achieved by the landmarker weka.classifiers.lazy.IBk
Number of instances belonging to the most frequent class.
Minimal entropy among attributes.
-0.82
Second quartile (Median) of kurtosis among attributes of the numeric type.

12 tasks

2 runs - estimation_procedure: Custom 10-fold Crossvalidation - target_feature: pXC50
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
Define a new task