Data
QSAR-DATASET-FOR-DRUG-TARGET-CHEMBL2959

QSAR-DATASET-FOR-DRUG-TARGET-CHEMBL2959

deactivated ARFF Publicly available Visibility: public Uploaded 14-07-2016 by Noureddin Sadawi
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
This dataset contains QSAR data (from ChEMBL version 17) showing activity values (unit is pseudo-pCI50) of several compounds on drug target ChEMBL_ID: CHEMBL2959 (TID: 10918), and it has 1238 rows and 138 features (not including molecule IDs and class feature: molecule_id and pXC50). The features represent Molecular Descriptors which were generated from SMILES strings. Missing value imputation was applied to this dataset (By choosing the Median). Feature selection was also applied.

140 features

pXC50 (target)numeric301 unique values
0 missing
molecule_id (row identifier)nominal1238 unique values
0 missing
SpMin8_Bh.i.numeric538 unique values
0 missing
SpMin7_Bh.v.numeric542 unique values
0 missing
D.Dtr09numeric668 unique values
0 missing
SpMax8_Bh.v.numeric576 unique values
0 missing
SpMin7_Bh.p.numeric545 unique values
0 missing
SpMax8_Bh.p.numeric562 unique values
0 missing
SpMin6_Bh.p.numeric507 unique values
0 missing
D.Dtr05numeric824 unique values
0 missing
SpMin8_Bh.e.numeric539 unique values
0 missing
SpMin7_Bh.e.numeric528 unique values
0 missing
SpMax8_Bh.e.numeric565 unique values
0 missing
SpMin5_Bh.s.numeric528 unique values
0 missing
Eig14_AEA.bo.numeric649 unique values
0 missing
SpMax8_Bh.i.numeric539 unique values
0 missing
SpMin8_Bh.v.numeric540 unique values
0 missing
SpMin8_Bh.m.numeric538 unique values
0 missing
SpMin6_Bh.v.numeric504 unique values
0 missing
SpMax1_Bh.i.numeric249 unique values
0 missing
ATS2inumeric711 unique values
0 missing
Eig14_EA.ri.numeric730 unique values
0 missing
MPC10numeric302 unique values
0 missing
Eig14_AEA.ri.numeric728 unique values
0 missing
ATS2enumeric688 unique values
0 missing
Psi_i_0numeric1047 unique values
0 missing
Eig14_EAnumeric623 unique values
0 missing
SM08_AEA.dm.numeric623 unique values
0 missing
ATS2vnumeric623 unique values
0 missing
ATSC5mnumeric1208 unique values
0 missing
Eig15_EA.ed.numeric742 unique values
0 missing
SM10_AEA.ri.numeric742 unique values
0 missing
Eig15_EA.ri.numeric771 unique values
0 missing
ATS1inumeric645 unique values
0 missing
Eig15_AEA.bo.numeric651 unique values
0 missing
Spnumeric939 unique values
0 missing
Eig15_EA.bo.numeric744 unique values
0 missing
MPC07numeric196 unique values
0 missing
MPC09numeric280 unique values
0 missing
Sinumeric983 unique values
0 missing
ATS1pnumeric627 unique values
0 missing
C.034numeric6 unique values
0 missing
Chi0_EA.bo.numeric942 unique values
0 missing
Eig15_EAnumeric632 unique values
0 missing
SM09_AEA.dm.numeric632 unique values
0 missing
SpMax1_Bh.e.numeric248 unique values
0 missing
ATS1enumeric636 unique values
0 missing
TIC1numeric1080 unique values
0 missing
Eig08_AEA.ri.numeric698 unique values
0 missing
ATS1vnumeric621 unique values
0 missing
SpMin6_Bh.m.numeric513 unique values
0 missing
SpMin4_Bh.v.numeric475 unique values
0 missing
SpMax6_Bh.i.numeric543 unique values
0 missing
Eig15_AEA.ri.numeric769 unique values
0 missing
SpMax5_Bh.e.numeric580 unique values
0 missing
ATS2pnumeric672 unique values
0 missing
CIDnumeric483 unique values
0 missing
VvdwZAZnumeric1015 unique values
0 missing
ATS4enumeric777 unique values
0 missing
SpMaxA_EA.ri.numeric122 unique values
0 missing
SpMax4_Bh.v.numeric597 unique values
0 missing
Eig14_AEA.ed.numeric619 unique values
0 missing
Senumeric969 unique values
0 missing
Eig15_AEA.dm.numeric700 unique values
0 missing
SpMax4_Bh.p.numeric592 unique values
0 missing
ATS5inumeric799 unique values
0 missing
Svnumeric981 unique values
0 missing
Eig05_EA.ri.numeric681 unique values
0 missing
Chi0_EA.ri.numeric1099 unique values
0 missing
IVDMnumeric309 unique values
0 missing
SpMax6_Bh.e.numeric568 unique values
0 missing
X1Kupnumeric1074 unique values
0 missing
Eig14_EA.ed.numeric701 unique values
0 missing
SM09_AEA.ri.numeric701 unique values
0 missing
X3vnumeric1107 unique values
0 missing
ON1Vnumeric931 unique values
0 missing
SpMaxA_EAnumeric125 unique values
0 missing
ATS4inumeric796 unique values
0 missing
Eig09_AEA.ri.numeric676 unique values
0 missing
TIC2numeric1100 unique values
0 missing
TIC3numeric903 unique values
0 missing
X1numeric748 unique values
0 missing
nBTnumeric70 unique values
0 missing
SpMaxA_AEA.ed.numeric221 unique values
0 missing
ATS3enumeric713 unique values
0 missing
Chi0_EA.ed.numeric938 unique values
0 missing
Chi0_AEA.bo.numeric696 unique values
0 missing
Chi0_AEA.dm.numeric696 unique values
0 missing
Chi0_AEA.ed.numeric696 unique values
0 missing
Chi0_AEA.ri.numeric696 unique values
0 missing
Chi0_EAnumeric696 unique values
0 missing
Eig15_AEA.ed.numeric589 unique values
0 missing
X1Pernumeric1082 unique values
0 missing
SNarnumeric171 unique values
0 missing
Eig08_EA.ri.numeric680 unique values
0 missing
ISIZnumeric69 unique values
0 missing
nATnumeric69 unique values
0 missing
SsssNnumeric421 unique values
0 missing
TIC5numeric824 unique values
0 missing
IDMnumeric794 unique values
0 missing
SpMin4_Bh.i.numeric516 unique values
0 missing
SpMaxA_AEA.ri.numeric156 unique values
0 missing
MDDDnumeric970 unique values
0 missing
LPRSnumeric1002 unique values
0 missing
S1Knumeric846 unique values
0 missing
SpMin5_Bh.m.numeric520 unique values
0 missing
ATS3inumeric734 unique values
0 missing
SpMin7_Bh.m.numeric538 unique values
0 missing
Xtnumeric115 unique values
0 missing
TIC4numeric834 unique values
0 missing
X1MulPernumeric1081 unique values
0 missing
IDETnumeric996 unique values
0 missing
SpMin4_Bh.m.numeric487 unique values
0 missing
SpMax8_Bh.m.numeric587 unique values
0 missing
SpMin8_Bh.s.numeric459 unique values
0 missing
ATS5enumeric817 unique values
0 missing
S0Knumeric262 unique values
0 missing
SAtotnumeric1081 unique values
0 missing
IDMTnumeric1000 unique values
0 missing
SpMin4_Bh.p.numeric474 unique values
0 missing
GMTInumeric984 unique values
0 missing
ON1numeric279 unique values
0 missing
ATSC2mnumeric1093 unique values
0 missing
SpMaxA_AEA.bo.numeric168 unique values
0 missing
RDCHInumeric825 unique values
0 missing
ON0Vnumeric689 unique values
0 missing
CSInumeric682 unique values
0 missing
Eig05_AEA.ri.numeric664 unique values
0 missing
X0numeric398 unique values
0 missing
BIDnumeric131 unique values
0 missing
Eig08_AEA.dm.numeric698 unique values
0 missing
NsssNnumeric5 unique values
0 missing
ATSC2pnumeric1039 unique values
0 missing
Eig03_EAnumeric523 unique values
0 missing
SM11_AEA.bo.numeric523 unique values
0 missing
Psi_e_0numeric1069 unique values
0 missing
ATS3pnumeric704 unique values
0 missing
Eig08_EAnumeric625 unique values
0 missing
SM02_AEA.dm.numeric625 unique values
0 missing
IDDMnumeric343 unique values
0 missing

107 properties

1238
Number of instances (rows) of the dataset.
140
Number of attributes (columns) of the dataset.
0
Number of distinct values of the target attribute (if it is nominal).
0
Number of missing values in the dataset.
0
Number of instances with at least one value missing.
139
Number of numeric attributes.
1
Number of nominal attributes.
Kappa coefficient achieved by the landmarker weka.classifiers.trees.REPTree -L 2
Entropy of the target attribute values.
Kappa coefficient achieved by the landmarker weka.classifiers.lazy.IBk
Number of instances belonging to the most frequent class.
Minimal entropy among attributes.
-0.02
Second quartile (Median) of kurtosis among attributes of the numeric type.
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.REPTree -L 3
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.DecisionStump
Maximum entropy among attributes.
-0.94
Minimum kurtosis among attributes of the numeric type.
3.93
Second quartile (Median) of means among attributes of the numeric type.
Error rate achieved by the landmarker weka.classifiers.trees.REPTree -L 3
Error rate achieved by the landmarker weka.classifiers.trees.DecisionStump
5.36
Maximum kurtosis among attributes of the numeric type.
-0.15
Minimum of means among attributes of the numeric type.
Second quartile (Median) of mutual information between the nominal attributes and the target attribute.
Kappa coefficient achieved by the landmarker weka.classifiers.trees.REPTree -L 3
Kappa coefficient achieved by the landmarker weka.classifiers.trees.DecisionStump
25940.42
Maximum of means among attributes of the numeric type.
Minimal mutual information between the nominal attributes and the target attribute.
-0.5
Second quartile (Median) of skewness among attributes of the numeric type.
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.RandomTree -depth 1
0.11
Number of attributes divided by the number of instances.
Maximum mutual information between the nominal attributes and the target attribute.
The minimal number of distinct values among attributes of the nominal type.
0
Percentage of binary attributes.
0.63
Second quartile (Median) of standard deviation of attributes of the numeric type.
Error rate achieved by the landmarker weka.classifiers.trees.RandomTree -depth 1
Number of attributes needed to optimally describe the class (under the assumption of independence among attributes). Equals ClassEntropy divided by MeanMutualInformation.
The maximum number of distinct values among attributes of the nominal type.
-1.91
Minimum skewness among attributes of the numeric type.
0
Percentage of instances having missing values.
Third quartile of entropy among attributes.
Kappa coefficient achieved by the landmarker weka.classifiers.trees.RandomTree -depth 1
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.J48 -C .00001
1.47
Maximum skewness among attributes of the numeric type.
0.03
Minimum standard deviation of attributes of the numeric type.
0
Percentage of missing values.
1.22
Third quartile of kurtosis among attributes of the numeric type.
0.34
Average class difference between consecutive instances.
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.RandomTree -depth 2
Error rate achieved by the landmarker weka.classifiers.trees.J48 -C .00001
17422.74
Maximum standard deviation of attributes of the numeric type.
Percentage of instances belonging to the least frequent class.
99.29
Percentage of numeric attributes.
18.62
Third quartile of means among attributes of the numeric type.
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.DecisionStump -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
Error rate achieved by the landmarker weka.classifiers.trees.DecisionStump -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
Error rate achieved by the landmarker weka.classifiers.trees.RandomTree -depth 2
Kappa coefficient achieved by the landmarker weka.classifiers.trees.J48 -C .00001
Average entropy of the attributes.
Number of instances belonging to the least frequent class.
0.71
Percentage of nominal attributes.
Third quartile of mutual information between the nominal attributes and the target attribute.
Kappa coefficient achieved by the landmarker weka.classifiers.trees.DecisionStump -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
Kappa coefficient achieved by the landmarker weka.classifiers.trees.RandomTree -depth 2
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.J48 -C .0001
0.64
Mean kurtosis among attributes of the numeric type.
Area Under the ROC Curve achieved by the landmarker weka.classifiers.bayes.NaiveBayes
First quartile of entropy among attributes.
0.03
Third quartile of skewness among attributes of the numeric type.
Area Under the ROC Curve achieved by the landmarker weka.classifiers.bayes.NaiveBayes -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.RandomTree -depth 3
Error rate achieved by the landmarker weka.classifiers.trees.J48 -C .0001
328.94
Mean of means among attributes of the numeric type.
Error rate achieved by the landmarker weka.classifiers.bayes.NaiveBayes
-0.51
First quartile of kurtosis among attributes of the numeric type.
4.43
Third quartile of standard deviation of attributes of the numeric type.
Error rate achieved by the landmarker weka.classifiers.bayes.NaiveBayes -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
Error rate achieved by the landmarker weka.classifiers.trees.RandomTree -depth 3
Kappa coefficient achieved by the landmarker weka.classifiers.trees.J48 -C .0001
Average mutual information between the nominal attributes and the target attribute.
Kappa coefficient achieved by the landmarker weka.classifiers.bayes.NaiveBayes
1.41
First quartile of means among attributes of the numeric type.
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.REPTree -L 1
Error rate achieved by the landmarker weka.classifiers.trees.REPTree -L 1
Kappa coefficient achieved by the landmarker weka.classifiers.bayes.NaiveBayes -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
Kappa coefficient achieved by the landmarker weka.classifiers.trees.RandomTree -depth 3
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.J48 -C .001
An estimate of the amount of irrelevant information in the attributes regarding the class. Equals (MeanAttributeEntropy - MeanMutualInformation) divided by MeanMutualInformation.
0
Number of binary attributes.
First quartile of mutual information between the nominal attributes and the target attribute.
Kappa coefficient achieved by the landmarker weka.classifiers.trees.REPTree -L 1
Area Under the ROC Curve achieved by the landmarker weka.classifiers.lazy.IBk -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
Standard deviation of the number of distinct values among attributes of the nominal type.
Error rate achieved by the landmarker weka.classifiers.trees.J48 -C .001
Average number of distinct values among the attributes of the nominal type.
-0.76
First quartile of skewness among attributes of the numeric type.
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.REPTree -L 2
Error rate achieved by the landmarker weka.classifiers.lazy.IBk -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
Area Under the ROC Curve achieved by the landmarker weka.classifiers.lazy.IBk
Kappa coefficient achieved by the landmarker weka.classifiers.trees.J48 -C .001
-0.35
Mean skewness among attributes of the numeric type.
0.26
First quartile of standard deviation of attributes of the numeric type.
Error rate achieved by the landmarker weka.classifiers.trees.REPTree -L 2
Kappa coefficient achieved by the landmarker weka.classifiers.lazy.IBk -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
Error rate achieved by the landmarker weka.classifiers.lazy.IBk
Percentage of instances belonging to the most frequent class.
201.4
Mean standard deviation of attributes of the numeric type.
Second quartile (Median) of entropy among attributes.

12 tasks

2 runs - estimation_procedure: Custom 10-fold Crossvalidation - target_feature: pXC50
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
Define a new task