Data
QSAR-DATASET-FOR-DRUG-TARGET-CHEMBL2534

QSAR-DATASET-FOR-DRUG-TARGET-CHEMBL2534

deactivated ARFF Publicly available Visibility: public Uploaded 14-07-2016 by Noureddin Sadawi
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
This dataset contains QSAR data (from ChEMBL version 17) showing activity values (unit is pseudo-pCI50) of several compounds on drug target ChEMBL_ID: CHEMBL2534 (TID: 10612), and it has 1042 rows and 139 features (not including molecule IDs and class feature: molecule_id and pXC50). The features represent Molecular Descriptors which were generated from SMILES strings. Missing value imputation was applied to this dataset (By choosing the Median). Feature selection was also applied.

141 features

pXC50 (target)numeric220 unique values
0 missing
molecule_id (row identifier)nominal1042 unique values
0 missing
P_VSA_e_3numeric330 unique values
0 missing
CATS2D_07_APnumeric4 unique values
0 missing
piPC09numeric759 unique values
0 missing
TPSA.NO.numeric588 unique values
0 missing
SpMin1_Bh.p.numeric198 unique values
0 missing
piPC10numeric762 unique values
0 missing
SpMin1_Bh.i.numeric224 unique values
0 missing
nPyridinesnumeric5 unique values
0 missing
piPC08numeric737 unique values
0 missing
piPC07numeric717 unique values
0 missing
CATS2D_07_DAnumeric8 unique values
0 missing
nR10numeric4 unique values
0 missing
P_VSA_i_4numeric417 unique values
0 missing
Hynumeric491 unique values
0 missing
SpMin1_Bh.m.numeric217 unique values
0 missing
H.050numeric14 unique values
0 missing
nHDonnumeric14 unique values
0 missing
piPC06numeric696 unique values
0 missing
P_VSA_p_2numeric588 unique values
0 missing
piPC05numeric629 unique values
0 missing
SsNH2numeric399 unique values
0 missing
N.075numeric6 unique values
0 missing
NaaNnumeric6 unique values
0 missing
TPSA.Tot.numeric612 unique values
0 missing
CATS2D_04_PLnumeric7 unique values
0 missing
SssOnumeric350 unique values
0 missing
C.028numeric5 unique values
0 missing
SAdonnumeric108 unique values
0 missing
CATS2D_03_PLnumeric5 unique values
0 missing
GATS1snumeric461 unique values
0 missing
SaaaCnumeric563 unique values
0 missing
GATS1enumeric439 unique values
0 missing
N.069numeric3 unique values
0 missing
CATS2D_00_DDnumeric6 unique values
0 missing
CATS2D_00_DPnumeric6 unique values
0 missing
CATS2D_00_PPnumeric6 unique values
0 missing
NsNH2numeric6 unique values
0 missing
NssOnumeric7 unique values
0 missing
CATS2D_02_APnumeric7 unique values
0 missing
CATS2D_08_APnumeric5 unique values
0 missing
D.Dtr10numeric194 unique values
0 missing
H.049numeric6 unique values
0 missing
NaaaCnumeric7 unique values
0 missing
CATS2D_02_PLnumeric5 unique values
0 missing
CATS2D_09_PPnumeric2 unique values
0 missing
CATS2D_07_AAnumeric9 unique values
0 missing
CATS2D_08_DAnumeric10 unique values
0 missing
SaaNnumeric778 unique values
0 missing
Eig04_EA.ed.numeric779 unique values
0 missing
SM13_AEA.dm.numeric779 unique values
0 missing
SM15_EA.bo.numeric729 unique values
0 missing
CATS2D_02_ALnumeric17 unique values
0 missing
SM13_EA.bo.numeric722 unique values
0 missing
PCDnumeric757 unique values
0 missing
SpMin6_Bh.m.numeric490 unique values
0 missing
SpMin1_Bh.s.numeric326 unique values
0 missing
SpMax1_Bh.e.numeric249 unique values
0 missing
nArORnumeric5 unique values
0 missing
SpMin1_Bh.v.numeric200 unique values
0 missing
Eig13_AEA.dm.numeric664 unique values
0 missing
SM14_EA.bo.numeric723 unique values
0 missing
CATS2D_09_DPnumeric4 unique values
0 missing
MATS1snumeric300 unique values
0 missing
Eig01_EA.bo.numeric439 unique values
0 missing
SM11_AEA.ri.numeric439 unique values
0 missing
SpDiam_EA.bo.numeric442 unique values
0 missing
SpMax_EA.bo.numeric439 unique values
0 missing
Chi0_EA.dm.numeric845 unique values
0 missing
X1solnumeric748 unique values
0 missing
Eig12_EA.ed.numeric689 unique values
0 missing
SM07_AEA.ri.numeric689 unique values
0 missing
SpMaxA_AEA.ed.numeric233 unique values
0 missing
C.032numeric3 unique values
0 missing
SpMax8_Bh.i.numeric537 unique values
0 missing
CATS2D_06_PLnumeric9 unique values
0 missing
P_VSA_v_2numeric665 unique values
0 missing
SpMin6_Bh.p.numeric503 unique values
0 missing
SM08_EA.bo.numeric678 unique values
0 missing
SpMax8_Bh.v.numeric530 unique values
0 missing
Rperimnumeric33 unique values
0 missing
Eig11_EAnumeric587 unique values
0 missing
SM05_AEA.dm.numeric587 unique values
0 missing
SpMin1_Bh.e.numeric217 unique values
0 missing
SpMaxA_EA.ed.numeric328 unique values
0 missing
SM09_EA.bo.numeric718 unique values
0 missing
CATS2D_06_DLnumeric17 unique values
0 missing
SM12_EA.bo.numeric715 unique values
0 missing
SpMin6_Bh.v.numeric501 unique values
0 missing
SM10_EA.bo.numeric710 unique values
0 missing
SpMax8_Bh.e.numeric534 unique values
0 missing
CATS2D_04_DLnumeric16 unique values
0 missing
MPC10numeric309 unique values
0 missing
X0solnumeric535 unique values
0 missing
SM11_EA.bo.numeric706 unique values
0 missing
SpMin8_Bh.e.numeric509 unique values
0 missing
ATS5enumeric714 unique values
0 missing
Eig12_EAnumeric594 unique values
0 missing
SM06_AEA.dm.numeric594 unique values
0 missing
ATS7inumeric778 unique values
0 missing
SpMax1_Bh.v.numeric271 unique values
0 missing
SpMin7_Bh.m.numeric509 unique values
0 missing
ATS5inumeric728 unique values
0 missing
Chi0_AEA.bo.numeric663 unique values
0 missing
Chi0_AEA.dm.numeric663 unique values
0 missing
Chi0_AEA.ed.numeric663 unique values
0 missing
Chi0_AEA.ri.numeric663 unique values
0 missing
Chi0_EAnumeric663 unique values
0 missing
Eta_betaS_Anumeric129 unique values
0 missing
IC5numeric564 unique values
0 missing
Eig09_EAnumeric562 unique values
0 missing
SM03_AEA.dm.numeric562 unique values
0 missing
SNarnumeric187 unique values
0 missing
ATS7enumeric767 unique values
0 missing
X1numeric681 unique values
0 missing
nR06numeric7 unique values
0 missing
SpMin7_Bh.v.numeric516 unique values
0 missing
Eta_betaSnumeric98 unique values
0 missing
SpMax8_Bh.p.numeric526 unique values
0 missing
Spnumeric833 unique values
0 missing
IDETnumeric894 unique values
0 missing
S0Knumeric248 unique values
0 missing
X1Madnumeric948 unique values
0 missing
Svnumeric886 unique values
0 missing
Eig15_EA.bo.numeric680 unique values
0 missing
LPRSnumeric897 unique values
0 missing
X2solnumeric836 unique values
0 missing
P_VSA_MR_7numeric161 unique values
0 missing
Xunumeric878 unique values
0 missing
SpMaxA_EAnumeric134 unique values
0 missing
ATS1enumeric608 unique values
0 missing
GMTInumeric878 unique values
0 missing
X5numeric853 unique values
0 missing
Eig15_EA.ri.numeric727 unique values
0 missing
ATS1inumeric617 unique values
0 missing
CIDnumeric481 unique values
0 missing
Chi1_EA.ri.numeric961 unique values
0 missing
ON1numeric258 unique values
0 missing
Psi_i_0numeric925 unique values
0 missing
IVDMnumeric315 unique values
0 missing

107 properties

1042
Number of instances (rows) of the dataset.
141
Number of attributes (columns) of the dataset.
0
Number of distinct values of the target attribute (if it is nominal).
0
Number of missing values in the dataset.
0
Number of instances with at least one value missing.
140
Number of numeric attributes.
1
Number of nominal attributes.
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.REPTree -L 3
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.DecisionStump
Maximum entropy among attributes.
-0.83
Minimum kurtosis among attributes of the numeric type.
3.78
Second quartile (Median) of means among attributes of the numeric type.
Error rate achieved by the landmarker weka.classifiers.trees.REPTree -L 3
Error rate achieved by the landmarker weka.classifiers.trees.DecisionStump
951.19
Maximum kurtosis among attributes of the numeric type.
-0.4
Minimum of means among attributes of the numeric type.
Second quartile (Median) of mutual information between the nominal attributes and the target attribute.
Kappa coefficient achieved by the landmarker weka.classifiers.trees.REPTree -L 3
Kappa coefficient achieved by the landmarker weka.classifiers.trees.DecisionStump
13799.85
Maximum of means among attributes of the numeric type.
Minimal mutual information between the nominal attributes and the target attribute.
0.91
Second quartile (Median) of skewness among attributes of the numeric type.
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.RandomTree -depth 1
0.14
Number of attributes divided by the number of instances.
Maximum mutual information between the nominal attributes and the target attribute.
The minimal number of distinct values among attributes of the nominal type.
0
Percentage of binary attributes.
0.83
Second quartile (Median) of standard deviation of attributes of the numeric type.
Error rate achieved by the landmarker weka.classifiers.trees.RandomTree -depth 1
Number of attributes needed to optimally describe the class (under the assumption of independence among attributes). Equals ClassEntropy divided by MeanMutualInformation.
The maximum number of distinct values among attributes of the nominal type.
-2.03
Minimum skewness among attributes of the numeric type.
0
Percentage of instances having missing values.
Third quartile of entropy among attributes.
Kappa coefficient achieved by the landmarker weka.classifiers.trees.RandomTree -depth 1
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.J48 -C .00001
30.34
Maximum skewness among attributes of the numeric type.
0.03
Minimum standard deviation of attributes of the numeric type.
0
Percentage of missing values.
51.92
Third quartile of kurtosis among attributes of the numeric type.
0.47
Average class difference between consecutive instances.
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.RandomTree -depth 2
Error rate achieved by the landmarker weka.classifiers.trees.J48 -C .00001
82867.5
Maximum standard deviation of attributes of the numeric type.
Percentage of instances belonging to the least frequent class.
99.29
Percentage of numeric attributes.
13.74
Third quartile of means among attributes of the numeric type.
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.DecisionStump -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
Error rate achieved by the landmarker weka.classifiers.trees.RandomTree -depth 2
Kappa coefficient achieved by the landmarker weka.classifiers.trees.J48 -C .00001
Average entropy of the attributes.
Number of instances belonging to the least frequent class.
0.71
Percentage of nominal attributes.
Third quartile of mutual information between the nominal attributes and the target attribute.
Error rate achieved by the landmarker weka.classifiers.trees.DecisionStump -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
Kappa coefficient achieved by the landmarker weka.classifiers.trees.RandomTree -depth 2
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.J48 -C .0001
59.21
Mean kurtosis among attributes of the numeric type.
Area Under the ROC Curve achieved by the landmarker weka.classifiers.bayes.NaiveBayes
First quartile of entropy among attributes.
4.31
Third quartile of skewness among attributes of the numeric type.
Kappa coefficient achieved by the landmarker weka.classifiers.trees.DecisionStump -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
Area Under the ROC Curve achieved by the landmarker weka.classifiers.bayes.NaiveBayes -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.RandomTree -depth 3
Error rate achieved by the landmarker weka.classifiers.trees.J48 -C .0001
122.52
Mean of means among attributes of the numeric type.
Error rate achieved by the landmarker weka.classifiers.bayes.NaiveBayes
0.87
First quartile of kurtosis among attributes of the numeric type.
3.79
Third quartile of standard deviation of attributes of the numeric type.
Error rate achieved by the landmarker weka.classifiers.bayes.NaiveBayes -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
Error rate achieved by the landmarker weka.classifiers.trees.RandomTree -depth 3
Kappa coefficient achieved by the landmarker weka.classifiers.trees.J48 -C .0001
Average mutual information between the nominal attributes and the target attribute.
Kappa coefficient achieved by the landmarker weka.classifiers.bayes.NaiveBayes
0.89
First quartile of means among attributes of the numeric type.
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.REPTree -L 1
Kappa coefficient achieved by the landmarker weka.classifiers.bayes.NaiveBayes -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
Kappa coefficient achieved by the landmarker weka.classifiers.trees.RandomTree -depth 3
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.J48 -C .001
An estimate of the amount of irrelevant information in the attributes regarding the class. Equals (MeanAttributeEntropy - MeanMutualInformation) divided by MeanMutualInformation.
0
Number of binary attributes.
First quartile of mutual information between the nominal attributes and the target attribute.
Error rate achieved by the landmarker weka.classifiers.trees.REPTree -L 1
Area Under the ROC Curve achieved by the landmarker weka.classifiers.lazy.IBk -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
Standard deviation of the number of distinct values among attributes of the nominal type.
Error rate achieved by the landmarker weka.classifiers.trees.J48 -C .001
Average number of distinct values among the attributes of the nominal type.
-0.51
First quartile of skewness among attributes of the numeric type.
Kappa coefficient achieved by the landmarker weka.classifiers.trees.REPTree -L 1
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.REPTree -L 2
Error rate achieved by the landmarker weka.classifiers.lazy.IBk -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
Area Under the ROC Curve achieved by the landmarker weka.classifiers.lazy.IBk
Kappa coefficient achieved by the landmarker weka.classifiers.trees.J48 -C .001
3.02
Mean skewness among attributes of the numeric type.
0.37
First quartile of standard deviation of attributes of the numeric type.
Error rate achieved by the landmarker weka.classifiers.trees.REPTree -L 2
Kappa coefficient achieved by the landmarker weka.classifiers.lazy.IBk -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
Error rate achieved by the landmarker weka.classifiers.lazy.IBk
Percentage of instances belonging to the most frequent class.
630.17
Mean standard deviation of attributes of the numeric type.
Second quartile (Median) of entropy among attributes.
Kappa coefficient achieved by the landmarker weka.classifiers.trees.REPTree -L 2
Entropy of the target attribute values.
Kappa coefficient achieved by the landmarker weka.classifiers.lazy.IBk
Number of instances belonging to the most frequent class.
Minimal entropy among attributes.
3.07
Second quartile (Median) of kurtosis among attributes of the numeric type.

12 tasks

2 runs - estimation_procedure: Custom 10-fold Crossvalidation - target_feature: pXC50
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
Define a new task