Data
QSAR-DATASET-FOR-DRUG-TARGET-CHEMBL2842

QSAR-DATASET-FOR-DRUG-TARGET-CHEMBL2842

deactivated ARFF Publicly available Visibility: public Uploaded 15-07-2016 by Noureddin Sadawi
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
This dataset contains QSAR data (from ChEMBL version 17) showing activity values (unit is pseudo-pCI50) of several compounds on drug target ChEMBL_ID: CHEMBL2842 (TID: 11400), and it has 1487 rows and 274 features (not including molecule IDs and class feature: molecule_id and pXC50). The features represent Molecular Descriptors which were generated from SMILES strings. Missing value imputation was applied to this dataset (By choosing the Median). Feature selection was also applied.

276 features

pXC50 (target)numeric612 unique values
0 missing
molecule_id (row identifier)nominal1487 unique values
0 missing
Eig03_AEA.dm.numeric570 unique values
0 missing
CATS2D_02_DDnumeric4 unique values
0 missing
MWC06numeric640 unique values
0 missing
nCONNnumeric3 unique values
0 missing
DECCnumeric728 unique values
0 missing
D.Dtr07numeric139 unique values
0 missing
SpMaxA_EA.ed.numeric293 unique values
0 missing
P_VSA_LogP_2numeric424 unique values
0 missing
CATS2D_07_DLnumeric13 unique values
0 missing
GATS8enumeric813 unique values
0 missing
IC5numeric682 unique values
0 missing
nR07numeric3 unique values
0 missing
ATS6vnumeric881 unique values
0 missing
SpDiam_EA.bo.numeric316 unique values
0 missing
SM15_EA.bo.numeric701 unique values
0 missing
Eig01_EA.bo.numeric312 unique values
0 missing
ATS5snumeric880 unique values
0 missing
SM11_AEA.ri.numeric312 unique values
0 missing
Eig06_AEA.bo.numeric612 unique values
0 missing
SpMax_EA.bo.numeric312 unique values
0 missing
Eig09_EAnumeric600 unique values
0 missing
C.041numeric3 unique values
0 missing
SM03_AEA.dm.numeric600 unique values
0 missing
X2vnumeric1315 unique values
0 missing
ATSC8pnumeric1370 unique values
0 missing
NssNHnumeric5 unique values
0 missing
Psi_i_1numeric1292 unique values
0 missing
cRo5numeric2 unique values
0 missing
SM03_AEA.ed.numeric583 unique values
0 missing
DLS_01numeric4 unique values
0 missing
Eig07_EA.ed.numeric784 unique values
0 missing
N.072numeric5 unique values
0 missing
SM02_AEA.ri.numeric784 unique values
0 missing
Eig01_EA.ed.numeric460 unique values
0 missing
X2numeric974 unique values
0 missing
SM10_AEA.dm.numeric460 unique values
0 missing
Eig10_EA.ed.numeric773 unique values
0 missing
SpMax_EA.ed.numeric460 unique values
0 missing
Infective.80numeric2 unique values
0 missing
Eig01_AEA.ed.numeric338 unique values
0 missing
SpMax_AEA.ed.numeric338 unique values
0 missing
Eig01_AEA.bo.numeric320 unique values
0 missing
SpMax_AEA.bo.numeric320 unique values
0 missing
DLS_03numeric5 unique values
0 missing
SM10_EA.dm.numeric249 unique values
0 missing
SM12_EA.dm.numeric217 unique values
0 missing
CATS2D_08_DAnumeric9 unique values
0 missing
SM15_EA.dm.numeric86 unique values
0 missing
SM13_EA.dm.numeric90 unique values
0 missing
SM14_EA.dm.numeric203 unique values
0 missing
SM09_EA.dm.numeric113 unique values
0 missing
SM11_EA.dm.numeric96 unique values
0 missing
SM06_EA.dm.numeric355 unique values
0 missing
Neoplastic.80numeric2 unique values
0 missing
SM07_EA.dm.numeric132 unique values
0 missing
CIC4numeric651 unique values
0 missing
SM13_EA.ed.numeric769 unique values
0 missing
CIC5numeric629 unique values
0 missing
LLS_02numeric5 unique values
0 missing
SM08_EA.dm.numeric309 unique values
0 missing
CMC.80numeric2 unique values
0 missing
SNarnumeric222 unique values
0 missing
SM14_EA.ed.numeric760 unique values
0 missing
Inflammat.80numeric2 unique values
0 missing
SpMin5_Bh.i.numeric558 unique values
0 missing
Eig14_EA.ri.numeric743 unique values
0 missing
IDDEnumeric459 unique values
0 missing
SM15_EA.ed.numeric747 unique values
0 missing
GATS5snumeric783 unique values
0 missing
Eig05_AEA.ri.numeric688 unique values
0 missing
Chi0_EA.ed.numeric985 unique values
0 missing
Eig12_EA.ri.numeric725 unique values
0 missing
NssCH2numeric19 unique values
0 missing
SpMin5_Bh.e.numeric549 unique values
0 missing
SpDiam_EA.dm.numeric101 unique values
0 missing
nNnumeric13 unique values
0 missing
Eig14_EA.bo.numeric649 unique values
0 missing
Chi1_AEA.bo.numeric909 unique values
0 missing
Chi1_AEA.dm.numeric909 unique values
0 missing
Chi1_AEA.ed.numeric909 unique values
0 missing
Chi1_AEA.ri.numeric909 unique values
0 missing
Chi1_EAnumeric909 unique values
0 missing
CIC3numeric685 unique values
0 missing
CATS2D_04_DAnumeric8 unique values
0 missing
C.031numeric3 unique values
0 missing
SpMax6_Bh.i.numeric583 unique values
0 missing
N.071numeric4 unique values
0 missing
Eig14_EAnumeric616 unique values
0 missing
SM08_AEA.dm.numeric616 unique values
0 missing
Eig01_AEA.dm.numeric425 unique values
0 missing
SpDiam_AEA.dm.numeric431 unique values
0 missing
SpMax_AEA.dm.numeric425 unique values
0 missing
SaaNnumeric1222 unique values
0 missing
SM04_EA.dm.numeric385 unique values
0 missing
nArNR2numeric4 unique values
0 missing
ECCnumeric534 unique values
0 missing
DLS_02numeric5 unique values
0 missing
NdOnumeric7 unique values
0 missing
O.058numeric7 unique values
0 missing
HVcpxnumeric710 unique values
0 missing
Depressant.80numeric2 unique values
0 missing
X1Kupnumeric1254 unique values
0 missing
Eig14_EA.ed.numeric670 unique values
0 missing
SM09_AEA.ri.numeric670 unique values
0 missing
P_VSA_e_3numeric311 unique values
0 missing
Eig02_EA.bo.numeric424 unique values
0 missing
SM12_AEA.ri.numeric424 unique values
0 missing
CMC.50numeric2 unique values
0 missing
Spnumeric1004 unique values
0 missing
SpMax4_Bh.p.numeric576 unique values
0 missing
IDEnumeric720 unique values
0 missing
SssNHnumeric731 unique values
0 missing
CATS2D_04_ALnumeric30 unique values
0 missing
Eig07_AEA.bo.numeric614 unique values
0 missing
X1vnumeric1259 unique values
0 missing
Neoplastic.50numeric2 unique values
0 missing
C.024numeric16 unique values
0 missing
P_VSA_MR_2numeric253 unique values
0 missing
Hypertens.80numeric2 unique values
0 missing
Eig01_EAnumeric298 unique values
0 missing
SM09_AEA.bo.numeric298 unique values
0 missing
SpDiam_EAnumeric298 unique values
0 missing
SpMax_EAnumeric298 unique values
0 missing
MPC07numeric254 unique values
0 missing
SM14_EA.bo.numeric698 unique values
0 missing
ATS2pnumeric750 unique values
0 missing
SM13_EA.bo.numeric717 unique values
0 missing
JGI3numeric60 unique values
0 missing
P_VSA_MR_5numeric1054 unique values
0 missing
MWC08numeric670 unique values
0 missing
BIC0numeric140 unique values
0 missing
SpMax5_Bh.v.numeric642 unique values
0 missing
MPC06numeric208 unique values
0 missing
SpMin4_Bh.s.numeric500 unique values
0 missing
Eta_FLnumeric1193 unique values
0 missing
SIC4numeric204 unique values
0 missing
GGI5numeric511 unique values
0 missing
X5numeric977 unique values
0 missing
SpMin8_Bh.i.numeric586 unique values
0 missing
MWC07numeric650 unique values
0 missing
piPC02numeric270 unique values
0 missing
SM02_EA.bo.numeric270 unique values
0 missing
Eig06_AEA.ed.numeric621 unique values
0 missing
MPC05numeric173 unique values
0 missing
MPC09numeric360 unique values
0 missing
CATS2D_08_AAnumeric10 unique values
0 missing
SpMax5_Bh.p.numeric620 unique values
0 missing
SIC5numeric198 unique values
0 missing
X4numeric983 unique values
0 missing
SpAD_EA.ed.numeric1028 unique values
0 missing
VARnumeric284 unique values
0 missing
SRW06numeric472 unique values
0 missing
TPSA.Tot.numeric634 unique values
0 missing
P_VSA_e_2numeric1227 unique values
0 missing
SpDiam_EA.ed.numeric561 unique values
0 missing
Eig15_EA.bo.numeric682 unique values
0 missing
SpMin8_Bh.s.numeric528 unique values
0 missing
Eig15_EA.ri.numeric782 unique values
0 missing
SM04_AEA.ed.numeric625 unique values
0 missing
SpMax6_Bh.p.numeric590 unique values
0 missing
P_VSA_s_5numeric46 unique values
0 missing
Eig05_AEA.bo.numeric564 unique values
0 missing
SM05_EA.dm.numeric135 unique values
0 missing
Eig15_EA.ed.numeric693 unique values
0 missing
SM10_AEA.ri.numeric693 unique values
0 missing
Eig15_AEA.ri.numeric797 unique values
0 missing
Eig11_EA.ed.numeric739 unique values
0 missing
SM06_AEA.ri.numeric739 unique values
0 missing
Eig10_EAnumeric584 unique values
0 missing
SM04_AEA.dm.numeric584 unique values
0 missing
Eig15_EAnumeric611 unique values
0 missing
SM09_AEA.dm.numeric611 unique values
0 missing
CATS2D_05_PLnumeric5 unique values
0 missing
ATS8mnumeric957 unique values
0 missing
SpMax7_Bh.p.numeric575 unique values
0 missing
P_VSA_i_4numeric386 unique values
0 missing
ATS6mnumeric861 unique values
0 missing
Eig12_AEA.bo.numeric598 unique values
0 missing
SM02_AEA.ed.numeric224 unique values
0 missing
MWC03numeric215 unique values
0 missing
ZM2numeric215 unique values
0 missing
SM10_EA.ed.numeric778 unique values
0 missing
nCbHnumeric14 unique values
0 missing
SpDiam_AEA.bo.numeric381 unique values
0 missing
SRW10numeric628 unique values
0 missing
P_VSA_e_5numeric106 unique values
0 missing
SpMin6_Bh.p.numeric526 unique values
0 missing
Eig10_AEA.dm.numeric677 unique values
0 missing
Eig14_AEA.ri.numeric778 unique values
0 missing
SpAD_EA.dm.numeric404 unique values
0 missing
Chi0_AEA.bo.numeric764 unique values
0 missing
Chi0_AEA.dm.numeric764 unique values
0 missing
Chi0_AEA.ed.numeric764 unique values
0 missing
Chi0_AEA.ri.numeric764 unique values
0 missing
Chi0_EAnumeric764 unique values
0 missing
ATS7snumeric872 unique values
0 missing
SpAD_EAnumeric1025 unique values
0 missing
SRW04numeric153 unique values
0 missing
Qindexnumeric37 unique values
0 missing
CATS2D_04_AAnumeric10 unique values
0 missing
Xindexnumeric267 unique values
0 missing
Eig13_AEA.bo.numeric612 unique values
0 missing
RDCHInumeric835 unique values
0 missing
Eig04_AEA.ed.numeric558 unique values
0 missing
Chi1_EA.ed.numeric938 unique values
0 missing
SpAD_AEA.ri.numeric1409 unique values
0 missing
Eig04_AEA.ri.numeric623 unique values
0 missing
SpMin3_Bh.s.numeric471 unique values
0 missing
Vindexnumeric200 unique values
0 missing
Eig09_AEA.ed.numeric646 unique values
0 missing
SM12_EA.ed.numeric780 unique values
0 missing
SpAD_AEA.ed.numeric1028 unique values
0 missing
Eig06_EA.bo.numeric618 unique values
0 missing
CATS2D_07_LLnumeric33 unique values
0 missing
Chi0_EA.ri.numeric1339 unique values
0 missing
MAXDNnumeric915 unique values
0 missing
NdssCnumeric10 unique values
0 missing
SpAD_AEA.bo.numeric1080 unique values
0 missing
X3solnumeric986 unique values
0 missing
Eig08_AEA.bo.numeric617 unique values
0 missing
SM02_EA.ed.numeric449 unique values
0 missing
Eig01_AEA.ri.numeric359 unique values
0 missing
SpMax_AEA.ri.numeric359 unique values
0 missing
MWC02numeric107 unique values
0 missing
ZM1numeric107 unique values
0 missing
Eig04_AEA.dm.numeric586 unique values
0 missing
RDSQnumeric1038 unique values
0 missing
X3numeric988 unique values
0 missing
MWC09numeric664 unique values
0 missing
SM12_EA.bo.numeric716 unique values
0 missing
X5solnumeric977 unique values
0 missing
Psi_i_0numeric1188 unique values
0 missing
Eig12_EA.bo.numeric633 unique values
0 missing
MPC10numeric421 unique values
0 missing
Chi1_EA.ri.numeric1369 unique values
0 missing
Xtnumeric120 unique values
0 missing
Eig05_EA.dm.numeric55 unique values
0 missing
BBInumeric72 unique values
0 missing
MPC02numeric72 unique values
0 missing
SM02_EAnumeric72 unique values
0 missing
Eig01_EA.dm.numeric86 unique values
0 missing
SpMax_EA.dm.numeric86 unique values
0 missing
Eig04_EA.ed.numeric742 unique values
0 missing
SM13_AEA.dm.numeric742 unique values
0 missing
Hypnotic.80numeric2 unique values
0 missing
ATS8snumeric931 unique values
0 missing
SpMin4_Bh.i.numeric502 unique values
0 missing
Eig04_EA.ri.numeric633 unique values
0 missing
Uindexnumeric1023 unique values
0 missing
SpAD_AEA.dm.numeric1237 unique values
0 missing
P_VSA_LogP_4numeric467 unique values
0 missing
TWCnumeric663 unique values
0 missing
SpMax8_Bh.i.numeric553 unique values
0 missing
SpAD_EA.ri.numeric1416 unique values
0 missing
SM03_EA.dm.numeric76 unique values
0 missing
ATSC8mnumeric1406 unique values
0 missing
Eig13_AEA.ed.numeric593 unique values
0 missing
IVDMnumeric380 unique values
0 missing
MPC01numeric51 unique values
0 missing
MWC01numeric51 unique values
0 missing
nBOnumeric51 unique values
0 missing
SRW02numeric51 unique values
0 missing
X1numeric807 unique values
0 missing
CIDnumeric508 unique values
0 missing
HDcpxnumeric232 unique values
0 missing
DLS_consnumeric44 unique values
0 missing
ATS8inumeric964 unique values
0 missing
BIC4numeric170 unique values
0 missing
piIDnumeric913 unique values
0 missing
MWC10numeric661 unique values
0 missing
SpMin7_Bh.m.numeric504 unique values
0 missing
Eig14_AEA.bo.numeric647 unique values
0 missing
Eig06_AEA.dm.numeric679 unique values
0 missing
Eig15_AEA.bo.numeric653 unique values
0 missing

62 properties

1487
Number of instances (rows) of the dataset.
276
Number of attributes (columns) of the dataset.
0
Number of distinct values of the target attribute (if it is nominal).
0
Number of missing values in the dataset.
0
Number of instances with at least one value missing.
275
Number of numeric attributes.
1
Number of nominal attributes.
Third quartile of entropy among attributes.
29.29
Maximum kurtosis among attributes of the numeric type.
0.04
Minimum of means among attributes of the numeric type.
0
Percentage of instances having missing values.
2.22
Third quartile of kurtosis among attributes of the numeric type.
404.89
Maximum of means among attributes of the numeric type.
Minimal mutual information between the nominal attributes and the target attribute.
0
Percentage of missing values.
10.95
Third quartile of means among attributes of the numeric type.
Maximum mutual information between the nominal attributes and the target attribute.
The minimal number of distinct values among attributes of the nominal type.
99.64
Percentage of numeric attributes.
Third quartile of mutual information between the nominal attributes and the target attribute.
The maximum number of distinct values among attributes of the nominal type.
-1.61
Minimum skewness among attributes of the numeric type.
0.36
Percentage of nominal attributes.
0.91
Third quartile of skewness among attributes of the numeric type.
3.93
Maximum skewness among attributes of the numeric type.
0.01
Minimum standard deviation of attributes of the numeric type.
First quartile of entropy among attributes.
2.58
Third quartile of standard deviation of attributes of the numeric type.
209.46
Maximum standard deviation of attributes of the numeric type.
Percentage of instances belonging to the least frequent class.
-0.15
First quartile of kurtosis among attributes of the numeric type.
Standard deviation of the number of distinct values among attributes of the nominal type.
Average entropy of the attributes.
Number of instances belonging to the least frequent class.
1.34
First quartile of means among attributes of the numeric type.
1.47
Mean kurtosis among attributes of the numeric type.
0
Number of binary attributes.
First quartile of mutual information between the nominal attributes and the target attribute.
15.17
Mean of means among attributes of the numeric type.
-0.26
First quartile of skewness among attributes of the numeric type.
0.03
Average class difference between consecutive instances.
Average mutual information between the nominal attributes and the target attribute.
0.28
First quartile of standard deviation of attributes of the numeric type.
Entropy of the target attribute values.
An estimate of the amount of irrelevant information in the attributes regarding the class. Equals (MeanAttributeEntropy - MeanMutualInformation) divided by MeanMutualInformation.
Second quartile (Median) of entropy among attributes.
0.19
Number of attributes divided by the number of instances.
Average number of distinct values among the attributes of the nominal type.
0.72
Second quartile (Median) of kurtosis among attributes of the numeric type.
Number of attributes needed to optimally describe the class (under the assumption of independence among attributes). Equals ClassEntropy divided by MeanMutualInformation.
0.38
Mean skewness among attributes of the numeric type.
4.23
Second quartile (Median) of means among attributes of the numeric type.
Percentage of instances belonging to the most frequent class.
5.06
Mean standard deviation of attributes of the numeric type.
Second quartile (Median) of mutual information between the nominal attributes and the target attribute.
Number of instances belonging to the most frequent class.
Minimal entropy among attributes.
0.28
Second quartile (Median) of skewness among attributes of the numeric type.
0.63
Second quartile (Median) of standard deviation of attributes of the numeric type.
Maximum entropy among attributes.
-1.97
Minimum kurtosis among attributes of the numeric type.
0
Percentage of binary attributes.

12 tasks

2 runs - estimation_procedure: Custom 10-fold Crossvalidation - target_feature: pXC50
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
Define a new task