Data
mtp

mtp

active ARFF Publicly available Visibility: public Uploaded 28-09-2014 by Joaquin Vanschoren
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
  • Social Media Statistics study_130
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
Author: Source: Unknown - Date unknown Please cite: This is one of 41 drug design datasets. The datasets with 1143 features are formed using Adriana.Code software (www.molecular-networks.com/software/adrianacode). The molecules and outputs are taken from the original studies (see below). The other datasets are taken exactly from the original studies. The last attribute in each file is the target. Original studies: carbolenes "B. D. Silverman and Daniel. E. Platt, J. Med. Chem. 1996, 39, 2129-2140" mtp2 "Bergstrom, C. A. S.; Norinder, U.; Luthman, K.; Artursson, P. Molecular Descriptors Influencing Melting Point and Their Role in Classification of Solid Drugs. J. Chem. Inf. Comput. Sci.; (Article); 2003; 43(4); 1177-1185" chang, cristalli, depreux, doherty, garrat2, garrat, heyl, krystek, lewis, penning, rosowsky, siddiqi, stevenson, strupcz, svensson, thompson, tsutumi, uejling, yokoyama1, yokoyama2 "David E Patterson, Richard D Cramer, Allan M Ferguson, Robert D Clark, Laurence W Weinberger. Neighbourhood Behaviour: A Useful Concept for Validation of ""Molecular Diversity"" Descriptors. J. Med. Chem. 1996 (39) 3049 - 3059." mtp "Karthikeyan, M.; Glen, R.C.; Bender, A. General melting point prediction based on a diverse compound dataset and artificial neural networks. J. Chem. Inf. Model.; 2005; 45(3); 581-590" benzo32 "Harrison,P.W. and Barlin,G.B. and Davies,L.P. and Ireland,S.J. and Matyus,P. and Wong,M.G., Syntheses, pharmacological evaluation and molecular modelling of substituted 6-alkoxyimidazo[1,2-b]pyridazines as new ligands for the benzodiazepine receptor, European Journal of Medicinal Chemistry, (31), 1996, 651-662" PHENETYL1 "H. Kubinyi (Ed.): ""QSAR: Hansch Analysis and Related Approaches"", VCH, Weinhein (Ger), 1993, pp.57-68" pah "Todeschini, R.; Gramatica, P.; Marengo, E.; Provenzani, R. Weighted Holistic Invariant Molecular Descriptors. Part 2. Theory Development and Applications on Modeling Physico-Chemical Properties of PolyAromatic Hydrocarbons (PAH). Chemom. Intell. Lab. Syst. 1995, 27, 221-229." pdgfr "R. Guha and P. Jurs. The Development of Linear, Ensemble and Non-linear Models for the Prediction and Interpretation of the Biological Activity of a Set of PDGFR Inhibitors. J. Chem. Inf. Comput. Sci. 2004, 44 (6), 2179-2189" Phen "Cammarata, A. Interrelationship of the Regression Models Used for Structure-Activity Analyses. J. Med. Chem. 1972, 15, 573-577" topo_2_1, yprop_4_1 "Jun Feng et al, Predictive Toxicology: Benchmarking Molecular Descriptors and Statistical Methods, J. Chem. Inf Comput. Sci., 2003 (43) 1463-1470" qsabr1, qsabr2 "Damborsky, J., Schultz, T.W., Comparison of the QSAR models for toxicity and biodegradability of anilines and phenols, Chemosphere 34: 429-446, 1997" qsartox "Blaha, L., Damborsky, J., Nemec, M., QSAR for acute toxicity of saturated and unsaturated halogenated aliphatic compounds, Chemosphere 36: 1345-1365, 1998" qsbr_rw1 "Damborsky, J. et al., Structure-biodegradability relationships for chlorinated dibenzo-p-dioxins and dibenzofurans, In: Wittich, R.-M., Biodegradation of dioxins and furans, R.G. Landes Company, Austin, 1998" qsbr_y2 "Damborsky, J. et al., A mechanistic approach to deriving QSBR- A case study: dehalogenation of haloaliphatic compounds, In: Peijnenburg, W.J.G.M., Damborsky, J., Biodegradability Prediction, Kluwer Academic Publishers" qsbralks "Damborsky, J. et al., Mechanism-based Quantitative Structure-Biodegradability Relationships for hydrolytic dehalogenation of chloro- and bromo-alkenes, Quantitative Structure-Activity Relationships 17: 450-458, 1998" qsfrdhla "Damborsky, J., Quantitative structure-function relationships of the single-point mutants of haloalkane dehalogenase: A multivariate approach, Qunatitative Structure-Activity Relationships 16: 126-135, 1997" qsfsr1 "Damborsky, J., Quantitative structure-function and structure-stability relationships of purposely modified proteins, Protein Engineering 11: 21-30, 1998" qsfsr2 "Damborsky, J., Quantitative structure-function and structure-stability relationships of purposely modified proteins, Protein Engineering 11: 21-30, 1998" qsprcmpx "Cajan, M. et al., Stability of Aromatic Amides with Bromide Anion: Quantitative Structure-Property Relationships, Journal of Chemical Information and Computer Sciences, in press, 2000" selwood "Selwood, D. L.; Livingstone, D. J.; Comley, J. C.; O'Dowd, A. B.; Hudson, A. T.; Jackson, P.; Jandu, K. S.; Rose, V. S.; Stables, J. N. Structure-Activity Relationships of Antifilarial Antimycin Analogues: A Multivariate Pattern Recognition Study J. Med. Chem., 1990, 33, 136-142"

203 features

oz203 (target)numeric800 unique values
0 missing
oz1numeric26 unique values
0 missing
oz2numeric51 unique values
0 missing
oz3numeric51 unique values
0 missing
oz4numeric16 unique values
0 missing
oz5numeric3523 unique values
0 missing
oz6numeric3527 unique values
0 missing
oz7numeric2026 unique values
0 missing
oz8numeric93 unique values
0 missing
oz9numeric30 unique values
0 missing
oz10numeric83 unique values
0 missing
oz11numeric2768 unique values
0 missing
oz12numeric2543 unique values
0 missing
oz13numeric53 unique values
0 missing
oz14numeric30 unique values
0 missing
oz15numeric474 unique values
0 missing
oz16numeric26 unique values
0 missing
oz17numeric88 unique values
0 missing
oz18numeric15 unique values
0 missing
oz19numeric29 unique values
0 missing
oz20numeric496 unique values
0 missing
oz21numeric85 unique values
0 missing
oz22numeric4 unique values
0 missing
oz23numeric3818 unique values
0 missing
oz24numeric1420 unique values
0 missing
oz25numeric4202 unique values
0 missing
oz26numeric2086 unique values
0 missing
oz27numeric2 unique values
0 missing
oz28numeric3393 unique values
0 missing
oz29numeric49 unique values
0 missing
oz30numeric6 unique values
0 missing
oz31numeric40 unique values
0 missing
oz32numeric7 unique values
0 missing
oz33numeric9 unique values
0 missing
oz34numeric3 unique values
0 missing
oz35numeric11 unique values
0 missing
oz36numeric19 unique values
0 missing
oz37numeric4 unique values
0 missing
oz38numeric6 unique values
0 missing
oz39numeric53 unique values
0 missing
oz40numeric1263 unique values
0 missing
oz41numeric1233 unique values
0 missing
oz42numeric2279 unique values
0 missing
oz43numeric1916 unique values
0 missing
oz44numeric6 unique values
0 missing
oz45numeric228 unique values
0 missing
oz46numeric53 unique values
0 missing
oz47numeric118 unique values
0 missing
oz48numeric3772 unique values
0 missing
oz49numeric4372 unique values
0 missing
oz50numeric4374 unique values
0 missing
oz51numeric4383 unique values
0 missing
oz52numeric4375 unique values
0 missing
oz53numeric1789 unique values
0 missing
oz54numeric1633 unique values
0 missing
oz55numeric390 unique values
0 missing
oz56numeric195 unique values
0 missing
oz57numeric105 unique values
0 missing
oz58numeric98 unique values
0 missing
oz59numeric79 unique values
0 missing
oz60numeric744 unique values
0 missing
oz61numeric578 unique values
0 missing
oz62numeric77 unique values
0 missing
oz63numeric81 unique values
0 missing
oz64numeric66 unique values
0 missing
oz65numeric196 unique values
0 missing
oz66numeric201 unique values
0 missing
oz67numeric3811 unique values
0 missing
oz68numeric4140 unique values
0 missing
oz69numeric3746 unique values
0 missing
oz70numeric3810 unique values
0 missing
oz71numeric4140 unique values
0 missing
oz72numeric3131 unique values
0 missing
oz73numeric3906 unique values
0 missing
oz74numeric4027 unique values
0 missing
oz75numeric858 unique values
0 missing
oz76numeric1431 unique values
0 missing
oz77numeric3949 unique values
0 missing
oz78numeric418 unique values
0 missing
oz79numeric2049 unique values
0 missing
oz80numeric2010 unique values
0 missing
oz81numeric2049 unique values
0 missing
oz82numeric2010 unique values
0 missing
oz83numeric4244 unique values
0 missing
oz84numeric4256 unique values
0 missing
oz85numeric3804 unique values
0 missing
oz86numeric4147 unique values
0 missing
oz87numeric3740 unique values
0 missing
oz88numeric3803 unique values
0 missing
oz89numeric4147 unique values
0 missing
oz90numeric3125 unique values
0 missing
oz91numeric3906 unique values
0 missing
oz92numeric4040 unique values
0 missing
oz93numeric855 unique values
0 missing
oz94numeric1423 unique values
0 missing
oz95numeric3953 unique values
0 missing
oz96numeric418 unique values
0 missing
oz97numeric4244 unique values
0 missing
oz98numeric4256 unique values
0 missing
oz99numeric227 unique values
0 missing
oz100numeric538 unique values
0 missing
oz101numeric944 unique values
0 missing
oz102numeric2963 unique values
0 missing
oz103numeric3448 unique values
0 missing
oz104numeric3889 unique values
0 missing
oz105numeric3600 unique values
0 missing
oz106numeric3397 unique values
0 missing
oz107numeric2468 unique values
0 missing
oz108numeric4243 unique values
0 missing
oz109numeric14 unique values
0 missing
oz110numeric7 unique values
0 missing
oz111numeric6 unique values
0 missing
oz112numeric9 unique values
0 missing
oz113numeric38 unique values
0 missing
oz114numeric262 unique values
0 missing
oz115numeric18 unique values
0 missing
oz116numeric20 unique values
0 missing
oz117numeric45 unique values
0 missing
oz118numeric2638 unique values
0 missing
oz119numeric897 unique values
0 missing
oz120numeric50 unique values
0 missing
oz121numeric4007 unique values
0 missing
oz122numeric250 unique values
0 missing
oz123numeric789 unique values
0 missing
oz124numeric327 unique values
0 missing
oz125numeric229 unique values
0 missing
oz126numeric188 unique values
0 missing
oz127numeric455 unique values
0 missing
oz128numeric9 unique values
0 missing
oz129numeric746 unique values
0 missing
oz130numeric85 unique values
0 missing
oz131numeric973 unique values
0 missing
oz132numeric4051 unique values
0 missing
oz133numeric276 unique values
0 missing
oz134numeric409 unique values
0 missing
oz135numeric323 unique values
0 missing
oz136numeric576 unique values
0 missing
oz137numeric535 unique values
0 missing
oz138numeric496 unique values
0 missing
oz139numeric535 unique values
0 missing
oz140numeric530 unique values
0 missing
oz141numeric1429 unique values
0 missing
oz142numeric3875 unique values
0 missing
oz143numeric3937 unique values
0 missing
oz144numeric3880 unique values
0 missing
oz145numeric3423 unique values
0 missing
oz146numeric4177 unique values
0 missing
oz147numeric4349 unique values
0 missing
oz148numeric4422 unique values
0 missing
oz149numeric4406 unique values
0 missing
oz150numeric4196 unique values
0 missing
oz151numeric4371 unique values
0 missing
oz152numeric4212 unique values
0 missing
oz153numeric4358 unique values
0 missing
oz154numeric4276 unique values
0 missing
oz155numeric4246 unique values
0 missing
oz156numeric4414 unique values
0 missing
oz157numeric2327 unique values
0 missing
oz158numeric4054 unique values
0 missing
oz159numeric3928 unique values
0 missing
oz160numeric3157 unique values
0 missing
oz161numeric3032 unique values
0 missing
oz162numeric4351 unique values
0 missing
oz163numeric4407 unique values
0 missing
oz164numeric4130 unique values
0 missing
oz165numeric4384 unique values
0 missing
oz166numeric3927 unique values
0 missing
oz167numeric4397 unique values
0 missing
oz168numeric4005 unique values
0 missing
oz169numeric1926 unique values
0 missing
oz170numeric1121 unique values
0 missing
oz171numeric1621 unique values
0 missing
oz172numeric4199 unique values
0 missing
oz173numeric4136 unique values
0 missing
oz174numeric2841 unique values
0 missing
oz175numeric4257 unique values
0 missing
oz176numeric4421 unique values
0 missing
oz177numeric4397 unique values
0 missing
oz178numeric2246 unique values
0 missing
oz179numeric3008 unique values
0 missing
oz180numeric1178 unique values
0 missing
oz181numeric4385 unique values
0 missing
oz182numeric4429 unique values
0 missing
oz183numeric4322 unique values
0 missing
oz184numeric4427 unique values
0 missing
oz185numeric4228 unique values
0 missing
oz186numeric4399 unique values
0 missing
oz187numeric4386 unique values
0 missing
oz188numeric4429 unique values
0 missing
oz189numeric4404 unique values
0 missing
oz190numeric738 unique values
0 missing
oz191numeric4413 unique values
0 missing
oz192numeric4248 unique values
0 missing
oz193numeric4036 unique values
0 missing
oz194numeric4419 unique values
0 missing
oz195numeric4405 unique values
0 missing
oz196numeric4419 unique values
0 missing
oz197numeric4303 unique values
0 missing
oz198numeric4263 unique values
0 missing
oz199numeric4340 unique values
0 missing
oz200numeric4356 unique values
0 missing
oz201numeric4369 unique values
0 missing
oz202numeric926 unique values
0 missing

107 properties

4450
Number of instances (rows) of the dataset.
203
Number of attributes (columns) of the dataset.
0
Number of distinct values of the target attribute (if it is nominal).
0
Number of missing values in the dataset.
0
Number of instances with at least one value missing.
203
Number of numeric attributes.
0
Number of nominal attributes.
4.15
Second quartile (Median) of kurtosis among attributes of the numeric type.
Kappa coefficient achieved by the landmarker weka.classifiers.trees.REPTree -L 2
Entropy of the target attribute values.
Kappa coefficient achieved by the landmarker weka.classifiers.lazy.IBk
Number of instances belonging to the most frequent class.
Minimal entropy among attributes.
0.22
Second quartile (Median) of means among attributes of the numeric type.
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.REPTree -L 3
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.DecisionStump
Maximum entropy among attributes.
-0.36
Minimum kurtosis among attributes of the numeric type.
Second quartile (Median) of mutual information between the nominal attributes and the target attribute.
Error rate achieved by the landmarker weka.classifiers.trees.REPTree -L 3
Error rate achieved by the landmarker weka.classifiers.trees.DecisionStump
2091.73
Maximum kurtosis among attributes of the numeric type.
0
Minimum of means among attributes of the numeric type.
1.15
Second quartile (Median) of skewness among attributes of the numeric type.
Kappa coefficient achieved by the landmarker weka.classifiers.trees.REPTree -L 3
Kappa coefficient achieved by the landmarker weka.classifiers.trees.DecisionStump
0.95
Maximum of means among attributes of the numeric type.
Minimal mutual information between the nominal attributes and the target attribute.
0
Percentage of binary attributes.
0.12
Second quartile (Median) of standard deviation of attributes of the numeric type.
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.RandomTree -depth 1
0.05
Number of attributes divided by the number of instances.
Maximum mutual information between the nominal attributes and the target attribute.
The minimal number of distinct values among attributes of the nominal type.
0
Percentage of instances having missing values.
Third quartile of entropy among attributes.
Error rate achieved by the landmarker weka.classifiers.trees.RandomTree -depth 1
Number of attributes needed to optimally describe the class (under the assumption of independence among attributes). Equals ClassEntropy divided by MeanMutualInformation.
The maximum number of distinct values among attributes of the nominal type.
-36.66
Minimum skewness among attributes of the numeric type.
0
Percentage of missing values.
13.83
Third quartile of kurtosis among attributes of the numeric type.
0.99
Average class difference between consecutive instances.
Kappa coefficient achieved by the landmarker weka.classifiers.trees.RandomTree -depth 1
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.J48 -C .00001
40.64
Maximum skewness among attributes of the numeric type.
0.02
Minimum standard deviation of attributes of the numeric type.
100
Percentage of numeric attributes.
0.36
Third quartile of means among attributes of the numeric type.
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.DecisionStump -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.RandomTree -depth 2
Error rate achieved by the landmarker weka.classifiers.trees.J48 -C .00001
0.41
Maximum standard deviation of attributes of the numeric type.
Percentage of instances belonging to the least frequent class.
0
Percentage of nominal attributes.
Third quartile of mutual information between the nominal attributes and the target attribute.
Error rate achieved by the landmarker weka.classifiers.trees.DecisionStump -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
Error rate achieved by the landmarker weka.classifiers.trees.RandomTree -depth 2
Kappa coefficient achieved by the landmarker weka.classifiers.trees.J48 -C .00001
Average entropy of the attributes.
Number of instances belonging to the least frequent class.
First quartile of entropy among attributes.
2.34
Third quartile of skewness among attributes of the numeric type.
Kappa coefficient achieved by the landmarker weka.classifiers.trees.DecisionStump -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
Kappa coefficient achieved by the landmarker weka.classifiers.trees.RandomTree -depth 2
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.J48 -C .0001
79.79
Mean kurtosis among attributes of the numeric type.
Area Under the ROC Curve achieved by the landmarker weka.classifiers.bayes.NaiveBayes
1
First quartile of kurtosis among attributes of the numeric type.
0.14
Third quartile of standard deviation of attributes of the numeric type.
Area Under the ROC Curve achieved by the landmarker weka.classifiers.bayes.NaiveBayes -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.RandomTree -depth 3
Error rate achieved by the landmarker weka.classifiers.trees.J48 -C .0001
0.27
Mean of means among attributes of the numeric type.
Error rate achieved by the landmarker weka.classifiers.bayes.NaiveBayes
0.1
First quartile of means among attributes of the numeric type.
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.REPTree -L 1
Error rate achieved by the landmarker weka.classifiers.bayes.NaiveBayes -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
Error rate achieved by the landmarker weka.classifiers.trees.RandomTree -depth 3
Kappa coefficient achieved by the landmarker weka.classifiers.trees.J48 -C .0001
Average mutual information between the nominal attributes and the target attribute.
Kappa coefficient achieved by the landmarker weka.classifiers.bayes.NaiveBayes
First quartile of mutual information between the nominal attributes and the target attribute.
Error rate achieved by the landmarker weka.classifiers.trees.REPTree -L 1
Kappa coefficient achieved by the landmarker weka.classifiers.bayes.NaiveBayes -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
Kappa coefficient achieved by the landmarker weka.classifiers.trees.RandomTree -depth 3
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.J48 -C .001
An estimate of the amount of irrelevant information in the attributes regarding the class. Equals (MeanAttributeEntropy - MeanMutualInformation) divided by MeanMutualInformation.
0
Number of binary attributes.
0.56
First quartile of skewness among attributes of the numeric type.
Kappa coefficient achieved by the landmarker weka.classifiers.trees.REPTree -L 1
Area Under the ROC Curve achieved by the landmarker weka.classifiers.lazy.IBk -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
Standard deviation of the number of distinct values among attributes of the nominal type.
Error rate achieved by the landmarker weka.classifiers.trees.J48 -C .001
Average number of distinct values among the attributes of the nominal type.
0.09
First quartile of standard deviation of attributes of the numeric type.
Area Under the ROC Curve achieved by the landmarker weka.classifiers.trees.REPTree -L 2
Error rate achieved by the landmarker weka.classifiers.lazy.IBk -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
Area Under the ROC Curve achieved by the landmarker weka.classifiers.lazy.IBk
Kappa coefficient achieved by the landmarker weka.classifiers.trees.J48 -C .001
2.24
Mean skewness among attributes of the numeric type.
Second quartile (Median) of entropy among attributes.
Error rate achieved by the landmarker weka.classifiers.trees.REPTree -L 2
Kappa coefficient achieved by the landmarker weka.classifiers.lazy.IBk -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W
Error rate achieved by the landmarker weka.classifiers.lazy.IBk
Percentage of instances belonging to the most frequent class.
0.11
Mean standard deviation of attributes of the numeric type.

14 tasks

0 runs - estimation_procedure: 10 times 10-fold Crossvalidation - evaluation_measure: mean_absolute_error - target_feature: oz203
0 runs - estimation_procedure: 10-fold Crossvalidation - evaluation_measure: mean_absolute_error - target_feature: oz203
0 runs - estimation_procedure: 33% Holdout set - target_feature: oz203
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
0 runs - estimation_procedure: 50 times Clustering
Define a new task