Data
yprop_4_1

yprop_4_1

active ARFF Publicly available Visibility: public Uploaded 06-01-2023 by Leo Grin
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
Dataset used in the tabular data benchmark https://github.com/LeoGrin/tabular-benchmark, transformed in the same way. This dataset belongs to the "regression on numerical features" benchmark. Original link: https://openml.org/d/416 Original description: Author: Source: Unknown - Date unknown Please cite: This is one of 41 drug design datasets. The datasets with 1143 features are formed using Adriana.Code software (www.molecular-networks.com/software/adrianacode). The molecules and outputs are taken from the original studies (see below). The other datasets are taken exactly from the original studies. The last attribute in each file is the target. Original studies: carbolenes "B. D. Silverman and Daniel. E. Platt, J. Med. Chem. 1996, 39, 2129-2140" mtp2 "Bergstrom, C. A. S.; Norinder, U.; Luthman, K.; Artursson, P. Molecular Descriptors Influencing Melting Point and Their Role in Classification of Solid Drugs. J. Chem. Inf. Comput. Sci.; (Article); 2003; 43(4); 1177-1185" chang, cristalli, depreux, doherty, garrat2, garrat, heyl, krystek, lewis, penning, rosowsky, siddiqi, stevenson, strupcz, svensson, thompson, tsutumi, uejling, yokoyama1, yokoyama2 "David E Patterson, Richard D Cramer, Allan M Ferguson, Robert D Clark, Laurence W Weinberger. Neighbourhood Behaviour: A Useful Concept for Validation of ""Molecular Diversity"" Descriptors. J. Med. Chem. 1996 (39) 3049 - 3059." mtp "Karthikeyan, M.; Glen, R.C.; Bender, A. General melting point prediction based on a diverse compound dataset and artificial neural networks. J. Chem. Inf. Model.; 2005; 45(3); 581-590" benzo32 "Harrison,P.W. and Barlin,G.B. and Davies,L.P. and Ireland,S.J. and Matyus,P. and Wong,M.G., Syntheses, pharmacological evaluation and molecular modelling of substituted 6-alkoxyimidazo[1,2-b]pyridazines as new ligands for the benzodiazepine receptor, European Journal of Medicinal Chemistry, (31), 1996, 651-662" PHENETYL1 "H. Kubinyi (Ed.): ""QSAR: Hansch Analysis and Related Approaches"", VCH, Weinhein (Ger), 1993, pp.57-68" pah "Todeschini, R.; Gramatica, P.; Marengo, E.; Provenzani, R. Weighted Holistic Invariant Molecular Descriptors. Part 2. Theory Development and Applications on Modeling Physico-Chemical Properties of PolyAromatic Hydrocarbons (PAH). Chemom. Intell. Lab. Syst. 1995, 27, 221-229." pdgfr "R. Guha and P. Jurs. The Development of Linear, Ensemble and Non-linear Models for the Prediction and Interpretation of the Biological Activity of a Set of PDGFR Inhibitors. J. Chem. Inf. Comput. Sci. 2004, 44 (6), 2179-2189" Phen "Cammarata, A. Interrelationship of the Regression Models Used for Structure-Activity Analyses. J. Med. Chem. 1972, 15, 573-577" topo_2_1, yprop_4_1 "Jun Feng et al, Predictive Toxicology: Benchmarking Molecular Descriptors and Statistical Methods, J. Chem. Inf Comput. Sci., 2003 (43) 1463-1470" qsabr1, qsabr2 "Damborsky, J., Schultz, T.W., Comparison of the QSAR models for toxicity and biodegradability of anilines and phenols, Chemosphere 34: 429-446, 1997" qsartox "Blaha, L., Damborsky, J., Nemec, M., QSAR for acute toxicity of saturated and unsaturated halogenated aliphatic compounds, Chemosphere 36: 1345-1365, 1998" qsbr_rw1 "Damborsky, J. et al., Structure-biodegradability relationships for chlorinated dibenzo-p-dioxins and dibenzofurans, In: Wittich, R.-M., Biodegradation of dioxins and furans, R.G. Landes Company, Austin, 1998" qsbr_y2 "Damborsky, J. et al., A mechanistic approach to deriving QSBR- A case study: dehalogenation of haloaliphatic compounds, In: Peijnenburg, W.J.G.M., Damborsky, J., Biodegradability Prediction, Kluwer Academic Publishers" qsbralks "Damborsky, J. et al., Mechanism-based Quantitative Structure-Biodegradability Relationships for hydrolytic dehalogenation of chloro- and bromo-alkenes, Quantitative Structure-Activity Relationships 17: 450-458, 1998" qsfrdhla "Damborsky, J., Quantitative structure-function relationships of the single-point mutants of haloalkane dehalogenase: A multivariate approach, Qunatitative Structure-Activity Relationships 16: 126-135, 1997" qsfsr1 "Damborsky, J., Quantitative structure-function and structure-stability relationships of purposely modified proteins, Protein Engineering 11: 21-30, 1998" qsfsr2 "Damborsky, J., Quantitative structure-function and structure-stability relationships of purposely modified proteins, Protein Engineering 11: 21-30, 1998" qsprcmpx "Cajan, M. et al., Stability of Aromatic Amides with Bromide Anion: Quantitative Structure-Property Relationships, Journal of Chemical Information and Computer Sciences, in press, 2000" selwood "Selwood, D. L.; Livingstone, D. J.; Comley, J. C.; O'Dowd, A. B.; Hudson, A. T.; Jackson, P.; Jandu, K. S.; Rose, V. S.; Stables, J. N. Structure-Activity Relationships of Antifilarial Antimycin Analogues: A Multivariate Pattern Recognition Study J. Med. Chem., 1990, 33, 136-142"

43 features

oz252 (target)numeric1336 unique values
0 missing
oz1numeric800 unique values
0 missing
oz2numeric26 unique values
0 missing
oz3numeric586 unique values
0 missing
oz4numeric1125 unique values
0 missing
oz5numeric15 unique values
0 missing
oz6numeric33 unique values
0 missing
oz9numeric21 unique values
0 missing
oz10numeric19 unique values
0 missing
oz11numeric10 unique values
0 missing
oz12numeric31 unique values
0 missing
oz13numeric22 unique values
0 missing
oz31numeric10 unique values
0 missing
oz83numeric12 unique values
0 missing
oz87numeric12 unique values
0 missing
oz124numeric15 unique values
0 missing
oz125numeric27 unique values
0 missing
oz126numeric14 unique values
0 missing
oz127numeric31 unique values
0 missing
oz128numeric15 unique values
0 missing
oz131numeric14 unique values
0 missing
oz133numeric15 unique values
0 missing
oz149numeric31 unique values
0 missing
oz150numeric16 unique values
0 missing
oz151numeric14 unique values
0 missing
oz165numeric11 unique values
0 missing
oz171numeric58 unique values
0 missing
oz172numeric44 unique values
0 missing
oz173numeric10 unique values
0 missing
oz175numeric14 unique values
0 missing
oz176numeric20 unique values
0 missing
oz177numeric28 unique values
0 missing
oz178numeric11 unique values
0 missing
oz181numeric12 unique values
0 missing
oz183numeric13 unique values
0 missing
oz185numeric10 unique values
0 missing
oz197numeric10 unique values
0 missing
oz246numeric54 unique values
0 missing
oz247numeric1231 unique values
0 missing
oz248numeric501 unique values
0 missing
oz249numeric8379 unique values
0 missing
oz250numeric1773 unique values
0 missing
oz251numeric4638 unique values
0 missing

19 properties

8885
Number of instances (rows) of the dataset.
43
Number of attributes (columns) of the dataset.
0
Number of distinct values of the target attribute (if it is nominal).
0
Number of missing values in the dataset.
0
Number of instances with at least one value missing.
43
Number of numeric attributes.
0
Number of nominal attributes.
0
Percentage of missing values.
0.97
Average class difference between consecutive instances.
100
Percentage of numeric attributes.
0
Number of attributes divided by the number of instances.
Percentage of instances belonging to the most frequent class.
0
Percentage of nominal attributes.
Number of instances belonging to the most frequent class.
Percentage of instances belonging to the least frequent class.
Number of instances belonging to the least frequent class.
0
Number of binary attributes.
0
Percentage of binary attributes.
0
Percentage of instances having missing values.

1 tasks

0 runs - estimation_procedure: 10-fold Crossvalidation - evaluation_measure: root_mean_squared_error - target_feature: oz252
Define a new task