{ "data_id": "44039", "name": "yprop_4_1", "exact_name": "yprop_4_1", "version": 2, "version_label": null, "description": "Dataset used in the tabular data benchmark https:\/\/github.com\/LeoGrin\/tabular-benchmark, \n transformed in the same way. This dataset belongs to the \"regression on categorical and\n numerical features\" benchmark. Original description: \n \n**Author**: \n**Source**: Unknown - Date unknown \n**Please cite**: \n\nThis is one of 41 drug design datasets. The datasets with 1143 features are formed using Adriana.Code software (www.molecular-networks.com\/software\/adrianacode).\nThe molecules and outputs are taken from the original studies (see below). The other datasets are taken exactly from the original studies. \n\nThe last attribute in each file is the target.\n\nOriginal studies:\n\ncarbolenes\n\"B. D. Silverman and Daniel. E. Platt, J. Med. Chem. 1996, 39, 2129-2140\"\n\nmtp2\n\"Bergstrom, C. A. S.; Norinder, U.; Luthman, K.; Artursson, P. Molecular Descriptors Influencing Melting Point and Their Role in Classification of Solid Drugs. J. Chem. Inf. Comput. Sci.; (Article); 2003; 43(4); 1177-1185\"\n\nchang, cristalli, depreux, doherty, garrat2, garrat, heyl, krystek, lewis, penning, rosowsky, siddiqi, stevenson, strupcz, svensson, thompson, tsutumi, uejling, yokoyama1, yokoyama2\t\n\"David E Patterson, Richard D Cramer, Allan M Ferguson, Robert D Clark, Laurence W Weinberger. Neighbourhood Behaviour: A Useful Concept for Validation of \"\"Molecular Diversity\"\" Descriptors. J. Med. Chem. 1996 (39) 3049 - 3059.\"\n\nmtp\n\"Karthikeyan, M.; Glen, R.C.; Bender, A. General melting point prediction based on a diverse compound dataset and artificial neural networks. J. Chem. Inf. Model.; 2005; 45(3); 581-590\"\n\nbenzo32\n\"Harrison,P.W. and Barlin,G.B. and Davies,L.P. and Ireland,S.J. and Matyus,P. and Wong,M.G., Syntheses, pharmacological evaluation and molecular modelling of substituted 6-alkoxyimidazo[1,2-b]pyridazines as new ligands for the benzodiazepine receptor, European Journal of Medicinal Chemistry, (31), 1996, 651-662\"\n\nPHENETYL1\t\n\"H. Kubinyi (Ed.): \"\"QSAR: Hansch Analysis and Related Approaches\"\", VCH, Weinhein (Ger), 1993, pp.57-68\"\n\npah\t\n\"Todeschini, R.; Gramatica, P.; Marengo, E.; Provenzani, R. Weighted Holistic Invariant Molecular Descriptors. Part 2. Theory Development and Applications on Modeling Physico-Chemical Properties of PolyAromatic Hydrocarbons (PAH). Chemom. Intell. Lab. Syst. 1995, 27, 221-229.\"\n\npdgfr\t\n\"R. Guha and P. Jurs. The Development of Linear, Ensemble and Non-linear Models for the Prediction and Interpretation of the Biological Activity of a Set of PDGFR Inhibitors. J. Chem. Inf. Comput. Sci. 2004, 44 (6), 2179-2189\"\n\nPhen\t\n\"Cammarata, A. Interrelationship of the Regression Models Used for Structure-Activity Analyses. J. Med. Chem. 1972, 15, 573-577\"\n\ntopo_2_1, yprop_4_1\t\n\"Jun Feng et al, Predictive Toxicology: Benchmarking Molecular Descriptors and Statistical Methods, J. Chem. Inf Comput. Sci., 2003 (43) 1463-1470\"\n\nqsabr1, qsabr2\t\n\"Damborsky, J., Schultz, T.W., Comparison of the QSAR models for toxicity and biodegradability of anilines and phenols, Chemosphere 34: 429-446, 1997\"\n\nqsartox\t\n\"Blaha, L., Damborsky, J., Nemec, M., QSAR for acute toxicity of saturated and unsaturated halogenated aliphatic compounds, Chemosphere 36: 1345-1365, 1998\"\n\nqsbr_rw1\t\n\"Damborsky, J. et al., Structure-biodegradability relationships for chlorinated dibenzo-p-dioxins and dibenzofurans, In: Wittich, R.-M., Biodegradation of dioxins and furans, R.G. Landes Company, Austin, 1998\"\n\nqsbr_y2\t\n\"Damborsky, J. et al., A mechanistic approach to deriving QSBR- A case study: dehalogenation of haloaliphatic compounds, In: Peijnenburg, W.J.G.M., Damborsky, J., Biodegradability Prediction, Kluwer Academic Publishers\"\n\nqsbralks\t\n\"Damborsky, J. et al., Mechanism-based Quantitative Structure-Biodegradability Relationships for hydrolytic dehalogenation of chloro- and bromo-alkenes, Quantitative Structure-Activity Relationships 17: 450-458, 1998\"\n\nqsfrdhla\t\n\"Damborsky, J., Quantitative structure-function relationships of the single-point mutants of haloalkane dehalogenase: A multivariate approach, Qunatitative Structure-Activity Relationships 16: 126-135, 1997\"\n\nqsfsr1\t\n\"Damborsky, J., Quantitative structure-function and structure-stability relationships of purposely modified proteins, Protein Engineering 11: 21-30, 1998\"\n\nqsfsr2\t\n\"Damborsky, J., Quantitative structure-function and structure-stability relationships of purposely modified proteins, Protein Engineering 11: 21-30, 1998\"\n\nqsprcmpx\t\n\"Cajan, M. et al., Stability of Aromatic Amides with Bromide Anion: Quantitative Structure-Property Relationships, Journal of Chemical Information and Computer Sciences, in press, 2000\"\n\nselwood\t\n\"Selwood, D. L.; Livingstone, D. J.; Comley, J. C.; O'Dowd, A. B.; Hudson, A. T.; Jackson, P.; Jandu, K. S.; Rose, V. S.; Stables, J. N. Structure-Activity Relationships of Antifilarial Antimycin Analogues: A Multivariate Pattern Recognition Study J. Med. Chem., 1990, 33, 136-142\"", "format": "arff", "uploader": "Leo Grin", "uploader_id": 26324, "visibility": "public", "creator": null, "contributor": "\"Leo Grin\"", "date": "2022-06-18 13:06:56", "update_comment": null, "last_update": "2022-06-18 13:06:56", "licence": "Public", "status": "active", "error_message": null, "url": "https:\/\/old.openml.org\/data\/download\/22103127\/dataset", "default_target_attribute": "oz252", "row_id_attribute": null, "ignore_attribute": null, "runs": 0, "suggest": { "input": [ "yprop_4_1", "Dataset used in the tabular data benchmark https:\/\/github.com\/LeoGrin\/tabular-benchmark, transformed in the same way. This dataset belongs to the \"regression on categorical and numerical features\" benchmark. Original description: This is one of 41 drug design datasets. The datasets with 1143 features are formed using Adriana.Code software (www.molecular-networks.com\/software\/adrianacode). The molecules and outputs are taken from the original studies (see below). The other datasets are taken exac " ], "weight": 5 }, "qualities": { "NumberOfInstances": 8885, "NumberOfFeatures": 63, "NumberOfClasses": 0, "NumberOfMissingValues": 0, "NumberOfInstancesWithMissingValues": 0, "NumberOfNumericFeatures": 43, "NumberOfSymbolicFeatures": 20, "NumberOfBinaryFeatures": 20, "PercentageOfBinaryFeatures": 31.746031746031743, "PercentageOfInstancesWithMissingValues": 0, "AutoCorrelation": 0.9733753381359748, "PercentageOfMissingValues": 0, "Dimensionality": 0.007090602138435566, "PercentageOfNumericFeatures": 68.25396825396825, "MajorityClassPercentage": null, "PercentageOfSymbolicFeatures": 31.746031746031743, "MajorityClassSize": null, "MinorityClassPercentage": null, "MinorityClassSize": null }, "tags": [ { "uploader": "38960", "tag": "Computer Systems" }, { "uploader": "38960", "tag": "Machine Learning" } ], "features": [ { "name": "oz252", "index": "62", "type": "numeric", "distinct": "1336", "missing": "0", "target": "1", "min": "0", "max": "1", "mean": "1", "stdev": "0" }, { "name": "oz1", "index": "0", "type": "numeric", "distinct": "800", "missing": "0", "min": "0", "max": "1", "mean": "1", "stdev": "0" }, { "name": "oz2", "index": "1", "type": "numeric", "distinct": "26", "missing": "0", "min": "0", "max": "1", "mean": "0", "stdev": "0" }, { "name": "oz3", "index": "2", "type": "numeric", "distinct": "586", "missing": "0", "min": "0", "max": "1", "mean": "1", "stdev": "0" }, { "name": "oz4", "index": "3", "type": "numeric", "distinct": "1125", "missing": "0", "min": "0", "max": "1", "mean": "1", "stdev": "0" }, { "name": "oz5", "index": "4", "type": "numeric", "distinct": "15", "missing": "0", "min": "0", "max": "1", "mean": "0", "stdev": "0" }, { "name": "oz6", "index": "5", "type": "numeric", "distinct": "33", "missing": "0", "min": "0", "max": "1", "mean": "0", "stdev": "0" }, { "name": "oz9", "index": "6", "type": "numeric", "distinct": "21", "missing": "0", "min": "0", "max": "1", "mean": "0", "stdev": "0" }, { "name": "oz10", "index": "7", "type": "numeric", "distinct": "19", "missing": "0", "min": "0", "max": "1", "mean": "0", "stdev": "0" }, { "name": "oz11", "index": "8", "type": "numeric", "distinct": "10", "missing": "0", "min": "0", "max": "1", "mean": "0", "stdev": "0" }, { "name": "oz12", "index": "9", "type": "numeric", "distinct": "31", "missing": "0", "min": "0", "max": "1", "mean": "0", "stdev": "0" }, { "name": "oz13", "index": "10", "type": "numeric", "distinct": "22", "missing": "0", "min": "0", "max": "1", "mean": "0", "stdev": "0" }, { "name": "oz31", "index": "11", "type": "numeric", "distinct": "10", "missing": "0", "min": "0", "max": "1", "mean": "0", "stdev": "0" }, { "name": "oz40", "index": "12", "type": "nominal", "distinct": "2", "missing": "0", "distr": [] }, { "name": "oz42", "index": "13", "type": "nominal", "distinct": "2", "missing": "0", "distr": [] }, { "name": "oz46", "index": "14", "type": "nominal", "distinct": "2", "missing": "0", "distr": [] }, { "name": "oz50", "index": "15", "type": "nominal", "distinct": "2", "missing": "0", "distr": [] }, { "name": "oz69", "index": "16", "type": "nominal", "distinct": "2", "missing": "0", "distr": [] }, { "name": "oz71", "index": "17", "type": "nominal", "distinct": "2", "missing": "0", "distr": [] }, { "name": "oz73", "index": "18", "type": "nominal", "distinct": "2", "missing": "0", "distr": [] }, { "name": "oz79", "index": "19", "type": "nominal", "distinct": "2", "missing": "0", "distr": [] }, { "name": "oz83", "index": "20", "type": "numeric", "distinct": "12", "missing": "0", "min": "0", "max": "1", "mean": "0", "stdev": "0" }, { "name": "oz87", "index": "21", "type": "numeric", "distinct": "12", "missing": "0", "min": "0", "max": "1", "mean": "0", "stdev": "0" }, { "name": "oz96", "index": "22", "type": "nominal", "distinct": "2", "missing": "0", "distr": [] }, { "name": "oz100", "index": "23", "type": "nominal", "distinct": "2", "missing": "0", "distr": [] }, { "name": "oz107", "index": "24", "type": "nominal", "distinct": "2", "missing": "0", "distr": [] }, { "name": "oz108", "index": "25", "type": "nominal", "distinct": "2", "missing": "0", "distr": [] }, { "name": "oz111", "index": "26", "type": "nominal", "distinct": "2", "missing": "0", "distr": [] }, { "name": "oz112", "index": "27", "type": "nominal", "distinct": "2", "missing": "0", "distr": [] }, { "name": "oz113", "index": "28", "type": "nominal", "distinct": "2", "missing": "0", "distr": [] }, { "name": "oz115", "index": "29", "type": "nominal", "distinct": "2", "missing": "0", "distr": [] }, { "name": "oz124", "index": "30", "type": "numeric", "distinct": "15", "missing": "0", "min": "0", "max": "1", "mean": "0", "stdev": "0" }, { "name": "oz125", "index": "31", "type": "numeric", "distinct": "27", "missing": "0", "min": "0", "max": "1", "mean": "0", "stdev": "0" }, { "name": "oz126", "index": "32", "type": "numeric", "distinct": "14", "missing": "0", "min": "0", "max": "1", "mean": "0", "stdev": "0" }, { "name": "oz127", "index": "33", "type": "numeric", "distinct": "31", "missing": "0", "min": "0", "max": "1", "mean": "0", "stdev": "0" }, { "name": "oz128", "index": "34", "type": "numeric", "distinct": "15", "missing": "0", "min": "0", "max": "1", "mean": "0", "stdev": "0" }, { "name": "oz131", "index": "35", "type": "numeric", "distinct": "14", "missing": "0", "min": "0", "max": "1", "mean": "0", "stdev": "0" }, { "name": "oz133", "index": "36", "type": "numeric", "distinct": "15", "missing": "0", "min": "0", "max": "1", "mean": "0", "stdev": "0" }, { "name": "oz135", "index": "37", "type": "nominal", "distinct": "2", "missing": "0", "distr": [] }, { "name": "oz149", "index": "38", "type": "numeric", "distinct": "31", "missing": "0", "min": "0", "max": "1", "mean": "0", "stdev": "0" }, { "name": "oz150", "index": "39", "type": "numeric", "distinct": "16", "missing": "0", "min": "0", "max": "1", "mean": "0", "stdev": "0" }, { "name": "oz151", "index": "40", "type": "numeric", "distinct": "14", "missing": "0", "min": "0", "max": "1", "mean": "0", "stdev": "0" }, { "name": "oz165", "index": "41", "type": "numeric", "distinct": "11", "missing": "0", "min": "0", "max": "1", "mean": "0", "stdev": "0" }, { "name": "oz171", "index": "42", "type": "numeric", "distinct": "58", "missing": "0", "min": "0", "max": "1", "mean": "0", "stdev": "0" }, { "name": "oz172", "index": "43", "type": "numeric", "distinct": "44", "missing": "0", "min": "0", "max": "1", "mean": "0", "stdev": "0" }, { "name": "oz173", "index": "44", "type": "numeric", "distinct": "10", "missing": "0", "min": "0", "max": "1", "mean": "0", "stdev": "0" }, { "name": "oz175", "index": "45", "type": "numeric", "distinct": "14", "missing": "0", "min": "0", "max": "1", "mean": "0", "stdev": "0" }, { "name": "oz176", "index": "46", "type": "numeric", "distinct": "20", "missing": "0", "min": "0", "max": "1", "mean": "0", "stdev": "0" }, { "name": "oz177", "index": "47", "type": "numeric", "distinct": "28", "missing": "0", "min": "0", "max": "1", "mean": "0", "stdev": "0" }, { "name": "oz178", "index": "48", "type": "numeric", "distinct": "11", "missing": "0", "min": "0", "max": "1", "mean": "0", "stdev": "0" }, { "name": "oz181", "index": "49", "type": "numeric", "distinct": "12", "missing": "0", "min": "0", "max": "1", "mean": "0", "stdev": "0" }, { "name": "oz183", "index": "50", "type": "numeric", "distinct": "13", "missing": "0", "min": "0", "max": "1", "mean": "0", "stdev": "0" }, { "name": "oz185", "index": "51", "type": "numeric", "distinct": "10", "missing": "0", "min": "0", "max": "1", "mean": "0", "stdev": "0" }, { "name": "oz197", "index": "52", "type": "numeric", "distinct": "10", "missing": "0", "min": "0", "max": "1", "mean": "0", "stdev": "0" }, { "name": "oz206", "index": "53", "type": "nominal", "distinct": "2", "missing": "0", "distr": [] }, { "name": "oz222", "index": "54", "type": "nominal", "distinct": "2", "missing": "0", "distr": [] }, { "name": "oz234", "index": "55", "type": "nominal", "distinct": "2", "missing": "0", "distr": [] }, { "name": "oz246", "index": "56", "type": "numeric", "distinct": "54", "missing": "0", "min": "0", "max": "1", "mean": "1", "stdev": "0" }, { "name": "oz247", "index": "57", "type": "numeric", "distinct": "1231", "missing": "0", "min": "0", "max": "1", "mean": "0", "stdev": "0" }, { "name": "oz248", "index": "58", "type": "numeric", "distinct": "501", "missing": "0", "min": "0", "max": "1", "mean": "0", "stdev": "0" }, { "name": "oz249", "index": "59", "type": "numeric", "distinct": "8379", "missing": "0", "min": "0", "max": "1", "mean": "0", "stdev": "0" }, { "name": "oz250", "index": "60", "type": "numeric", "distinct": "1773", "missing": "0", "min": "0", "max": "1", "mean": "0", "stdev": "0" }, { "name": "oz251", "index": "61", "type": "numeric", "distinct": "4638", "missing": "0", "min": "0", "max": "1", "mean": "1", "stdev": "0" } ], "nr_of_issues": 0, "nr_of_downvotes": 0, "nr_of_likes": 0, "nr_of_downloads": 0, "total_downloads": 0, "reach": 0, "reuse": 0, "impact_of_reuse": 0, "reach_of_reuse": 0, "impact": 0 }