Context
AqSolDB is created by the Autonomous Energy Materials Discovery [AMD] research group, consists of aqueous solubility values of 9,982 unique compounds curated from 9 different publicly available aqueous solubility datasets. This openly accessible dataset, which is the largest of its kind, and will not only serve as a useful reference source of measured solubility data, but also as a much improved and generalizable training data source for building data-driven models.
Content
In addition to curated experimental solubility values, AqSolDB also contains some relevant topological and physico-chemical 2D descriptors calculated by RDKit. Additionally, AqSolDB contains validated molecular representations of each of the compounds.
Citation
If you use AqSolDB in your study, please cite the following paper.
Paper: Nature Scientific Data - https://doi.org/10.1038/s41597-019-0151-1
Reproducible code: Code Ocean - https://doi.org/10.24433/CO.1992938.v1
Sources of AqSolDB
eChemPortal - The Global Portal to Information on Chemical Substances. https://www.echemportal.org/.
Meylan, W. M. Preliminary Report: Water Solubility Estimation by Base Compound Modification.Environmental Science Center, Syracuse, NY (1995).
Raevsky, O. A., Grigorev, V. Y., Polianczyk, D. E., Raevskaja, O. E. Dearden, J. C. Calculation of aqueous solubility of crystalline un-ionized organic chemicals and drugs based on structural similarity and physicochemical descriptors.Journal of Chemical Information and Computer Sciences 54, 683691 (2014).
Meylan, W. M., Howard, P. H. Upgrade of PCGEMS Water Solubility Estimation Method. Environmental Science Center, Syracuse, NY(1994)
Huuskonen, J. Estimation of aqueous solubility for a diverse set of organic compounds based on molecular topology.Journal of Chemical Informationand Computer Sciences 40, 773777 (2000).
Wang, J., Hou, T. Xu, X. Aqueous solubility prediction based on weighted atom type counts and solvent accessible surface areas. Journal of Chemical Information and Modeling 49, 571581 (2009).
Delaney, J. S. ESOL: estimating aqueous solubility directly from molecular structure. Journal of Chemical Information and Computer Sciences 44,10001005 (2004).
Llinas, A., Glen, R. C. Goodman, J. M. Solubility challenge: can you predict solubilities of 32 molecules using a database of 100 reliable measurements?.Journal of Chemical Information and Modeling 48, 12891303 (2008).