OpenML

JavaScript is required to properly view the contents of this page!

QSAR-Bioconcentration-classes-dataset

active ARFF CC0: Public Domain Visibility: public Uploaded 24-03-2022 by Dustin Carrion
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes

Issue	#Downvotes for this reason	By

Loading wiki

Help us complete this description Edit

Source: UCI Machine Learning Repository Content A dataset of manually-curated BCF for 779 chemicals was used to determine the mechanisms of bioconcentration, i.e. to predict whether a chemical: (1) is mainly stored within lipid tissues, (2) has additional storage sites (e.g. proteins), or (3) is metabolized/eliminated. Data were randomly split into a training set of 584 compounds (75) and a test set of 195 compounds (25), preserving the proportion between the classes. Two QSAR classification trees were developed using CART (Classification and Regression Trees) machine learning technique coupled with Genetic Algorithms. The file contains the selected Dragon descriptors (9) along with CAS, SMILES, experimental BCF, experimental/predicted KOW and mechanistic class (1, 2, 3). Further details on model development and performance along with descriptor definitions and interpretation are provided in the original manuscript (Grisoni et al., 2016). Relevant Papers: F. Grisoni, V.Consonni, M.Vighi, S.Villa, R.Todeschini (2016). Investigating the mechanisms of bioconcentration through QSAR classification trees, Environment International, 88, 198-205 Citation Request: The dataset is freeware and may be used if proper reference is given to the authors. Please, refer to the following papers: F. Grisoni, V.Consonni, M.Vighi, S.Villa, R.Todeschini (2016). Investigating the mechanisms of bioconcentration through QSAR classification trees, Environment International, 88, 198-205. F. Grisoni, V. Consonni, S. Villa, M. Vighi, R. Todeschini (2015). QSAR models for bioconcentration: Is the increase in the complexity justified by more accurate predictions?. Chemosphere, 127, 171-179.

14 features

CAS	string	779 unique values 0 missing
SMILES	string	779 unique values 0 missing
Set	string	2 unique values 0 missing
nHM	numeric	11 unique values 0 missing
piPC09	numeric	322 unique values 0 missing
PCD	numeric	224 unique values 0 missing
X2Av	numeric	63 unique values 0 missing
MLOGP	numeric	346 unique values 0 missing
ON1V	numeric	261 unique values 0 missing
N-072	numeric	4 unique values 0 missing
B02[C-N]	numeric	2 unique values 0 missing
F04[C-O]	numeric	23 unique values 0 missing
Class	numeric	3 unique values 0 missing
logBCF	numeric	391 unique values 0 missing