

active ARFF CC BY 4.0 Visibility: public Uploaded 16-06-2022 by Sebastian Fischer
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
Issue #Downvotes for this reason By

Loading wiki
Help us complete this description Edit
Data source Davide Ballabio (davide.ballabio @, Matteo Cassotti, Viviana Consonni, Roberto Todeschini, Milano Chemometrics and QSAR Research Group (, University degli Studi Milano - Bicocca, Milano (Italy). This dataset was obtained from the UCI repository. Dataset description This dataset was used to develop quantitative regression QSAR models to predict acute aquatic toxicity towards the fish Pimephales promelas (fathead minnow) on a set of 908 chemicals. LC50 data, which is the concentration that causes death in 50% of test fish over a test duration of 96 hours, was used as model response. The model comprised 6 molecular descriptors: MLOGP (molecular properties), CIC0 (information indices), GATS1i (2D autocorrelations), NdssC (atom-type counts), NdsCH ((atom-type counts), SM1_Dz(Z) (2D matrix-based descriptors). Details can be found in the quoted reference: M. Cassotti, D. Ballabio, R. Todeschini, V. Consonni. A similarity-based QSAR model for predicting acute toxicity towards the fathead minnow (Pimephales promelas), SAR and QSAR in Environmental Research (2015), 26, 217-243; doi: 10.1080/1062936X.2015.1018938 Attribute description 6 molecular descriptors and 1 quantitative experimental response: 1) CIC0 2) SM1_Dz(Z) 3) GATS1i 4) NdsCH 5) NdssC 6) MLOGP 7) quantitative response, LC50 [-LOG(mol/L)] Related Studies Please, cite the following paper if you publish results based on the QSAR fish toxicity dataset: M. Cassotti, D. Ballabio, R. Todeschini, V. Consonni. A similarity-based QSAR model for predicting acute toxicity towards the fathead minnow (Pimephales promelas), SAR and QSAR in Environmental Research (2015), 26, 217-243; doi: 10.1080/1062936X.2015.1018938 Bibtex @misc{Dua:2019, author = "Dua, Dheeru and Graff, Casey", year = "2017", title = "{UCI} Machine Learning Repository", url = "", institution = "University of California, Irvine, School of Information and Computer Sciences" }

7 features

LC50 (target)numeric827 unique values
0 missing
CIC0numeric502 unique values
0 missing
SM1_Dznumeric186 unique values
0 missing
GATS1inumeric557 unique values
0 missing
NdsCHnumeric5 unique values
0 missing
NdssCnumeric7 unique values
0 missing
MLOGPnumeric559 unique values
0 missing

19 properties

Number of instances (rows) of the dataset.
Number of attributes (columns) of the dataset.
Number of distinct values of the target attribute (if it is nominal).
Number of missing values in the dataset.
Number of instances with at least one value missing.
Number of numeric attributes.
Number of nominal attributes.
Percentage of binary attributes.
Percentage of instances having missing values.
Percentage of missing values.
Average class difference between consecutive instances.
Percentage of numeric attributes.
Number of attributes divided by the number of instances.
Percentage of nominal attributes.
Percentage of instances belonging to the most frequent class.
Number of instances belonging to the most frequent class.
Percentage of instances belonging to the least frequent class.
Number of instances belonging to the least frequent class.
Number of binary attributes.

1 tasks

0 runs - estimation_procedure: 10-fold Crossvalidation - target_feature: LC50
Define a new task