OpenML
qsar

qsar

active ARFF Publicly available Visibility: public Uploaded 27-01-2023 by Young Lee
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
  • Geography Health study_340 study_341
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
The QSAR biodegradation dataset was built in the Milano Chemometrics and QSAR Research Group. The research leading to these results has received funding from the European Communitys Seventh Framework Programme [FP7/2007-2013] under Grant Agreement n. 238701 of Marie Curie ITN Environmental Chemoinformatics (ECO) project.The data have been used to develop QSAR (Quantitative Structure Activity Relationships) models for the study of the relationships between chemical structure and biodegradation of molecules. Biodegradation experimental values of 1055 chemicals were collected from the webpage of the National Institute of Technology and Evaluation of Japan (NITE). Classification models were developed in order to discriminate ready (356) and not ready (699) biodegradable molecules by means of three different modelling methods: k Nearest Neighbours, Partial Least Squares Discriminant Analysis and Support Vector Machines. Details on attributes (molecular descriptors) selected in each model can be found in the quoted reference: Mansouri, K., Ringsted, T., Ballabio, D., Todeschini, R., Consonni, V. (2013). Quantitative Structure - Activity Relationship models for ready biodegradability of chemicals. Journal of Chemical Information and Modeling, 53, 867-878.Source: https://archive.ics.uci.edu/ml/datasets/QSAR+biodegradation

41 features

class (target)string2 unique values
0 missing
0numeric440 unique values
0 missing
1numeric1022 unique values
0 missing
7numeric188 unique values
0 missing
11numeric384 unique values
0 missing
12numeric756 unique values
0 missing
13numeric373 unique values
0 missing
14numeric510 unique values
0 missing
16numeric167 unique values
0 missing
17numeric125 unique values
0 missing
21numeric352 unique values
0 missing
26numeric329 unique values
0 missing
27numeric205 unique values
0 missing
29numeric470 unique values
0 missing
30numeric553 unique values
0 missing
35numeric705 unique values
0 missing
36numeric624 unique values
0 missing
38numeric862 unique values
0 missing
2numeric11 unique values
0 missing
4numeric16 unique values
0 missing
5numeric13 unique values
0 missing
6numeric15 unique values
0 missing
8numeric15 unique values
0 missing
9numeric12 unique values
0 missing
10numeric21 unique values
0 missing
15numeric24 unique values
0 missing
31numeric8 unique values
0 missing
32numeric11 unique values
0 missing
33numeric16 unique values
0 missing
37numeric8 unique values
0 missing
40numeric17 unique values
0 missing
39nominal5 unique values
0 missing
20nominal4 unique values
0 missing
28nominal2 unique values
0 missing
23nominal2 unique values
0 missing
3nominal4 unique values
0 missing
22nominal13 unique values
0 missing
34nominal8 unique values
0 missing
19nominal4 unique values
0 missing
25nominal4 unique values
0 missing
24nominal2 unique values
0 missing

19 properties

1055
Number of instances (rows) of the dataset.
41
Number of attributes (columns) of the dataset.
2
Number of distinct values of the target attribute (if it is nominal).
0
Number of missing values in the dataset.
0
Number of instances with at least one value missing.
30
Number of numeric attributes.
10
Number of nominal attributes.
0
Percentage of instances having missing values.
1
Average class difference between consecutive instances.
0
Percentage of missing values.
73.17
Percentage of numeric attributes.
0.04
Number of attributes divided by the number of instances.
24.39
Percentage of nominal attributes.
66.26
Percentage of instances belonging to the most frequent class.
699
Number of instances belonging to the most frequent class.
33.74
Percentage of instances belonging to the least frequent class.
356
Number of instances belonging to the least frequent class.
3
Number of binary attributes.
7.32
Percentage of binary attributes.

1 tasks

0 runs - estimation_procedure: 10-fold Crossvalidation - evaluation_measure: predictive_accuracy - target_feature: class
Define a new task