

active ARFF CC-BY Visibility: public Uploaded 19-05-2021 by Meilina Reksoprodjo
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
Issue #Downvotes for this reason By

Loading wiki
Help us complete this description Edit
Dataset This is a data set of Physicochemical Properties of Protein Tertiary Structure. The data set is taken from CASP 5-9. There are 45730 decoys and size varying from 0 to 21 armstrong. Attribute Description RMSD-Size of the residue. F1 - Total surface area. F2 - Non polar exposed area. F3 - Fractional area of exposed non polar residue. F4 - Fractional area of exposed non polar part of residue. F5 - Molecular mass weighted exposed area. F6 - Average deviation from standard exposed area of residue. F7 - Euclidian distance. F8 - Secondary structure penalty. F9 - Spacial Distribution constraints (N,K Value).

10 features

RMSDnumeric15903 unique values
0 missing
F1numeric39916 unique values
0 missing
F2numeric39863 unique values
0 missing
F3numeric20089 unique values
0 missing
F4numeric40374 unique values
0 missing
F5numeric41868 unique values
0 missing
F6numeric39155 unique values
0 missing
F7numeric39450 unique values
0 missing
F8numeric341 unique values
0 missing
F9numeric37299 unique values
0 missing

19 properties

Number of instances (rows) of the dataset.
Number of attributes (columns) of the dataset.
Number of distinct values of the target attribute (if it is nominal).
Number of missing values in the dataset.
Number of instances with at least one value missing.
Number of numeric attributes.
Number of nominal attributes.
Number of binary attributes.
Percentage of binary attributes.
Percentage of instances having missing values.
Average class difference between consecutive instances.
Percentage of missing values.
Number of attributes divided by the number of instances.
Percentage of numeric attributes.
Percentage of instances belonging to the most frequent class.
Percentage of nominal attributes.
Number of instances belonging to the most frequent class.
Percentage of instances belonging to the least frequent class.
Number of instances belonging to the least frequent class.

2 tasks

0 runs - estimation_procedure: 10-fold Crossvalidation - evaluation_measure: root_mean_squared_error - target_feature: RMSD
0 runs - estimation_procedure: 33% Holdout set - target_feature: RMSD
Define a new task