Data
physiochemical_protein

physiochemical_protein

active ARFF CC BY 4.0 Visibility: public Uploaded 22-12-2022 by Sebastian Fischer
0 likes downloaded by 0 people , 0 total downloads 0 issues 0 downvotes
Issue #Downvotes for this reason By


Loading wiki
Help us complete this description Edit
Data Description This is a data set of Physicochemical Properties of Protein Tertiary Structure. The data set is taken from CASP 5-9. There are 45730 decoys and size varying from 0 to 21 armstrong. The goal of the dataset is to predict the size of the residue for a tertiary protein structure (a 3d protein structure). Once linked in the protein chain, an individual amino acid is called a residue. The target feature is root mean square error of the residue. Attribute Description 1. *RMSD* - size of the residue 2. *F1* - total surface area 3. *F2* - non polar exposed area 4. *F3* - fractional area of exposed non polar residue 5. *F4* - fractional area of exposed non polar part of residue 6. *F5* - molecular mass weighted exposed area 7. *F6* - average deviation from standard exposed area of residue 8. *F7* - Euclidian distance 9. *F8* - secondary structure penalty 10. *F9* - Spacial Distribution constraints (N,K Value)

10 features

RMSD (target)numeric15903 unique values
0 missing
F1numeric39916 unique values
0 missing
F2numeric39863 unique values
0 missing
F3numeric20089 unique values
0 missing
F4numeric40374 unique values
0 missing
F5numeric41868 unique values
0 missing
F6numeric39155 unique values
0 missing
F7numeric39450 unique values
0 missing
F8numeric341 unique values
0 missing
F9numeric37299 unique values
0 missing

19 properties

45730
Number of instances (rows) of the dataset.
10
Number of attributes (columns) of the dataset.
0
Number of distinct values of the target attribute (if it is nominal).
0
Number of missing values in the dataset.
0
Number of instances with at least one value missing.
10
Number of numeric attributes.
0
Number of nominal attributes.
0
Percentage of binary attributes.
0
Percentage of instances having missing values.
0
Percentage of missing values.
-5.82
Average class difference between consecutive instances.
100
Percentage of numeric attributes.
0
Number of attributes divided by the number of instances.
0
Percentage of nominal attributes.
Percentage of instances belonging to the most frequent class.
Number of instances belonging to the most frequent class.
Percentage of instances belonging to the least frequent class.
Number of instances belonging to the least frequent class.
0
Number of binary attributes.

2 tasks

0 runs - estimation_procedure: 10-fold Crossvalidation - target_feature: RMSD
0 runs - estimation_procedure: 10-fold Crossvalidation - evaluation_measure: root_mean_squared_error - target_feature: RMSD
Define a new task