Data Description
This is a data set of Physicochemical Properties of Protein Tertiary Structure. The data set is taken from CASP 5-9. There are 45730 decoys and size varying from 0 to 21 armstrong.
The goal of the dataset is to predict the size of the residue for a tertiary protein structure (a 3d protein structure). Once linked in the protein chain, an individual amino acid is called a residue. The target feature is root mean square error of the residue.
Attribute Description
1. *RMSD* - size of the residue
2. *F1* - total surface area
3. *F2* - non polar exposed area
4. *F3* - fractional area of exposed non polar residue
5. *F4* - fractional area of exposed non polar part of residue
6. *F5* - molecular mass weighted exposed area
7. *F6* - average deviation from standard exposed area of residue
8. *F7* - Euclidian distance
9. *F8* - secondary structure penalty
10. *F9* - Spacial Distribution constraints (N,K Value)