This web page provides datasets associated
with
Gao
J, Zhang T, Zhang H, Shen S, Ruan J, Kurgan LA, 2010.
Accurate prediction of protein folding rates from sequence and
sequence-derived residue flexibility and solvent accessibility. Proteins: Structure, Function,
and Bioinformatics, accepted
Three dataset are used:
- the training dataset
D62 can be downloaded
from D62 dataset link.
This dataset was originally introduced in (Ivankov and Finkelstein, 2004).
- the
low sequence identity (with respect to the D62 dataset) dataset
D8 can be downloaded
from D8 dataset link.
This dataset was originally introduced in (Jiang
et al. 2009).
-
the low sequence identity
(with respect to the D62 datasets) dataset D16 can be downloaded from D16
dataset link. This dataset was
prepared using depositions from the kineticDB database (Bogatyreva et al. 2009).
The datasets are in a comma-separated
text file format with four columns that correspond to protein id
(from PDB), protein sequence, annotation of the kinetic
type (two-state or multi-state), and the folding rate defined
as log10(kf).
|