Accurate prediction of protein folding rates from sequence and sequence-derived residue flexibility and solvent accessibility

 
This web page provides datasets associated with 


Gao J, Zhang T, Zhang H, Shen S, Ruan J, Kurgan LA, 2010. Accurate prediction of protein folding rates from sequence and sequence-derived residue flexibility and solvent accessibility. Proteins: Structure, Function, and Bioinformatics, accepted


Three dataset are used:
-
the training dataset D62 can be downloaded from D62 dataset link. This dataset was originally introduced in (Ivankov and Finkelstein, 2004). 
- the low sequence identity (with respect to the D62 dataset) dataset D8 can be downloaded from D8 dataset link. This dataset was originally introduced in (Jiang et al. 2009).
- the low sequence identity (with respect to the D62 datasets) dataset D16 can be downloaded from D16 dataset link. This dataset was prepared using depositions from the kineticDB database (Bogatyreva et al. 2009).
 
The datasets are in a comma-separated text file format with four columns that correspond to protein id (from 
PDB), protein sequence, annotation of the kinetic type (two-state or multi-state), and the folding rate defined as log10(kf).