Supplement for manuscript entitled "PFRES: Protein fold classification by using evolutionary information and predicted secondary structure"


This web page provides datasets associated with 


Chen K, Kurgan LA, 2007. PFRES: Protein fold classification by using evolutionary information and predicted secondary structure. Bioinformatics, 23(21):2843-2850

 
The supplementary files include dataset of 908 sequence that was used to test the PFRES method.
 
Download dataset
 
The dataset includes the following information:
1. PDB id
2. primary sequence using single letter encoding of amino acids
3. protein fold (SCOP fold label)

The fold names are encoded as follows (label in the dataset / name of fold in SCOP / ID in SCOP):
a    Globin-like    (a.1.*.*)
b    Cytochrome c    (a.3.*.*)
c    DNA/RNA-binding 3-helical bundle    (a.4.*.*)
d    Four-helical up-and-down bundle    (a.24.*.*)
e    4-helical cytokines    (a.26.*.*)
f    EF Hand-like    (a.39.*.*)
g    Immunoglobulin-like beta-sandwich    (b.1.*.*)
h    Cupredoxin-like    (b.6.*.*)
i    viral coat and capsid proteins    (b.121.*.*)
j    Concanavalin A-like lectins/glucanases    (b.29.*.*)
k    SH3-like barrel    (b.34.*.*)
l    OB-fold    (b.40.*.*)
m    beta-Trefoil    (b.42.*.*)
n    Trypsin-like serine proteases    (b.47.*.*)
o    Lipocalins    (b.60.*.*)
p    TIM beta/alpha-barrel    (c.1.*.*)
q    FAD/NAD (P)-binding domain    (c.3.*.*)
r    Flavodoxin-like    (c.23.*.*)
s    NAD (P)-binding Rossmann-fold domains    (c.2.*.*)
t    P-loop containing nucleoside triphosphate hydrolases    (c.37.*.*)
u    Thioredoxin fold    (c.47.*.*)
v    Ribonuclease H-like motif    (c.55.*.*)
w    alpha/beta-Hydrolases    (c.69.*.*)
x    Periplasmic binding protein-like I    (c.93.*.*)
y    beta-Grasp    (d.15.*.*)
z    Ferredoxin-like    (d.58.*.*)
-    small inhibitors toxins lectins    (g.3.*.*)

The primary sequences include X, which denotes a "special" amino acid:
- a residue originally annotated as not one of the 20 amino acids
- or a residue that constitutes a connection between two distinct segments in one sequence; for example in a.24.25.1, {A:118-211,A:372-450} is a domain that consists of two segments and X annotates the connection.
X should be ignored while performing predictions and preparing input data (this assumption was followed when evaluating PFRES).