Materials
-
Supplement
- Supplementary materials.
-
Training dataset
- Dataset used to develop the method (to perform feature selection and parameterize
the prediction algorithm) based on 5-fold cross validation protocol.
-
Test dataset
- Dataset developed using PDB depositions from before April 2008, which is used to
evaluate and compare our method with the existing predictors. Shares up to 30%
similarity with the training dataset.
-
Experimental
- Dataset developed using experimentally validated data extracted from publications
between 2008 and 2012. Shares up to 30% similarity with the training dataset.
-
Test 2012
- Dataset developed using PDB depositions from 2012. Shares up to 30%
similarity with the training dataset.
-
Negative dataset
- Dataset developed using PDB depositions between January 2010 and March 2012,
consisting of ordered apo structures.
The datasets come as FASTA like text file with three lines per protein.
The first line contains the sequence id (PDB id of the Morf segment and Uniprot id of the parent sequence) and also the secondary structure of the MoRF region in the bound state.
The second and third line correspond to AA sequence, and annotation of the MoRF residues (1 - MoRF, 0- non-MoRF) respectively.