CyPred: Accurate and high-throughput sequence-based prediction of cyclic proteins

CyPred webserver

Cyclic proteins (CPs) have circular chains with a continuous cycle of peptide bonds. Their unique structural traits give them greater stability, better receptor selectively, and improved pharmacodynamic properties when compared to their acyclic counterparts, making them promising targets for pharmaceutical/therapeutic applications. This website provides access to the high-throughput sequence-based predictor of CPs, called CyPred, and putative CPs predicted from Archaea, Bacterial, and Eukaryotic proteomes.

Please follow the three steps below to make predictions:

1. Upload a file with protein sequences, or paste them into text area

Server accepts up to 75 000 (FASTA formated) protein sequences (40MB max file size limit). Either upload a file or enter each protein in a new line in the following text field (see Help for details):

2. Provide your e-mail address (required)

Please provide your e-mail address to be notified when results are ready.

3. Predict:

Click button to launch prediction.

Standalone version

CyPred is available as standalone application! (download: CyPred.zip (3.2 MB)). The package includes README.txt file which explains how to install and use the software. CyPred should execute on any operating system where JAVA (version 6 or higher) is installed.

Materials

    Datasets used to design and evalaute CyPred:
  • TRAINING dataset - Dataset used to develop classifier including feature selection and parameterization using 5-fold cross validation.
  • TEST dataset - Dataset used to perform out-of-sample evaluation of CyPred.
  • TEST_NEW dataset - Second test dataset with recently deposited (after July 2011) in CyBase proteins used to perform additional out-of-sample evaluation of CyPred.
  • PDB80 - Dataset used for evaluation on a representative subset of high quality (R-factor < 0.25, resolution < 2 Å) non-redundant (at 80% similarity) proteins from PDB.
    Predicted Cyclic proteins in fully sequenced proteomes from UniProt version 2011_08:
  • 3 domains - putative cyclic proteins in Archaea, Bacterias, and Eukaryotas (combines predictions from the three files below)
  • Archaea - putative cyclic proteins in Archaea
  • Bacterias - putative cyclic proteins in Bacterias
  • Eukaryotas - putative cyclic proteins in Eukaryotas

Each of the abovementioned files contains a file header that explains the format of the data in a given file.

Help

CyPred accepts either single or multiple protein sequences and the input is limited to 75 000 protein sequences at the time. The user should submit the protein sequence(s) in FASTA format.

The format of the input file is as follows (example for incomplete proteome of Violaceae):

  1. >protein name (The server will trim protein names to first 12 characters)
  2. protein sequence (one letter amino acid code only)

Acknowledgments

We acknowledge, with thanks, that the following software was used as a part of this server: