BTNpred method for in-silico prediction of beta-turns

 
This web page provides datasets and prediction model associated with 


Zheng C, Kurgan LA, 2008. Prediction of beta-turns at over 80% accuracy based on an ensemble of predicted secondary structures and multiple alignments, BMC Bioinformatics, 9:430

  1. Datasets
    Three dataset are used: BT426 developed by Guruprasad and Rajkumar (2000), and BT547 and BT823 developed by Fuchs and Alix (2005).
    The beta-turns were assigned using
    PROMOTIF (Hutchinson and Thornton, 1996), the pairwise sequence identity between any two protein chains was below 25%, the structure was determined by X-ray crystallography with at least 2.0A resolution, and each chain contains at least one beta-turn. Each dataset comes in three versions:
    - all-sequences: text file containing entire dataset with each protein represented by PDB id, sequence and annotation of the location of beta-turns
    - folds-sequences: rar file including seven folds used to validate BTpred method, where
    each protein is represented by its PDB id and sequence
    - folds-features: rar file that includes sevel folds used to validate BTpred method, where each residue is represented with the 90 features used to perform prediction (this files are in arff format, which is the input file for
    WEKA platform)

    The BT426 dataset: all-sequences, folds-sequences, folds-features
    The BT547 dataset: all-sequences, folds-sequences, folds-features
    The BT823 dataset: all-sequences, folds-sequences, folds-features
     
  2. Prediction model
    The model is in WEKA's format, and implements the RBF-kernel based Support Vector Machine classifier.
    It can be downloaded from here: 
    BTNpred model.

  3. Instructions to perform predictions with CRpred
    The user should use the following procedure:
  1. Download and install WEKA platform. This free, open source platform can be dowloaded from here: http://www.cs.waikato.ac.nz/ml/weka/index_downloading.html
  2. Download and save the BTNpred model in a root folder where the WEKA was installed.
  3. In the same folder, create a file that stores the input for the prediction. Example file can be dowloaded from here: example input.
    Note that this file includes values of the 90 features for each residue + a dummy class label (prediction), which could be used to automate evaluation of the prediction results. This file can include multiple lines with data, which allows predicting multiple residues and sequences in a single run.
  4. Open command line window and navigate to the directory where the model and the input file are located.
  5. Execute the following command
    java -Xmx512m -classpath "%CLASSPATH%;weka.jar" weka.classifiers.functions.SMO -l BTNpred.model -T example.arff -p 0
    where
    -Xmx512m allocates the memory, weka.classifiers.functions.SMO specifies location of the engine that runs Support Vector Machine classifier, -l specifies location of the file with the prediction model, -T specifies location of the file with data to predict, and -p specifies how the results are displayed.
    Additional help with respect to command line execution of models in WEKA can be found here:
    http://weka.sourceforge.net/wekadoc/index.php/en%3APrimer
  6. Read the prediction from the screen. 
    The first column provides the input number, the second column provides the predicted turn/non-turn class for the input residue, and the last column provides the class label (dummy class label) provided in the file with input data.
    The output for the provided example should read:

    0 Non-Turn 1.0 dummy_class
    1 Non-Turn 1.0 dummy_class
    which means that for samples number 0 and 1 the predicted class is Non-Turn, while the actual class stored in the input file was dummy_class.