BETArPred method for in-silico prediction of strand residues

This web page provides datasets and prediction model associated with:
Kedarisetti KD, Mizianty M, Dick S, Kurgan LA, 2011. Improved sequence-based prediction of strand residues. Journal of Bioinformatics and Computational Biology, 9(1):67-89 
  1. Datasets
    Each dataset is provided in .csv format and contains information about PDB ID, AA sequence, and secondary structure assigned with DSSP.

      The training dataset can be downloaded from here: training dataset
      The test dataset can be downloaded from here: test dataset
      The CASP8 dataset can be downloaded from here: CASP8 dataset

  2. Prediction model
    The model is in WEKA's format, and implements Logistic regression classifier.
    It can be downloaded from here: BETArPred model.

  3. Instructions to perform predictions with BETArPred
    The user should use the following procedure:

    1. Download and install WEKA platform. This free, open source platform can be dowloaded from here: http://www.cs.waikato.ac.nz/ml/weka/index_downloading.html Use version 3.7
    2. Download and save the BETArPred model in a root folder where the WEKA was installed.
    3. In the same folder, create a file that stores the inputs for the prediction. Example file can be dowloaded from here: example input.
      Note that this file includes values of the nine features + the class label (classification target, e for β-strand residue, n for non-β-strand residue), which could be used to automate evaluation of the prediction results (user can use dummy values if the true outcomes are unknown). This file can include multiple lines with data, which allows predicting multiple residues/sequences in a single run.
    4. Open command line window and navigate to the directory where the model and the input file are located.
    5. Execute the following command
      java -classpath weka.jar; weka.classifiers.functions.Logistic -l BETArPred.model -T example.arff -p 0
      where:
      weka.classifiers.functions.Logistic specifies location of the Logistic Regression classifier,
      -l specifies location of the file with the prediction model,
      -T specifies location of the file with data to predict, and
      -p specifies how the results are displayed.
      Additional help with respect to command line execution of models in WEKA can be found here
    6. Read the prediction(s) from the screen.
      The first column provides the serial number, the second column provides the actual class label (taken from the input file), the third column provides the predicted class label, and the last column provides the probability estimate associated with the prediction. Incorrect predictions are marked with "+". The output for the provided example has two predictions:

      "31 1:n 2:e + 0.644", which means that the sample with serial number 1 is predicted as β-strand residue (labeled as "e") with a probability of 0.644, while the actual class stored in the input file is non-β-strand residue (labeled as "n"). "+" shows that it is an incorrect prediction. "1:n" represents β-strand residue (the latter "n") which is the first ("1:") class. "2:e" represents β-strand residue ("e") which is the second ("2:") class (This is a two-class classification).

      "4 1:e 2:e 0.716", which means that the sample with serial number 4 is predicted as β-strand residue (labeled as "e") with a probability of 0.716, while the actual class stored in the input file is also β-strand residue.

    7. For the final prediction merge the results from BETArPred model with the strand residues predicted by SSPro , i.e., mark as strands all residues predicted by SSPro as strands.