OMBBpred method for in-silico prediction of outer membrane beta barrel proteins

This web page provides datasets and prediction model associated with:
Mizianty M, Kurgan LA,  2011. Improved Identification of Outer Membrane Beta Barrel Proteins Using Primary Sequence, Predicted Secondary Structure and Evolutionary Information. Proteins 79(1):294-303.
  1. Datasets
    Each dataset is provided in .csv format and contains information about PDB ID, target class (globular protein, integral membrane protein, or outer membrane protein), and AA sequence.

      The DS1 dataset can be downloaded from here: DS1 dataset
      The DS2 dataset can be downloaded from here: DS2 dataset
      The DS3 dataset can be downloaded from here: fold 1, fold 2, fold 3, fold 4, fold 5

  2. Prediction model
    The model is in WEKA's format, and implements SVM classifier.
    It can be downloaded from here: OMBBpred model.

  3. Instructions to perform predictions with OMBBpred
    The user should use the following procedure:

    1. Download and install WEKA platform. This free, open source platform can be dowloaded from here: http://www.cs.waikato.ac.nz/ml/weka/index_downloading.html Use version 3.7.2
    2. Download and save the OMBBpred model in a root folder where the WEKA was installed.
    3. In the same folder, create a file that stores the inputs for the prediction. Example file can be dowloaded from here: example input.
      Note that this file includes values of the twenty six features + the class label (classification target, 1 for OMBB proteins, 0 for non-OMBB proteins), which could be used to automate evaluation of the prediction results (user can use dummy values if the true outcomes are unknown). This file can include multiple lines with data, which allows predicting multiple sequences in a single run.
    4. Open command line window and navigate to the directory where the model and the input file are located.
    5. Execute the following command
      java -classpath weka.jar; weka.classifiers.functions.SMO -l OMBBpred.model -T example.arff -p 0
      where:
      weka.classifiers.functions.SMO specifies location of the SVM classifier,
      -l specifies location of the file with the prediction model,
      -T specifies location of the file with data to predict, and
      -p specifies how the results are displayed.
      Additional help with respect to command line execution of models in WEKA can be found here
    6. Read the prediction(s) from the screen.
      The first column provides the serial number, the second column provides the actual class label (taken from the input file), the third column provides the predicted class label, and the last column provides the probability estimate associated with the prediction (which for SVM classifier is always 1). Incorrect predictions are marked with "+". The output for the provided example has two predictions:

      "371 2:1 1:0 + 1", which means that the sample with serial number 371 is predicted as non-OMBB (labeled as "0"), while the actual class stored in the input file is OMBB (labeled as "1"). "+" shows that it is an incorrect prediction. "2:1" represents OMBB (the latter "1") which is the second ("2:") class. "1:0" represents non-OMP ("0") which is the first ("1:") class (This is a two-class classification).

      "372 2:1 2:1 1", which means that the sample with serial number 372 is predicted as OMBB (labeled as "1"), while the actual class stored in the input file is also OMBB.