OMBBpred method for in-silico prediction of outer membrane beta barrel proteins
This web page provides datasets and prediction model associated with:
Mizianty M, Kurgan LA, 2011. Improved Identification of Outer
Membrane Beta Barrel Proteins Using Primary Sequence, Predicted
Secondary Structure and Evolutionary Information. Proteins 79(1):294-303.
- Datasets
Each dataset is provided in .csv format and contains information about
PDB ID, target class (globular protein, integral membrane protein, or
outer membrane protein), and AA sequence.
The DS1 dataset can be downloaded from here: DS1 dataset
The DS2 dataset can be downloaded from here: DS2 dataset
The DS3 dataset can be downloaded from here: fold 1, fold 2, fold 3, fold 4, fold 5
- Prediction model
The model is in WEKA's format, and implements SVM classifier.
It can be downloaded from here: OMBBpred model.
- Instructions to perform predictions with OMBBpred
The user should use the following procedure:
- Download and install WEKA platform.
This free, open source platform can be dowloaded from here: http://www.cs.waikato.ac.nz/ml/weka/index_downloading.html Use version 3.7.2
- Download and save the OMBBpred model in a root folder where the WEKA was installed.
- In the same folder, create a file that stores the inputs for the prediction. Example file can be dowloaded from here: example input.
Note that this file includes values of the twenty six features + the
class label (classification target, 1 for OMBB proteins, 0 for non-OMBB
proteins), which could be used to automate evaluation of
the prediction results (user can use dummy values if the true outcomes
are unknown). This file can include multiple lines with data, which
allows predicting multiple sequences in a single run.
- Open command line window and navigate to the directory where the model and the input file are located.
- Execute the following command
java -classpath weka.jar; weka.classifiers.functions.SMO -l OMBBpred.model -T example.arff -p 0
where:
weka.classifiers.functions.SMO specifies location of the SVM classifier,
-l specifies location of the file with the prediction model,
-T specifies location of the file with data to predict, and
-p specifies how the results are displayed.
Additional help with respect to command line execution of models in WEKA can be found here
- Read the prediction(s) from the screen.
The first column provides the serial number, the second column provides
the actual class label (taken from the input file), the third column
provides the predicted class label, and the last column provides the
probability estimate associated with the prediction (which for SVM
classifier is always 1). Incorrect predictions are marked with "+". The
output for the provided example has two predictions:
"371 2:1 1:0 + 1", which means that the sample with serial number 371
is predicted as non-OMBB (labeled as "0"), while the actual class
stored in the input file is OMBB (labeled as "1"). "+" shows that it is
an incorrect prediction. "2:1" represents OMBB (the latter "1") which
is the second ("2:") class. "1:0" represents non-OMP ("0") which is the
first ("1:") class (This is a two-class classification).
"372 2:1 2:1 1", which means that the sample with serial number 372 is
predicted as OMBB (labeled as "1"), while the actual class stored in
the input file is also OMBB.