BETArPred method for in-silico prediction of strand residues
This web page provides datasets and prediction model associated with:
Kedarisetti KD, Mizianty M, Dick S, Kurgan LA, 2011. Improved
sequence-based prediction of strand residues. Journal of Bioinformatics
and Computational Biology, 9(1):67-89
- Datasets
Each dataset is provided in .csv format and contains information about
PDB ID, AA sequence, and secondary structure assigned with DSSP.
The training dataset can be downloaded from here: training dataset
The test dataset can be downloaded from here: test dataset
The CASP8 dataset can be downloaded from here: CASP8 dataset
- Prediction model
The model is in WEKA's format, and implements Logistic regression classifier.
It can be downloaded from here: BETArPred model.
- Instructions to perform predictions with BETArPred
The user should use the following procedure:
- Download and install WEKA platform.
This free, open source platform can be dowloaded from here: http://www.cs.waikato.ac.nz/ml/weka/index_downloading.html Use version 3.7
- Download and save the BETArPred model in a root folder where the WEKA was installed.
- In the same folder, create a file that stores the inputs for the prediction. Example file can be dowloaded from here: example input.
Note that this file includes values of the nine features + the class
label (classification target, e for β-strand residue, n for
non-β-strand residue), which could be used to automate evaluation
of
the prediction results (user can use dummy values if the true outcomes
are unknown). This file can include multiple lines with data, which
allows predicting multiple residues/sequences in a single run.
- Open command line window and navigate to the directory where the model and the input file are located.
- Execute the following command
java -classpath weka.jar; weka.classifiers.functions.Logistic -l BETArPred.model -T example.arff -p 0
where:
weka.classifiers.functions.Logistic specifies location of the Logistic Regression classifier,
-l specifies location of the file with the prediction model,
-T specifies location of the file with data to predict, and
-p specifies how the results are displayed.
Additional help with respect to command line execution of models in WEKA can be found here
- Read the prediction(s) from the screen.
The first column provides the serial number, the second column provides
the actual class label (taken from the input file), the third column
provides the predicted class label, and the last column provides the
probability estimate associated with the prediction. Incorrect
predictions are marked with "+". The output for the provided example
has two predictions:
"31 1:n 2:e + 0.644", which means that the sample with serial number 1
is predicted as β-strand residue (labeled as "e") with a
probability of 0.644, while the actual class stored in the input file
is non-β-strand residue (labeled as "n"). "+" shows that it is an
incorrect prediction. "1:n" represents β-strand residue (the
latter "n") which is the first ("1:") class. "2:e" represents
β-strand residue ("e") which is the second ("2:") class (This is a
two-class classification).
"4 1:e 2:e 0.716", which means that the sample with serial number 4 is
predicted as β-strand residue (labeled as "e") with a probability
of 0.716, while the actual class stored in the input file is also
β-strand residue.
-
For the final prediction merge the results from BETArPred model with the strand residues predicted by SSPro , i.e., mark as strands all residues predicted by SSPro as strands.