This web page provides datasets and prediction model associated
with
Zheng C, Kurgan LA, 2008. Prediction of beta-turns at over 80% accuracy based on an ensemble of predicted
secondary structures and multiple alignments, BMC Bioinformatics, 9:430
- Datasets
Three
dataset are used: BT426 developed by Guruprasad
and Rajkumar (2000),
and BT547 and BT823 developed by Fuchs
and Alix (2005).
The beta-turns were assigned using PROMOTIF (Hutchinson and
Thornton, 1996), the pairwise sequence identity between any two protein
chains was below 25%, the structure was determined by X-ray
crystallography with
at least 2.0A resolution, and each chain contains at least one
beta-turn. Each dataset comes in three versions:
- all-sequences: text file containing entire dataset with each protein
represented by PDB id, sequence and annotation of the location
of beta-turns
- folds-sequences: rar file including seven folds used to validate
BTpred method, where each
protein is represented by its PDB id and sequence
- folds-features: rar file that includes sevel folds used to validate
BTpred method, where each residue is represented with the 90 features
used to perform prediction (this files are in arff format, which is the
input file for WEKA platform)
The BT426 dataset: all-sequences, folds-sequences, folds-features
The BT547 dataset: all-sequences, folds-sequences, folds-features
The BT823 dataset: all-sequences, folds-sequences, folds-features
- Prediction
model
The model is in WEKA's format, and implements the RBF-kernel based
Support Vector Machine classifier.
It can be downloaded from here: BTNpred
model.
- Instructions
to
perform predictions with CRpred
The
user should use the following procedure:
- Download and install
WEKA platform.
This free, open source platform can be dowloaded from here: http://www.cs.waikato.ac.nz/ml/weka/index_downloading.html
- Download and save the BTNpred
model in a root folder where the
WEKA was installed.
- In the same folder,
create a file that stores
the input for the prediction. Example file can be dowloaded from here: example
input.
Note that this file includes values of the 90 features for each residue
+ a dummy
class label (prediction), which could be used to automate evaluation of
the prediction results. This file can include multiple lines with data,
which allows predicting multiple residues and
sequences in a
single run.
- Open command line window
and navigate to the
directory where the model and the input file are located.
- Execute the following
command
java -Xmx512m -classpath "%CLASSPATH%;weka.jar" weka.classifiers.functions.SMO
-l BTNpred.model -T example.arff -p 0
where -Xmx512m allocates the memory, weka.classifiers.functions.SMO
specifies location of the engine that runs Support Vector Machine
classifier, -l specifies location of the file with the prediction
model, -T specifies
location of the file with data to predict, and -p specifies how the
results are displayed.
Additional help with respect to command line execution of models in
WEKA can be found here:
http://weka.sourceforge.net/wekadoc/index.php/en%3APrimer
- Read the prediction from
the screen.
The first column provides the input number, the second column provides
the predicted turn/non-turn class for the input residue, and the last column provides the class label
(dummy class label) provided in the file with input data.
The output for the provided example should read:
0 Non-Turn 1.0 dummy_class
1 Non-Turn 1.0 dummy_class
which
means that for samples number 0 and 1 the predicted class is Non-Turn, while the
actual
class stored in the input file was dummy_class.
|