Supporting information for hybridNAP

Benchmark datasets with the annotations of binding residues

These datasets are summarized in Table 2


All proteins (some proteins bind multiple ligands and are included in multiple ligand-specific datasets): Click here to download (8.3 MB)

All proteins (exclude small-ligand binding proteins; some proteins bind multiple ligands and are included in multiple ligand-specific datasets): Click here to download (6.4 MB)

DNA-binding proteins: Click here to download (322 KB)

RNA-binding proteins: Click here to download (312 KB)

Protein-binding proteins: Click here to download (6.2 MB)

Small ligand-binding proteins: Click here to download (5.4 MB)

DNA & protein-binding proteins: Click here to download (151 KB)

RNA & protein-binding proteins: Click here to download (169 KB)

 

The format of the above files is explained here

Click here to download the script to produce the numbers from Table 2

 

Independent datasets with the annotations of binding residues

These datasets are summarized in Table 2


DNA_T proteins: Click here to download (8 KB)

RNA_T proteins: Click here to download (5 KB)

Protein_T proteins: Click here to download (13 KB)

 

The format of the above files is explained here

Click here to download the script to produce the numbers from Table 2

 

Mapping between UniProt and corresponding PDB chains for the benchmark datasets

These datasets are summarized in Table 2


Mapping of IDs between UniProt and PDB for the dataset with all proteins: Click here to download (1.1 MB)
Mapping of sequences with annotations of binding residues between UniProt and PDB for the dataset with all proteins: Click here to download (11.8 MB)

Mapping of IDs between UniProt and PDB for the dataset with DNA-binding proteins: Click here to download (67 KB)
Mapping of sequences with annotations of binding residues between UniProt and PDB for the dataset with DNA-binding proteins: Click here to download (709 KB)

Mapping of IDs between UniProt and PDB for the dataset with RNA-binding proteins: Click here to download (158 KB)
Mapping of sequences with annotations of binding residues between UniProt and PDB for the dataset with RNA-binding proteins: Click here to download (1.0 MB)

Mapping of IDs between UniProt and PDB for the dataset with protein-binding proteins: Click here to download (1.0 MB)
Mapping of sequences with annotations of binding residues between UniProt and PDB for the dataset with protein-binding proteins: Click here to download (9.9 MB)

Mapping of IDs between UniProt and PDB for the dataset with small ligand-binding proteins: Click here to download (893 KB)
Mapping of sequences with annotations of binding residues between UniProt and PDB for the dataset with small ligand-binding proteins: Click here to download (9.0 MB)

Mapping of IDs between UniProt and PDB for the dataset with DNA & Protein-binding proteins: Click here to download (47 KB)
Mapping of sequences with annotations of binding residues between UniProt and PDB for the dataset with DNA & Protein-binding proteins: Click here to download (481 KB)

Mapping of IDs between UniProt and PDB for the dataset with RNA & Protein-binding proteins: Click here to download (135 KB)
Mapping of sequences with annotations of binding residues between UniProt and PDB for the dataset with RNA & Protein-binding proteins: Click here to download (797 KB)

Mapping of IDs between UniProt and PDB for the DNA_T dataset: Click here to download (1 KB)
Mapping of sequences with annotations of binding residues between UniProt and PDB for the DNA_T dataset: Click here to download (9 KB)

Mapping of IDs between UniProt and PDB for the RNA_T dataset: Click here to download (1 KB)
Mapping of sequences with annotations of binding residues between UniProt and PDB for the RNA_T dataset: Click here to download (6 KB)

Mapping of IDs between UniProt and PDB for the Protein_T dataset: Click here to download (3 KB)
Mapping of sequences with annotations of binding residues between UniProt and PDB for the Protein_T dataset: Click here to download (28 KB)

 

The format of the above files is explained here

 

Values of the three hallmarks of binding (RAA, native/putative RSA and ECO) on the benchmark dataset

These datasets were used to compute results summarized in Tables 4, S1, and S2.


DNA-binding proteins: Click here to download (2.9 MB)

RNA-binding proteins: Click here to download (2.5 MB)

Protein-binding proteins: Click here to download (52.7 MB)

DNA & Protein-binding proteins: Click here to download (1.2 MB)

RNA & Protein-binding proteins: Click here to download (1.2 MB)

 

The format of the above files is explained here

 

Predictions from the regression models that are based on the three hallmarks on the benchmark dataset

These datasets were used to compute results summarized in Tables S3 and S4


Predictions based on the three-fold cross-validation on the benchmark dataset (excludes small-ligand binding proteins) using RAA, native RSA and ECO: Click here to download (40 MB)

Predictions based on the three-fold cross-validation on the benchmark dataset (excludes small-ligand binding proteins) using RAA, putative RSA and ECO : Click here to download (45 MB)

Predictions of the whole benchmark dataset (excludes small-ligand binding proteins) using RAA, native RSA and ECO: Click here to download (40 MB)

Predictions of the whole benchmark dataset (excludes small-ligand binding proteins) using RAA, putative RSA and ECO: Click here to download (39 MB)

 

Assignment of proteins into the cross-validation folds is defined here

The format of the above files is explained here

Click here to download the script to produce the numbers from Table S3 and S4

 

 

Random picked 100 proteins. Click here to download (1 KB)

Predictions of the random picked 100 proteins for DP-Bind,RNABindR and SPRINGS: Click here to download (206 KB)

The format of the above file is explained here

Click here to download the script to produce the numbers from the last line in Table S4

 

DNA_T, RNA_T and protein_T test datasets and predictions of DNA-, RNA- and protein-binding residues from hybridNAP and other predictors on these datasets

These datasets were used to compute results summarized in Tables S5, S6 and in the last line in S7


Predictions of BindN+, DBS-PSSM, DP-Bind(klr) and hybridNAP on the DNA_T dataset: Click here to download (124 KB)

Predictions of BindN+, RNABindR, Pprint and hybridsNAP on the RNA_T dataset: Click here to download (81 KB)

Predictions of PSIVER, SPRINGS and hybridNAP on the protein_T dataset: Click here to download (107 KB)

 

Click here to download the script to produce the numbers from Table S5

The format of the above files is explained here

 

Non-redundant benchmark training30_DNA datasets that was used to train hybridNAP for prediction of DNA-binding proteins: Click here to download (1 MB)

Non-redundant benchmark training30_RNA datasets that was used to train hybridNAP for prediction of RNA-binding proteins: Click here to download (1 MB)

Non-redundant benchmark training30_protein datasets that was used to train hybridNAP for prediction of protein-binding proteins: Click here to download (1 MB)

 

The format of the above files is explained here

 

Predictions of HybridNAP trained on ten training30small datasets

These datasets were used to compute results summarized in Tables S5

 

Ten randomly picked training30small_DNA datasets that were used to train hybridNAP: Click here to download (6 KB)

Ten randomly picked training30small_RNA datasets that were used to train hybridNAP: Click here to download (8 KB)

Ten randomly picked training30small_protein datasets that were used to train hybridNAP: Click here to download (17 KB)

 

The format of the above files is explained here

 

Predictions of hybridNAP trained on ten training30small_DNA datasets and tested on the DNA_T datasets: Click here to download (185 KB)

Predictions of hybridNAP trained on ten training30small_RNA datasets and tested on the RNA_T datasets: Click here to download (117 KB)

Predictions of hybridNAP trained on ten training30small_protein datasets and tested on the protein_T datasets: Click here to download (318 KB)

 

The format of the above files is explained here

Click here to download the script to produce the numbers from Table S5

 

Source code to draw distributions of binding and non-binding residues in the 3-D space defined by the three hallmarks of binding

Source code for the DNA-binding residues: Click here to download (80.4 KB)

Source code for the RNA-binding residues: Click here to download (80.4 KB)

Source code for the protein-binding residues: Click here to download (80.4 KB)

Source code for the DNA & protein-binding residues: Click here to download (80.4 KB)

Source code for the RNA & protein-binding residues: Click here to download (80.4 KB)

 

The source code should be run in Mathematica version 9.0.1. You can download this software here