The PDID database provides access to a comprehensive set of putative and native protein-drug interactions in the structural human proteome. The structural human proteome includes about 10,000 human and human-like (with high sequence similarity to human proteins) proteins with known 3-D structures. The database includes data for popular, FDA-approved drugs. The corresponding protein-drug interactions were generated with three predictors, and were collected from and linked with three related databases of known protein-drug interactions.
Tutorial that explains how to use PDID is available here.
The structural human proteome was collected from the Protein Data Bank by removing low resolution (< 3Å) structures. Proteins for which sequences can be mapped to human proteins in the Ensembl database were used. Structures of chains with at least 90% sequence identity (measured using BLAST) to any human protein from 68th release of Ensembl were selected. The list of included proteins (identifiers from the Protein Data Bank) is available at http://biomine.cs.vcu.edu/servers/PDID/files/list_proteome.txt. The structural human proteome will be periodically updated in the future releases of PDID as new data will be deposited into the Protein Data Bank.
The database includes the FDA-approved drugs and nutraceuticals found in structures of proteins from the PDB that were extracted with the help of PDBsum. The structure of the protein-drug complexes is required to predict drug targets that are stored in PDID. The list of included drugs is available at http://biomine.cs.vcu.edu/servers/PDID/files/list_drugs.txt. Additional drugs will be periodically added in the future releases of PDID.
The protein-drug interactions that are made available in the PDID include the known and putative (predicted) interactions.
The known interactions were collected from the DrugBank [1], BindingDB [2], and Protein Data Bank [3] resources. These interactions are annotated in PDID as known and are linked to the corresponding databases.
The putative interactions were predicted
with three methods:
1. Customized version of the eFindSite
method [4, 5] that predicts targets based on similarity of binding
pockets using threading.
2. Customized version of the SMAP method [6]
that predicts targets based on similarity of binding pockets
and protein fold using profile-profile alignment.
3. The ILbind method [7] that predicts
targets using consensus of 15 support vector
machines and combines similarity based on threading and profile-profile alignment.
The proteins are mapped into the UniProt database [8] using UniProt identifiers to facilitate mapping between PDID, Protein Data Bank, DrugBank, and BindingDB.
References
1. Wishart DS, Knox C, Guo AC, et al. (2006). DrugBank: a
comprehensive resource for in silico drug discovery and exploration.
Nucleic Acids Res 34:D668-72
2. Liu T, Lin Y, Wen X, et al. (2007) BindingDB: a
web-accessible database of experimentally determined protein-ligand
binding affinities. Nucleic Acids Res 35:D198-201
3. Berman HM, Westbrook J, Feng Z, et al. (2000). The Protein
Data Bank. Nucleic Acids Res 28:235-42
4. Brylinski M and Feinstein WP. (2013). eFindSite: Improved prediction
of ligand binding sites in protein models using meta-threading,
machine learning and auxiliary ligands. J Comput Aided Mol Des.
27(6):551-567
5. Feinstein WP and Brylinski M. (2014). eFindSite: Enhanced
fingerprint-based virtual screening against predicted ligand
binding sites in protein models. Mol Inform. 33(2):135-50
6. Xie L and Bourne PE (2008) Detecting
evolutionary relationships across existing fold space, using sequence
order independent profile-profile alignments". Proc Natl Acad
Sci USA 105(14):5441-6
7. Hu G, Gao J, Wang K, et al. (2012) Finding protein targets for small
biologically relevant ligands across
fold space using inverse ligand binding predictions. Structure
20:1815-22
8. The UniProt Consortium. Activities at the Universal Protein Resource
(UniProt). Nucleic Acids Res 42:D191-8