CONNECTOR - prediCtOr of compouNd-proteiN intEraCTiOn based on ensemble of similaRities

CONNECTOR webserver

CONNECTOR is a webserver that predicts propensity of putative drug-protein interactions based on similarity between the input drug structure, drug profile, and/or protein sequence and the experimental drug-protein interactions that are included in the internal database. This webserver facilitates prediction for any combination of inputs including any individual input (drug structure, drug profile, and protein sequence), any pair of inputs, and all three inputs.

The internal database of CONNECTOR integrates drug-protein interactions that were collected from Therapeutic Target Database, IUPHAR/BPS Guide to Pharmacology database and Drug2Gene that combines data from CGDCP, ChEBI, ChEMBL, CTD, DrugBank, HGNC, Ligand Expo, MICAD, NCBI Gene, Pathway Commons DB, PDBsum, PDSP Ki, PharmGKB, Pubchem Bioassay, PubChem Compound, PubChem Substance, and UniProt.

The current version of the internal database includes 449 drugs, 1469 protein targets, and 34456 drug-protein interactions

Please follow the three steps below to make predictions.

Step 1. Provide at least one of the three inputs listed below

Enter structure of a query drug in the SMILES format.

List side-effects of the query drug in the CSV format. The side-effect terms must come from this fixed list of terms.

Enter amino acids sequence of a known protein target in one line.

Step 2. Provide your email address (required)

Please enter your email address in the following text area. A link to prediction results will be sent to your email address once they are ready.

Step 3. Click button to launch prediction


The benchmark database

    Internal database of CONNECTOR includes four files:
  • drug_structures.tsv - tab-separated list of PubChem identifiers and structures of the 449 drugs in the SMILES format.
  • drug_profiles.tsv - tab-separated list of PubChem identifiers and profiles of the 449 drugs. The profiles are based on the list of terms that are available from here.
  • protein_sequences.fasta - the FASTA formatted list of UniProt accession numbers and sequences for the 1469 drug targets.
  • drug_protein_interactions.tsv - tab-separated list of 34456 drug-protein interactions where drugs are identified using PubChem identifiers and drug targets are identified using UniProt accession numbers.


We acknowledge with thanks the following software used as a part of this server:

  • Chemistry Development Kit - Open Source modular Java libraries for Cheminformatics
  • Weka 3 - Data Mining Software in Java
  • BLAST - Finding regions of similarity between protein sequences