Protein-ligand interactions play a vital role in all biological processes ranging from metabolic enzyme catalysis to regulation of complex signaling cascades. Knowledge on molecular details of these interactions is crucial for complete understanding of the biological system. The large-scale structural information available on protein-ligand complexes has led to development of various computational approaches that analyze protein-ligand interactions in terms of atomic contacts, energetic contributions, binding site features etc. Though there exists various databases that store information on similar protein sequences and structure, there are only a handful of them that gives the information on similar protein-ligand interactions and different attributes of the protein-ligand interactions. The attributes can be roughly divided into: (a) Binding Site Characteristics – This includes pocket shape, nature of residues, interaction profiles with different kinds of atomic probes etc., (b) Atomic Contacts – Various type of polar, hydrophobic, aromatic contacts along with binding site water molecules that play crucial role in protein-ligand interaction and (c) Energetics involved in interactions derived from insilico scoring functions developed for docking.
Although there exists protein-ligand databases like BioLiP(1) and Possum(2), none of them analyze the similarities in binding sites by considering all the attributes nor provide multiple structure alignment of binding sites. In this study we present a database providing the PDB-scale information of all the similar binding sites. In-house tools like PocketMatch(3) and PocketAlign(4) have been used to obtained clusters of similar binding sites from PDB. Along with these, various computational tools like fPocket(5), Autodock, EasyMIFs(6) have been utilized to study the other attributes of these interactions across the clusters of similar binding sites.
All the protein-ligand complexes present in PDB were downloaded and the binding sites (all residues within 4.5Å from any ligand atom) were extracted. Only the biologically relevant ligands were chosen and minimum cut-off of five residues was chosen for all the binding sites. The 84846 binding sites obtained were then compared using PocketMatch algorithm. A PMAX cut-off of 0.80 was used to construct a binding site similarity network and clustering of this network was carried out using MCL algorithm(7). Around 10858 clusters of binding sites were obtained. Different attributes as mentioned above were derived using LPC(8), fPocket, Autodock and EasyMIFs for each of the interaction in the cluster.
The following figure depicts the binding site similarity network obtained for interactions present in the PLIC database. Each node represents a binding site and an edge represents a high similarity(PMAX above 0.80) between the sites. The color of the nodes are based upon the clusters obtained through implementation of MCL algorith on the binding site similarity network.
Click here to download the PDF - All_site_clusters.pdf
Information on all the 84846 protein-ligand interactions along with binding site similarity scores, clustering ID’s and their attributes has been stored in MySQL database. The database can be queried using standard PDB protein/hetatm codes or simple text search for protein or ligand names. The information on CATH superfamily association, enzyme commission number, Uniprot accession has been added. A web interface was created for the database through php. The interactions between the protein and ligand is displayed using jmol applet. Multiple structural alignment of various binding sites within a cluster is also provided. The graphical results of various analyses are displayed through highcharts java plugin. jQuery plugins have been used appropriately to display the results of the database query.
1. Yang, J., Roy, A. and Zhang, Y. (2013) BioLiP: a semi-manually curated database for biologically relevant ligand-protein interactions. Nucleic Acids Res, 41, D1096-1103.
2. Ito, J., Tabei, Y., Shimizu, K., Tsuda, K. and Tomii, K. (2012) PoSSuM: a database of similar protein-ligand binding and putative pockets. Nucleic Acids Res, 40, D541-548.
3. Yeturu, K. and Chandra, N. (2008) PocketMatch: a new algorithm to compare binding sites in protein structures. BMC Bioinformatics, 9, 543.
4. Yeturu, K. and Chandra, N. (2011) PocketAlign a novel algorithm for aligning binding sites in protein structures. J Chem Inf Model, 51, 1725-1736.
5. Schmidtke, P., Le Guilloux, V., Maupetit, J. and Tuffery, P. (2010) fpocket: online tools for protein ensemble pocket detection and tracking. Nucleic Acids Res, 38, W582-589.
6. Ghersi, D. and Sanchez, R. (2009) EasyMIFS and SiteHound: a toolkit for the identification of ligand-binding sites in protein structures. Bioinformatics, 25, 3185-3186.
7. van Dongen, S. and Abreu-Goodger, C. (2012) Using MCL to extract clusters from networks. Methods Mol Biol, 804, 281-295.
8. Sobolev, V., Eyal, E., Gerzon, S., Potapov, V., Babor, M., Prilusky, J. and Edelman, M. (2005) SPACE: a suite of tools for protein structure prediction and analysis based on complementarity and environment. Nucleic Acids Res, 33, W39-43.