Characterizing the pocketome of Mycobacterium tuberculosis and rationalization of polypharmacological target selection.

Obtaining Structural Proteome


Structural proteome annotation of Mycobacterium tuberculosis was carried out by our group in previous study. The details can be found in this publication. These structures along with experimental structures available in PDB were used to carry out the analysis in this study. Individual structures can be visualized at : http://proline.physics.iisc.ernet.in/Tbstructuralannotation/. Most of these structures were obtained from MODBASE. The details of those structures can be obtained from the csv file below:

Binding Site Detection


Consensus of three different algorithms were used to detect the binding sites, given a protein structure as input. They are as follows:
  • PocketDepth - Inhouse tool developed to shape-based method to detect binding site on protein structure through calculation of depth factor.
  • Sitehound - Energetics based method to detect the binding sites.
  • LigsiteCSC - Binding site detection algorithm based on surface-solvent-surface events and the degree of conservation of the involved surface residues.
All the above algorithms are available along with source code for download from their respective websites.

Binding Site Comparison


There are only handful of algorithms that compare give pair of binding sites on a large-scale. We mainly used two inhouse based algorithms for carrying out proteome wide binding site comparison. They are as follows:
  • PocketMatch - Each binding site is represented by 90 lists of sorted distances capturing shape and chemical nature of the site. The sorted arrays are then aligned using an incremental alignment method and scored to obtain PMScores for pairs of sites
  • PocketAlign - The algorithm encodes shape descriptors in the form of geometric perspectives, supplemented by chemical group classification. The shape descriptor considers several perspectives with each residue as the focus and captures relative distribution of residues around it in a given site. Residue-wise pairings are computed by comparing the set of perspectives of the first site with that of the second, followed by a greedy approach that incrementally combines residue pairings into a mapping.

Network Analysis


Following are the softwares and the packages used to analyze the networks of similarities obtained.
  • Cytoscape - Cytoscape was used for visualization of the networks obtained.
  • MCODE - The AllegroMCODE plugin of cytoscape was used for clustering of the network to obtain sets of similar binding sites.
  • igraph - The igraph package in R was used to carry out the network analysis and obtain clustering-coefficient of the nodes.

Other Datasets and Softwares Used


Following are the list of other datasets and softwares used for the analysis carried out in this study.
  • Binding MOAD Database - The biologically relevant known ligand binding sites present in PDB were obtained from this database. The MCODE clustering algorithm was validated over this dataset.The file containing the information on binding sites derived from BindingMOAD database that was used for validation of binding-site network clustering can be obtained by clicking here - BindingMOAD_sites.txt.
  • PROCOGNATE - The procognate database was used to validate the pocket detection and ligand associations obatined in this study. The file containing the information of ligand associations downloaded from PROCOGNATE database along with SCOP codes for the protein can be obtained by clicking here - PROCOGNATE1.6_SCOP.txt
  • 3D-BLAST - This algorithm was used to assign the SCOP codes for the protein structures.
  • Structural superpositions - TM-Align and Click were used for the structural superpositions.
  • Sequence comparison - All the sequence comparisons were carried out using BLAST.
  • KEGG - KEGG database was used to obtain the ligand associations for all the proteins in Mtb. The ligand associations can be obtained from this file - Rvprotein_keggligand.csv
  • Drug binding sites - The list of drug binding sites were obtained from Drugbank and Drugport. These are the correpsonding list of binding sites from drugbank - Drug_bank_drug_sites.csv and drugport - DrugPort_sites.csv
  • AutoDock Vina - Autodock Vina was used for calculating the energy of predicted drug associations through binding site comparison.
  • OpenBabel - This was used to calculate the tanimoto chemical similarity scores between the KEGG ligands and PDB ligands.