Protein sequence space is riddled with large gaps and sequence-based remote homology detection methods are rendered
less effective due to the paucity of 'natural linkers sequences' which facilitate homology detection. To address this problem, we developed an
algorithm to computationally design protein-like sequences that attempts to link distantly related proteins by ‘populating the gaps in
protein sequence space’.
We designed 3,611,010 protein-like sequences between 27,882 pairs of related protein families for 374 multi-membered
SCOP folds (SCOP 1.75v). These designed intermediate sequences were appropriately annotated with the two parent
families and form the Artifical Sequence database (AS-DB, version 1). This AS-DB was augmented into a database of sequences from SCOP
database and their homologs (SCOP-DB) to create a sequence database, referred as the 'SCOP(v1.75)-NrichD' Database
i.e. Natural sequences from SCOP database enriched with Designed intermediate sequences comprising of 8,305,931
sequences.
A similar database was generated by 'plugging-in' the AS-DB in a more generic sequence database (Pfam database, version 27), called
'Pfam(v27.0)-NrichD' database.
These NrichD databases as well as natural sequence databases and only designed intermediate sequences can be downloaded from the download page: http://proline.biochem.iisc.ernet.in/NRICHD/download
This web-resource provides two key features:
The web-resource enables the user to perform jackhmmer searches (HMMER3.0 suite) in two databases namely, SCOP-NrichD database and Pfam-NrichD database, to identify remote homologs for their query sequence.
Query NrichD dbUser can design protein-like sequences for a SCOP domain family or between two families within a fold. User can also provide multiple sequence alignment(s) of their families to generate sequences for/between them.
Design Sequences