NrichD Database

Protein sequence space is riddled with large gaps and sequence-based remote homology detection methods are rendered less effective due to the paucity of 'natural linkers sequences' which facilitate homology detection. To address this problem, we developed an algorithm to computationally design protein-like sequences that attempts to link distantly related proteins by ‘populating the gaps in protein sequence space’.
We designed 3,611,010 protein-like sequences between 27,882 pairs of related protein families for 374 multi-membered SCOP folds (SCOP 1.75v). These designed intermediate sequences were appropriately annotated with the two parent families and form the Artifical Sequence database (AS-DB, version 1). This AS-DB was augmented into a database of sequences from SCOP database and their homologs (SCOP-DB) to create a sequence database, referred as the 'SCOP(v1.75)-NrichD' Database i.e. Natural sequences from SCOP database enriched with Designed intermediate sequences comprising of 8,305,931 sequences. A similar database was generated by 'plugging-in' the AS-DB in a more generic sequence database (Pfam database, version 27), called 'Pfam(v27.0)-NrichD' database.

These NrichD databases as well as natural sequence databases and only designed intermediate sequences can be downloaded from the download page:

This web-resource provides two key features:

Query NrichD database

The web-resource enables the user to perform jackhmmer searches (HMMER3.0 suite) in two databases namely, SCOP-NrichD database and Pfam-NrichD database, to identify remote homologs for their query sequence.                                           

Query NrichD db

Design protein-like sequences

User can design protein-like sequences for a SCOP domain family or between two families within a fold. User can also provide multiple sequence alignment(s) of their families to generate sequences for/between them.

Design Sequences


(1) NrichD database: sequence databases enriched with computationally designed protein-like sequences aid in remote homology detection.
Mudgal R., Sandhya S., Kumar G., Sowdhamini R., Chandra N. and Srinivasan N.
Nucl. Acids Res. (2014) Advance access: doi: 10.1093/nar/gku888

(2) Filling-in void and sparse regions in protein sequence space by protein-like artificial sequences enables remarkable enhancement in remote homology detection capability.
Mudgal R., Sowdhamini R., Chandra N., Srinivasan N. and Sandhya S.
J. Mol. Biol. (2014) 426: 962-979.  (