We have made available the tar-zipped files of the four databases for remote homology detection
and a separate dataset containing only the designed intermediate sequences.
(1) SCOP(v1.75)-DB (version 1)
This protein sequence database contains sequences with known structures (SCOP 1.75v) and their sequence homologues obtained from
non-redundant sequence database searches (UniRef90). This database contains 4,694,921 sequences and is of size 517 MB. On an average,
at a downlaod speed of 1.5 Mbps, it would take 5 to 6 minutes for download.
(2) SCOP(v1.75)-NrichD database (version 1)
The database contains natural protein sequences in SCOP-DB (version 1) which are augmented with computationally designed
intermediate sequences (8,305,931 sequences).
Designed intermediate sequences are annotated as "Int" and the two parent SCOP domain families are also provided in the
annotation line. The SCOP codes of the two families are separated by a "|".
For example: - ">Int_1|a.1.1.1_1|a.1.1.2_1"
This annotation line denotes that its an designed intermediate sequence between SCOP domain families "a.1.1.1" and "a.1.1.2".
The size of the database is 869 MB and takes approximately 14 minutes to download at an average download speed of ~1.5 Mbps.
(3) Pfam(v27.0)-DB (version 1)
This sequence database contains non-redundant sequences from 14,831 Pfam families (version 27) that were downloaded from Pfam
FTP site (10,626,097 sequences). The database size is 983 MB and average download time at 1.5 Mbps download speed is 15 minutes.
(4) Pfam(v27.0)-NrichD database (version 1)
The database contains sequences from Pfam-DB (version 1) and computationally designed intermediate sequences generated using multiple PSSMs
of SCOP domain families (14,237,107 sequences). This database is 1.4 GB in size and takes about 15 to 20 mins for download.
(5) Pfam(v31.0)-NrichD database (version 2)
The database contains sequences from Pfam-DB (version 2) and computationally designed intermediate sequences generated using multiple PSSMs of SCOP domain families (Total number of sequence is 25438429). This database is 5.2 GB in size and takes about twenty mins for download.
(6) AS-DB: Artifical sequence database (version 1)
This dataset contains 3,611,010 designed intermediate sequences that were designed for 374 folds (SCOP- v1.75). The database size is 353 MB and can be downloaded
in 5 minutes.