DomIns: A Web Resource for Domain
Insertions in Known Protein Structures

Home          Search Methods          Search            Literature          FAQ          Contact
 

1) What is domIns all about?

      Welcome to the new and updated domIns website! DomIns is a web resource aimed at providing comprehensive information on domain insertions in proteins of known structure. We have followed the definition of protein domains as in the SCOP (Structural Classification of Proteins) database in order to identify insertions. The server is currently updated to SCOP version 1.71 and PDB_Select March 2006 version. The previous version of domIns used the SCOP version 1.61 and PDB_Select April 2002 version.


2) What are domain insertions?

      In the above figure, the E.coli protein Malonyl-CoA:Acyl Carrier Protein transacylase has two domains: the catalytic domain (coloured blue and green) is interrupted by the insertion of the ACP-binding domain (coloured red and yellow). The parent domain (catalytic domain) has two regions, with residue position from 3-127 and 128-307 in the same domain. Both the parent and the insert domains belong to two different superfamily of proteins. Similar arrangement is seen in Streptomyces coelicolor malonyl-CoA:ACP transacylase as well (pdb code:1nm2). This is an example for single insertion, where the parent domain is interrupted by a single insert domain. In mutiple insertions, there is more than one insert domain.


3) How does one identify domain insertions?

      Although there are several schemes for protein structure classification for investigating protein sequences and structures, SCOP is important as it is a manually curated classification of proteins of know structures from the protein data Bank based on their structural and evolutionary relatedness. In SCOP, a protein domain is considered as an unit of evolution if it occurs independently or in combination with other domains on the basis of evidence from proteins of known structure. SCOP has a hierarchical classification scheme with the principal levels being family, superfamily, fold and class. Proteins clustered together into families are clearly evolutionarily related, usually detectable at sequence level. Proteins brought together into superfamilies although have low sequence identity, their structural and functional features suggest a common evolutionary origin. Superfamilies with similar topology, but without evidence for evolutionary relatedness are grouped under a fold. Folds are then classified into classes based on the secondary structure elements present.
      We have considered only the first five classes (All-alpha, All-beta, alpha/beta, alpha+beta and Small proteins), the fold and the superfamily level of SCOP hierarchy for determining insertions. We excluded mono-domain proteins and considered chains which have at least two domains in them. In multi-domain proteins, while it is usual to have two domains linked in a linear fashion, i.e., the C-terminus of the first domain covalently linked to the N-terminus of the second domain, we looked for domains which are interrupted in the middle by the insertion of another domain. Thus, the second domain (insert) begins and ends inside the first domain (parent domain). The domains involved in insertions can come from the same or different SCOP superfamily.


4) About the access methods, how does one use the “browse all entries” option?

      This option allows you to browse all PDB entries with at least a single insertion. There is also an option to view entries from a non-redundant set of proteins. We have used PDB_Select for obtaining a representative list of protein chains from the PDB. PDB_Select contains several lists, each at a different cutoff of similarity. Although the most stringent is the 25% list, in which no two proteins have more than 25% sequence identity (for alignments of length 80 or more residues), we have used a 90% list.

The lists can be obtained from:
                                                   http://bioinfo.tg.fh-giessen.de/pdbselect/

The algorithm to extract the lists is explained in:
Selection of a representative set of structures from the Brookhaven Protein Data Bank",Protein Science 1 (1992), 409-417.


5) How can you do a simple search?

      A simple search allows one to look for insertions, given a PDB identifier with or without chain information. No result for a given query can be because of one of the following reasons:
(a) No known insertion
(b) There is no SCOP classification available for the structure
(c) The structure is not part of true class (a to g) as defined in SCOP
(d) We may have missed identifying the insertion in which case it would nice if you can let us know.


6) Can one obtain a list going by insertion type?

      We have categorized known insertions as single or multiple depending on the number of insert domains in a given chain. In single insertions, a domain belonging to a particular superfamily gets inserted into another domain of the same superfamiy or of a different superfamily. In multiple insertions, more than one insert, of the same or different superfamily is inserted into the parent domain. There is a feature to display entries belonging to either of these categories.

7) How can we obtain a list based on insertion combination?

      We have provided a search facility where we have grouped insertions based on the combination of SCOP classes. For example, clicking the cell marked 1 will retrieve the list of entries where the parent domain belongs to alpha/beta class and the insert belongs to alpha+beta class.
      The list of entries with a specific parent or insert class can also be obtained by clicking the individual classes on the top-most horizontal row for parent classes or the first vertical column for insert classes. For example, the cell marked 2 will retrieve all entries which have at least one parent domain belonging to All-alpha class while the clicking the cell marked 3 will retrieve all entries which have at least one insert domain belonging to alpha+beta class.
For each entry (chain) in the database, we provide the following information: the name of the protein, its biochemical function, Medline reference for the structure, the number of domains, their boundary (based on SCOP domain definition), sequence information, links to SCOP, CATH, FSSP, PDBSum and MMDB.


8) What sort of software packages/tools went into the making of domIns?

       We have used mySQL and HTML pages to create the resource.


To report comments and suggestions email aroul@oxfordbiodynamics.com                                                       This page was last updated on 17 August 2007