Protein superfamily

Source: Wikipedia, the free encyclopedia.
(Redirected from
Enzyme superfamily
)

Historically, the similarity of different amino acid sequences has been the most common method of inferring

catalytic sites
and binding sites, since these regions are less tolerant to sequence changes.

Using sequence similarity to infer homology has several limitations. There is no minimum level of sequence similarity guaranteed to produce identical structures. Over long periods of evolution, related proteins may show no detectable sequence similarity to one another. Sequences with many

PA clan of proteases, for example, not a single residue is conserved through the superfamily, not even those in the catalytic triad
. Conversely, the individual families that make up a superfamily are defined on the basis of their sequence alignment, for example the C04 protease family within the PA clan.

Nevertheless, sequence similarity is the most commonly used form of evidence to infer relatedness, since the number of known sequences vastly outnumbers the number of known tertiary structures.[6] In the absence of structural information, sequence similarity constrains the limits of which proteins can be assigned to a superfamily.[6]

Structural similarity

west nile virus protease (1fp7), exfoliatin toxin (1exf), HtrA protease (1l1j), snake venom plasminogen activator (1bqy), chloroplast protease (4fln) and equine arteritis virus protease (1mbm).

DALI, use the 3D structure of a protein of interest to find proteins with similar folds.[10] However, on rare occasions, related proteins may evolve to be structurally dissimilar[11] and relatedness can only be inferred by other methods.[12][13][14]

Mechanistic similarity

The

convergently evolved multiple times independently, and so form separate superfamilies,[18][19][20] and in some superfamilies display a range of different (though often chemically similar) mechanisms.[15][21]

Evolutionary significance

Protein superfamilies represent the current limits of our ability to identify common ancestry.

evolutionary grouping based on direct evidence that is currently possible. They are therefore amongst the most ancient evolutionary events currently studied. Some superfamilies have members present in all kingdoms of life, indicating that the last common ancestor of that superfamily was in the last universal common ancestor of all life (LUCA).[23]

Superfamily members may be in different species, with the ancestral protein being the form of the protein that existed in the ancestral species (orthology). Conversely, the proteins may be in the same species, but evolved from a single protein whose gene was duplicated in the genome (paralogy).

Diversification

A majority of proteins contain multiple domains. Between 66-80% of eukaryotic proteins have multiple domains while about 40-60% of prokaryotic proteins have multiple domains.[5] Over time, many of the superfamilies of domains have mixed together. In fact, it is very rare to find “consistently isolated superfamilies”.[5] [1] When domains do combine, the N- to C-terminal domain order (the "domain architecture") is typically well conserved. Additionally, the number of domain combinations seen in nature is small compared to the number of possibilities, suggesting that selection acts on all combinations.[5]

Examples

α/β hydrolase superfamily
Members share an α/β sheet, containing 8
Alkaline phosphatase superfamily
Members share an αβα sandwich structure[26] as well as performing common promiscuous reactions by a common mechanism.[27]
Globin superfamily
Members share an 8-
globin fold.[28][29]
Immunoglobulin superfamily
Members share a sandwich-like structure of two
Ig-fold), and are involved in recognition, binding, and adhesion.[30][31]
PA clan
Members share a
Ras superfamily
Members share a common catalytic G domain of a 6-strand β sheet surrounded by 5 α-helices.[33]
RSH superfamily
Members share capability to hydrolyze and/or synthesize ppGpp alarmones in the stringent response. [34]
Serpin superfamily
Members share a high-energy, stressed fold which can undergo a large
cysteine proteases by disrupting their structure.[9]
TIM barrel superfamily
Members share a large α8β8 barrel structure. It is one of the most common
protein folds and the monophylicity of this superfamily is still contested.[35][36]

Protein superfamily resources

Several

biological databases
document protein superfamilies and protein folds, for example:

  • Pfam - Protein families database of alignments and HMMs
  • PROSITE - Database of protein domains, families and functional sites
  • PIRSF - SuperFamily Classification System
  • PASS2 - Protein Alignment as Structural Superfamilies v2
  • SUPERFAMILY
    - Library of HMMs representing superfamilies and database of (superfamily and family) annotations for all completely sequenced organisms
  • CATH
    - Classifications of protein structures into superfamilies, families and domains

Similarly there are algorithms that search the PDB for proteins with structural homology to a target structure, for example:

  • DALI - Structural alignment based on a distance alignment matrix method

See also

References

External links