Protein family

Source: Wikipedia, the free encyclopedia.
(Redirected from
Enzyme family
)
The human cyclophilin family, as represented by the structures of the isomerase domains of some of its members

A protein family is a group of evolutionarily related proteins. In many cases, a protein family has a corresponding gene family, in which each gene encodes a corresponding protein with a 1:1 relationship. The term "protein family" should not be confused with family as it is used in taxonomy.

Proteins in a family descend from a common ancestor and typically have similar

clades called superfamilies
based on structural similarity, even if there is no identifiable sequence homology.

Currently, over 60,000 protein families have been defined,[5] although ambiguity in the definition of "protein family" leads different researchers to highly varying numbers.

Terminology and usage

As with many biological terms, the use of protein family is somewhat context dependent; it may indicate large groups of proteins with the lowest possible level of detectable sequence similarity, or very narrow groups of proteins with almost identical sequence, function, and three-dimensional structure, or any kind of group in between. To distinguish between these situations, the term

PA clan
of proteases, has far lower sequence conservation than one of the families it contains, the C04 family.

DALI
.

Protein domains and motifs

The concept of protein family was conceived when very few protein structures or sequences were known. At the time, the majority of proteins that were structurally understood were small, single-domain proteins such as myoglobin, hemoglobin, and cytochrome c. Since then, many proteins have been found with multiple independent structural and functional units or domains. Due to evolutionary shuffling, different domains in a protein have evolved independently. This has led to a focus on families of protein domains. A number of online resources are devoted to identifying and cataloging such domains.[12][13]

Different regions of a protein have differing functional constraints (features critical to the structure and function of the protein). For example, the

hydrophobicity or polarity of the amino-acid residues. Functionally constrained regions of proteins evolve more slowly than unconstrained regions such as surface loops, giving rise to discernible blocks of conserved sequence when the sequences of a protein family are compared (see multiple sequence alignment). These blocks are most commonly referred to as motifs, although many other terms are used (blocks, signatures, fingerprints, etc.). Again, many online resources are devoted to identifying and cataloging protein motifs.[14]

Evolution of protein families

According to current consensus, protein families arise in two ways. First, the separation of a parent species into two genetically isolated descendant species allows a gene/protein to independently accumulate variations (

paralog
). Because the original gene is still able to perform its function, the duplicated gene is free to diverge and may acquire new functions (by random mutation).

Certain gene/protein families, especially in

genome duplications. Expansions are less likely, and losses more likely, for intrinsically disordered proteins and for protein domains whose hydrophobic amino acids are further from the optimal degree of dispersion along the primary sequence.[15] This expansion and contraction of protein families is one of the salient features of genome evolution
, but its importance and ramifications are currently unclear.

Phylogenetic tree of RAS superfamily: This tree was created using FigTree (free online software).

Use and importance of protein families

As the total number of sequenced proteins increases and interest expands in

phylogenetic analysis, functional annotation, and the exploration of the diversity of protein function in a given phylogenetic branch. The Enzyme Function Initiative uses protein families and superfamilies as the basis for development of a sequence/structure-based strategy for large scale functional assignment of enzymes of unknown function.[16]
The algorithmic means for establishing protein families on a large scale are based on a notion of similarity.

Protein family resources

Many

biological databases
record examples of protein families and allow users to identify if newly identified proteins belong to a known family. Here are a few examples:

  • Pfam - Protein families database of alignments and HMMs
  • PROSITE - Database of protein domains, families and functional sites
  • PIRSF - SuperFamily Classification System
  • PASS2 - Protein Alignment as Structural Superfamilies v2 - PASS2@NCBS[17]
  • SUPERFAMILY
    - Library of HMMs representing superfamilies and database of (superfamily and family) annotations for all completely sequenced organisms
  • CATH
    - Classifications of protein structures into superfamilies, families and domains

Similarly, many database-searching algorithms exist, for example:

  • BLAST - DNA sequence similarity search
  • BLASTp
    - Protein sequence similarity search
  • OrthoFinder - Method for clustering proteins into families (orthogroups)[18][19]

See also

Protein families

References

External links