Protein superfamily
A protein superfamily is the largest grouping (
Identification
Superfamilies of proteins are identified using a number of methods. Closely related members can be identified by different methods to those needed to group the most evolutionarily divergent members.
Sequence similarity
Historically, the similarity of different amino acid sequences has been the most common method of inferring
Using sequence similarity to infer homology has several limitations. There is no minimum level of sequence similarity guaranteed to produce identical structures. Over long periods of evolution, related proteins may show no detectable sequence similarity to one another. Sequences with many
Nevertheless, sequence similarity is the most commonly used form of evidence to infer relatedness, since the number of known sequences vastly outnumbers the number of known tertiary structures.[6] In the absence of structural information, sequence similarity constrains the limits of which proteins can be assigned to a superfamily.[6]
Structural similarity
Mechanistic similarity
The
Evolutionary significance
Protein superfamilies represent the current limits of our ability to identify common ancestry.
Superfamily members may be in different species, with the ancestral protein being the form of the protein that existed in the ancestral species (orthology). Conversely, the proteins may be in the same species, but evolved from a single protein whose gene was duplicated in the genome (paralogy).
Diversification
A majority of proteins contain multiple domains. Between 66-80% of eukaryotic proteins have multiple domains while about 40-60% of prokaryotic proteins have multiple domains.[5] Over time, many of the superfamilies of domains have mixed together. In fact, it is very rare to find “consistently isolated superfamilies”.[5] [1] When domains do combine, the N- to C-terminal domain order (the "domain architecture") is typically well conserved. Additionally, the number of domain combinations seen in nature is small compared to the number of possibilities, suggesting that selection acts on all combinations.[5]
Examples
- α/β hydrolase superfamily
- Members share an α/β sheet, containing 8
- Alkaline phosphatase superfamily
- Members share an αβα sandwich structure[26] as well as performing common promiscuous reactions by a common mechanism.[27]
- Globin superfamily
- Members share an 8-
- Immunoglobulin superfamily
- Members share a sandwich-like structure of two
- PA clan
- Members share a
- Ras superfamily
- Members share a common catalytic G domain of a 6-strand β sheet surrounded by 5 α-helices.[33]
- RSH superfamily
- Members share capability to hydrolyze and/or synthesize ppGpp alarmones in the stringent response. [34]
- Serpin superfamily
- Members share a high-energy, stressed fold which can undergo a large cysteine proteases by disrupting their structure.[9]
- TIM barrel superfamily
- Members share a large α8β8 barrel structure. It is one of the most common
Protein superfamily resources
Several
- Pfam - Protein families database of alignments and HMMs
- PROSITE - Database of protein domains, families and functional sites
- PIRSF - SuperFamily Classification System
- PASS2 - Protein Alignment as Structural Superfamilies v2
- SUPERFAMILY- Library of HMMs representing superfamilies and database of (superfamily and family) annotations for all completely sequenced organisms
- CATH- Classifications of protein structures into superfamilies, families and domains
Similarly there are algorithms that search the PDB for proteins with structural homology to a target structure, for example:
- DALI - Structural alignment based on a distance alignment matrix method
See also
- Structural alignment
- Protein domains
- Protein family
- Protein mimetic
- Protein structure
- Homology (biology)
- Interolog
- List of gene families
- SUPERFAMILY
- CATH
References
- ^ PMID 20457744.
- ^ PMID 22086950.
- PMID 8687420.
- ^ "Clustal FAQ #Symbols". Clustal. Archived from the original on 24 October 2016. Retrieved 8 December 2014.
- ^ S2CID 13762291.
- ^ PMID 11752317.
- PMID 15954844.
- PMID 22427707.
- ^ PMID 11435447.
- PMID 27131377.
- PMID 19325884.
- S2CID 14936647.
- PMID 15604105.
- PMID 20591649.
- ^ ISBN 9789402410679
- PMID 26781812.
- PMID 26097079.
- PMID 23382230.
- PMID 12691742.
- PMID 25575902.
- PMID 24271399.
- PMID 15741509.
- S2CID 25258028.
- PMID 19508187.
- PMID 10607665.
- ^ "SCOP". Archived from the original on 29 July 2014. Retrieved 28 May 2014.
- PMID 22885024.
- ISBN 978-0815323051.
- PMID 2926816.
- PMID 7932691.
- PMID 8574878.
- PMID 3186696.
- S2CID 6636339.
- PMID 21858139.
- PMID 12206759.
- .
External links
- Media related to Protein superfamilies at Wikimedia Commons