Sequence homology
Sequence homology is the
Homology among DNA, RNA, or proteins is typically inferred from their nucleotide or amino acid sequence similarity. Significant similarity is strong evidence that two sequences are related by evolutionary changes from a common ancestral sequence. Alignments of multiple sequences are used to indicate which regions of each sequence are homologous.
Identity, similarity, and conservation
The term "percent homology" is often used to mean "sequence similarity”, that is the percentage of identical residues (percent identity), or the percentage of residues conserved with similar physicochemical properties (percent similarity), e.g. leucine and isoleucine, is usually used to "quantify the homology." Based on the definition of homology specified above this terminology is incorrect since sequence similarity is the observation, homology is the conclusion.[3] Sequences are either homologous or not.[3] This involves that the term "percent homology" is a misnomer.[4]
As with morphological and anatomical structures, sequence similarity might occur because of convergent evolution, or, as with shorter sequences, by chance, meaning that they are not homologous. Homologous sequence regions are also called conserved. This is not to be confused with conservation in amino acid sequences, where the amino acid at a specific position has been substituted with a different one that has functionally equivalent physicochemical properties.
Partial homology can occur where a segment of the compared sequences has a shared origin, while the rest does not. Such partial homology may result from a
Orthology
Homologous sequences are orthologous if they are inferred to be descended from the same ancestral sequence separated by a
For instance, the plant
Orthology is strictly defined in terms of ancestry. Given that the exact ancestry of genes in different organisms is difficult to ascertain due to gene duplication and genome rearrangement events, the strongest evidence that two similar genes are orthologous is usually found by carrying out phylogenetic analysis of the gene lineage. Orthologs often, but not always, have the same function.[7]
Orthologous sequences provide useful information in taxonomic classification and phylogenetic studies of organisms. The pattern of genetic divergence can be used to trace the relatedness of organisms. Two organisms that are very closely related are likely to display very similar DNA sequences between two orthologs. Conversely, an organism that is further removed evolutionarily from another organism is likely to display a greater divergence in the sequence of the orthologs being studied.[citation needed]
Databases of orthologous genes
Given their tremendous importance for biology and
- eggNOG[10][11]
- GreenPhylDB[12][13] for plants
- InParanoid[14][15] focuses on pairwise ortholog relationships
- OHNOLOGS[16][17] is a repository of the genes retained from whole genome duplications in the vertebrate genomes including human and mouse.
- OMA[18]
- OrthoDB[19] appreciates that the orthology concept is relative to different speciation points by providing a hierarchy of orthologs along the species tree.
- OrthoInspector[20] is a repository of orthologous genes for 4753 organisms covering the three domains of life
- OrthologID[21][22]
- OrthoMaM[23][24][25] for mammals
- OrthoMCL[26][27]
- Roundup[28]
Tree-based
- LOFT[29]
- TreeFam[30][31]
- OrthoFinder[32]
A third category of hybrid approaches uses both heuristic and phylogenetic methods to construct clusters and determine trees, for example:
Paralogy
Paralogous genes are genes that are related via duplication events in the
As an example, in the LCA, one gene (gene A) may get duplicated to make a separate similar gene (gene B), those two genes will continue to get passed to subsequent generations. During speciation, one environment will favor a mutation in gene A (gene A1), producing a new species with genes A1 and B. Then in a separate speciation event, one environment will favor a mutation in gene B (gene B1) giving rise to a new species with genes A and B1. The descendants' genes A1 and B1 are paralogous to each other because they are homologs that are related via a duplication event in the last common ancestor of the two species.[1]
Additional classifications of paralogs include alloparalogs (out-paralogs) and symparalogs (in-paralogs). Alloparalogs are paralogs that evolved from gene duplications that preceded the given speciation event. In other words, alloparalogs are paralogs that evolved from duplication events that happened in the LCA of the organisms being compared. The example above is an example alloparalogy. Symparalogs are paralogs that evolved from gene duplication of paralogous genes in subsequent speciation events. From the example above, if the descendant with genes A1 and B underwent another speciation event where gene A1 duplicated, the new species would have genes B, A1a, and A1b. In this example, genes A1a and A1b are symparalogs.[1]
Paralogous genes can shape the structure of whole genomes and thus explain genome evolution to a large extent. Examples include the Homeobox (Hox) genes in animals. These genes not only underwent gene duplications within chromosomes but also whole genome duplications. As a result, Hox genes in most vertebrates are clustered across multiple chromosomes with the HoxA-D clusters being the best studied.[37]
Another example are the
It is often asserted that orthologs are more functionally similar than paralogs of similar divergence, but several papers have challenged this notion.[38][39][40]
Regulation
Paralogs are often regulated differently, e.g. by having different tissue-specific expression patterns (see Hox genes). However, they can also be regulated differently on the protein level. For instance, Bacillus subtilis encodes two paralogues of glutamate dehydrogenase: GudB is constitutively transcribed whereas RocG is tightly regulated. In their active, oligomeric states, both enzymes show similar enzymatic rates. However, swaps of enzymes and promoters cause severe fitness losses, thus indicating promoter–enzyme coevolution. Characterization of the proteins shows that, compared to RocG, GudB's enzymatic activity is highly dependent on glutamate and pH.[41]
Paralogous chromosomal regions
Sometimes, large regions of chromosomes share gene content similar to other chromosomal regions within the same genome.
Ohnology
Ohnologous genes are paralogous genes that have originated by a process of whole-genome duplication. The name was first given in honour of Susumu Ohno by Ken Wolfe.[51] Ohnologues are useful for evolutionary analysis because all ohnologues in a genome have been diverging for the same length of time (since their common origin in the whole genome duplication). Ohnologues are also known to show greater association with cancers, dominant genetic disorders, and pathogenic copy number variations.[52][53][54][55][56]
Xenology
Homologs resulting from horizontal gene transfer between two organisms are termed xenologs. Xenologs can have different functions if the new environment is vastly different for the horizontally moving gene. In general, though, xenologs typically have similar function in both organisms. The term was coined by Walter Fitch.[5]
Homoeology
Homoeologous (also spelled homeologous) chromosomes or parts of chromosomes are those brought together following
Gametology
Gametology denotes the relationship between homologous genes on non-recombining, opposite
See also
- Deep homology
- EggNOG (database)
- OrthoDB
- Orthologous MAtrix (OMA)
- PhEVER
- Protein family
- Protein superfamily
- TreeFam
- Syntelog
References
- ^ PMID 16285863.
- ^ "Clustal FAQ #Symbols". Clustal. Retrieved 8 December 2014.
- ^ S2CID 42949514.
- ISSN 0882-3383.
- ^ PMID 5449325.
Where the homology is the result of gene duplication so that both copies have descended side by side during the history of an organism (for example, a and b hemoglobin) the genes should be called paralogous (para = in parallel). Where the homology is the result of speciation so that the history of the gene reflects the history of the species (for example a hemoglobin in man and mouse) the genes should be called orthologous (ortho = exact).
- PMID 15630026.
- PMID 20361041.
- PMID 9381173.
- PMID 30893420.
- PMID 19900971.
- PMID 24297252.
- PMID 17986457.
- PMID 20864446.
- PMID 19892828.
- PMID 25429972.
- PMID 26181593.
- ^ "Vertebrate Ohnologs". ohnologs.curie.fr. Retrieved 2018-10-12.
- PMID 29106550.
- PMID 27899580.
- PMID 30380106.
- PMID 16410324.
- PMID 19378138.
- PMID 18053139.
- PMID 24723423.
- PMID 30698751.
- PMID 16381887.
- PMID 21901743.
- PMID 16777906.
- PMID 17346331.
- PMID 18056084.
- PMID 24194607.
- PMID 31727128.
- PMID 19029536.
- PMID 29425291.
- PMID 21097890.
- PMID 16729895.
- ^ PMID 17644373.
- PMID 19368988.
- PMID 21695233.
- ^ Eisen J (20 September 2011). "Special Guest Post & Discussion Invitation from Matthew Hahn on Ortholog Conjecture Paper".
- PMID 28468957.
- PMID 8486346.
- PMID 11144283.
- PMID 7579516.
- PMID 9729879.
- PMID 18578868.
- S2CID 32135432.
- PMID 16801555.
- PMID 11567626.
- S2CID 8263376.
- S2CID 85257685.
- PMID 23168259.
- PMID 24530892.
- PMID 25080083.
- PMID 24368850.
- PMID 20439718.
- PMID 27021699.
- ^ PMID 11110898.