Genome evolution

Source: Wikipedia, the free encyclopedia.

Genome evolution is the process by which a genome changes in structure (sequence) or size over time. The study of genome evolution involves multiple fields such as structural analysis of the genome, the study of genomic parasites, gene and ancient genome duplications, polyploidy, and comparative genomics. Genome evolution is a constantly changing and evolving field due to the steadily growing number of sequenced genomes, both prokaryotic and eukaryotic, available to the scientific community and the public at large.

Circular representation of the Mycobacterium leprae genome created using JCVI online genome tools.

History

Since the first sequenced genomes became available in the late 1970s,

Genome sequencing has progressed over time to include more and more complex genomes including the eventual sequencing of the entire human genome in 2001.[2] By comparing genomes of both close relatives and distant ancestors the stark differences and similarities between species began to emerge as well as the mechanisms by which genomes are able to evolve over time.[citation needed
]

Prokaryotic and eukaryotic genomes

Prokaryotes

The principal forces of evolution in prokaryotes and their effects on archaeal and bacterial genomes. The horizontal line shows archaeal and bacterial genome size on a logarithmic scale (in megabase pairs) and the approximate corresponding number of genes (in parentheses).The effects of the main forces of prokaryotic genome evolution are denoted by triangles that are positioned, roughly, over the ranges of genome size for which the corresponding effects are thought to be most pronounced.

Prokaryotic genomes have two main mechanisms of evolution: mutation and horizontal gene transfer.[3] A third mechanism, sexual reproduction, is prominent in eukaryotes and also occurs in bacteria. Prokaryotes can acquire novel genetic material through the process of bacterial conjugation in which both plasmids and whole chromosomes can be passed between organisms. An often cited example of this process is the transfer of antibiotic resistance utilizing plasmid DNA.[4] Another mechanism of genome evolution is provided by transduction whereby bacteriophages introduce new DNA into a bacterial genome. The main mechanism of sexual interaction is natural genetic transformation which involves the transfer of DNA from one prokaryotic cell to another though the intervening medium. Transformation is a common mode of DNA transfer and at least 67 prokaryotic species are known to be competent for transformation.[5]

Genome evolution in bacteria is well understood because of the thousands of completely sequenced bacterial genomes available. Genetic changes may lead to both increases or decreases of genomic complexity due to adaptive genome streamlining and purifying selection.[6] In general, free-living bacteria have evolved larger genomes with more genes so they can adapt more easily to changing environmental conditions. By contrast, most parasitic bacteria have reduced genomes as their hosts supply many if not most nutrients, so that their genome does not need to encode for enzymes that produce these nutrients themselves.[7][page needed]

Characteristic E.coli genome Human genome
Genome Size (base pairs) 4.6 Mb 3.2 Gb
Genome Structure Circular Linear
Number of chromosomes 1 46
Presence of Plasmids Yes No
Presence of Histones No Yes
DNA segregated in the nucleus No Yes
Number of genes 4,288 20,000
Presence of Introns No* Yes
Average Gene Size 700 bp 27,000 bp
* E.coli largely contains only exons in genes. However, it does contain a small amount of self-splicing introns (Group II).[8]

Eukaryotes

Eukaryotic genomes are generally larger than that of the prokaryotes. While the E. coli genome is roughly 4.6Mb in length,

introns, which are largely not present in prokaryotes, are removed by RNA splicing before translation of the protein can occur. Eukaryotic genomes evolve over time through many mechanisms including sexual reproduction which introduces much greater genetic diversity to the offspring than the usual prokaryotic process of replication in which the offspring are theoretically genetic clones of the parental cell.[citation needed
]

Genome size

eukaryotic organisms, there is a paradox observed, namely that the number of genes that make up the genome does not correlate with genome size. In other words, the genome size is much larger than would be expected given the total number of protein coding genes.[12]

Genome size can increase by

deletions. A famous example for such gene decay is the genome of Mycobacterium leprae, the causative agent of leprosy. M. leprae has lost many once-functional genes over time due to the formation of pseudogenes.[13] This is evident in looking at its closest ancestor Mycobacterium tuberculosis.[14] M. leprae lives and replicates inside of a host and due to this arrangement it does not have a need for many of the genes it once carried which allowed it to live and prosper outside the host. Thus over time these genes have lost their function through mechanisms such as mutation causing them to become pseudogenes. It is beneficial to an organism to rid itself of non-essential genes because it makes replicating its DNA much faster and requires less energy.[15]

An example of increasing genome size over time is seen in filamentous plant pathogens. These plant pathogen genomes have been growing larger over the years due to repeat-driven expansion. The repeat-rich regions contain genes coding for host interaction proteins. With the addition of more and more repeats to these regions the plants increase the possibility of developing new virulence factors through mutation and other forms of genetic recombination. In this way it is beneficial for these plant pathogens to have larger genomes.[16]

Chromosomal evolution

Chromosome fusion, leading to a reduced number of chromosomes (here a fused human chromosome 2, with 2 separate chromosomes still present in chimpanzees and other apes).

The evolution of genomes can be impressively shown by the change of chromosome number and structure over time. For instance, the ancestral chromosomes corresponding to chimpanzee chromosomes 2A and 2B fused to produce human chromosome 2. Similarly, the chromosomes of more distantly related species show chromosomes that have been broken up into more parts over the course of evolution. This can be demonstrated by Fluorescence in situ hybridization.[17]

Mechanisms

Gene duplication

selective pressure under which genes normally exist. As a result, a large number of mutations may accumulate in the duplicate gene code. This may render the gene non-functional or in some cases confer some benefit to the organism.[18][19]

Whole genome duplication

Similar to gene duplication, whole genome duplication is the process by which an organism's entire genetic information is copied, once or multiple times which is known as polyploidy.[20] This may provide an evolutionary benefit to the organism by supplying it with multiple copies of a gene thus creating a greater possibility of functional and selectively favored genes. However, tests for enhanced rate and innovation in teleost fishes with duplicated genomes compared with their close relative holostean fishes (without duplicated genomes) found that there was little difference between them for the first 150 million years of their evolution.[21]

In 1997, Wolfe & Shields gave evidence for an ancient duplication of the Saccharomyces cerevisiae (

chromosomes. Based on these observations, they determined that Saccharomyces cerevisiae underwent a whole genome duplication soon after its evolutionary split from Kluyveromyces, a genus of ascomycetous yeasts. Over time, many of the duplicate genes were deleted and rendered non-functional. A number of chromosomal rearrangements broke the original duplicate chromosomes into the current manifestation of homologous chromosomal regions. This idea was further solidified in looking at the genome of yeast's close relative Ashbya gossypii.[23] Whole genome duplication is common in fungi as well as plant species. An example of extreme genome duplication is represented by the Common Cordgrass (Spartina anglica) which is a dodecaploid, meaning that it contains 12 sets of chromosomes,[24]
in stark contrast to the human diploid structure in which each individual has only two sets of 23 chromosomes.

Transposable elements

Alu sequence, which is present in the genome over one million times.[27]

Mutation

Spontaneous

transcription of the gene targeted by these regulatory elements. Mutations are constantly occurring in an organism's genome and can cause either a negative effect, positive effect or neutral effect (no effect at all).[30][31]

Pseudogenes

The proS loci in Mycobacterium leprae and M. tuberculosis, showing 3 pseudogenes (indicated by crosses) in M. leprae that still represent functional genes in M. tuberculosis. Homologous genes are indicated by identical colors and vertical, hatched bars. Modified after Cole et al. 2001.[28]

Often a result of spontaneous

nucleotides. This can result in a shift of reading frame, causing the gene to no longer code for the expected protein, introduce a premature stop codon or a mutation in the promoter region.[32]

Often cited examples of pseudogenes within the human genome include the once functional

olfactory gene families. Over time, many olfactory genes in the human genome became pseudogenes and were no longer able to produce functional proteins, explaining the poor sense of smell humans possess in comparison to their mammalian relatives.[33][34]

Similarly, bacterial pseudogenes commonly arise from adaptation of free-living bacteria to parasitic lifestyles, so that many metabolic genes become superfluous as these species become adapted to their host. Once a parasite obtains nutrients (such as amino acids or vitamins) from its host it has no need to produce these nutrients itself and often loses the genes to make them.[citation needed]

Exon shuffling

transposon mediated shuffling, sexual recombination or non-homologous recombination (also called illegitimate recombination). Exon shuffling may introduce new genes into the genome that can be either selected against and deleted or selectively favored and conserved.[35][36][37]

Genome reduction and gene loss

Many species exhibit genome reduction when subsets of their genes are not needed anymore. This typically happens when organisms adapt to a parasitic life style, e.g. when their nutrients are supplied by a host. As a consequence, they lose the genes needed to produce these nutrients. In many cases, there are both free living and parasitic species that can be compared and their lost genes identified. Good examples are the genomes of Mycobacterium tuberculosis and Mycobacterium leprae, the latter of which has a dramatically reduced genome (see figure under pseudogenes above).

Another beautiful example are

endosymbiont species. For instance, Polynucleobacter necessarius was first described as a cytoplasmic endosymbiont of the ciliate Euplotes aediculatus. The latter species dies soon after being cured of the endosymbiont. In the few cases in which P. necessarius is not present, a different and rarer bacterium apparently supplies the same function. No attempt to grow symbiotic P. necessarius outside their hosts has yet been successful, strongly suggesting that the relationship is obligate for both partners. Yet, closely related free-living relatives of P. necessarius have been identified. The endosymbionts have a significantly reduced genome when compared to their free-living relatives (1.56 Mbp vs. 2.16 Mbp).[38]

Speciation

Cichlids such as Tropheops tropheops from Lake Malawi provide models for genome evolution.

A major question of evolutionary biology is how genomes change to create new species. Speciation requires changes in behavior, morphology, physiology, or metabolism (or combinations thereof). The evolution of genomes during speciation has been studied only very recently with the availability of next-generation sequencing technologies. For instance, cichlid fish in African lakes differ both morphologically and in their behavior. The genomes of 5 species have revealed that both the sequences but also the expression pattern of many genes has quickly changed over a relatively short period of time (100,000 to several million years). Notably, 20% of duplicate gene pairs have gained a completely new tissue-specific expression pattern, indicating that these genes also obtained new functions. Given that gene expression is driven by short regulatory sequences, this demonstrates that relatively few mutations are required to drive speciation. The cichlid genomes also showed increased evolutionary rates in microRNAs which are involved in gene expression.[39][40]

Gene expression

Mutations can lead to changed gene function or, probably more often, to changed gene expression patterns. In fact, a study on 12 animal species provided strong evidence that tissue-specific gene expression was largely conserved between orthologs in different species. However, paralogs within the same species often have a different expression pattern. That is, after duplication of genes they often change their expression pattern, for instance by getting expressed in another tissue and thereby adopting new roles.[41]

Composition of nucleotides (GC content)

The genetic code is made up of sequences of four

CpG islands, areas of the genome where a cytosine nucleotide occurs next to a guanine nucleotide at a greater proportion. It has also been shown that a broad distribution of GC-content between species within a genus shows a more ancient ancestry. Since the species have had more time to evolve, their GC-content has diverged further apart.[citation needed
]

Evolving translation of genetic code

Amino acids are made up of three base long

codons and both Glycine and Alanine are characterized by codons with Guanine-Cytosine bonds at the first two codon base positions. This GC bond gives more stability to the DNA structure. It has been hypothesized that as the first organisms evolved in a high-heat and pressure environment they needed the stability of these GC bonds in their genetic code.[44]

De novo origin of genes

Novel genes can arise from

de novo origin of genes has been also shown in other organisms such as yeast,[47] rice[48] and humans.[49] For instance, Wu et al. (2011) reported 60 putative de novo human-specific genes all of which are short consisting of a single exon (except one).[50] In bacteria, 'grounded' prophages (i.e. integrated phage that cannot produce new phage) are buffer zones which would tolerate variations thereby increasing the probability of de novo gene formation.[51] These grounded prophages and other such genetic elements are sites where genes could be acquired through horizontal gene transfer
(HGT).

Origin of life and the first genomes

In order to understand how the genome arose, knowledge is required of the chemical pathways that permit formation of the key building blocks of the genome under plausible prebiotic conditions. According to the RNA world hypothesis free-floating ribonucleotides were present in the primitive soup. These were the fundamental molecules that combined in series to form the original RNA genome. Molecules as complex as RNA must have arisen from small molecules whose reactivity was governed by physico-chemical processes. RNA is composed of purine and pyrimidine nucleotides, both of which are necessary for reliable information transfer, and thus Darwinian natural selection and evolution. Nam et al.[52] demonstrated the direct condensation of nucleobases with ribose to give ribonucleosides in aqueous microdroplets, a key step leading to formation of the RNA genome. Also, a plausible prebiotic process for synthesizing pyrimidine and purine ribonucleotides leading to genome formation using wet-dry cycles was presented by Becker et al.[53]

See also

References

  1. S2CID 4289674
    .
  2. .
  3. .
  4. .
  5. .
  6. .
  7. .
  8. .
  9. .
  10. .
  11. .
  12. .
  13. .
  14. .
  15. .
  16. .
  17. .
  18. .
  19. .
  20. .
  21. .
  22. .
  23. .
  24. .
  25. .
  26. .
  27. .
  28. ^ .
  29. .
  30. .
  31. .
  32. .
  33. .
  34. .
  35. .
  36. .
  37. .
  38. .
  39. PMID 25186727.{{cite journal}}: CS1 maint: DOI inactive as of April 2024 (link
    )
  40. .
  41. .
  42. .
  43. .
  44. .
  45. .
  46. .
  47. .
  48. .
  49. .
  50. .
  51. .
  52. ^ Nam I, Nam HG, Zare RN. Abiotic synthesis of purine and pyrimidine ribonucleosides in aqueous microdroplets. Proc Natl Acad Sci U S A 2018 Jan 2;115(1):36-40. doi: 10.1073/pnas.1718559115. Epub 2017 Dec 18. PMID 29255025; PMCID: PMC5776833
  53. ^ Becker S, Feldmann J, Wiedemann S, Okamura H, Schneider C, Iwan K, Crisp A, Rossa M, Amatov T, Carell T. Unified prebiotically plausible synthesis of pyrimidine and purine RNA ribonucleotides. Science. 2019 Oct 4;366(6461):76-82. doi: 10.1126/science.aax2747. PMID 31604305.