Overlapping gene
An overlapping gene (or OLG)[1][2] is a gene whose expressible nucleotide sequence partially overlaps with the expressible nucleotide sequence of another gene.[3] In this way, a nucleotide sequence may make a contribution to the function of one or more gene products. Overlapping genes are present in and a fundamental feature of both cellular and viral genomes.[2] The current definition of an overlapping gene varies significantly between eukaryotes, prokaryotes, and viruses.[2] In prokaryotes and viruses overlap must be between coding sequences but not mRNA transcripts, and is defined when these coding sequences share a nucleotide on either the same or opposite strands. In eukaryotes, gene overlap is almost always defined as mRNA transcript overlap. Specifically, a gene overlap in eukaryotes is defined when at least one nucleotide is shared between the boundaries of the primary mRNA transcripts of two or more genes, such that a DNA base mutation at any point of the overlapping region would affect the transcripts of all genes involved. This definition includes 5′ and 3′ untranslated regions (UTRs) along with introns.
Overprinting refers to a type of overlap in which all or part of the sequence of one gene is read in an alternate
Classification
Genes may overlap in a variety of ways and can be classified by their positions relative to each other.[3][11][12][13][14]
- Unidirectional or tandem overlap: the 5'end of another gene on the same strand. This arrangement can be symbolized with the notation → → where arrows indicate the reading frame from start to end.
- Convergent or end-on overlap: the 3'ends of the two genes overlap on opposite strands. This can be written as → ←.
- Divergent or tail-on overlap: the 5'ends of the two genes overlap on opposite strands. This can be written as ← →.
Overlapping genes can also be classified by phases, which describe their relative reading frames:[3][11][12][13][14]
- In-phase overlap occurs when the shared sequences use the same reading frame. This is also known as "phase 0". Unidirectional genes with phase 0 overlap are not considered distinct genes, but rather as alternative start sitesof the same gene.
- Out-of-phase overlaps occurs when the shared sequences use different reading frames. This can occur in "phase 1" or "phase 2", depending on whether the reading frames are offset by 1 or 2 nucleotides. Because a codonis three nucleotides long, an offset of three nucleotides is an in-phase, phase 0 frame.
Studies on overlapping genes suggest that their evolution can be summarized in two possible models.[4] In one model, the two proteins encoded by their respective overlapping genes evolve under similar selection pressures. The proteins and the overlap region are highly conserved when strong selection against amino acid change is favored. Overlapping genes are reasoned to evolve under strict constraints as a single nucleotide substitution is able to alter the structure and function of the two proteins simultaneously. A study on the hepatitis B virus (HBV), whose DNA genome contains numerous overlapping genes, showed the mean number of synonymous nucleotide substitutions per site in overlapping coding regions was significantly lower than that of non-overlapping regions.[15] The same study showed that it was possible for some of these overlapping regions and their proteins to diverge significantly from the original when there's weak selection against amino acid change. The spacer domain of the polymerase and the pre-S1 region of a surface protein of HBV, for example, had a percentage of conserved amino acids of 30% and 40%, respectively.[15] However, these overlap regions are known to be less important for replication compared to the overlap regions that were highly conserved among different HBV strains, which are absolutely essential for the process.
The second model suggests that the two proteins and their respective overlap genes evolve under opposite selection pressures: one frame experiences positive selection while the other is under purifying selection. In tombusviruses, the proteins p19 and p22 are encoded by overlapping genes that form a 549 nt coding region, and p19 is shown to be under positive selection while p22 is under purifying selection.[16] Additional examples are mentioned in studies involving overlapping genes of the Sendai virus,[17] potato leafroll virus,[18] and human parvovirus B19.[19] This phenomenon of overlapping genes experiencing different selection pressures is suggested to be a consequence of a high rate of nucleotide substitution with different effects on the two frames; the substitutions may be majorly non-synonymous for one frame while mostly being synonymous for the other frame.[4]
Evolution
Overlapping genes are particularly common in rapidly evolving genomes, such as those of
- By extension of an existing open reading frame (ORF) downstream into a contiguous gene due to the loss of a stop codon;
- By extension of an existing ORF upstream into a contiguous gene due to loss of an initiation codon;
- By generation of a novel ORF within an existing one due to a point mutation.
The use of the same nucleotide sequence to encode multiple genes may provide
Origins of new genes
In 1977,
Taxonomic distribution
Overlapping genes occur in all
Viruses
The existence of overlapping genes was first identified in the virus
The proportion of viruses with overlapping coding sequences within their genomes varies.[2] Double-stranded RNA viruses have fewer than a quarter that contains them while almost three-quarters of retroviridae and viruses with single-stranded DNA genomes contain overlapping coding sequences.[37] Segmented viruses in particular, or viruses with their genome split into separate pieces and packaged either all in the same capsid or in separate capsids, are more likely to contain an overlapping sequence than non-segmented viruses.[37] RNA viruses have fewer overlapping genes than DNA viruses which possess lower mutation rates and less restrictive genome sizes.[37][38] The lower mutation rate of DNA viruses facilitates greater genomic novelty and evolutionary exploration within a structurally constrained genome and may be the primary driver of the evolution of overlapping genes.[39][40]
Studies of overprinted viral genes suggest that their protein products tend to be accessory proteins which are not
Prokaryotes
Estimates of gene overlap in
Eukaryotes
Compared to prokaryotic genomes, eukaryotic genomes are often poorly annotated and thus identifying genuine overlaps is relatively challenging.[28] However, examples of validated gene overlaps have been documented in a variety of eukaryotic organisms, including mammals such as mice and humans.[49][50][51][52] Eukaryotes differ from prokaryotes in distribution of overlap types: while unidirectional (i.e., same-strand) overlaps are most common in prokaryotes, opposite or antiparallel-strand overlaps are more common in eukaryotes. Among the opposite-strand overlaps, convergent orientation is most common.[50] Most studies of eukaryotic gene overlap have found that overlapping genes are extensively subject to genomic reorganization even in closely related species, and thus the presence of an overlap is not always well-conserved.[51][53] Overlap with older or less taxonomically restricted genes is also a common feature of genes likely to have originated de novo in a given eukaryotic lineage.[51][54][55]
Function
The precise functions of overlapping genes seems to vary across the domains of life but several experiments have shown that they are important for virus lifecycles through proper protein expression and stoichiometry
The retention and evolution of overlapping genes within viruses may also be due to capsid size limitations.[59] Dramatic viability loss was observed in viruses with genomes engineered to be longer than the wild-type genome.[60] Increasing the single-stranded DNA genome length of ΦX174 by >1% results in almost complete loss of infectivity, believed to be the result of the strict physical constraints imposed by the finite capsid volume.[61] Studies on adeno-associated viruses as gene delivery vectors showed that viral packaging is constrained by genetic cargo size limits, requiring the use of multiple vectors to deliver large human genes such as CFTR81.[62][63] Therefore, it is suggested that overlapping genes evolved as a means to overcome these physical constraints, increasing genetic diversity by utilizing only the existing sequence rather than increasing genome length.
Methods in identifying overlapping genes and ORFs
Standardized methods such as genome annotation may be inappropriate for the detection of overlapping genes as they are reliant on already curated genes while overlapping genes are generally overlooked contain atypical sequence composition.[2][64][65][66] Genome annotation standards are also often biased against feature overlaps, such as genes entirely contained within another gene.[67] Furthermore, some bioinformatics pipelines such as the RAST pipeline markedly penalizes overlaps between predicted ORFs.[68] However, rapid advancement of genome-scale protein and RNA measurement tools along with increasingly advanced prediction algorithms have revealed an avalanche of overlapping genes and ORFs within numerous genomes.[2] Proteogenomic methods have been essential in discovering numerous overlapping genes and include a combination of techniques such as bottom-up proteomics, ribosome profiling, DNA sequencing, and perturbation. RNA sequencing is also used to identify genomic regions containing overlapping transcripts. It has been utilized to identify 180,000 alternate ORFs within previously annotated coding regions found in humans.[69] Newly discovered ORFs such as these are verified using a variety of reverse genetics techniques, such as CRISPR-Cas9 and catalytically dead Cas9 (dCas9) disruption.[70][71][72] Attempts at proof-by-synthesis are also performed to show beyond doubt the absence of any undiscovered overlapping genes.[73]
See also
References
- ^ PMID 33001029.
- ^ PMID 34611352.
- ^ PMID 10101192.
- ^ PMID 34073395.
- PMID 6198955.
- ^ PMID 1329098.
- PMID 1329098.
- ISBN 978-0-521-45533-6, retrieved 3 December 2021
- ^ PMID 23966842.
- S2CID 4355527.
- ^ PMID 14659892.
- ^ PMID 15520290.
- ^ PMID 6198955.
- ^ PMID 12047938.
- ^ S2CID 22644652.
- PMID 26871901.
- S2CID 12869504.
- PMID 12075102.
- PMID 27775080.
- S2CID 8818055.
- PMID 18226237.
- PMID 26853049.
- ^ PMID 24312259.
- PMID 25552532.
- ISBN 978-1-4832-7409-6.
- PMID 6585807.
- ^ PMID 30026186.
- ^ PMID 22821011.
- ^ S2CID 4206886.
- ^ PMID 14661029.
- ^ S2CID 4264796.
- S2CID 4175651.
- PMID 31666371.
- ^ Dockrill P (11 November 2020). "Scientists Just Found a Mysteriously Hidden 'Gene Within a Gene' in SARS-CoV-2". ScienceAlert. Retrieved 11 November 2020.
- PMID 20610432.
- PMID 27209091.
- ^ PMID 32071766.
- PMID 20610432.
- PMID 27209091.
- PMID 32452417.
- PMID 19640978.
- PMID 26296474.
- S2CID 12993441.
- PMID 25159814.
- ^ S2CID 21612308.
- PMID 24192837.
- PMID 18627618.
- PMID 26677845.
- PMID 26323763.
- ^ PMID 18410680.
- ^ PMID 17939861.
- PMID 14762064.
- PMID 23777277.
- PMID 23185269.
- PMID 19726446.
- S2CID 222300240.
- PMID 19063901.
- PMID 23079106.
- PMID 11818563.
- PMID 841861.
- S2CID 32443408.
- PMID 19904234.
- S2CID 232761334.
- PMID 30026186.
- PMID 30339683.
- PMID 23966842.
- )
- )
- )
- )
- PMID 33510483.
- PMID 33479206.
- PMID 31719208.