Pseudogene

Pseudogenes are nonfunctional segments of

mRNA transcript. Pseudogenes are usually identified when genome sequence analysis finds gene-like sequences that lack regulatory sequences needed for transcription or translation, or whose coding sequences are obviously defective due to frameshifts or premature stop codons. Pseudogenes are a type of junk DNA

.

Most non-bacterial genomes contain many pseudogenes, often as many as functional genes. This is not surprising, since various biological processes are expected to accidentally create pseudogenes, and there are no specialized mechanisms to remove them from genomes. Eventually pseudogenes may be deleted from their genomes by chance of DNA replication or DNA repair errors, or they may accumulate so many mutational changes that they are no longer recognizable as former genes. Analysis of these degeneration events helps clarify the effects of non-selective processes in genomes.

Pseudogene sequences may be transcribed into RNA at low levels, due to promoter elements inherited from the ancestral gene or arising by new mutations. Although most of these transcripts will have no more functional significance than chance transcripts from other parts of the genome, some have given rise to beneficial regulatory RNAs and new proteins.

Properties

Pseudogenes are usually characterized by a combination of similarity to a known gene and loss of some functionality. That is, although every pseudogene has a DNA sequence that is similar to some functional gene, they are usually unable to produce functional final protein products.^[1] Pseudogenes are sometimes difficult to identify and characterize in genomes, because the two requirements of similarity and loss of functionality are usually implied through sequence alignments rather than biologically proven.

Homology is implied by sequence similarity between the DNA sequences of the pseudogene and a known gene. After aligning the two sequences, the percentage of identical base pairs is computed. A high sequence identity means that it is highly likely that these two sequences diverged from a common ancestral sequence (are homologous), and highly unlikely that these two sequences have evolved independently (see Convergent evolution).
Nonfunctionality can manifest itself in many ways. Normally, a gene must go through several steps to a fully functional protein:
pre-mRNA processing, translation, and protein folding are all required parts of this process. If any of these steps fails, then the sequence may be considered nonfunctional. In high-throughput pseudogene identification, the most commonly identified disablements are premature stop codons and frameshifts
, which almost universally prevent the translation of a functional protein product.

Pseudogenes for RNA genes are usually more difficult to discover as they do not need to be translated and thus do not have "reading frames". A number of rRNA pseudogenes have been identified on the basis of changes in rDNA array ends.^[2]

Pseudogenes can complicate molecular genetic studies. For example, amplification of a gene by PCR may simultaneously amplify a pseudogene that shares similar sequences. This is known as PCR bias or amplification bias. Similarly, pseudogenes are sometimes annotated as genes in genome sequences.

Processed pseudogenes often pose a problem for gene prediction programs, often being misidentified as real genes or exons. It has been proposed that the identification of processed pseudogenes can help improve the accuracy of gene prediction methods.^[3]

In 2014, 140 human pseudogenes have been shown to be translated.^[4] However, the function, if any, of the protein products is unknown.

Types and origin

There are four main types of pseudogenes, all with distinct mechanisms of origin and characteristic features. The classifications of pseudogenes are as follows:

Processed

In higher

cDNAs. However, because they are derived from an RNA product, processed pseudogenes also lack the upstream promoters of normal genes; thus, they are considered "dead on arrival", becoming non-functional pseudogenes immediately upon the retrotransposition event.^[10] However, these insertions occasionally contribute exons to existing genes, usually via alternatively spliced transcripts.^[11] A further characteristic of processed pseudogenes is common truncation of the 5' end relative to the parent sequence, which is a result of the relatively non-processive retrotransposition mechanism that creates processed pseudogenes.^[12] Processed pseudogenes are continually being created in primates.^[13] Human populations, for example, have distinct sets of processed pseudogenes across its individuals.^[14]

It has been shown that processed pseudogenes accumulate mutations faster than non-processed pseudogenes.[15]

Non-processed (duplicated)

genome analysis.^[17]^[20] According to evolutionary context, these pseudogenes will either be deleted or become so distinct from the parental genes so that they will no longer be identifiable. Relatively young pseudogenes can be recognized due to their sequence similarity.^[21]

Unitary pseudogenes

Various mutations (such as

ascorbic acid (vitamin C), but it exists as a disabled gene (GULOP) in humans and other primates.^[22]^[23] Another more recent example of a disabled gene links the deactivation of the caspase 12 gene (through a nonsense mutation) to positive selection in humans.^[24]

Examples of pseudogene function

While the vast majority of pseudogenes have lost their function, some cases have emerged in which a pseudogene either re-gained its original or a similar function or evolved a new function. In the human genome, a number of examples have been identified that were originally classified as pseudogenes but later discovered to have a functional, although not necessarily protein-coding, role.^[25]^[26]

Examples include the following:

Protein-coding: "pseudo-pseudogenes"

The rapid proliferation of DNA sequencing technologies has led to the identification of many apparent pseudogenes using gene prediction techniques. Pseudogenes are often identified by the appearance of a premature stop codon in a predicted mRNA sequence, which would, in theory, prevent synthesis (translation) of the normal protein product of the original gene. There have been some reports of translational readthrough of such premature stop codons in mammals. As alluded to in the figure above, a small amount of the protein product of such readthrough may still be recognizable and function at some level. If so, the pseudogene can be subject to natural selection. That appears to have happened during the evolution of Drosophila species.

In 2016 it was reported that four predicted pseudogenes in multiple Drosophila species actually encode proteins with biologically important functions,

neurons. This finding of tissue-specific biologically-functional genes that could have been classified as pseudogenes by in silico analysis complicates the analysis of sequence data.^[27] Another Drosophilia pseudo-pseudogene is jingwei,^[28]^[29] which encodes a functional alcohol dehydrogenase enzyme in vivo.^[30]

As of 2012, it appeared that there are approximately 12,000–14,000 pseudogenes in the human genome.[31] A 2016 proteogenomics analysis using mass spectrometry of peptides identified at least 19,262 human proteins produced from 16,271 genes or clusters of genes, with 8 new protein-coding genes identified that were previously considered pseudogenes.^[32] An earlier analysis found that human PGAM4 (phosphoglycerate mutase),^[33] previously thought to be a pseudogene, is not only functional, but also causes infertility if mutated.^[34]^[35]

A number of pseudo-pseudogenes were also found in prokaryotes, where some stop codon substitutions in essential genes appear to be retained, even positively selected for.^[36]^[37]

Non-protein-coding

siRNAs. Some endogenous

siRNAs appear to be derived from pseudogenes, and thus some pseudogenes play a role in regulating protein-coding transcripts, as reviewed.^[38] One of the many examples is psiPPM1K. Processing of RNAs transcribed from psiPPM1K yield siRNAs that can act to suppress the most common type of liver cancer, hepatocellular carcinoma.^[39] This and much other research has led to considerable excitement about the possibility of targeting pseudogenes with/as therapeutic agents^[40]

piRNAs. Some

ceRNA

.

PTEN. The

micro RNAs due to its similarity to the PTEN gene, and overexpression of the 3' UTR resulted in an increase of PTEN protein level.^[45]

That is, overexpression of the PTENP1 3' UTR leads to increased regulation and suppression of cancerous tumors. The biology of this system is basically the inverse of the BRAF system described above.

Potogenes. Pseudogenes can, over evolutionary time scales, participate in gene conversion and other mutational events that may give rise to new or newly functional genes. This has led to the concept that pseudogenes could be viewed as potogenes: potential genes for evolutionary diversification.^[46]

Bacterial pseudogenes

Pseudogenes are found in

obligate intracellular parasites. Thus, they do not require many genes that are needed by free-living bacteria, such as gene associated with metabolism and DNA repair. However, there is not an order to which functional genes are lost first. For example, the oldest pseudogenes in Mycobacterium leprae are in RNA polymerases and the biosynthesis of secondary metabolites while the oldest ones in Shigella flexneri and Shigella typhi are in DNA replication, recombination, and repair.^[48]

Since most bacteria that carry pseudogenes are either symbionts or obligate intracellular parasites, genome size eventually reduces. An extreme example is the genome of Mycobacterium leprae, an obligate parasite and the causative agent of leprosy. It has been reported to have 1,133 pseudogenes which give rise to approximately 50% of its transcriptome.^[48] The effect of pseudogenes and genome reduction can be further seen when compared to Mycobacterium marinum, a pathogen from the same family. Mycobacteirum marinum has a larger genome compared to Mycobacterium leprae because it can survive outside the host; therefore, the genome must contain the genes needed to do so.^[49]

Although genome reduction focuses on what genes are not needed by getting rid of pseudogenes, selective pressures from the host can sway what is kept. In the case of a symbiont from the Verrucomicrobiota phylum, there are seven additional copies of the gene coding the mandelalide pathway.^[50] The host, species from Lissoclinum, use mandelalides as part of its defense mechanism.^[50]

The relationship between epistasis and the domino theory of gene loss was observed in Buchnera aphidicola. The domino theory suggests that if one gene of a cellular process becomes inactivated, then selection in other genes involved relaxes, leading to gene loss.^[48] When comparing Buchnera aphidicola and Escherichia coli, it was found that positive epistasis furthers gene loss while negative epistasis hinders it.

References

S2CID 42204036
.

PMID 28204512
.

PMID 16651666
.

PMID 24870542
.

^ Max EE (1986). "Plagiarized Errors and Molecular Genetics". Creation Evolution Journal. 6 (3): 34–46.

^ Chandrasekaran C, Betrán E (2008). "Origins of new genes and pseudogenes". Nature Education. 1 (1): 181.

PMID 15531153
.

S2CID 25083962
.

S2CID 32151696
.

S2CID 22437436
.

PMID 18842134
.

PMID 12468100
.

PMID 26224704
.

PMID 23359205
.

PMID 17568002
.

^ Max EE (2003-05-05). "Plagiarized Errors and Molecular Genetics". TalkOrigins Archive. Retrieved 2008-07-22.

^
PMID 11073452
.

PMID 7705642
.

PMID 11779815
.

PMID 11827946
.

doi:10.1016/S0169-5347(03)00033-8
.

PMID 1400507
.

PMID 8175804
.

PMID 16532395
.

S2CID 209393216
.

PMID 32421357
.

^
PMID 27776356
.

S2CID 1665885
.

PMID 10958846
.

PMID 7682012
.

PMID 22951037
.

PMID 27250503
.

PMID 9370262
.

PMID 11961099
.

PMID 22590500
.

PMID 33672790
.

PMID 35489061
.

PMID 24823781
.

PMID 23376929
.

PMID 24279857
.

PMID 24178556
.

S2CID 5710813
.

PMID 25843629
.

PMID 9620558
.

PMID 20577206
.

PMID 14616058
.

PMID 25461580
.

^
PMID 16237210
.

PMID 28854187
.

^
PMID 29181447
.

S2CID 4307207
.

Further reading

Gerstein M, Zheng D (August 2006). "The real life of pseudogenes". Scientific American. 295 (2): 48–55.
PMID 16866288
.

Torrents D, Suyama M, Zdobnov E, Bork P (December 2003). "A genome-wide survey of human pseudogenes". Genome Research. 13 (12): 2559–2567.
PMID 14656963
.

Bischof JM, Chiang AP, Scheetz TE, Stone EM, Casavant TL, Sheffield VC, Braun TA (June 2006). "Genome-wide identification of pseudogenes capable of disease-causing gene conversion". Human Mutation. 27 (6): 545–552.
S2CID 20219423
.

Syberg-Olsen MJ, Garber AI, Keeling PJ, McCutcheon JP, Husnik F (July 2022). "Pseudofinder: Detection of Pseudogenes in Prokaryotic Genomes". Molecular Biology and Evolution. 39 (7).
PMID 35801562
.

External links

Pseudogene interaction database, miRNA-pseudogene and protein-pseudogene interaction maps database

Yale University pseudogene database

Hoppsigen database (homologous processed pseudogenes)

RCPedia - Processed Pseudogene database

v
t
e
Genetics: repeated sequence, transposon, gene duplication
Repeatome
Repeated sequence
Tandem repeats

Satellite DNA

Variable number tandem repeat/Minisatellite

Short tandem repeat/Microsatellite (Trinucleotide repeat disorders)

Macrosatellite

Interspersed
repeat

Gene conversion

Retrotransposon

DNA transposon
Polinton

Helitron

Other

Inverted repeat

Direct repeat

Transposon
Retrotransposon
SINEs

Alu sequence

MIR

LINEs

LINE1

LINE2

LTRs

HERV

MER4

retroposon

DNA transposon

Academ

Crypton

Dada

EnSpm/CACTA

Ginger1

Ginger2

Harbinger

hAT

Helitron

IS3EU

ISL2EU

Kolobok

Tc1/mariner

Merlin

MuDR

Novosib

P element

PiggyBac

Polinton

Sola

Transib

Zator

Zisupton

Gene duplication

Gene amplification

Tandemly arrayed genes
Ribosomal DNA

Gene family
Gene cluster

Pseudogene

See also

Genomic island
Pathogenicity island

Symbiosis island

Low copy repeats

CRISPR

Telomere

Protein tandem repeats

Retrieved from "https://en.wikipedia.org/w/index.php?title=Pseudogene&oldid=1220182138"

[Mighell_2000-1] S2CID 42204036
.

[2] PMID 28204512
.

[Van_Baren_Brent_2006-3] PMID 16651666
.

[4] PMID 24870542
.

[Max_1986-5] Max EE (1986). "Plagiarized Errors and Molecular Genetics". Creation Evolution Journal. 6 (3): 34–46.

[Chandrasekaran_2008-6] Chandrasekaran C, Betrán E (2008). "Origins of new genes and pseudogenes". Nature Education. 1 (1): 181.

[Jurka_2004-7] PMID 15531153
.

[Dewannieux_2005-8] S2CID 25083962
.

[Dewannieux_2003-9] S2CID 32151696
.

[Graur_1989-10] S2CID 22437436
.

[Baertsch_2008-11] PMID 18842134
.

[Pavlicek_2002-12] PMID 12468100
.

[13] PMID 26224704
.

[14] PMID 23359205
.

[Zheng-15] PMID 17568002
.

[Max-16] Max EE (2003-05-05). "Plagiarized Errors and Molecular Genetics". TalkOrigins Archive. Retrieved 2008-07-22.

[Lynch_Conery_2000-17] 
PMID 11073452
.

[pmid7705642-18] PMID 7705642
.

[pmid11779815-19] PMID 11779815
.

[Harrison_2002-20] PMID 11827946
.

[Zhang_2003-21] :10.1016/S0169-5347(03)00033-8
.

[Nishikimi_1992-22] PMID 1400507
.

[Nishikimi_1994-23] PMID 8175804
.

[Xue_2006-24] PMID 16532395
.

[cheetham_2020-25] S2CID 209393216
.

[zerbino_2020-26] PMID 32421357
.

[Prieto-Godino_2016-27] 
PMID 27776356
.

[28] S2CID 1665885
.

[29] PMID 10958846
.

[pmid7682012-30] PMID 7682012
.

[Pei_2012-31] PMID 22951037
.

[Wright_2016-32] PMID 27250503
.

[33] PMID 9370262
.

[Betrán_2002-34] PMID 11961099
.

[35] PMID 22590500
.

[36] PMID 33672790
.

[37] PMID 35489061
.

[38] PMID 24823781
.

[39] PMID 23376929
.

[40] PMID 24279857
.

[41] PMID 24178556
.

[42] S2CID 5710813
.

[43] PMID 25843629
.

[44] PMID 9620558
.

[45] PMID 20577206
.

[46] PMID 14616058
.

[47] PMID 25461580
.

[:123-48] 
PMID 16237210
.

[49] PMID 28854187
.

[:13-50] 
PMID 29181447
.

[51] S2CID 4307207
.

[1]

[2]

[3]

[4]

[5]

[6]

[10]

[11]

[12]

[13]

[14]

[17]

[20]

[21]

[22]

[23]

[24]

[25]

[26]

[27]

[28]

[29]

[30]

[32]

[33]

[34]

[35]

[36]

[37]

[38]

[39]

[40]

[45]

[46]

[48]

[49]

[50]

[51]