Gene duplication

Gene duplication (or chromosomal duplication or gene amplification) is a major mechanism through which new genetic material is generated during

replication slippage.^[1]

Mechanisms of duplication

Ectopic recombination

Duplications arise from an event termed

transposable elements offer one source of repetitive DNA that can facilitate recombination, and they are often found at duplication breakpoints in plants and mammals.^[2]

Schematic of a region of a chromosome before and after a duplication event

Replication slippage

Replication slippage is an error in DNA replication that can produce duplications of short genetic sequences. During replication DNA polymerase begins to copy the DNA. At some point during the replication process, the polymerase dissociates from the DNA and replication stalls. When the polymerase reattaches to the DNA strand, it aligns the replicating strand to an incorrect position and incidentally copies the same section more than once. Replication slippage is also often facilitated by repetitive sequences, but requires only a few bases of similarity.^{[citation needed}

]

Retrotransposition

Retrotransposons, mainly L1, can occasionally act on cellular mRNA. Transcripts are reverse transcribed to DNA and inserted into random place in the genome, creating retrogenes. Resulting sequence usually lack introns and often contain poly(A) sequences that are also integrated into the genome. Many retrogenes display changes in gene regulation in comparison to their parental gene sequences, which sometimes results in novel functions. Retrogenes can move between different chromosomes to shape chromosomal evolution.^[3]

Aneuploidy

Aneuploidy occurs when nondisjunction at a single chromosome results in an abnormal number of chromosomes. Aneuploidy is often harmful and in mammals regularly leads to spontaneous abortions (miscarriages). Some aneuploid individuals are viable, for example trisomy 21 in humans, which leads to Down syndrome. Aneuploidy often alters gene dosage in ways that are detrimental to the organism; therefore, it is unlikely to spread through populations.

Polyploidy

Polyploidy, or whole genome duplication is a product of nondisjunction during meiosis which results in additional copies of the entire genome. Polyploidy is common in plants, but it has also occurred in animals, with two rounds of whole genome duplication (2R event) in the vertebrate lineage leading to humans.^[4] It has also occurred in the hemiascomycete yeasts ~100 mya.^[5]^[6]

After a whole genome duplication, there is a relatively short period of genome instability, extensive gene loss, elevated levels of nucleotide substitution and regulatory network rewiring.^[7]^[8] In addition, gene dosage effects play a significant role.^[9] Thus, most duplicates are lost within a short period, however, a considerable fraction of duplicates survive.^[10] Interestingly, genes involved in regulation are preferentially retained.^[11]^[12] Furthermore, retention of regulatory genes, most notably the Hox genes, has led to adaptive innovation.

Rapid evolution and functional divergence have been observed at the level of the transcription of duplicated genes, usually by point mutations in short transcription factor binding motifs.^[13]^[14] Furthermore, rapid evolution of protein phosphorylation motifs, usually embedded within rapidly evolving intrinsically disordered regions is another contributing factor for survival and rapid adaptation/neofunctionalization of duplicate genes.^[15] Thus, a link seems to exist between gene regulation (at least at the post-translational level) and genome evolution.^[15]

Polyploidy is also a well known source of speciation, as offspring, which have different numbers of chromosomes compared to parent species, are often unable to interbreed with non-polyploid organisms. Whole genome duplications are thought to be less detrimental than aneuploidy as the relative dosage of individual genes should be the same.

As an evolutionary event

Rate of gene duplication

Comparisons of genomes demonstrate that gene duplications are common in most species investigated. This is indicated by variable copy numbers (copy number variation) in the genome of humans^[16]^[17] or fruit flies.^[18] However, it has been difficult to measure the rate at which such duplications occur. Recent studies yielded a first direct estimate of the genome-wide rate of gene duplication in C. elegans, the first multicellular eukaryote for which such as estimate became available. The gene duplication rate in C. elegans is on the order of 10⁻⁷ duplications/gene/generation, that is, in a population of 10 million worms, one will have a gene duplication per generation. This rate is two orders of magnitude greater than the spontaneous rate of point mutation per nucleotide site in this species.^[19] Older (indirect) studies reported locus-specific duplication rates in bacteria, Drosophila, and humans ranging from 10⁻³ to 10⁻⁷/gene/generation.^[20]^[21]^[22]

Neofunctionalization

Gene duplications are an essential source of genetic novelty that can lead to evolutionary innovation. Duplication creates genetic redundancy, where the second copy of the gene is often free from

selective pressure—that is, mutations of it have no deleterious effects to its host organism. If one copy of a gene experiences a mutation that affects its original function, the second copy can serve as a 'spare part' and continue to function correctly. Thus, duplicate genes accumulate mutations faster than a functional single-copy gene, over generations of organisms, and it is possible for one of the two copies to develop a new and different function. Some examples of such neofunctionalization is the apparent mutation of a duplicated digestive gene in a family of ice fish into an antifreeze gene and duplication leading to a novel snake venom gene^[23] and the synthesis of 1 beta-hydroxytestosterone in pigs.^[24]

Gene duplication is believed to play a major role in evolution; this stance has been held by members of the scientific community for over 100 years.^[25] Susumu Ohno was one of the most famous developers of this theory in his classic book Evolution by gene duplication (1970).^[26] Ohno argued that gene duplication is the most important evolutionary force since the emergence of the universal common ancestor.^[27] Major

polyploid

), meaning that it has six copies of its genome.

Subfunctionalization

Another possible fate for duplicate genes is that both copies are equally free to accumulate degenerative mutations, so long as any defects are complemented by the other copy. This leads to a neutral "subfunctionalization" (a process of constructive neutral evolution) or DDC (duplication-degeneration-complementation) model,^[29]^[30] in which the functionality of the original gene is distributed among the two copies. Neither gene can be lost, as both now perform important non-redundant functions, but ultimately neither is able to achieve novel functionality.

Subfunctionalization can occur through neutral processes in which mutations accumulate with no detrimental or beneficial effects. However, in some cases subfunctionalization can occur with clear adaptive benefits. If an ancestral gene is pleiotropic and performs two functions, often neither one of these two functions can be changed without affecting the other function. In this way, partitioning the ancestral functions into two separate genes can allow for adaptive specialization of subfunctions, thereby providing an adaptive benefit.^[31]

Loss

Often the resulting genomic variation leads to gene dosage dependent neurological disorders such as Rett-like syndrome and Pelizaeus–Merzbacher disease.^[32] Such detrimental mutations are likely to be lost from the population and will not be preserved or develop novel functions. However, many duplications are, in fact, not detrimental or beneficial, and these neutral sequences may be lost or may spread through the population through random fluctuations via genetic drift.

Identifying duplications in sequenced genomes

Criteria and single genome scans

The two genes that exist after a gene duplication event are called

orthologous genes present in different species which are each originally derived from the same ancestral sequence. (See Homology of sequences in genetics

).

It is important (but often difficult) to differentiate between paralogs and orthologs in biological research. Experiments on human gene function can often be carried out on other

transposable elements that causes significant variation between them in their sequence and finally may become responsible for divergent evolution. This may also render the chances and the rate of gene conversion

between the homologs of gene duplicates due to less or no similarity in their sequences.

Paralogs can be identified in single genomes through a sequence comparison of all annotated gene models to one another. Such a comparison can be performed on translated amino acid sequences (e.g. BLASTp, tBLASTx) to identify ancient duplications or on DNA nucleotide sequences (e.g. BLASTn, megablast) to identify more recent duplications. Most studies to identify gene duplications require reciprocal-best-hits or fuzzy reciprocal-best-hits, where each paralog must be the other's single best match in a sequence comparison.[33]

Most gene duplications exist as

subtelomeric and interstitial

regions of a chromosome. Many LCRs, due to their size (>1Kb), similarity, and orientation, are highly susceptible to duplications and deletions.

Genomic microarrays detect duplications

Technologies such as genomic

gene regulation after gene duplication or speciation.^[34]^[35]

Next generation sequencing

Gene duplications can also be identified through the use of next-generation sequencing platforms. The simplest means to identify duplications in genomic resequencing data is through the use of paired-end sequencing reads. Tandem duplications are indicated by sequencing read pairs which map in abnormal orientations. Through a combination of increased sequence coverage and abnormal mapping orientation, it is possible to identify duplications in genomic sequencing data.

Nomenclature

autosomal chromosome pairs, both the female (XX) and male (XY) versions of the two sex chromosomes, as well as the mitochondrial genome (at bottom left).

The

human chromosome nomenclature, which includes band names, symbols and abbreviated terms used in the description of human chromosome and chromosome abnormalities. Abbreviations include dup for duplications of parts of a chromosome.^[36] For example, dup(17p12) causes Charcot–Marie–Tooth disease type 1A.^[37]

As amplification

Gene duplication does not necessarily constitute a lasting change in a species' genome. In fact, such changes often don't last past the initial host organism. From the perspective of

enzymes, or it can occur naturally, as described above. If it's a natural duplication, it can still take place in a somatic cell, rather than a germline

cell (which would be necessary for a lasting evolutionary change).

Role in cancer

Duplications of

oncogenes are a common cause of many types of cancer. In such cases the genetic duplication occurs in a somatic cell and affects only the genome of the cancer cells themselves, not the entire organism, much less any subsequent offspring. Recent comprehensive patient-level classification and quantification of driver events in TCGA cohorts revealed that there are on average 12 driver events per tumor, of which 1.5 are amplifications of oncogenes.^[38]

Common oncogene amplifications in human cancers
Cancer type	Associated gene amplifications	Prevalence of amplification in cancer type (percent)
Breast cancer	MYC	20%^[39]
	ERBB2 (HER2 )	20%^[39]
	CCND1 (Cyclin D1 )	15–20%^[39]
	FGFR1	12%^[39]
	FGFR2	12%^[39]
Cervical cancer	MYC	25–50%^[39]
Cervical cancer	ERBB2	20%^[39]
Colorectal cancer	HRAS	30%^[39]
	KRAS	20%^[39]
	MYB	15–20%^[39]
Esophageal cancer	MYC	40%^[39]
	CCND1	25%^[39]
	MDM2	13%^[39]
Gastric cancer	CCNE (Cyclin E)	15%^[39]
	KRAS	10%^[39]
	MET	10%^[39]
Glioblastoma	ERBB1 (EGFR)	33–50%^[39]
Glioblastoma	CDK4	15%^[39]
Head and neck cancer	CCND1	50%^[39]
	ERBB1	10%^[39]
	MYC	7–10%^[39]
Hepatocellular cancer	CCND1	13%^[39]
Neuroblastoma	MYCN	20–25%^[39]
Ovarian cancer	MYC	20–30%^[39]
	ERBB2	15–30%^[39]
	AKT2	12%^[39]
Sarcoma	MDM2	10–30%^[39]
Sarcoma	CDK4	10%^[39]
Small cell lung cancer	MYC	15–20%^[39]

Whole-genome duplications are also frequent in cancers, detected in 30% to 36% of tumors from the most common cancer types.[40]^[41] Their exact role in carcinogenesis is unclear, but they in some cases lead to loss of chromatin segregation leading to chromatin conformation changes that in turn lead to oncogenic epigenetic and transcriptional modifications.^[42]

References

doi:10.1016/S0169-5347(03)00033-8
.

^ "Definition of Gene duplication". medterms medical dictionary. MedicineNet. 2012-03-19.

PMID 35741730
.

PMID 16128622
.

S2CID 4307263
.

S2CID 4422074
.

S2CID 10054182
.

PMID 16555924
.

S2CID 4382441
.

PMID 11073452
.

PMID 16818725
.

PMID 16098632
.

PMID 16507168
.

PMID 16140417
.

^
PMID 20080574
.

S2CID 20357402
.

PMID 15286789
.

S2CID 206512885
.

PMID 21295484
.

PMID 6789329
.

PMID 19114461
.

PMID 18059269
.

PMID 17233905
.

S2CID 1240225
.

PMID 15568988
.

ISBN 978-0-04-575015-3
.

ISBN 978-91-554-5776-1
.

S2CID 4422074
.

PMID 10101175
.

S2CID 1743092
.

S2CID 418964
.

S2CID 22412305
.

PMID 17997610
.

PMID 16240409
.

PMID 15647348
.

^ "ISCN Symbols and Abbreviated Terms". Coriell Institute for Medical Research. Retrieved 2022-10-27.

OMIM
. Updated : 4/23/2014

PMID 35030162
.

^
ISBN 978-0-07-137050-9
.

PMID 30013179
.

PMID 33505027
.

PMID 36922594
.

External links

A bibliography on gene and genome duplication

A brief overview of mutation, gene duplication and translocation

v
t
e
Molecular evolution
Natural selection

Balancing selection

Directional selection

Disruptive selection

Negative selection

Stabilizing selection

Selective sweep

Models

Models of DNA evolution

Models of nucleotide substitution

Allele frequency

Ka/Ks ratio

Tajima's D

Fay and Wu's H

Molecular processes

Gene conversion

Gene duplication

Silent mutation

Synonymous substitution

Nonsynonymous substitution

v
t
e
Genetics: repeated sequence, transposon, gene duplication
Repeatome
Repeated sequence
Tandem repeats

Satellite DNA

Variable number tandem repeat/Minisatellite

Short tandem repeat/Microsatellite (Trinucleotide repeat disorders)

Macrosatellite

Interspersed
repeat

Gene conversion

Retrotransposon

DNA transposon
Polinton

Helitron

Other

Inverted repeat

Direct repeat

Transposon
Retrotransposon
SINEs

Alu sequence

MIR

LINEs

LINE1

LINE2

LTRs

HERV

MER4

retroposon

DNA transposon

Academ

Crypton

Dada

EnSpm/CACTA

Ginger1

Ginger2

Harbinger

hAT

Helitron

IS3EU

ISL2EU

Kolobok

Tc1/mariner

Merlin

MuDR

Novosib

P element

PiggyBac

Polinton

Sola

Transib

Zator

Zisupton

Gene duplication

Gene amplification

Tandemly arrayed genes
Ribosomal DNA

Gene family
Gene cluster

Pseudogene

See also

Genomic island
Pathogenicity island

Symbiosis island

Low copy repeats

CRISPR

Telomere

Protein tandem repeats

v
t
e
Self-replicating organic structures
Cellular life

Bacteria

Archaea

Eukaryota
Animalia

Fungi

Plantae

Protista

Incertae sedis
Parakaryon myojinensis

Biological dark matter

Virus

dsDNA virus
Giant virus

ssDNA virus

dsRNA virus

(+)ssRNA virus

(−)ssRNA virus

ssRNA-RT virus

dsDNA-RT virus

Subviral
agents
Viroid

Pospiviroidae

Avsunviroidae

Helper-virus
dependent
Satellite

ssRNA satellite virus

dsDNA satellite virus (Virophage)

ssDNA satellite virus

ssDNA satellite

dsRNA satellite

ssRNA satellite (Virusoid)

Satellite-like nucleic acids
RNA

DNA

Other

Defective interfering particle
RNA

DNA

Prion

Mammalian prion

Fungal prion

Nucleic acid
self-replication
Mobile genetic
elements

Mobilome
Horizontal gene transfer

Genomic island

Transposable element
Class I or retrotransposon

Class II or DNA transposon

Plasmid
Fertility

Resistance

Col

Degradative

Virulence/Ti

Cryptic

Cosmid
Fosmid

Phagemid

Group I intron

Group II intron

Retrozyme

Other aspects

DNA replication
RNA replication

Chromosome
Linear

Circular

Extrachromosomal DNA

Secondary chromosome

Genome
Gene

Gene duplication

Non-coding DNA

Origin of replication
Replicon

Endogenous viral element
Provirus

Prophage

Endogenous retrovirus

Transpoviron

Repeated sequences in DNA
Tandem repeat

Interspersed repeat

Endosymbiosis

Mitochondrion
Mitosome

Hydrogenosome

Plastid
Chloroplast

Chromoplast

Gerontoplast

Leucoplast

Apicoplast

Kappa organism

Organs
Bacteriome

Trophosome

Abiogenesis

Last universal common ancestor

Earliest known life forms

?RNA life
Ribozyme

†Protocell

Coacervate

Proteinoid

Sulphobe

Research
Model lipid bilayer

Jeewanu

See also

Organism

Cell
Cell division

Artificial cell

Non-cellular life

Synthetic virus
Viral vector

Helper dependent virus

?Nanobacterium

?Nanobe

Cancer cell
HeLa

Clonally transmissible cancer

Virome

Retrieved from "https://en.wikipedia.org/w/index.php?title=Gene_duplication&oldid=1208807888"

[Zhang_2003-1] :10.1016/S0169-5347(03)00033-8
.

[2] "Definition of Gene duplication". medterms medical dictionary. MedicineNet. 2012-03-19.

[3] PMID 35741730
.

[HollandDehal2005-4] PMID 16128622
.

[5] S2CID 4307263
.

[6] S2CID 4422074
.

[7] S2CID 10054182
.

[8] PMID 16555924
.

[9] S2CID 4382441
.

[10] PMID 11073452
.

[11] PMID 16818725
.

[12] PMID 16098632
.

[13] PMID 16507168
.

[14] PMID 16140417
.

[:0-15] 
PMID 20080574
.

[16] S2CID 20357402
.

[17] PMID 15286789
.

[18] S2CID 206512885
.

[19] PMID 21295484
.

[20] PMID 6789329
.

[21] PMID 19114461
.

[22] PMID 18059269
.

[VLynch-23] PMID 17233905
.

[Conant-24] S2CID 1240225
.

[Taylor_Raes_2004-25] PMID 15568988
.

[Ohno_1970-26] ISBN 978-0-04-575015-3
.

[Ohno_1967-27] ISBN 978-91-554-5776-1
.

[Kellis_2004-28] S2CID 4422074
.

[Force_1999-29] PMID 10101175
.

[Stoltzfus_1999-30] S2CID 1743092
.

[DesMerais-31] S2CID 418964
.

[32] S2CID 22412305
.

[Hahn-33] PMID 17997610
.

[34] PMID 16240409
.

[35] PMID 15647348
.

[36] "ISCN Symbols and Abbreviated Terms". Coriell Institute for Medical Research. Retrieved 2022-10-27.

[37] OMIM
. Updated : 4/23/2014

[38] PMID 35030162
.

[Vogelstein2002-39] 
ISBN 978-0-07-137050-9
.

[40] PMID 30013179
.

[41] PMID 33505027
.

[42] PMID 36922594
.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

[23]

[24]

[25]

[26]

[27]

[29]

[30]

[31]

[32]

[34]

[35]

[36]

[37]

[38]

[39]

[41]

[42]