Promoter (genetics)
In
Promoters control gene expression in bacteria and eukaryotes.[3] RNA polymerase must attach to DNA near a gene for transcription to occur. Promoter DNA sequences provide an enzyme binding site. The -10 sequence is TATAAT. -35 sequences are conserved on average, but not in most promoters.
Artificial promoters with conserved -10 and -35 elements transcribe more slowly. All DNAs have "Closely spaced promoters". Divergent, tandem, and convergent orientations are possible. Two closely spaced promoters will likely interfere. Regulatory elements can be several kilobases away from the transcriptional start site in gene promoters (enhancers).
In eukaryotes, the transcriptional complex can bend DNA, allowing regulatory sequences to be placed far from the transcription site. The distal promoter is upstream of the gene and may contain additional regulatory elements with a weaker influence. RNA polymerase II (RNAP II) bound to the transcription start site promoter can start mRNA synthesis. It also typically contains
Hypermethylation downregulates both genes, while demethylation upregulates them. Non-coding RNAs are linked to mRNA promoter regions, according to research. Subgenomic promoters range from 24 to 100 nucleotides (Beet necrotic yellow vein virus). Gene expression depends on promoter binding. Unwanted gene changes can increase a cell's cancer risk.
MicroRNA promoters often contain CpG islands. DNA methylation forms 5-methylcytosines at the 5' pyrimidine ring of CpG cytosine residues. Some cancer genes are silenced by mutation, but most are silenced by DNA methylation. Others are regulated promoters. Selection may favor less energetic transcriptional binding.
Variations in promoters or transcription factors cause some diseases. Misunderstandings can result from using canonical sequence to describe a promoter.
Overview
For transcription to take place, the enzyme that synthesizes RNA, known as
- In bacteria
- The promoter is recognized by RNA polymerase and an associated sigma factor, which in turn are often brought to the promoter DNA by an activator protein's binding to its own DNA binding site nearby.
- In eukaryotes
- The process is more complicated, and at least seven different factors are necessary for the binding of an RNA polymerase II to the promoter.
Promoters represent critical elements that can work in concert with other regulatory regions (
Identification of relative location
As promoters are typically immediately adjacent to the gene in question, positions in the promoter are designated relative to the
Relative location in the cell nucleus
In the cell nucleus, it seems that promoters are distributed preferentially at the edge of the chromosomal territories, likely for the co-expression of genes on different chromosomes.[6] Furthermore, in humans, promoters show certain structural features characteristic for each chromosome.[6]
Elements
Bacterial
In
- The sequence at -10 (the -10 element) has the consensus sequence TATAAT.
- The sequence at -35 (the -35 element) has the consensus sequence TTGACA.
- The above consensus sequences, while conserved on average, are not found intact in most promoters. On average, only 3 to 4 of the 6 base pairs in each consensus sequence are found in any given promoter. Few natural promoters have been identified to date that possess intact consensus sequences at both the -10 and -35; artificial promoters with complete conservation of the -10 and -35 elements have been found to transcribe at lower frequencies than those with a few mismatches with the consensus.
- The optimal spacing between the -35 and -10 sequences is 17 bp.
- Some promoters contain one or more upstream promoter element (UP element) subsites[7] (consensus sequence 5'-AAAAAARNR-3' when centered in the -42 region; consensus sequence 5'-AWWWWWTTTTT-3' when centered in the -52 region; W = A or T; R = A or G; N = any base).[8]
The above promoter sequences are recognized only by RNA polymerase
← upstream downstream --> 5'-XXXXXXXPPPPPPXXXXXXPPPPPPXXXXGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGXXXX-3' -35 -10 Gene to be transcribed
Probability of occurrence of each nucleotide
for -10 sequence T A T A A T 77% 76% 60% 61% 56% 82%
for -35 sequence T T G A C A 69% 79% 61% 56% 54% 54%
Bidirectional (prokaryotic)
Promoters can be very closely located in the DNA. Such "closely spaced promoters" have been observed in the DNAs of all life forms, from humans[9] to prokaryotes[10] and are highly conserved.[11] Therefore, they may provide some (presently unknown) advantages. These pairs of promoters can be positioned in divergent, tandem, and convergent directions. They can also be regulated by transcription factors and differ in various features, such as the nucleotide distance between them, the two promoter strengths, etc. The most important aspect of two closely spaced promoters is that they will, most likely, interfere with each other. Several studies have explored this using both analytical and stochastic models.[12][13][14] There are also studies that measured gene expression in synthetic genes or from one to a few genes controlled by bidirectional promoters.[15]
More recently, one study measured most genes controlled by tandem promoters in E. coli.[16] In that study, it was measured and then modeled two main forms of interference. One is when an RNAP is on the downstream promoter, blocking the movement of RNAPs elongating from the upstream promoter. The other is when the two promoters are so close that when an RNAP sits on one of the promoters, it blocks any other RNAP from reaching the other promoter. These events are possible because the RNAP occupies several nucleotides when bound to the DNA, including in transcription start sites. Similar events occur when the promoters are in divergent and convergent formations. The possible events also depend on the distance between them.
Eukaryotic
Gene promoters are typically located upstream of the gene and can have regulatory elements several kilobases away from the transcriptional start site (enhancers). In eukaryotes, the transcriptional complex can cause the DNA to bend back on itself, which allows for placement of regulatory sequences far from the actual site of transcription. Eukaryotic RNA-polymerase-II-dependent promoters can contain a
The TATA element and BRE typically are located close to the transcriptional start site (typically within 30 to 40 base pairs).Eukaryotic promoter regulatory sequences typically bind proteins called transcription factors that are involved in the formation of the transcriptional complex. An example is the
- Core promoter – the minimal portion of the promoter required to properly initiate transcription[18]
- Includes the transcription start site (TSS) and elements directly upstream
- A binding site for RNA polymerase
- RNA polymerase I: transcribes genes encoding 18S, 5.8S and 28S ribosomal RNAs
- RNA polymerase II: transcribes genes encoding messenger RNA and certain small nuclear RNAs and microRNA
- RNA polymerase III: transcribes genes encoding transfer RNA, 5s ribosomal RNAs and other small RNAs
- General transcription factor binding sites, e.g. TATA box, B recognition element.
- Many other elements/motifs may be present. There is no such thing as a set of "universal elements" found in every core promoter.[23]
- Proximal promoter – the proximal sequence upstream of the gene that tends to contain primary regulatory elements
- Approximately 250 base pairs upstream of the start site
- Specific transcription factor binding sites
- Distal promoter – the distal sequence upstream of the gene that may contain additional regulatory elements, often with a weaker influence than the proximal promoter
- Anything further upstream (but not an enhancer or other regulatory region whose influence is positional/orientation independent)
- Specific transcription factor binding sites
Mammalian promoters
This section may require cleanup to meet Wikipedia's quality standards. The specific problem is: Text about mammals highly duplicated among uses of the same picture -- can we make a "canonical" version and redirect people there? (September 2021) |
Up-regulated expression of genes in mammals is initiated when signals are transmitted to the promoters associated with the genes. Promoter DNA sequences may include different elements such as CpG islands (present in about 70% of promoters), a TATA box (present in about 24% of promoters), initiator (Inr) (present in about 49% of promoters), upstream and downstream TFIIB recognition elements (BREu and BREd) (present in about 22% of promoters), and downstream core promoter element (DPE) (present in about 12% of promoters).[24] The presence of multiple methylated CpG sites in CpG islands of promoters causes stable silencing of genes.[25] However, the presence or absence of the other elements have relatively small effects on gene expression in experiments.[26] Two sequences, the TATA box and Inr, caused small but significant increases in expression (45% and 28% increases, respectively). The BREu and the BREd elements significantly decreased expression by 35% and 20%, respectively, and the DPE element had no detected effect on expression.[26]
Enhancers are regions of the genome that are major gene-regulatory elements. Enhancers control cell-type-specific gene expression programs, most often by looping through long distances to come in physical proximity with the promoters of their target genes.[30] In a study of brain cortical neurons, 24,937 loops were found, bringing enhancers to promoters.[27] Multiple enhancers, each often at tens or hundred of thousands of nucleotides distant from their target genes, loop to their target gene promoters and coordinate with each other to control expression of their common target gene.[30]
The schematic illustration in this section shows an enhancer looping around to come into close physical proximity with the promoter of a target gene. The loop is stabilized by a dimer of a connector protein (e.g. dimer of CTCF or YY1), with one member of the dimer anchored to its binding motif on the enhancer and the other member anchored to its binding motif on the promoter (represented by the red zigzags in the illustration).[31] Several cell function specific transcription factors (there are about 1,600 transcription factors in a human cell[32]) generally bind to specific motifs on an enhancer[33] and a small combination of these enhancer-bound transcription factors, when brought close to a promoter by a DNA loop, govern the level of transcription of the target gene. Mediator (coactivator) (a complex usually consisting of about 26 proteins in an interacting structure) communicates regulatory signals from enhancer DNA-bound transcription factors directly to the RNA polymerase II (pol II) enzyme bound to the promoter.[34]
Enhancers, when active, are generally transcribed from both strands of DNA with RNA polymerases acting in two different directions, producing two eRNAs as illustrated in the Figure.[35] An inactive enhancer may be bound by an inactive transcription factor. Phosphorylation of the transcription factor may activate it and that activated transcription factor may then activate the enhancer to which it is bound (see small red star representing phosphorylation of transcription factor bound to enhancer in the illustration).[36] An activated enhancer begins transcription of its RNA before activating a promoter to initiate transcription of messenger RNA from its target gene.[37]
Bidirectional (mammalian)
Bidirectional promoters are short (<1 kbp) intergenic regions of
Bidirectionally paired genes in the
Some functional classes of genes are more likely to be bidirectionally paired than others. Genes implicated in DNA repair are five times more likely to be regulated by bidirectional promoters than by unidirectional promoters.
Certain sequence characteristics have been observed in bidirectional promoters, including a lack of
Although the term "bidirectional promoter" refers specifically to promoter regions of
Subgenomic
A subgenomic promoter is a promoter added to a virus for a specific heterologous gene, resulting in the formation of mRNA for that gene alone. Many positive-sense RNA viruses produce these subgenomic mRNAs (sgRNA) as one of the common infection techniques used by these viruses and generally transcribe late viral genes. Subgenomic promoters range from 24 nucleotide (Sindbis virus) to over 100 nucleotides (Beet necrotic yellow vein virus) and are usually found upstream of the transcription start.[47]
Detection
A wide variety of algorithms have been developed to facilitate detection of promoters in genomic sequence, and promoter prediction is a common element of many gene prediction methods. A promoter region is located before the -35 and -10 Consensus sequences. The closer the promoter region is to the consensus sequences the more often transcription of that gene will take place. There is not a set pattern for promoter regions as there are for consensus sequences.
Evolutionary change
Changes in promoter sequences are critical in evolution as indicated by the relatively stable number of genes in many lineages. For instance, most vertebrates have roughly the same number of protein-coding genes (about 20,000) which are often highly conserved in sequence, hence much of evolutionary change must come from changes in gene expression.[6][17]
De novo origin of promoters
Given the short sequences of most promoter elements, promoters can rapidly evolve from random sequences. For instance, in E. coli, ~60% of random sequences can evolve expression levels comparable to the wild-type lac promoter with only one mutation, and that ~10% of random sequences can serve as active promoters even without evolution.[48]
Binding
The initiation of the transcription is a multistep sequential process that involves several mechanisms: promoter location, initial reversible binding of RNA polymerase, conformational changes in RNA polymerase, conformational changes in DNA, binding of nucleoside triphosphate (NTP) to the functional RNA polymerase-promoter complex, and nonproductive and productive initiation of RNA synthesis.[49][2]
The promoter binding process is crucial in the understanding of the process of gene expression. Tuning synthetic genetic systems relies on precisely engineered synthetic promoters with known levels of transcription rates.[2]
Location
Although RNA polymerase
Diseases associated with aberrant function
Most diseases are heterogeneous in cause, meaning that one "disease" is often many different diseases at the molecular level, though symptoms exhibited and response to treatment may be identical. How diseases of different molecular origin respond to treatments is partially addressed in the discipline of pharmacogenomics.
Not listed here are the many kinds of cancers involving aberrant transcriptional regulation owing to creation of chimeric genes through pathological chromosomal translocation. Importantly, intervention in the number or structure of promoter-bound proteins is one key to treating a disease without affecting expression of unrelated genes sharing elements with the target gene.[52] Some genes whose change is not desirable are capable of influencing the potential of a cell to become cancerous.[53]
CpG islands in promoters
In humans, about 70% of promoters located near the transcription start site of a gene (proximal promoters) contain a
Distal promoters also frequently contain CpG islands, such as the promoter of the DNA repair gene
Methylation of CpG islands stably silences genes
In humans, DNA methylation occurs at the 5' position of the pyrimidine ring of the cytosine residues within
Promoter CpG hyper/hypo-methylation in cancer
Generally, in progression to cancer, hundreds of genes are silenced or activated. Although silencing of some genes in cancers occurs by mutation, a large proportion of carcinogenic gene silencing is a result of altered DNA methylation (see DNA methylation in cancer). DNA methylation causing silencing in cancer typically occurs at multiple CpG sites in the CpG islands that are present in the promoters of protein coding genes.
Altered expressions of microRNAs also silence or activate many genes in progression to cancer (see microRNAs in cancer). Altered microRNA expression occurs through hyper/hypo-methylation of CpG sites in CpG islands in promoters controlling transcription of the microRNAs.
Silencing of DNA repair genes through methylation of CpG islands in their promoters appears to be especially important in progression to cancer (see methylation of DNA repair genes in cancer).
Canonical sequences and wild-type
The usage of the term
In the case of a transcription factor binding site, there may be a single sequence that binds the protein most strongly under specified cellular conditions. This might be called canonical.
However, natural selection may favor less energetic binding as a way of regulating transcriptional output. In this case, we may call the most common sequence in a population the wild-type sequence. It may not even be the most advantageous sequence to have under prevailing conditions.
Recent evidence also indicates that several genes (including the
Synthetic promoter design and engineering
Promoters are important gene regulatory elements used in tuning synthetically designed genetic circuits and metabolic networks. For example, to overexpress an important gene in a network, to yield higher production of target protein, synthetic biologists design promoters to upregulate its expression. Automated algorithms can be used to design neutral DNA or insulators that do not trigger gene expression of downstream sequences.[57][2]
Diseases that may be associated with variations
Some cases of many genetic diseases are associated with variations in promoters or transcription factors.
Examples include:
Constitutive vs regulated
Some promoters are called constitutive as they are active in all circumstances in the cell, while others are regulated, becoming active in the cell only in response to specific stimuli.
Tissue-Specific Promoter
Name | Tissue |
---|---|
POMC | melanocytes, keratinocytes and dermal microvascular endothelial cells |
B29 | B cells |
CD4 | a subset of T cells, natural-killer cells, monocytes and macrophages |
CD14 | Monocytic cells |
CD43 | Leukocytes and platelets |
CD45 | Hematopoietic cells |
CD68 | Macrophages |
Desmin | Muscle |
Elastase | Pancreatic acinar cells |
Endoglin | Endothelial |
Fibronectin | Differentiating cells, healing tissue |
Flt-1 | Endothelial |
GFAP | Astrocytes |
GPIIb | Megakaryocytes |
ICAM-2 | Endothelial |
INF-beta | Hemotopoietic |
Mb | Muscle |
Nphsl | Podocytes |
OG-2 | Osteoblasts, Odonblasts |
SP-B | Lung |
SYN1 | Neurons |
WASP | Hemotopoietic |
Alb | Liver |
RU5' | Mature neurons |
Nos1 | kidney |
Thy-1(Thymocyte differentiation antigen 1, CD90) | brain |
Slc6a4 | gastrointestinal tract, female tissues, and lung |
PDGF, PDGFRb, CX3CR1, TRPA1, Krt5, actb, aMHC, 1a1kin, Cck, zDC, cFos, Hand1, Rosa, Insl5, Cart, Sctr, Ins1, Nrsn1, Foxp3, Tph, Cnr1, Pzp, CD23, Cx40, Foxn1, Rspo3, Krt13, Pnoc, ChAT, MMTV, Myh6, Sftpc, Mlc2a, Atf3, Pirt, Dbh, Villin, Vav1, Sox2, Dat, Pdx1, Cal, Gfral, Cr, BAF53b, Cntn2, Nav1.8, ObRb, Krt5, Advillin, Mrgprd, PV, Pax7, Calb1, Mx1, Nmu, Aldh1, CAG, CD19, Krt14, Vil1, Stra8, E8i, BAF53b, Pf4, UBC, Vip, TCF21, Cart, Htr3b, Pdx1, Mgarp, Mx1, Nmu, GFAP, vGlut2, Tac1.
Use of the term
When referring to a promoter some authors actually mean promoter +
See also
- Activator (genetics)
- Enhancer (genetics)
- Glossary of gene expression terms
- Operon
- Regulation of gene expression
- Repressor
- Transcription factor
- Promoter bashing
References
- ^ Sharan R (4 January 2007). "Analysis of Biological Networks: Transcriptional Networks – Promoter Sequence Analysis" (PDF). Tel Aviv University. Retrieved 30 December 2012.
- ^ PMID 36056029.
- PMID 35264797.
- PMID 24825771.
- PMID 16380379.
- ^ PMID 23617842.
- PMID 8248780.
- PMID 10465790.
- PMID 12110178.
- PMID 3010319.
- S2CID 3546895.
- PMID 15670592.
- PMID 22370562.
- PMID 31822223.
- PMID 27346626.
- PMID 35100257.
- ^ PMID 23020586.
- ^ PMID 12651739.
- PMID 15572469.
- PMID 9420329.
- S2CID 4373712.
- PMID 26221185.
- PMID 19682982.
- PMID 17123746.
- ^ PMID 11782440.
- ^ PMID 30622120.
- ^ PMID 32451484.
- PMID 33102493.
- S2CID 205485256.
- ^ S2CID 152283312.
- PMID 29224777.
- PMID 29425488.
- PMID 29987030.
- PMID 25693131.
- PMID 29378788.
- PMID 12514134.
- PMID 32810208.
- ^ PMID 14707170.
- PMID 17447839.
- S2CID 8556921.
- PMID 15944140.
- PMID 21689477.
- ^ PMID 16707430.
- ^ PMID 21601935.
- PMID 30332484.
- PMID 17568000.
- PMID 10846080.
- PMID 29670097.
- PMID 9620948.
- PMID 3308887.
- PMID 12732296.
- S2CID 205469320.
- S2CID 678541.
- PMID 16432200.
- PMID 21576262.
- PMID 19626585.
- S2CID 220506228.
- PMID 9847292.
- PMID 10471619.
- PMID 2018842.
- S2CID 4254507.
- ^ Tissue-specific Promoters
- ^ Maloy S. "Expression vectors". San Diego State University.
External links
- ORegAnno – Open Regulatory Annotation Database
- Identifying a Protein Binding Sites on DNA molecule YouTube tutorial video
- Pleiades Promoter Project – a research project with an aim to generate 160 fully characterized, human DNA promoters of less than 4 kb (MiniPromoters) to drive gene expression in defined brain regions of therapeutic interests.
- ENCODE threads Explorer RNA and chromatin modification patterns around promoters. Nature (journal)