TATA box

Source: Wikipedia, the free encyclopedia.
Figure 1. TATA box structural elements. The TATA box consensus sequence is TATAWAW, where W is either A or T.

In

sequence of DNA found in the core promoter region of genes in archaea and eukaryotes.[2] The bacterial homolog of the TATA box is called the Pribnow box which has a shorter consensus sequence
.

The TATA box is considered a

non-coding DNA sequence (also known as a cis-regulatory element). It was termed the "TATA box" as it contains a consensus sequence characterized by repeating T and A base pairs.[3] How the term "box" originated is unclear. In the 1980s, while investigating nucleotide sequences in mouse genome loci, the Hogness box sequence was found and "boxed in" at the -31 position.[4] When consensus nucleotides and alternative ones were compared, homologous regions were "boxed" by the researchers.[4]
The boxing in of sequences sheds light on the origin of the term "box".

The TATA box was first identified in 1978[1] as a component of eukaryotic promoters. Transcription is initiated at the TATA box in TATA-containing genes. The TATA box is the binding site of the TATA-binding protein (TBP) and other transcription factors in some eukaryotic genes. Gene transcription by RNA polymerase II depends on the regulation of the core promoter by long-range regulatory elements such as enhancers and silencers.[5] Without proper regulation of transcription, eukaryotic organisms would not be able to properly respond to their environment.

Based on the sequence and mechanism of TATA box initiation, mutations such as insertions, deletions, and point mutations to this consensus sequence can result in phenotypic changes. These phenotypic changes can then turn into a disease phenotype. Some diseases associated with mutations in the TATA box include gastric cancer, spinocerebellar ataxia, Huntington's disease, blindness, β-thalassemia, immunosuppression, Gilbert's syndrome, and HIV-1. The TATA-binding protein (TBP) could also be targeted by viruses as a means of viral transcription.[6]

History

Discovery

The TATA box was the first eukaryotic core promoter motif to be identified in 1978 by American biochemist David Hogness[1] while he and his graduate student, Michael Goldberg were on sabbatical at the University of Basel in Switzerland.[7] They first discovered the TATA sequence while analyzing 5' DNA promoter sequences in Drosophila, mammalian, and viral genes.[8][2] The TATA box was found in protein coding genes transcribed by RNA polymerase II.[2]

Evolutionary history

Most research on the TATA box has been conducted on yeast, human, and Drosophila genomes, however, similar elements have been found in archaea and ancient eukaryotes.[2] In archaea species, the promoter contains an 8 bp AT-rich sequence located ~24 bp upstream of the transcription start site. This sequence was originally called Box A, which is now known to be the sequence that interacts with the homologue of the archaeal TATA-binding protein (TBP). Also, even though some studies have uncovered several similarities, there are others that have detected notable differences between archaeal and eukaryotic TBP. The archaea protein exhibits a greater symmetry in its primary sequence and in the distribution of electrostatic charge, which is important because the higher symmetry lowers the protein's ability to bind the TATA box in a polar manner.[2]

Even though the TATA box is present in many eukaryotic promoters, it is not contained in the majority of promoters. One study found less than 30% of 1031 potential promoter regions contain a putative TATA box motif in humans.

TFIID), initiating transcription in TATA-less promoters. The DPE has been identified in three Drosophila TATA-less promoters and in the TATA-less human IRF-1 promoter.[10]

Features

Location

Promoter sequences vary between

contractile apparatus in cells.[5]

The type of core promoter affects the level of transcription and expression of a

TFIID.[11] When promoters use the SAGA/TATA box complex to recruit RNA polymerase II, they are more highly regulated and display higher expression levels than promoters using the TFIID/TBP mode of recruitment.[11]

Analogous sequences

In bacteria, promoter regions may contain a Pribnow box, which serves an analogous purpose to the eukaryotic TATA box. The Pribnow box has a 6 bp region centered around the -10 position and an 8-12 bp sequence around the -35 region that are both conserved.[10]

A CAAT box (also CAT box) is a region of nucleotides with the following consensus sequence: 5’ GGCCAATCT 3’. The CAAT box is located about 75-80 bases upstream of the transcription initiation site and about 150 bases upstream of the TATA box. It binds transcription factors (CAAT TF or CTFs) and thereby stabilizes the nearby preinitiation complex for easier binding of RNA polymerases. CAAT boxes are rarely found in genes that express proteins ubiquitous in all cell types.[10]

Structure

Sequence and prevalence

Figure 2. Mechanism for transcription initiation at the TATA box. Transcription factors, TATA binding protein (TBP), and RNA polymerase II are all recruited to begin transcription.

The TATA box is a component of the eukaryotic core promoter and generally contains the consensus sequence 5'-TATA(A/T)A(A/T)-3'.[3] In yeast, for example, one study found that various Saccharomyces genomes had the consensus sequence 5'-TATA(A/T)A(A/T)(A/G)-3', yet only about 20% of yeast genes even contained the TATA sequence.[12] Similarly, in humans only 24% of genes have promoter regions containing the TATA box.[13] Genes containing the TATA-box tend to be involved in stress-responses and certain types of metabolism and are more highly regulated when compared to TATA-less genes.[12][14] Generally, TATA-containing genes are not involved in essential cellular functions such as cell growth, DNA replication, transcription, and translation because of their highly regulated nature.[14]

The TATA box is usually located 25-35 base pairs upstream of the transcription start site. Genes containing the TATA box usually require additional promoter elements, including an initiator site located just upstream of the transcription start site and a downstream core element (DCE).[3] These additional promoter regions work in conjunction with the TATA box to regulate initiation of transcription in eukaryotes.

Function

Role in transcription initiation

The TATA-box is the site of

minor groove[15] of the TATA box via a region of antiparallel β sheets in the protein.[16] Three types of molecular interactions contribute to TBP
binding to the TATA box:

  1. Four phenylalanine residues(Phe57, Phe74, Phe148, Phe 165) on TBP bind to DNA and form kinks in the DNA, forcing the DNA minor groove open.[16][17][18]
  2. Four
    minor groove.[16]
  3. Numerous hydrophobic interactions(~15) form between TBP residues(notably Ile152 and Leu163) and DNA bases, including van der Waals forces.[16][17][18]

Additionally, binding of TBP is facilitated by stabilizing interactions with DNA flanking the TATA box, which consists of G-C rich sequences.[19] These secondary interactions induce bending of the DNA and helical unwinding.[20] The degree of DNA bending is species and sequence dependent. For example, one study used the adenovirus TATA promoter sequence (5'-CGCTATAAAAGGGC-3') as a model binding sequence and found that human TBP binding to the TATA box induced a 97° bend toward the major groove while the yeast TBP protein only induced an 82° bend.[21] X-ray crystallography studies of TBP/TATA-box complexes generally agree that the DNA goes through an ~80° bend during the process of TBP-binding.[16][17][18]

The conformational changes induced by

TFIIH.[24] This completes the assembly of the preinitiation complex for eukaryotic transcription.[3] Generally, the TATA box is found at RNA polymerase II promoter regions, although some in vitro studies have demonstrated that RNA polymerase III can recognize TATA sequences.[25]

This cluster of RNA polymerase II and various transcription factors is known as the basal transcriptional complex (BTC). In this state, it only gives a low level of transcription. Other factors must stimulate the BTC to increase transcription levels.[2] One such example of a BTC stimulating region of DNA is the CAAT box. Additional factors, including the Mediator complex, transcriptional regulatory proteins, and nucleosome-modifying enzymes also enhance transcription in vivo.[3]

Interactions

In specific cell types or on specific promoters TBP can be replaced by one of several TBP-related factors (TRF1 in

metazoans, TBPL2/TRF3 in vertebrates), some of which interact with the TATA box similar to TBP.[26] Interaction of TATA boxes with a variety of activators or repressors can influence the transcription of genes in many ways[citation needed]. Enhancers are long-range regulatory elements that increase promoter activity while silencers
repress promoter activity.

Mutations

Figure 3. Effects on TBP binding to the TATA box from mutations. Wildtype shows transcription done normally. An insertion or deletion shifts the TATA box recognition site which results in a shifted transcription site.[27] Point mutations risk the TBP being unable to bind for initiation.[28]

Mutations to the TATA box can range from a deletion or insertion to a point mutation with varying effects based on the gene that has been mutated. The mutations change the binding of the TATA-binding protein (TBP) for transcription initiation. Thus, there is a resulting change in phenotype based on the gene that is not being expressed (Figure 3).

Insertions or deletions

One of the first studies of TATA box mutations looked at a sequence of DNA from Agrobacterium tumefaciens for the octopine type cytokinin gene.[27] This specific gene has three TATA boxes. A phenotype change was only observed when all three TATA boxes were deleted. An insertion of extra base pairs between the last TATA box and the transcription start site resulted in a shift in the start site; thus, resulting in a phenotypic change.  From this original mutation study, a change in transcription can be seen when there is no TATA box to promote transcription, but transcription of a gene will occur when there is an insertion to the sequence. The nature of the resulting phenotype may be affected due to the insertion.

duplication of the TATA box leads to a significant decrease in enzymatic activity in the scutellum and roots, leaving pollen enzymatic levels unaffected. A deletion of the TATA box leads to a small decrease in enzymatic activity in the scutellum and roots, but a large decrease in enzymatic levels in pollen.[29]

Point mutations

Point mutations to the TATA box have similar varying phenotypic changes depending on the gene that is being affected. Studies also show that the placement of the mutation in the TATA box sequence hinders the binding of TBP.[28] For example, a mutation from TATAAAA to CATAAAA does completely hinder the binding sufficiently to change transcription, the neighboring sequences can affect if there is a change or not.[30] However, a change can be seen in HeLa cells with a TATAAAA to TATACAA which leads to a 20 fold decrease in transcription.[31] Some diseases that can be caused due to this insufficiency by specific gene transcription are:  Thalassemia,[32] lung cancer,[33] chronic hemolytic anemia,[34] immunosuppression,[35] hemophilia B Leyden,[36] and thrombophlebitis and myocardial infarction.[37]

Savinkova et al. has written a simulation to predict the KD value for a selected TATA box sequence and TBP.[38] This can be used to directly predict the phenotypic traits resulting from a selected mutation based on how tightly TBP is binding to the TATA box.

Diseases

Mutations in the TATA box region affects the binding of the TATA-binding protein (TBP) for transcription initiation, which may cause carriers to have a disease phenotype.

Gastric cancer is correlated with TATA box polymorphism.[39] The TATA box has a binding site for the transcription factor of the PG2 gene. This gene produces PG2 serum, which is used as a biomarker for tumours in gastric cancer. Longer TATA box sequences correlates with higher levels of PG2 serum indicating gastric cancer conditions. Carriers with shorter TATA box sequences may produce lower levels of PG2 serum.

Several

neurodegenerative disorders are associated TATA box mutations.[40] Two disorders have been highlighted, spinocerebellar ataxia and Huntington's disease. In spinocerebellar ataxia, the disease phenotype is caused by expansion of the polyglutamine repeat in the TATA-binding protein (TBP). An accumulation of these polyglutamine-TBP cells will occur, as shown by protein aggregates in brain sections of patients, resulting in a loss of neuronal cells
.

Blindness can be caused by excessive cataract formation when the TATA box is targeted by microRNAs to increase the level of oxidative stress genes.[41] MicroRNAs can target the 3'-untranslated region and bind to the TATA box to activate the transcription of oxidative stress related genes.

SNPs in TATA boxes are associated with B-thalassemia, immunosuppression, and other neurological disorders.[42] SNPs destabilize the TBP/TATA complex which significantly decreases the rate at which TATA-binding proteins (TBP) will bind to the TATA box. This leads to lower levels of transcription affecting the severity of the disease. Results from studies have shown the interaction in vitro so far, but results may be comparable to that in vivo.

Gilbert's syndrome is correlated with UTG1A1 TATA box polymorphism.[43] This poses a risk for developing jaundice in newborns.

MicroRNAs also play a role in replicating viruses such as HIV-1.[44] Novel HIV-1-encoded microRNA have been found to enhance the production of the virus as well as activating HIV-1 latency by targeting the TATA box region.

Clinical significance

Technology

Many of the studies so far have been performed

ACTB gene involved in TATA-binding.[5]

Cancer therapy

topoisomerase II).[45] Cisplatin is a compound that binds covalently to adjacent guanines in the major groove of DNA, which distorts DNA to allow access of DNA-binding proteins in the minor groove.[45] This will destabilize the interaction between the TATA-binding protein (TBP) to the TATA box. The result is to immobilize the TATA-binding protein (TBP) on DNA in order to down-regulate transcription
initiation.

Genetic engineering

TATA box modification

Evolutionary changes have pushed

TFIID activity and subsequently transcription initiation, resulting in a more iron-efficient phenotype.[46]

See also

References