Epigenomics
Epigenomics is the study of the complete set of
Introduction to epigenetics
The mechanisms governing phenotypic plasticity, or the capacity of a cell to change its state in response to stimuli, have long been the subject of research (Phenotypic plasticity 1). The traditional central dogma of biology states that the DNA of a cell is transcribed to RNA, which is translated to proteins, which perform cellular processes and functions.[10] A paradox exists, however, in that cells exhibit diverse responses to varying stimuli and that cells sharing identical sets of DNA such as in multicellular organisms can have a variety of distinct functions and phenotypes.[11] Classical views have attributed phenotypic variation to differences in primary DNA structure, be it through aberrant mutation or an inherited sequence allele.[12] However, while this did explain some aspects of variation, it does not explain how tightly coordinated and regulated cellular responses, such as differentiation, are carried out.
A more likely source of cellular plasticity is through the
With the finding that DNA methylation and histone modifications are stable, heritable, and also reversible processes that influence gene expression without altering DNA primary structure, a mechanism for the observed variability in cell gene expression was provided.[12] These modifications were termed epigenetic, from epi “on top of” the genetic material “DNA” (Epigenetics 1). The mechanisms governing epigenetic modifications are complex, but through the advent of high-throughput sequencing technology they are now becoming better understood.[12]
Epigenetics
Genomic modifications that alter gene expression that cannot be attributed to modification of the primary DNA sequence and that are heritable mitotically and meiotically are classified as epigenetic modifications. DNA methylation and histone modification are among the best characterized epigenetic processes.[3]
DNA methylation
The first epigenetic modification to be characterized in depth was DNA methylation. As its name implies, DNA methylation is the process by which a methyl group is added to DNA. The enzymes responsible for catalyzing this reaction are the DNA methyltransferases (DNMTs). While DNA methylation is stable and heritable, it can be reversed by an antagonistic group of enzymes known as DNA de-methylases. In eukaryotes, methylation is most commonly found on the carbon 5 position of cytosine residues (5mC) adjacent to guanine, termed CpG dinucleotides.[9][14]
DNA methylation patterns vary greatly between species and even within the same organism. The usage of methylation among animals is quite different; with vertebrates exhibiting the highest levels of 5mC and invertebrates more moderate levels of 5mC. Some organisms like Caenorhabditis elegans have not been demonstrated to have 5mC nor a conventional DNA methyltransferase; this would suggest that other mechanisms other than DNA methylation are also involved.[11]
Within an organism, DNA methylation levels can also vary throughout development and by region. For example, in mouse primordial
The mechanism by which DNA methylation represses gene expression is a multi-step process. The distinction between methylated and unmethylated cytosine residues is carried out by specific DNA-binding proteins. Binding of these proteins recruit histone deacetylases (HDACs) enzyme which initiate chromatin remodeling such that the DNA becoming less accessible to transcriptional machinery, such as RNA polymerase, effectively repressing gene expression.[15]
Histone modification
In eukaryotes, genomic DNA is coiled into protein-DNA complexes called chromatin. Histones, which are the most prevalent type of protein found in chromatin, function to condense the DNA; the net positive charge on histones facilitates their bonding with DNA, which is negatively charged. The basic and repeating units of chromatin, nucleosomes, consist of an octamer of histone proteins (H2A, H2B, H3 and H4) and a 146 bp length of DNA wrapped around it. Nucleosomes and the DNA connecting form a 10 nm diameter chromatin fiber, which can be further condensed.[16][17]
Chromatin packaging of DNA varies depending on the cell cycle stage and by local DNA region.[18] The degree to which chromatin is condensed is associated with a certain transcriptional state. Unpackaged or loose chromatin is more transcriptionally active than tightly packaged chromatin because it is more accessible to transcriptional machinery. By remodeling chromatin structure and changing the density of DNA packaging, gene expression can thus be modulated.[17]
Chromatin remodeling occurs via
Histone modifications regulate gene expression by two mechanisms: by disruption of the contact between nucleosomes and by recruiting chromatin remodeling ATPases. An example of the first mechanism occurs during the acetylation of lysine terminal tail amino acids, which is catalyzed by histone acetyltransferases (HATs). HATs are part of a multiprotein complex that is recruited to chromatin when activators bind to DNA binding sites. Acetylation effectively neutralizes the basic charge on lysine, which was involved in stabilizing chromatin through its affinity for negatively charged DNA. Acetylated histones therefore favor the dissociation of nucleosomes and thus unwinding of chromatin can occur. Under a loose chromatin state, DNA is more accessible to transcriptional machinery and thus expression is activated. The process can be reversed through removal of histone acetyl groups by deacetylases.[17][19]
The second process involves the recruitment of chromatin remodeling complexes by the binding of activator molecules to corresponding enhancer regions. The nucleosome remodeling complexes reposition nucleosomes by several mechanisms, enabling or disabling accessibility of transcriptional machinery to DNA. The SWI/SNF protein complex in yeast is one example of a chromatin remodeling complex that regulates the expression of many genes through chromatin remodeling.[17][20]
Relation to other genomic fields
Epigenomics shares many commonalities with other genomics fields, in both methodology and in its abstract purpose. Epigenomics seeks to identify and characterize epigenetic modifications on a global level, similar to the study of the complete set of DNA in genomics or the complete set of proteins in a cell in proteomics.[1][2] The logic behind performing epigenetic analysis on a global level is that inferences can be made about epigenetic modifications, which might not otherwise be possible through analysis of specific loci.[16][1] As in the other genomics fields, epigenomics relies heavily on bioinformatics, which combines the disciplines of biology, mathematics and computer science.[21] However while epigenetic modifications had been known and studied for decades, it is through these advancements in bioinformatics technology that have allowed analyses on a global scale. Many current techniques still draw on older methods, often adapting them to genomic assays as is described in the next section.
Methods
Histone modification assays
The cellular processes of
It is now known that sensitivity to DNAse I regions correspond to regions of chromatin with loose DNA-histone association. Hypersensitive sites most often represent promoters regions, which require for DNA to be accessible for DNA binding transcriptional machinery to function.[23]
ChIP-Chip and ChIP-Seq
Histone modification was first detected on a genome wide level through the coupling of
ChIP-chip was used extensively to characterize the global histone modification patterns of yeast. From these studies, inferences on the function of histone modifications were made; that transcriptional activation or repression was associated with certain histone modifications and by region. While this method was effective providing near full coverage of the yeast epigenome, its use in larger genomes such as humans is limited.[16][17]
In order to study histone modifications on a truly genome level, other high-throughput methods were coupled with the chromatin immunoprecipitation, namely:
DNA methylation assays
Techniques for characterizing primary DNA sequences could not be directly applied to methylation assays. For example, when DNA was amplified in PCR or bacterial cloning techniques, the methylation pattern was not copied and thus the information lost. The DNA hybridization technique used in DNA assays, in which radioactive probes were used to map and identify DNA sequences, could not be used to distinguish between methylated and non-methylated DNA.[26][9]
Restriction endonuclease based methods
Non genome-wide approaches
The earliest methylation detection assays used methylation modification sensitive restriction endonucleases. Genomic DNA was digested with both methylation-sensitive and insensitive restriction enzymes recognizing the same restriction site. The idea being that whenever the site was methylated, only the methylation insensitive enzyme could cleave at that position. By comparing restriction fragment sizes generated from the methylation-sensitive enzyme to those of the methylation-insensitive enzyme, it was possible to determine the methylation pattern of the region. This analysis step was done by amplifying the restriction fragments via PCR, separating them through gel electrophoresis and analyzing them via southern blot with probes for the region of interest.[26][9]
This technique was used to compare the DNA methylation modification patterns in the human adult and hemoglobin gene loci. Different regions of the gene (gamma delta beta globin) were known to be expressed at different stages of development.[27] Consistent with a role of DNA methylation in gene repression, regions that were associated with high levels of DNA methylation were not actively expressed.[28]
This method was limited not suitable for studies on the global methylation pattern, or ‘methylome’. Even within specific loci it was not fully representative of the true methylation pattern as only those restriction sites with corresponding methylation sensitive and insensitive restriction assays could provide useful information. Further complications could arise when incomplete digestion of DNA by restriction enzymes generated false negative results.[9]
Genome wide approaches
DNA methylation profiling on a large scale was first made possible through the Restriction Landmark Genome Scanning (RLGS) technique. Like the locus-specific DNA methylation assay, the technique identified methylated DNA via its digestion methylation sensitive enzymes. However it was the use of two-dimensional gel electrophoresis that allowed be characterized on a broader scale.[9]
However it was not until the advent of microarray and next generation sequencing technology when truly high resolution and genome-wide DNA methylation became possible.[12] As with RLGS, the endonuclease component is retained in the method but it is coupled to new technologies. One such approach is the differential methylation hybridization (DMH), in which one set of genomic DNA is digested with methylation-sensitive restriction enzymes and a parallel set of DNA is not digested. Both sets of DNA are subsequently amplified and each labelled with fluorescent dyes and used in two-colour array hybridization. The level of DNA methylation at a given loci is determined by the relative intensity ratios of the two dyes. Adaptation of next generation sequencing to DNA methylation assay provides several advantages over array hybridization. Sequence-based technology provides higher resolution to allele specific DNA methylation, can be performed on larger genomes, and does not require creation of DNA microarrays which require adjustments based on CpG density to properly function.[9]
Bisulfite sequencing
Bisulfite sequencing relies on chemical conversion of unmethylated cytosines exclusively, such that they can be identified through standard DNA sequencing techniques. Sodium bisulfate and alkaline treatment does this by converting unmethylated cytosine residues into uracil while leaving methylated cytosine unaltered. Subsequent amplification and sequencing of untreated DNA and sodium bisulphite treated DNA allows for methylated sites to be identified. Bisulfite sequencing, like the traditional restriction based methods, was historically limited to methylation patterns of specific gene loci, until whole genome sequencing technologies became available. However, unlike traditional restriction based methods, bisulfite sequencing provided resolution on a nucleotide level.[26][9]
Limitations of the bisulfite technique include the incomplete conversion of cytosine to uracil, which is a source of false positives. Further, bisulfite treatment also causes DNA degradation and requires an additional purification step to remove the sodium bisulfite.[9]
Next-generation sequencing is well suited in complementing bisulfite sequencing in genome-wide methylation analysis. While this now allows for methylation pattern to be determined on the highest resolution possible, on the single nucleotide level, challenges still remain in the assembly step because of reduced sequence complexity in bisulphite treated DNA. Increases in read length seek to address this challenge, allowing for whole genome shotgun bisulphite sequencing (WGBS) to be performed. The WGBS approach using an Illumina Genome Analyzer platform and has already been implemented in Arabidopsis thaliana.[9] Reduced representation genomic methods based on bisulfite sequencing exist as well,[29][30] and they are particularly suitable for species with large genome sizes.[31]
Chromatin accessibility assays
Chromatin accessibility is the measure of how "accessible" or "open" a region of genome is to transcription or binding of transcription factors. The regions which are inaccessible (i.e. because they're bound by nucleosomes) are not actively transcribed by the cell while open and accessible regions are actively transcribed.[32] Changes in chromatin accessibility are important epigenetic regulatory processes that govern cell- or context-specific expression of genes.[33] Assays such as MNase-seq, DNase-seq, ATAC-seq or FAIRE-seq are routinely used to understand the accessible chromatin landscape of cells. The main feature of all these methods is that they're able to selectively isolate either the DNA sequences that are bounded to the histones, or those that are not. These sequences are then compared to a reference genome that allows to identify their relative position.[34]
MNase-seq and DNase-seq both follow the same principles, as they employ lytic enzymes that target nucleic acids to cut the DNA strands unbounded by nucleosomes or other proteic factors, while the bounded pieces are sheltered, and can be retrieved and analysed. Since active, unbound regions are destroyed, their detection can only be indirect, by sequencing with a
FAIRE-seq (Formaldehyde-Assisted Isolation of Regulatory Elements) requires as its first step crosslinking of the DNA with nucleosomes, then DNA shearing by sonication. The free and linked fragments are separated with a traditional phenol-chloroform extraction, since the proteic fraction is stuck in the interphase while the unlinked DNA shifts to the aqueous phase and can be analysed with various methods.[39] Sonication produces random breaks, and therefore is not subject to any kind of bias, and is also the bigger length of the fragments (200-700 nt) makes this technique suitable for wider regions, while it's unable to resolve the single nucleosome.[34] Unlike the nuclease-based methods, FAIRE-seq allows the direct identification of the transcriptionally active sites, and a less laborious sample preparation.[40]
ATAC-seq is based on the activity of Tn5 transposase. The transposase is used to insert tags in the genome, with higher frequency on regions not covered by proteic factors. The tags are then used as adapters for PRC or other analytical tools.[41]
Direct detection
Polymerase sensitivity in single-molecule real-time sequencing made it possible for scientists to directly detect epigenetic marks such as methylation as the polymerase moves along the DNA molecule being sequenced.[42] Several projects have demonstrated the ability to collect genome-wide epigenetic data in bacteria.[43][44][45][46]
Nanopore sequencing is based on changes of electrolytic current signals according to base modifications (e.g. Methylation). A
Single-molecule real-time sequencing (SMRT) is a single-molecule DNA sequencing method. Single-molecule real-time sequencing utilizes a zero-mode waveguide (ZMW). A single DNA polymerase enzyme is bound to the bottom of a ZMW with a single molecule of DNA as a template. Each of the four DNA bases is attached to one of four different
In 2010 a team of scientists demonstrated the use of single-molecule real-time sequencing for direct detection of modified nucleotide in the DNA template including
In 2017, another team proposed a combined bisulfite conversion with third-generation single-molecule real-time sequencing, it is called single-molecule real-time bisulfite sequencing (SMRT-BS), which is an accurate targeted CpG methylation analysis method capable of a high degree of multiplying and long read lengths (1.5 kb) without the need for PCR amplicon sub-cloning.[51]
Theoretical modeling approaches
First mathematical models for different nucleosome states affecting gene expression were introduced in 1980s [ref]. Later, this idea was almost forgotten, until the experimental evidence has indicated a possible role of covalent histone modifications as an epigenetic code.[52] In the next several years, high-throughput data have indeed uncovered the abundance of epigenetic modifications and their relation to chromatin functioning which has motivated new theoretical models for the appearance, maintaining and changing these patterns,.[53][54] These models are usually formulated in the frame of one-dimensional lattice approaches.[55]
See also
- Epigenetics
- Epigenetic clock
- Genomics
- Human Epigenome Project
- Epigenomics AG
- Single cell epigenomics
Notes
- ^ a b c Russell 2010, p. 217.
- ^ a b Russell 2010, p. 230.
- ^ a b Russell 2010, p. 475.
- S2CID 10911203.
- PMID 27364681.
- ^ "The Potential Epigenetic and Anticancer Power of Dietary Flavones". 2016-10-11.
- ^ PMID 23333102.
- ^ Russell 2010, p. 597.
- ^ S2CID 6780101.
- S2CID 4164029.
- ^ PMID 11782440.
- ^ S2CID 7641577.
- ^ Russell 2010, pp. 518–9.
- ^ Russell 2010, pp. 531–2.
- ^ Russell 2010, pp. 532–3.
- ^ S2CID 6326093.
- ^ S2CID 11691263.
- ^ Russell 2010, pp. 24–7.
- ^ a b Russell 2010, pp. 529–30.
- ^ Russell 2010, p. 530.
- ^ Russell 2010, p. 218.
- PMID 3052270.
- ^ Russell 2010, p. 529.
- ^ a b Gibson & Muse 2009, pp. 229–32.
- ^ Russell 2010, p. 532.
- ^ PMID 10734209.
- ^ Russell 2010, pp. 552–3.
- S2CID 54324289.
- PMID 26818626.
- S2CID 10457615.
- PMID 30121954.
- PMID 25693563.
- PMID 23503198.
- ^ PMID 21241889.
- PMID 1148185.
- PMID 29051481.
- PMID 22955616.
- PMID 24317252.
- PMID 25473421.
- PMID 17179217.
- PMID 24097267.
- PMID 23093720.
- PMID 23434113.
- PMID 23300489.
- PMID 23034806.
- PMID 23138224.
- S2CID 16152628.
- PMID 24167255.
- PMID 29431738.
- PMID 20453866.
- PMID 28986786.
- S2CID 4418993.
- PMID 17991991.
- S2CID 16091877.
- S2CID 103345.
References
- Gibson G, Muse SV (2009). A primer of Genome Science (3rd ed.). Sunderland: Sinaeur Associates. ISBN 978-0-87893-236-8.
- Russell PJ (2010). iGenetics: A Molecular Approach (3rd ed.). San Francisco: Pearson Benjamin Cummings. ISBN 978-0-321-56976-9.
Further reading
- Bodnar JW, Bradley MK (November 1996). "A chromatin switch". Journal of Theoretical Biology. 183 (1): 1–7. PMID 8959107.
- Cartwright IL (October 1987). "Developmental switch in chromatin structure associated with alternate promoter usage in the Drosophila melanogaster alcohol dehydrogenase gene". The EMBO Journal. 6 (10): 3097–101. PMID 3121305.
- "2 Chromatin patterns at transcription factor binding sites". Nature: 1. 2019. .