Single-cell sequencing
Single-cell sequencing examines the nucleic acid
Background
A typical human cell consists of about 2 x 3.3 billion base pairs of DNA and 600 million mRNA bases. Usually, a mix of millions of cells is used in sequencing the DNA or RNA using traditional methods like
Recent technical improvements make single-cell sequencing a promising tool for approaching a set of seemingly inaccessible problems. For example, heterogeneous samples, rare cell types, cell lineage relationships, mosaicism of somatic tissues, analyses of microbes that cannot be cultured, and disease evolution can all be elucidated through single-cell sequencing.[5] Single-cell sequencing was selected as the method of the year 2013 by Nature Publishing Group.[6]
Genome (DNA) sequencing
Single-cell DNA genome sequencing involves isolating a single cell, amplifying the whole genome or region of interest, constructing sequencing libraries, and then applying next-generation DNA sequencing (for example
Methods
A list of more than 100 different single-cell omics methods has been published.[12]
Another common method is MALBAC.[15] ThAs done in MDA, this method begins with isothermal amplificationbut the primers are flanked with a “common” sequence for downstream PCR amplification. As the preliminary amplicons are generated, the common sequence promotes self-ligation and the formation of “loops” to prevent further amplification. In contrast with MDA, the highly branched DNA network is not formed. Instead,, the loops are denatured in another temperature cycle allowing the fragments to be amplified with PCR. MALBAC has also been implemented in a microfluidic device, but the amplification performance was not significantly improved by encapsulation in nanoliter droplets.[16]
Comparing MDA and MALBAC, MDA results in better genome coverage, but MALBAC provides more even coverage across the genome. MDA could be more effective for identifying SNPs, whereas MALBAC is preferred for detecting copy number variants. While performing MDA with a microfluidic device markedly reduces bias and contamination, the chemistry involved in MALBAC does not demonstrate the same potential for improved efficiency.
A method particularly suitable for the discovery of genomic structural variation is Single-cell DNA template strand sequencing (a.k.a. Strand-seq).[17] Using the principle of single-cell tri-channel processing, which uses joint modelling of read-orientation, read-depth, and haplotype-phase, Strand-seq enables discovery of the full spectrum of somatic structural variation classes ≥200kb in size. Strand-seq overcomes limitations of whole genome amplification based methods for identification of somatic genetic variation classes in single cells,[18] because it is not susceptible against read chimers leading to calling artefacts (discussed in detail in the section below), and is less affected by drop outs. The choice of method depends on the goal of the sequencing because each method presents different advantages.[7]
Limitations
MDA of individual cell genomes results in highly uneven genome coverage, i.e. relative overrepresentation and underrepresentation of various regions of the template, leading to loss of some sequences. There are two components to this process: a) stochastic over- and under-amplification of random regions; and b) systematic bias against high %GC regions. The stochastic component may be addressed by pooling single-cell MDA reactions from the same cell type, by employing
Strand-seq overcomes limitations of methods based on whole genome amplification for genetic variant calling: Since Strand-seq does not require reads (or read pairs) transversing the boundaries (or breakpoints) of CNVs or copy-balanced structural variant classes, it is less susceptible to common artefacts of single-cell methods based on whole genome amplification, which include variant calling dropouts due to missing reads at the variant breakpoint and read chimera.[7][18] Strand-seq discovers the full spectrum of structural variation classes of at least 200kb in size, including breakage-fusion-bridge cycles and chromothripsis events, as well as balanced inversions, and copy-number balanced or imbalanced translocations.[18]" Structural variant calls made by Strand-seq are resolved by chromosome-length haplotype, which provides additional variant calling specificity.[18] As a current limitation, Strand-seq requires dividing cells for strand-specific labelling using bromodeoxyuridine (BrdU), and the method does not detect variants smaller than 200kb in size, such as mobile element insertions.
Applications
Microbiomes are among the main targets of single cell genomics due to the difficulty of culturing the majority of microorganisms in most environments. Single-cell genomics is a powerful way to obtain microbial genome sequences without cultivation. This approach has been widely applied on marine, soil, subsurface, organismal, and other types of microbiomes in order to address a wide array of questions related to microbial ecology, evolution, public health and biotechnology potential.[20][21][22][23][24][25][26][27][28]
Cancer sequencing is also an emerging application of scDNAseq. Fresh or frozen tumors may be analyzed and categorized with respect to SCNAs, SNVs, and rearrangements quite well using whole-genome DNAS approaches.[29] Cancer scDNAseq is particularly useful for examining the depth of complexity and compound mutations present in amplified therapeutic targets such as receptor tyrosine kinase genes (EGFR, PDGFRA etc.) where conventional population-level approaches of the bulk tumor are not able to resolve the co-occurrence patterns of these mutations within single cells of the tumor. Such overlap may provide redundancy of pathway activation and tumor cell resistance.
DNA methylome sequencing
Single-cell DNA methylome sequencing quantifies DNA methylation. There are several known types of methylation that occur in nature, including 5-methylcytosine (5mC), 5-hydroxymethylcytosine (5hmC), 6-methyladenine (6mA), and 4mC 4-methylcytosine (4mC). In eukaryotes, especially animals, 5mC is widespread along the genome and plays an important role in regulating gene expression by repressing transposable elements.[31] Sequencing 5mC in individual cells can reveal how epigenetic changes across genetically identical cells from a single tissue or population give rise to cells with different phenotypes.
Methods
Bisulfite sequencing has become the gold standard in detecting and sequencing 5mC in single cells.[32] Treatment of DNA with bisulfite converts cytosine residues to uracil, but leaves 5-methylcytosine residues unaffected. Therefore, DNA that has been treated with bisulfite retains only methylated cytosines. To obtain the methylome readout, the bisulfite-treated sequence is aligned to an unmodified genome. Whole genome bisulfite sequencing was achieved in single cells in 2014.[33] The method overcomes the loss of DNA associated with the typical procedure, where sequencing adapters are added prior to bisulfite fragmentation. Instead, the adapters are added after the DNA is treated and fragmented with bisulfite, allowing all fragments to be amplified by PCR.[34] Using deep sequencing, this method captures ~40% of the total CpGs in each cell. With existing technology DNA cannot be amplified prior to bisulfite treatment, as the 5mC marks will not be copied by the polymerase.
Single-cell reduced representation bisulfite sequencing (scRRBS) is another method.[35] This method leverages the tendency of methylated cytosines to cluster at CpG islands (CGIs) to enrich for areas of the genome with a high CpG content. This reduces the cost of sequencing compared to whole-genome bisulfite sequencing, but limits the coverage of this method. When RRBS is applied to bulk samples, the majority of the CpG sites in gene promoters are detected, but site in gene promoters only account for 10% of CpG sites in the entire genome.[36] In single cells, 40% of the CpG sites from the bulk sample are detected. To increase coverage, this method can also be applied to a small pool of single cells. In a sample of 20 pooled single cells, 63% of the CpG sites from the bulk sample were detected. Pooling single cells is one strategy to increase methylome coverage, but at the cost of obscuring the heterogeneity in the population of cells.
Limitations
While bisulfite sequencing remains the most widely used approach for 5mC detection, the chemical treatment is harsh and fragments and degrades the DNA. This effect is exacerbated when moving from bulk samples to single cells. Other methods to detect DNA methylation include methylation-sensitive restriction enzymes. Restriction enzymes also enable the detection of other types of methylation, such as 6mA with DpnI.[37] Nanopore-based sequencing also offers a route for direct methylation sequencing without fragmentation or modification to the original DNA. Nanopore sequencing has been used to sequence the methylomes of bacteria, which are dominated by 6mA and 4mC (as opposed to 5mC in eukaryotes), but this technique has not yet been scaled down to single cells.[38]
Applications
Single-cell DNA methylation sequencing has been widely used to explore epigenetic differences in genetically similar cells. To validate these methods during their development, the single-cell methylome data of a mixed population were successfully classified by hierarchal clustering to identify distinct cell types.[35] Another application is studying single cells during the first few cell divisions in early development to understand how different cell types emerge from a single embryo.[39] Single-cell whole-genome bisulfite sequencing has also been used to study rare but highly active cell types in cancer such as circulating tumor cells (CTCs).[40]
Transposase-accessible chromatin sequencing (scATAC-seq)
Single cell transposase-accessible chromatin sequencing maps chromatin accessibility across the genome. A transposase inserts sequencing adapters directly into open regions of chromatin, allowing those regions to be amplified and sequenced.[41]
Methods
The two methods for library preparation in scATAC-Seq are based on split-pool cellular indexing and microfluidics.
Transcriptome sequencing (scRNA-seq)
Standard methods such as
Single-cell RNA sequencing (scRNA-seq) provides the expression profiles of individual cells and is considered the gold standard for defining cell states and phenotypes as of 2020.[44] Although it is impossible to obtain complete information on every RNA expressed by each cell, due to the small amount of material available, gene expression patterns can be identified through gene clustering analyses.[45] This can uncover rare cell types within a cell population that may never have been seen before. For example, one group of scientists performing scRNA-seq on neuroblastoma tumor tissue identified a rare pan-neuroblastoma cancer cell, which may be attractive for novel therapy approaches.[46]
Methods
Current scRNA-seq protocols involve isolating single cells and their RNA, and then following the same steps as bulk RNA-seq:
Challenges for scRNA-Seq include preserving the initial relative abundance of mRNA in a cell and identifying rare transcripts.[49] The reverse transcription step is critical as the efficiency of the RT reaction determines how much of the cell's RNA population will be eventually analyzed by the sequencer. The processivity of reverse transcriptases and the priming strategies used may affect full-length cDNA production and the generation of libraries biased toward 3’ or 5' end of genes.
In the amplification step, either PCR or in vitro transcription (IVT) is currently used to amplify cDNA. One of the advantages of PCR-based methods is the ability to generate full-length cDNA. However, different PCR efficiency on particular sequences (for instance, GC content and snapback structure) may also be exponentially amplified, producing libraries with uneven coverage. On the other hand, while libraries generated by IVT can avoid PCR-induced sequence bias, specific sequences may be transcribed inefficiently, thus causing sequence drop-out or generating incomplete sequences.[1][42] Several scRNA-seq protocols have been published: Tang et al.,[50] STRT,[51] SMART-seq,[52] SORT-seq,[53] CEL-seq,[54] RAGE-seq,[55] Quartz-seq.[56] , and C1-CAGE.[57] These protocols differ in terms of strategies for reverse transcription, cDNA synthesis and amplification, and the possibility to accommodate sequence-specific barcodes (i.e., UMIs) or the ability to process pooled samples.[58]
In 2017, two approaches were introduced to simultaneously measure single-cell mRNA and protein expression through oligonucleotide-labeled antibodies known as REAP-seq,[59] and CITE-seq.[60] Collecting cellular contents following electrophysiological recording using patch-clamp has also allowed development of the Patch-Seq method, which is steadily gaining ground in neuroscience.[61]
Example of a droplet based platform - 10X method
This platform of single cell RNA sequencing allows to analyze transcriptomes on a cell-by-cell basis by the use of microfluidic partitioning to capture single cells and prepare
Overall, in a first stage individual cells are captured separately and lysed, then
So, the first step of the method is the single cell encapsulation and library preparation. Cells are encapsulated into Gel Beads-in-emulsion (GEMs) thanks to an automate. To form these vesicle, the automate uses a
The final step of the platform is the sequencing. Libraries generated can be directly used for single cell whole transcriptome sequencing or target sequencing workflows. The sequencing is performed by using the Illumina dye sequencing method. This sequencing method is based on sequencing by synthesis (SBS) principle and the use of reversible dye-terminator that enables the identification of each single nucleotid. In order to read the transcript sequences on one end, and the barcode and UMI on the other end, paired-end sequencing readers are required.[67]
The droplet-based platform allows the detection of rare cell types thanks to its high throughput. In fact, 500 to 10,000 cells are captured per sample from a single cell suspension. The protocol is performed easily and allows a high cell recovery rate of up to 65%. The global workflow of the droplet-based platform takes 8 hours and so is faster than the Microwell-based method (BD Rhapsody), which takes 10 hours. However, it presents some limitations as the need of fresh samples and the final detection of only 10% mRNA.
The major difference between the droplet-based method and the microwell-based method is the technique used for partitioning cells.[64]
Limitations
Most RNA-seq methods depend on poly(A) tail capture to enrich mRNA and deplete abundant and uninformative rRNA. Thus, they are often restricted to sequencing polyadenylated mRNA molecules. However, recent studies are now starting to appreciate the importance of non-poly(A) RNA, such as long-noncoding RNA and microRNAs in gene expression regulation. Small-seq is a single-cell method that captures small RNAs (<300 nucleotides) such as microRNAs, fragments of tRNAs and small nucleolar RNAs in mammalian cells.[68] This method uses a combination of “oligonucleotide masks” (that inhibit the capture of highly abundant 5.8S rRNA molecules) and size selection to exclude large RNA species such as other highly abundant rRNA molecules. To target larger non-poly(A) RNAs, such as long non-coding mRNA, histone mRNA, circular RNA, and enhancer RNA, size selection is not applicable for depleting the highly abundant ribosomal RNA molecules (18S and 28s rRNA).[69] Single-cell RamDA-Seq is a method that achieves this by performing reverse transcription with random priming (random displacement amplification) in the presence of “not so random” (NSR) primers specifically designed to avoid priming on rRNA molecule.[70] While this method successfully captures full-length total RNA transcripts for sequencing and detected a variety of non-poly(A) RNAs with high sensitivity, it has some limitations. The NSR primers were carefully designed according to rRNA sequences in the specific organism (mouse), and designing new primer sets for other species would take considerable effort. Recently, a CRISPR-based method named scDASH (single-cell depletion of abundant sequences by hybridization) demonstrated another approach to depleting rRNA sequences from single-cell total RNA-seq libraries.[71]
Bacteria and other prokaryotes are currently not amenable to single-cell RNA-seq due to the lack of polyadenylated mRNA. Thus, the development of single-cell RNA-seq methods that do not depend on poly(A) tail capture will also be instrumental in enabling single-cell resolution microbiome studies. Bulk bacterial studies typically apply general rRNA depletion to overcome the lack of polyadenylated mRNA on bacteria, but at the single-cell level, the total RNA found in one cell is too small.[69] Lack of polyadenylated mRNA and scarcity of total RNA found in single bacteria cells are two important barriers limiting the deployment of scRNA-seq in bacteria.
Applications
scRNA-Seq is becoming widely used across biological disciplines including
Using
Some scRNA-seq methods have also been applied to single cell microorganisms. SMART-seq2 has been used to analyze single cell eukaryotic microbes, but since it relies on poly(A) tail capture, it has not been applied in prokaryotic cells.[84] Microfluidic approaches such as Drop-seq and the Fluidigm IFC-C1 devices have been used to sequence single malaria parasites or single yeast cells.[85][86] The single-cell yeast study sought to characterize the heterogeneous stress tolerance in isogenic yeast cells before and after the yeast are exposed to salt stress. Single-cell analysis of the several transcription factors by scRNA-seq revealed heterogeneity across the population. These results suggest that regulation varies among members of a population to increase the chances of survival for a fraction of the population.
The first single-cell transcriptome analysis in a prokaryotic species was accomplished using the terminator exonuclease enzyme to selectively degrade rRNA and rolling circle amplification (RCA) of mRNA.[87] In this method, the ends of single-stranded DNA were ligated together to form a circle, and the resulting loop was then used as a template for linear RNA amplification. The final product library was then analyzed by microarray, with low bias and good coverage. However, RCA has not been tested with RNA-seq, which typically employs next-generation sequencing. Single-cell RNA-seq for bacteria would be highly useful for studying microbiomes. It would address issues encountered in conventional bulk metatranscriptomics approaches, such as failing to capture species present in low abundance, and failing to resolve heterogeneity among cell populations.
scRNA-Seq has provided considerable insight into the development of embryos and organisms, including the worm
A molecular cell atlas of mice testes was established to define BDE47-induced prepubertal testicular toxicity using the ScRNA-seq approach, providing novel insight into our understanding of the underlying mechanisms and pathways involved in BDE47-associated testicular injury at a single-cell resolution.[98]
Considerations
Isolation of single cells
There are several ways to isolate individual cells prior to whole genome amplification and sequencing.
Number of cells to be sequenced and analyzed
scRNA-Seq
The single-cell RNA-Seq protocols vary in efficiency of RNA capture, which results in differences in the number of transcripts generated from each single cell. Single-cell libraries are usually sequenced to a depth of 1,000,000 reads because a large majority of genes are detected with 500,000 reads.[104] Increasing the number of cells and decreasing the read depth increases the power of identifying major cell populations. However, low read depths may not always provide necessary information about the genes, and the difference in their expression between the cell populations is dependent on the stability and detection of the mRNA molecules.
Quality control covariates serve as a strategy to analyze the number of cells. These covariates mainly include filtering based on count depth, the number of genes, and the fraction of counts from mitochondrial genes, which leads to the interpretation of cellular signals.
See also
- Single-cell analysis
- Single-cell transcriptomics
- Single cell epigenomics
- Tcr-seq
- DNA sequencing
- Whole genome sequencing
References
- ^ S2CID 11575439.
- PMID 29700246.
- PMID 25053837.
- PMID 24499009.
- S2CID 5252333.
- PMID 24524124.
- ^ S2CID 4800650.
- PMID 30266101.
- PMID 29391438.
- ^ PMID 17923430."
- PMID 23918251.
- ^ "Single-Cell-Omics.v2.3.13 @albertvilella". Google Docs. Retrieved 2020-01-01.
- ^ PMID 28729688.
- PMID 28701744.
- ^ PMID 23258894.
- PMID 25233049.
- PMID 23042453.
- ^ S2CID 209464011."
- PMID 24478987."
- PMID 24524132.
- S2CID 2994579.
- S2CID 34343205.
- S2CID 206533092.
- PMID 19390573.
- PMID 23801761.
- S2CID 4394530.
- S2CID 13659345.
- PMID 29170234.
- PMID 24893890.
- PMID 25732828.
- S2CID 206525166.
- PMID 28055307.
- PMID 25042786.
- PMID 22649061.
- ^ PMID 24179143.
- S2CID 24912438.
- PMID 25936837.
- PMID 30546107.
- S2CID 4450377.
- PMID 30633912.
- ^ Stein RA (1 Jul 2019). "Single-Cell Sequencing Sifts through Multiple Omics". Retrieved 1 August 2019.
- ^ S2CID 500845."
- PMID 26000846.
- PMID 34164589.
- hdl:10523/10111.
- PMID 33547074.
- PMID 26000487.
- PMID 26000488.
- PMID 24832513."
- S2CID 16570747.
- PMID 21543516.
- PMID 22820318.
- PMID 27693023.
- PMID 22939981.
- PMID 31311926.
- PMID 23594475.
- PMID 30664627.
- PMID 29394315.
- S2CID 205285357.
- PMID 28759029.
- PMID 30349457.
- ^ Clark, Sheila. "Single cell RNA-seq: An introductory overview and tools for getting started". 10xgenomics.com.
- PMID 27318933.
- ^ PMID 33414681.
- ^ "Chromium Single Cell Gene Expression Solution with Feature Barcoding technology" (PDF). 10xgenomics.com.
- PMID 33662621.
- ^ KWOK, Hin; LUI, Schwan. "Single Cell (10X Genomics)". CPOS HKUMed.
- S2CID 52813142.
- ^ PMID 29434199.
- S2CID 12164981.
- PMID 33520469.
- PMID 29661792.
- PMID 29608178.
- PMID 18695026.
- PMID 29606308.
- PMID 30388455.
- PMID 28094102.
- PMID 29476078.
- S2CID 226309826.
- PMID 35948637.
- PMID 26343579.
- PMID 31332193.
- PMID 31379928.
- PMID 29580379.
- PMID 29094698.
- PMID 29240790.
- PMID 21536723.
- PMID 28818938.
- PMID 29674432.
- PMID 29674431.
- PMID 30262634.
- PMID 30514844.
- PMID 29700229.
- PMID 29700225.
- PMID 32546686.
- PMID 29700227.
- ^ You J. "Science's 2018 Breakthrough of the Year: tracking development cell by cell". Science Magazine. American Association for the Advancement of Science.
- PMID 35875604.
- S2CID 7404545.
- PMID 20981102.
- PMID 18284708.
- PMID 22081019.
- PMID 21808033.
- PMID 28212749.