DNA nanoball sequencing
DNA nanoball sequencing is a
Procedure
DNA Nanoball Sequencing involves isolating
DNA Isolation, fragmentation, and size capture
Cells are
Attaching adapter sequences
Adapter DNA sequences must be attached to the unknown DNA fragment so that DNA segments with known sequences flank the unknown DNA. In the first round of adapter ligation, right (Ad153_right) and left (Ad153_left) adapters are attached to the right and left flanks of the fragmented DNA, and the DNA is amplified by PCR. A splint oligo then hybridizes to the ends of the fragments which are ligated to form a circle. An exonuclease is added to remove all remaining linear single-stranded and double-stranded DNA products. The result is a completed circular DNA template.[2]
Rolling circle replication
Once a single-stranded circular DNA template is created, containing sample DNA that is ligated to two unique adapter sequences has been generated, the full sequence is amplified into a long string of DNA. This is accomplished by
DNA nanoball patterned array
To obtain DNA sequence, the DNA nanoballs are attached to a patterned array flow cell. The flow cell is a silicon wafer coated with
Imaging
After each DNA nucleotide incorporation step, the flow cell is imaged to determine which nucleotide base bound to the DNA nanoball. The fluorophore is excited with a
Sequencing data format
The data generated from the DNA nanoballs is formatted as standard FASTQ formatted files with contiguous bases (no gaps). These files can be used in any data analysis pipeline that is configured to read single-end or paired-end FASTQ files.
For example:
Read 1, from a 100bp paired end run from[12]
@CL100011513L1C001R013_126365/1 CTAGGCAACTATAGGTCTCAGTTAAGTCAAATAAAATTCACATCAAATTTTTACTCCCACCATCCCAACACTTTCCTGCCTGGCATATGCCGTGTCTGCC + FFFFFFFFFFFGFGFFFFFF;FFFFFFFGFGFGFFFFFF;FFFFGFGFGFFEFFFFFEDGFDFF@FCFGFGCFFFFFEFFEGDFDFFFFFGDAFFEFGFF
Corresponding Read 2:
@CL100011513L1C001R013_126365/2 TGTCTACCATATTCTACATTCCACACTCGGTGAGGGAAGGTAGGCACATAAAGCAATGGCAGTACGGTGTAATACATGCTAATGTAGAGTAAGCACTCAG + 3E9E<ADEBB:D>E?FD<<@EFE>>ECEF5CE:B6E:CEE?6B>B+@??31/FD:0?@:E9<3FE2/A:/8>9CB&=E<7:-+>;29:7+/5D9)?5F/:
Informatics Tips
Reference Genome Alignment
Default parameters for the popular aligners are sufficient.
Read Names
In the FASTQ file created by BGI/MGI sequencers using DNA nanoballs on a patterned array flowcell, the read names look like this:
BGISEQ-500:
CL100025298L1C002R050_244547
MGISEQ-2000:
V100006430L1C001R018613883
Read names can be parsed to extract three variables describing the physical location of the read on the patterned array: (1) tile/region, (2) x coordinate, and (3) y coordinate. Note that, due to the order of these variables, these read names cannot be natively parsed by Picard MarkDuplicates in order to identify optical duplicates. However, as there are none on this platform, this poses no problem to Picard-based data analysis.
Duplicates
Because DNA nanoballs remain confined their spots on the patterned array there are no optical duplicates to contend with during bioinformatics analysis of sequencing reads. It is suggested to run Picard MarkDuplicates as follows:
java -jar picard.jar MarkDuplicates I=input.bam O=marked_duplicates.bam M=marked_dup_metrics.txt READ_NAME_REGEX=null
A test with Picard-friendly, reformatted read names demonstrates the absence of this class of duplicate read:
The single read marked as an optical duplicate is most assuredly artefactual. In any case, the effect on the estimated library size is negligible.
Advantages
DNA nanoball sequencing technology offers some advantages over other sequencing platforms. One advantage is the eradication of optical duplicates. DNA nanoballs remain in place on the patterned array and do not interfere with neighboring nanoballs.
Another advantage of DNA nanoball sequencing include the use of high-fidelity Phi 29 DNA polymerase[10] to ensure accurate amplification of the circular template, several hundred copies of the circular template compacted into a small area resulting in an intense signal, and attachment of the fluorophore to the probe at a long distance from the ligation point results in improved ligation.[2]
Disadvantages
The main disadvantage of DNA nanoball sequencing is the short read length of the DNA sequences obtained with this method.
Applications
DNA nanoball sequencing has been used in recent studies. Lee et al. used this technology to find mutations that were present in a lung cancer and compared them to normal lung tissue.
Significance
References
- PMID 28379488.
- ^ S2CID 17309571.
- S2CID 54557996.
- ^ "BGI-Shenzhen Completes Acquisition of Complete Genomics" (Press release). PR Newswire.
- ^ "Revolocity™ Whole Genome Sequencing Technology Overview" (PDF). Complete Genomics. Retrieved 18 November 2017.
- PMID 28379488.
- PMID 19339662.
- PMID 27895807.
- PMID 7173204.
- ^ PMID 2498321.
- PMID 8760890.
- ^ "An updated reference human genome dataset of the BGISEQ-500 sequencer". GigaDB. Retrieved 22 March 2017.
- S2CID 4354035.
- ^ PMID 20220176.
- ^ PMID 20537948.