DNA nanoball sequencing

Source: Wikipedia, the free encyclopedia.
Workflow for DNA nanoball sequencing[1]

DNA nanoball sequencing is a

Beijing Genomics Institute (BGI) refined DNA nanoball sequencing to sequence nucleotide samples on their own platform.[4][5]

Procedure

DNA Nanoball Sequencing involves isolating

adsorbed onto a sequencing flow cell. The color of the fluorescence at each interrogated position is recorded through a high-resolution camera. Bioinformatics are used to analyze the fluorescence data and make a base call, and for mapping or quantifying the 50bp, 100bp, or 150bp single- or paired-end reads.[6][2]

DNA Isolation, fragmentation, and size capture

Cells are

lysate. The high-molecular-weight DNA, often several megabase pairs long, is fragmented by physical or enzymatic methods to break the DNA double-strands at random intervals. Bioinformatic mapping of the sequencing reads is most efficient when the sample DNA contains a narrow length range.[7] For small RNA sequencing, selection of the ideal fragment lengths for sequencing is performed by gel electrophoresis;[8] for sequencing of larger fragments, DNA fragments are separated by bead-based size selection.[9]

Attaching adapter sequences

Adapter DNA sequences must be attached to the unknown DNA fragment so that DNA segments with known sequences flank the unknown DNA. In the first round of adapter ligation, right (Ad153_right) and left (Ad153_left) adapters are attached to the right and left flanks of the fragmented DNA, and the DNA is amplified by PCR. A splint oligo then hybridizes to the ends of the fragments which are ligated to form a circle. An exonuclease is added to remove all remaining linear single-stranded and double-stranded DNA products. The result is a completed circular DNA template.[2]

Rolling circle replication

Once a single-stranded circular DNA template is created, containing sample DNA that is ligated to two unique adapter sequences has been generated, the full sequence is amplified into a long string of DNA. This is accomplished by

nanometers (nm) across. Nanoballs remain separated from each other because they are negatively charged naturally repel each other, reducing any tangling between different single stranded DNA lengths.[2]

DNA nanoball creation and adsorption to the patterned array flowcell
DNA nanoball creation and adsorption to the patterned array flowcell

DNA nanoball patterned array

To obtain DNA sequence, the DNA nanoballs are attached to a patterned array flow cell. The flow cell is a silicon wafer coated with

hexamethyldisilazane (HMDS), and a photoresist material. The DNA nanoballs are added to the flow cell and selectively bind to the positively-charged aminosilane in a highly ordered pattern, allowing a very high density of DNA nanoballs to be sequenced.[2][11]

Imaging

After each DNA nucleotide incorporation step, the flow cell is imaged to determine which nucleotide base bound to the DNA nanoball. The fluorophore is excited with a

CCD camera. The image is then processed to remove background noise and assess the intensity of each point. The color of each DNA nanoball corresponds to a base at the interrogative position and a computer records the base position information.[2]

Sequencing data format

The data generated from the DNA nanoballs is formatted as standard FASTQ formatted files with contiguous bases (no gaps). These files can be used in any data analysis pipeline that is configured to read single-end or paired-end FASTQ files.

For example:

Read 1, from a 100bp paired end run from[12]

 @CL100011513L1C001R013_126365/1
 CTAGGCAACTATAGGTCTCAGTTAAGTCAAATAAAATTCACATCAAATTTTTACTCCCACCATCCCAACACTTTCCTGCCTGGCATATGCCGTGTCTGCC
 +
 FFFFFFFFFFFGFGFFFFFF;FFFFFFFGFGFGFFFFFF;FFFFGFGFGFFEFFFFFEDGFDFF@FCFGFGCFFFFFEFFEGDFDFFFFFGDAFFEFGFF

Corresponding Read 2:

 @CL100011513L1C001R013_126365/2
 TGTCTACCATATTCTACATTCCACACTCGGTGAGGGAAGGTAGGCACATAAAGCAATGGCAGTACGGTGTAATACATGCTAATGTAGAGTAAGCACTCAG
 +
 3E9E<ADEBB:D>E?FD<<@EFE>>ECEF5CE:B6E:CEE?6B>B+@??31/FD:0?@:E9<3FE2/A:/8>9CB&=E<7:-+>;29:7+/5D9)?5F/:

Informatics Tips

Reference Genome Alignment

Default parameters for the popular aligners are sufficient.

Read Names

In the FASTQ file created by BGI/MGI sequencers using DNA nanoballs on a patterned array flowcell, the read names look like this:

BGISEQ read name anatomy
Anatomy of a BGI sequencer read name
MGISEQ read name anatomy
Anatomy of an MGI sequencer read name

BGISEQ-500: CL100025298L1C002R050_244547

MGISEQ-2000: V100006430L1C001R018613883

Read names can be parsed to extract three variables describing the physical location of the read on the patterned array: (1) tile/region, (2) x coordinate, and (3) y coordinate. Note that, due to the order of these variables, these read names cannot be natively parsed by Picard MarkDuplicates in order to identify optical duplicates. However, as there are none on this platform, this poses no problem to Picard-based data analysis.

Duplicates

Because DNA nanoballs remain confined their spots on the patterned array there are no optical duplicates to contend with during bioinformatics analysis of sequencing reads. It is suggested to run Picard MarkDuplicates as follows:

java -jar picard.jar MarkDuplicates I=input.bam O=marked_duplicates.bam M=marked_dup_metrics.txt READ_NAME_REGEX=null

A test with Picard-friendly, reformatted read names demonstrates the absence of this class of duplicate read:

Picard MarkDuplicates test results
Test of Picard MarkDuplicates varying the OPTICAL_DUPLICATE_PIXEL_DISTANCE parameter

The single read marked as an optical duplicate is most assuredly artefactual. In any case, the effect on the estimated library size is negligible.

Advantages

DNA nanoball sequencing technology offers some advantages over other sequencing platforms. One advantage is the eradication of optical duplicates. DNA nanoballs remain in place on the patterned array and do not interfere with neighboring nanoballs.

Another advantage of DNA nanoball sequencing include the use of high-fidelity Phi 29 DNA polymerase[10] to ensure accurate amplification of the circular template, several hundred copies of the circular template compacted into a small area resulting in an intense signal, and attachment of the fluorophore to the probe at a long distance from the ligation point results in improved ligation.[2]

Disadvantages

The main disadvantage of DNA nanoball sequencing is the short read length of the DNA sequences obtained with this method.

DNA repeats, may map to two or more regions of the reference genome. A second disadvantage of this method is that multiple rounds of PCR have to be used. This can introduce PCR bias and possibly amplify contaminants in the template construction phase.[2]
However, these disadvantages are common to all short-read sequencing platforms are not specific to DNA nanoballs.

Applications

DNA nanoball sequencing has been used in recent studies. Lee et al. used this technology to find mutations that were present in a lung cancer and compared them to normal lung tissue.

]

Significance

References