Genomic library
A genomic library is a collection of overlapping DNA fragments that together make up the total genomic
There are several kinds of vectors available with various insert capacities. Generally, libraries made from organisms with larger
Genomic libraries are commonly used for sequencing applications. They have played an important role in the whole genome sequencing of several organisms, including the human genome and several model organisms.[4][5]
History
The first DNA-based genome ever fully sequenced was achieved by two-time Nobel Prize winner, Frederick Sanger, in 1977. Sanger and his team of scientists created a library of the bacteriophage, phi X 174, for use in DNA sequencing.[6] The importance of this success contributed to the ever-increasing demand for sequencing genomes to research gene therapy. Teams are now able to catalog polymorphisms in genomes and investigate those candidate genes contributing to maladies such as Parkinson's disease, Alzheimer's disease, multiple sclerosis, rheumatoid arthritis, and Type 1 diabetes.[7] These are due to the advance of genome-wide association studies from the ability to create and sequence genomic libraries. Prior, linkage and candidate-gene studies were some of the only approaches.[8]
Genomic library construction
Construction of a genomic library involves creating many recombinant DNA molecules. An organism's genomic DNA is extracted and then digested with a restriction enzyme. For organisms with very small genomes (~10 kb), the digested fragments can be separated by gel electrophoresis. The separated fragments can then be excised and cloned into the vector separately. However, when a large genome is digested with a restriction enzyme, there are far too many fragments to excise individually. The entire set of fragments must be cloned together with the vector, and separation of clones can occur after. In either case, the fragments are ligated into a vector that has been digested with the same restriction enzyme. The vector containing the inserted fragments of genomic DNA can then be introduced into a host organism.[1]
Below are the steps for creating a genomic library from a large genome.
- Extractand purify DNA.
- Digest the DNA with a restriction enzyme. This creates fragments that are similar in size, each containing one or more genes.
- Insert the fragments of DNA into vectors that were cut with the same restriction enzyme. Use the enzyme DNA ligase to seal the DNA fragments into the vector. This creates a large pool of recombinant molecules.
- These recombinant molecules are taken up by a host bacterium by transformation, creating a DNA library.[9][10]
Below is a diagram of the above outlined steps.
Determining titer of library
After a genomic library is constructed with a viral vector, such as lambda phage, the titer of the library can be determined. Calculating the titer allows researchers to approximate how many infectious viral particles were successfully created in the library. To do this, dilutions of the library are used to transform cultures of E. coli of known concentrations. The cultures are then plated on agar plates and incubated overnight. The number of viral plaques are counted and can be used to calculate the total number of infectious viral particles in the library. Most viral vectors also carry a marker that allows clones containing an insert to be distinguished from those that do not have an insert. This allows researchers to also determine the percentage of infectious viral particles actually carrying a fragment of the library.[11]
A similar method can be used to titer genomic libraries made with non-viral vectors, such as
Screening library
In order to isolate clones that contain regions of interest from a library, the library must first be
Another method of screening is with polymerase chain reaction (PCR). Some libraries are stored as pools of clones and screening by PCR is an efficient way to identify pools containing specific clones.[2]
Types of vectors
Genome size varies among different organisms and the cloning vector must be selected accordingly. For a large genome, a vector with a large capacity should be chosen so that a relatively small number of clones are sufficient for coverage of the entire genome. However, it is often more difficult to characterize an insert contained in a higher capacity vector.[3]
Below is a table of several kinds of vectors commonly used for genomic libraries and the insert size that each generally holds.
Vector type | Insert size (thousands of bases) |
---|---|
Plasmids | up to 10 |
Phage lambda (λ) | up to 25 |
Cosmids | up to 45 |
Bacteriophage P1 | 70 to 100 |
P1 artificial chromosomes (PACs) | 130 to 150 |
Bacterial artificial chromosomes (BACs) | 120 to 300 |
Yeast artificial chromosomes (YACs) | 250 to 2000 |
Plasmids
A
Phage lambda (λ)
Cosmids
Cosmid vectors are plasmids that contain a small region of bacteriophage λ DNA called the cos sequence. This sequence allows the cosmid to be packaged into bacteriophage λ particles. These particles- containing a linearized cosmid- are introduced into the host cell by transduction. Once inside the host, the cosmids circularize with the aid of the host's DNA ligase and then function as plasmids. Cosmids are capable of carrying inserts up to 40kb in size.[2]
Bacteriophage P1 vectors
P1 artificial chromosomes
P1 artificial chromosomes (PACs) have features of both P1 vectors and Bacterial Artificial Chromosomes (BACs). Similar to P1 vectors, they contain a plasmid and a lytic replicon as described above. Unlike P1 vectors, they do not need to be packaged into bacteriophage particles for transduction. Instead they are introduced into E. coli as circular DNA molecules through electroporation just as BACs are.[2] Also similar to BACs, these are relatively harder to prepare due to a single origin of replication.[14]
Bacterial artificial chromosomes
Yeast artificial chromosomes
How to select a vector
Vector selection requires one to ensure the library made is representative of the entire genome. Any insert of the genome derived from a restriction enzyme should have an equal chance of being in the library compared to any other insert. Furthermore, recombinant molecules should contain large enough inserts ensuring the library size is able to be handled conveniently.[14] This is particularly determined by the number of clones needed to have in a library. The number of clones to get a sampling of all the genes is determined by the size of the organism's genome as well as the average insert size. This is represented by the formula (also known as the Carbon and Clarke formula):[15]
where,
is the necessary number of recombinants[16]
is the desired probability that any fragment in the genome will occur at least once in the library created
is the fractional proportion of the genome in a single recombinant
can be further shown to be:
where,
is the insert size
is the genome size
Thus, increasing the insert size (by choice of vector) would allow for fewer clones needed to represent a genome. The proportion of the insert size versus the genome size represents the proportion of the respective genome in a single clone.[14] Here is the equation with all parts considered:
Vector selection example
The above formula can be used to determine the 99% confidence level that all sequences in a genome are represented by using a vector with an insert size of twenty thousand basepairs (such as the phage lambda vector). The genome size of the organism is three billion basepairs in this example.
clones
Thus, approximately 688,060 clones are required to ensure a 99% probability that a given DNA sequence from this three billion basepair genome will be present in a library using a vector with an insert size of twenty thousand basepairs.
Applications
After a library is created, the genome of an organism can be sequenced to elucidate how genes affect an organism or to compare similar organisms at the genome-level. The aforementioned genome-wide association studies can identify candidate genes stemming from many functional traits. Genes can be isolated through genomic libraries and used on human cell lines or animal models to further research.[17] Furthermore, creating high-fidelity clones with accurate genome representation and no stability issues would contribute well as intermediates for shotgun sequencing or the study of complete genes in functional analysis.[10]
Hierarchical sequencing
One major use of genomic libraries is hierarchichal shotgun sequencing, which is also called top-down, map-based or clone-by-clone sequencing. This strategy was developed in the 1980s for sequencing whole genomes before high throughput techniques for sequencing were available. Individual clones from genomic libraries can be sheared into smaller fragments, usually 500bp to 1000bp, which are more manageable for sequencing.[4] Once a clone from a genomic library is sequenced, the sequence can be used to screen the library for other clones containing inserts which overlap with the sequenced clone. Any new overlapping clones can then be sequenced forming a contig. This technique, called chromosome walking, can be exploited to sequence entire chromosomes.[2]
Whole genome shotgun sequencing is another method of genome sequencing that does not require a library of high-capacity vectors. Rather, it uses computer algorithms to assemble short sequence reads to cover the entire genome. Genomic libraries are often used in combination with whole genome shotgun sequencing for this reason. A high resolution map can be created by sequencing both ends of inserts from several clones in a genomic library. This map provides sequences of known distances apart, which can be used to help with the assembly of sequence reads acquired through shotgun sequencing.[4] The human genome sequence, which was declared complete in 2003, was assembled using both a BAC library and shotgun sequencing.[18][19]
Genome-wide association studies
Genome-wide association studies are general applications to find specific gene targets and polymorphisms within the human race. In fact, the International HapMap project was created through a partnership of scientists and agencies from several countries to catalog and utilize this data.[20] The goal of this project is to compare genetic sequences of different individuals to elucidate similarities and differences within chromosomal regions.[20] Scientists from all of the participating nations are cataloging these attributes with data from populations of African, Asian, and European ancestry. Such genome-wide assessments may lead to further diagnostic and drug therapies while also helping future teams focus on orchestrating therapeutics with genetic features in mind. These concepts are already being exploited in genetic engineering.[20] For example, a research team has actually constructed a PAC shuttle vector that creates a library representing two-fold coverage of the human genome.[17] This could serve as an incredible resource to identify genes, or sets of genes, causing disease. Moreover, these studies can serve as a powerful way to investigate transcriptional regulation as it has been seen in the study of baculoviruses.[21] Overall, advances in genome library construction and DNA sequencing has allowed for efficient discovery of different molecular targets.[5] Assimilation of these features through such efficient methods can hasten the employment of novel drug candidates.
References
- ^ ISBN 978-0-8053-9592-1.
- ^ ISBN 978-0-87969-577-4.
- ^ ISBN 978-0-07-284846-5.
- ^ ISBN 978-0-87893-232-0.
- ^ S2CID 14457634.
- S2CID 4206886.
- PMID 21533026.
- PMID 21353194.
- PMID 11561720.
- ^ S2CID 8208834.
- ISBN 978-0-7637-3329-2.
- ^ Peterson, Daniel; Jeffrey Tomkins; David Frisch (2000). "Construction of Plant Bacterial Artificial Chromosome (BAC) Libraries: An Illustrated Guide". Journal of Agricultural Genomics. 5.
- PMID 8661051.
- ^ a b c d e f "Cloning Genomic DNA". University College London. Retrieved 13 March 2013.[permanent dead link]
- ^ "Gene libraries". Archived from the original on 2013-03-31. Retrieved 2013-06-05.
- ^ Blaber, Michael. "Genomic Libraries". Retrieved 1 April 2013.
- ^ PMID 22285925.
- PMID 21698376.
- S2CID 206577479.
- ^ a b c "HapMap Homepage".
- PMID 19791511.
Further reading
Klug, Cummings, Spencer, Palladino (2010). Essentials of Genetics. Pearson. pp. 355–264. {{cite book}}
: CS1 maint: multiple names: authors list (link