User:Estevezj/sandbox/Genomics

Source: Wikipedia, the free encyclopedia.

Genomics is a discipline in

genetic mapping. The field also includes studies of intragenomic phenomena such as heterosis, epistasis, pleiotropy and other interactions between loci and alleles within the genome.[3] In contrast, the investigation of the roles and functions of single genes is a primary focus of molecular biology or genetics and is a common topic of modern medical and biological research. Research of single genes does not fall into the definition of genomics unless the aim of this genetic, pathway, and functional information analysis is to elucidate its effect on, place in, and response to the entire genome's networks.[4]

History

Etymology

While the word 'genome' (from the German Genom, attributed to Hans Winkler) was in use in English as early as 1926,

Bar Harbor, ME) over beer at a meeting held in Maryland on the mapping of the human genome in 1986. [6]

Early sequencing efforts

Following

University of Ghent (Ghent, Belgium) were the first to determine the sequence of a gene: the gene for Bacteriophage MS2 coat protein.[11] Fiers' group expanded on their MS2 coat protein work, determining the complete nucleotide-sequence of bacteriophage MS2-RNA (whose genome encodes just four genes in 3569 base pairs [bp]) and Simian virus 40 in 1976 and 1978, respectively.[12][13]

DNA sequencing technology developed

Frederick Sanger
Walter Gilbert
Frederick Sanger and Walter Gilbert shared half of the 1980 Nobel Prize in chemistry for independently developing methods for the sequencing of DNA.

In addition to his seminal work on the amino acid sequence of insulin,

Maxam-Gilbert method (also known as the chemical method) of DNA sequencing, involving the preferential cleavage of DNA at known bases, a less efficient method.[19][20] For their groundbreaking work in the sequencing of nucleic acids, Gilbert and Sanger shared half the 1980 Nobel Prize in chemistry with Paul Berg (recombinant DNA
).

Complete genomes

The advent of these technologies resulted in a rapid intensification in the scope and speed of completion of

"Hockey stick" graph showing the exponential growth of public sequence databases.
The number of genome projects has increased as technological improvements continue to lower the cost of sequencing. (A) Exponential growth of genome sequence databases since 1995. (B) The cost in US Dollars (USD) to sequence one million bases. (C) The cost in USD to sequence a 3,000 Mb (human-sized) genome on a log-transformed scale.

Most of the microorganisms whose genomes have been completely sequenced are problematic

Pan troglodytes) are all important model animals in medical research. [20]

A rough draft of the human genome was completed by the Human Genome Project in early 2001, creating much fanfare.[34] This project, completed in 2003, sequenced the entire genome for one specific person, and by 2007 this sequence was declared "finished" (less than one error in 20,000 bases and all chromosomes assembled).[34] In the years since then, the genomes of many other individual people have been sequenced, partly under the auspices of the 1000 Genomes Project, which announced the sequencing of 1,092 genomes in October 2012.[35] Completion of this project was made possible by the development of dramatically more efficient sequencing technologies and required the commitment of significant bioinformatics resources from large international collaboration.[36] The continued analysis of human genomic data has profound political and social repercussions for human societies.[37]

Next-generation sequencing

[38]


Size comparison of selected genomes.[39] [40]
Latin Name Common Name Genome Size
Eukaryotes
Lilium longiflorum Easter lily 90,000,000 Kb
Homo sapiens
Human 3,200,000 Kb
Oryza sativa Rice 420,000 Kb
Drosophila melanogaster Fruit fly 137,000 Kb
Arabidopsis thaliana Mustard cress 115,428 Kb
Caenorhabditis elegans Roundworm 97,000 Kb
Saccharomyces cerevisiae Yeast 12,069 Kb
Eubacteria
Haemophilus influenzae Pfeiffer's bacillus 1,830 Kb
Escherichia coli Human colon bacterium 4,639 Kb
Helicobacter pylori Stomach ulcer bacterium 1,667 Kb
Mycobacterium tuberculosis Tuberculosis 4,411 Kb
Yersinia pestis Plague 4,653 Kb
Archaea
Halobacterium Salt-tolerant archaean 2,014 Kb
Methanobacterium thermoautotrophicum Methane-producing archaean 1,751 Kb

Genome analysis

After an organism has been selected, genome projects involve three components: the sequencing of DNA, the assembly of that sequence to create a representation of the original chromosome, and the annotation and analysis of that representation.[3]

BGI or JGI
). Third, the genome sequence is annotated at several levels: DNA, protein, gene pathways, or comparatively.

Sequencing

Historically, sequencing was done in sequencing centers, centralized facilities (ranging from large independent institutions such as Joint Genome Institute which sequence dozens of terabases a year, to local molecular biology core facilities) which contain research laboratories with the costly instrumentation and technical support necessary. As sequencing technology continues to improve, however, a new generation of effective fast turnaround benchtop sequencers has come within reach of the average academic laboratory.[41][42] On the whole, genome sequencing approaches fall into two broad categories, shotgun and high-throughput (aka next-generation) sequencing.[3]

Shotgun sequencing
An ABI PRISM 3100 Genetic Analyzer. Such capillary sequencers automated the early efforts of sequencing genomes.

Shotgun sequencing (Sanger sequencing is used interchangably) is a sequencing method designed for analysis of DNA sequences longer than 1000 base pairs, up to and including entire chromosomes.[43] It is named by analogy with the rapidly-expanding, quasi-random firing pattern of a shotgun. Since the chain termination method of DNA sequencing can only be used for fairly short strands (100 to 1000 basepairs), longer DNA sequences must be broken into random small segments which are then sequenced to obtain reads. Multiple overlapping reads for the target DNA are obtained by performing several rounds of this fragmentation and sequencing. Computer programs then use the overlapping ends of different reads to assemble them into a continuous sequence.[43][44]

The technology underlying shotgun sequencing is the classical chain-termination method, which is based on the selective incorporation of chain-terminating

DNA sequencers) can sequence up to 96 DNA samples in a single batch (run) in up to 48 runs a day.[47]

High-throughput sequencing

The high demand for low-cost sequencing has driven the development of high-throughput sequencing (or next-generation sequencing [NGS]) technologies that parallelize the sequencing process, producing thousands or millions of sequences at once.[48][49] High-throughput sequencing technologies are intended to lower the cost of DNA sequencing beyond what is possible with standard dye-terminator methods. In ultra-high-throughput sequencing as many as 500,000 sequencing-by-synthesis operations may be run in parallel.[50][51]

While NGS methods' high throughput are associated with the sequencing of large eukaryotic genomes, their scalability gives them applications in the sequencing of smaller prokaryotic genomes.

Integrated Microbial Genomes project found comparison [46]


454 pyrosequencing
Illumina (Solexa) sequencing
Illumina Genome Analyzer II System. Illumina technologies have set the standard for high throughput massively parallel sequencing.[41]

Illumina, developed a sequencing method based on reversible dye-terminators technology acquired from Manteia Predictive Medicine in 2004. This technology had been invented and developed in late 1996 at Glaxo-Welcome's Geneva Biomedical Research Institute (GBRI), by Dr. Pascal Mayer and Dr Laurent Farinelli.[52] In this method, DNA molecules and primers are first attached on a slide and amplified with polymerase
so that local clonal colonies, initially coined "DNA colonies", are formed. To determine the sequence, four types of reversible terminator bases (RT-bases) are added and non-incorporated nucleotides are washed away. Unlike pyrosequencing, the DNA chains are extended one nucleotide at a time and image acquisition can be performed at a delayed moment, allowing for very large arrays of DNA colonies to be captured by sequential images taken from a single camera.

Decoupling the enzymatic reaction and the image capture allows for optimal throughput and theoretically unlimited sequencing capacity. With an optimal configuration, the ultimately reachable instrument throughput is thus dictated solely by the analogic-to-digital conversion rate of the camera, multiplied by the number of cameras and divided by the number of pixels per DNA colony required for visualizing them optimally (approximately 10 pixels/colony). In 2012, with cameras operating at more than 10 MHz A/D conversion rates and available optics, fluidics and enzymatics, throughput can be multiples of 1 million nucleotides/second, corresponding roughly to 1 human genome equivalent at 1x coverage per hour per instrument, and 1 human genome re-sequenced (at approx. 30x) per day per instrument (equipped with a single camera). The camera takes images of the

fluorescently labeled nucleotides, then the dye along with the terminal 3' blocker is chemically removed from the DNA, allowing the next cycle.[53]


Single cell genomics

Assembly

Overlapping reads form contigs; contigs and gaps of known length form scaffolds.
Paired end reads of next generation sequencing data mapped to a reference genome.
Multiple, fragmented sequence reads must be assembled together on the basis of their overlapping areas.

When are genomes finished?

  • genome standards [58]
  • Coverage?

Challenges

  • challenges reviewed:
    • NGS and, [59]
    • Mammalian assembly and, [60]
  • Assembler performance compared (2011) [61][62]
Algorithms

Scaffolding

De-novo vs. mapping assembly

Finishing

Annotation

The DNA sequence alone is of little value without additional analysis.

sequences, and consists of three main steps:[63]

  1. identifying portions of the genome that do not code for proteins
  2. identifying elements on the genome, a process called gene prediction, and
  3. attaching biological information to these elements.

Automatic annotation tools try to perform these steps in silico, as opposed to manual annotation (a.k.a. curation) which involves human expertise and potential experimental verification.[64] Ideally, these approaches co-exist and complement each other in the same annotation pipeline (also see below).

Traditionally, the basic level of annotation is using

Ensembl) rely on both curated data sources as well as a range of different software tools in their automated genome annotation pipeline.[65] Structural annotation consists of the identification of genomic elements, primarily ORFs
and their localisation, or gene structure. Functional annotation consists of attaching biological information to genomic elements.

Sequencing pipelines and databases

Genome analysis tools in Integrated Microbial Genomes (v. 2.9) pipeline.

The need for reproducibility and efficient management of large amount of data associated with genome projects mean that computational pipelines have important applications in genomics.[66]

Research areas

Functional genomics

Functional genomics is a field of

DNA sequence
or structures. Functional genomics attempts to answer questions about the function of DNA at the levels of genes, RNA transcripts, and protein products. A key characteristic of functional genomics studies is their genome-wide approach to these questions, generally involving high-throughput methods rather than a more traditional “gene-by-gene” approach.

A major branch of genomics is still concerned with sequencing the genomes of various organisms, but the knowledge of full genomes has created the possibility for the field of functional genomics, mainly concerned with patterns of gene expression during various conditions. The most important tools here are microarrays and bioinformatics.

Evolutionary genomics

Structural genomics

An example of a protein structure determined by the Midwest Center for Structural Genomics.

Structural genomics seeks to describe the

3-dimensional structure of every protein encoded by a given genome.[67][68] This genome-based approach allows for a high-throughput method of structure determination by a combination of experimental and modeling approaches. The principal difference between structural genomics and traditional structural prediction is that structural genomics attempts to determine the structure of every protein encoded by the genome, rather than focusing on one particular protein. With full-genome sequences available, structure prediction can be done more quickly through a combination of experimental and modeling approaches, especially because the availability of large number of sequenced genomes and previously-solved protein structures allows scientists to model protein structure on the structures of previously solved homologs. Structural genomics involves taking a large number of approaches to structure determination, including experimental methods using genomic sequences or modeling-based approaches based on sequence or structural homology to a protein of known structure or based on chemical and physical principles for a protein with no homology to any known structure. As opposed to traditional structural biology, the determination of a protein structure through a structural genomics effort often (but not always) comes before anything is known regarding the protein function. This raises new challenges in structural bioinformatics, i.e. determining protein function from its 3D structure.[69]

Epigenomics

Epigenomics is the study of the complete set of

epigenetic modifications on the genetic material of a cell, known as the epigenome.[70] Epigenetic modifications are reversible modifications on a cell’s DNA or histones that affect gene expression without altering the DNA sequence (Russell 2010 p. 475). Two of the most characterized epigenetic modifications are DNA methylation and histone modification. Epigenetic modifications play an important role in gene expression and regulation, and are involved in numerous cellular processes such as in differentiation/development and tumorigenesis [70]. The study of epigenetics on a global level has been made possible only recently through the adaptation of genomic high-throughput assays [71]

Metagenomics

Environmental Shotgun Sequencing (ESS) is a key technique in metagenomics. (A) Sampling from habitat; (B) filtering particles, typically by size; (C) Lysis and DNA extraction; (D) cloning and library construction; (E) sequencing the clones; (F) sequence assembly into contigs and scaffolds.

Sanger sequencing or massively parallel pyrosequencing to get largely unbiased samples of all genes from all the members of the sampled communities.[73] Because of its power to reveal the previously hidden diversity of microscopic life, metagenomics offers a powerful lens for viewing the microbial world that has the potential to revolutionize understanding of the entire living world.[74][75]

Study systems

Viruses and bacteriophages

phage evolution. Bacteriophage genome sequences can be obtained through direct sequencing of isolated bacteriophages, but can also be derived as part of microbial genomes. Analysis of bacterial genomes has shown that a substantial amount of microbial DNA consists of prophage sequences and prophage-like elements.[76] A detailed database mining of these sequences offers insights into the role of prophages in shaping the bacterial genome.[77][78]

Microbes

[79] [80]

Cyanobacteria

At present there are 24

Acaryochloris and Prochloron, the N2-fixing filamentous cyanobacteria Nodularia spumigena, Lyngbya aestuarii and Lyngbya majuscula, as well as bacteriophages infecting marine cyanobaceria. Thus, the growing body of genome information can also be tapped in a more general way to address global problems by applying a comparative approach. Some new and exciting examples of progress in this field are the identification of genes for regulatory RNAs, insights into the evolutionary origin of photosynthesis, or estimation of the contribution of horizontal gene transfer to the genomes that have been analyzed.[81]

Animals

Human genomics

Applications of genomics

Genomics has provided applications in many fields, including

social sciences.[37]

Genomic medicine

[82]

[83] [84] [85] [86]

Synthetic biology and bioengineering

[87]

Social aspects

'Astrological' genomics

[88]

Race and genomics

[89][90][91]

'Omics

See also

References

  1. ^ National Human Genome Research Institute (2010-11-08). "A Brief Guide to Genomics". Genome.gov. Retrieved 2011-12-03.
  2. .
  3. ^ .
  4. ^ National Human Genome Research Institute (2010-11-08). "FAQ About Genetic and Genomic Science". Genome.gov. Retrieved 2011-12-03.
  5. ^ "Genome, n.". Oxford English Dictionary (Third ed.). Oxford University Press. 2008. Retrieved 2012-12-01.(subscription required)
  6. PMID 18166670
    .
  7. . Retrieved 2012-06-18.
  8. PMID 14299636. {{cite journal}}: Check date values in: |date= (help)CS1 maint: multiple names: authors list (link
    )
  9. S2CID 40989800.{{cite journal}}: CS1 maint: multiple names: authors list (link
    )
  10. PMID 5330357.{{cite journal}}: CS1 maint: date and year (link) CS1 maint: multiple names: authors list (link
    )
  11. S2CID 4153893.{{cite journal}}: CS1 maint: multiple names: authors list (link
    )
  12. S2CID 4289674.{{cite journal}}: CS1 maint: multiple names: authors list (link
    )
  13. S2CID 1634424. Retrieved 2012-12-20. {{cite journal}}: Text "113-120" ignored (help
    )
  14. ISBN 0071243208-9780071243209. {{cite book}}: Check |isbn= value: length (help
    )
  15. ^ Sanger, F. (1980), Nobel lecture: Determination of nucleotide sequences in DNA (PDF), Nobelprize.org, retrieved 2010-10-18
  16. ^
    S2CID 4206886.{{cite journal}}: CS1 maint: multiple names: authors list (link
    )
  17. . Retrieved 2012-12-20.
  18. .
  19. .
  20. ^ a b Darden, Lindley (2010). "Molecular Biology". In Edward N. Zalta (ed.) (ed.). The Stanford Encyclopedia of Philosophy (Fall 2010 ed.). Retrieved 2012-12-20. {{cite book}}: |editor= has generic name (help); Unknown parameter |coauthors= ignored (|author= suggested) (help)
  21. S2CID 4355527
    .(subscription required)
  22. .
  23. .
  24. .
  25. PMID 7542800. {{cite journal}}: Explicit use of et al. in: |author= (help)CS1 maint: multiple names: authors list (link
    )
  26. .(subscription required)
  27. ^ "Complete genomes: Viruses". NCBI. 2011-11-17. Retrieved 2011-11-18.
  28. ^ "Genome Project Statistics". Entrez Genome Project. 2011-10-07. Retrieved 2011-11-18.
  29. ISSN 0362-4331
    . Retrieved 2012-12-21.}
  30. .
  31. ^ "Human gene number slashed". BBC. 2004-10-20. Retrieved 2012-12-21.
  32. S2CID 21797344
    .
  33. ^ National Human Genome Research Institute (2004-07-14). "Dog Genome Assembled: Canine Genome Now Available to Research Community Worldwide". Genome.gov. Retrieved 2012-01-20.
  34. ^ .
  35. .
  36. .
  37. ^
    ISBN 9780226172958-0226172953. {{cite book}}: Check |isbn= value: length (help); Unknown parameter |coauthors= ignored (|author= suggested) (help) Cite error: The named reference "barnes2008" was defined multiple times with different content (see the help page
    ).
  38. .
  39. ISBN 0028656067. {{cite encyclopedia}}: |editor= has generic name (help); Unknown parameter |coauthors= ignored (|author= suggested) (help
    )
  40. .
  41. ^ a b Monya Baker (2012-09-14). "Benchtop sequencers ship off" (Blog). Nature News Blog. Retrieved 2012-12-22.
  42. PMID 22827831.{{cite journal}}: CS1 maint: unflagged free DOI (link
    )
  43. ^
    PMID 461197. {{cite journal}}: Check date values in: |date= (help
    )
  44. .
  45. PMID 1100841.{{cite journal}}: CS1 maint: date and year (link
    )
  46. ^ .
  47. ^ a b Illumina, Inc. (2012-02-28). An Introduction to Next-Generation Sequencing Technology (PDF). San Diego, California, USA: Illumina, Inc. p. 12. Retrieved 2012-12-28.
  48. S2CID 25688677.{{cite journal}}: CS1 maint: date and year (link
    )
  49. PMID 16468433.{{cite journal}}: CS1 maint: date and year (link
    )
  50. .
  51. .
  52. ^ Kawashima, Eric H. (2005-05-12), Method of nucleic acid amplification, retrieved 2012-12-22 {{citation}}: Unknown parameter |coauthors= ignored (|author= suggested) (help)
  53. PMID 18576944
    .
  54. ^ Gilbert, David (2009-04-29). "DNA of Uncultured Organisms Sequenced Using Novel Single-Cell Approach" (Press Release). DOE Joint Genome Institute. Retrieved 2012-12-23.
  55. PMID 19390573
    .
  56. .
  57. ^ Stein, Richard A. (2009-06-01). "Single-Cell Genomics Clarifies Big Picture". Genetic Engineering & Biotechnology News. Vol. 29, no. 11. Retrieved 2012-12-23.
  58. PMID 19815760
    .
  59. .
  60. ISBN 0470849746, 9780470849743, 047001153X, 9780470011539. Retrieved 2012-12-23. {{cite book}}: |editor= has generic name (help); Check |isbn= value: invalid character (help)CS1 maint: multiple names: editors list (link
    )
  61. .
  62. .
  63. .
  64. .
  65. .
  66. .
  67. PMID 17349043.{{cite journal}}: CS1 maint: unflagged free DOI (link
    )
  68. .
  69. . Retrieved 2012-12-07.
  70. ^ .
  71. .
  72. ^ Hugenholtz, Philip; Goebel, Brett M.; Pace, Norman R. (1 September 1998). "Impact of Culture-Independent Studies on the Emerging Phylogenetic View of Bacterial Diversity". J. Bacteriol. 180 (18): 4765–74.
    PMID 9733676
    .
  73. ^ Eisen, JA (2007). "Environmental Shotgun Sequencing: Its Potential and Challenges for Studying the Hidden World of Microbes". PLOS Biology. 5 (3): e82.
    PMID 17355177.{{cite journal}}: CS1 maint: unflagged free DOI (link
    )
  74. ^ Marco, D, ed. (2010). Metagenomics: Theory, Methods and Applications. Caister Academic Press. .
  75. ^ Marco, D, ed. (2011). Metagenomics: Current Innovations and Future Trends. .
  76. .
  77. .
  78. .
  79. ^ Office of Science (2005-08). Genomics:GTL Roadmap (PDF). US Department of Energy. {{cite conference}}: Check date values in: |date= (help)
  80. ^ Microbiology in the 21st Century: Where Are We and Where Are We Going?. American Academy of Microbiology. 2004.
  81. .
  82. . Retrieved 2012-12-07.
  83. . Retrieved 2012-12-07.
  84. ProQuest 914354372.{{cite journal}}: CS1 maint: numeric names: authors list (link
    )
  85. . Retrieved 2012-12-07.
  86. . Retrieved 2012-12-07.
  87. ISBN 9780465021758-0465021751. {{cite book}}: Check |isbn= value: length (help); Unknown parameter |coauthors= ignored (|author= suggested) (help
    )
  88. .
  89. .
  90. S2CID 32192135. Retrieved 2012-12-07. {{cite journal}}: Check |issn= value (help
    )
  91. . Retrieved 2012-12-07.

External links