International HapMap Project
The International HapMap Project was an organization that aimed to develop a haplotype map (HapMap) of the human genome, to describe the common patterns of human genetic variation. HapMap is used to find genetic variants affecting health, disease and responses to drugs and environmental factors. The information produced by the project is made freely available for research.
The International HapMap Project is a collaboration among researchers at academic centers, non-profit biomedical research groups and private companies in Canada, China (including Hong Kong), Japan, Nigeria, the United Kingdom, and the United States. It officially started with a meeting on October 27 to 29, 2002, and was expected to take about three years. It comprises two phases; the complete data obtained in Phase I were published on 27 October 2005.[1] The analysis of the Phase II dataset was published in October 2007.[2] The Phase III dataset was released in spring 2009 and the publication presenting the final results published in September 2010.[3]
Background
Unlike with the
Although any two unrelated people share about 99.5% of their
Each person has two copies of all
The alleles of nearby SNPs on a single chromosome are correlated. Specifically, if the allele of one SNP for a given individual is known, the alleles of nearby SNPs can often be predicted, a process known as genotype imputation.[7] This is because each SNP arose in evolutionary history as a single point mutation, and was then passed down on the chromosome surrounded by other, earlier, point mutations. SNPs that are separated by a large distance on the chromosome are typically not very well correlated, because recombination occurs in each generation and mixes the allele sequences of the two chromosomes. A sequence of consecutive alleles on a particular chromosome is known as a haplotype.[8]
To find the genetic factors involved in a particular disease, one can proceed as follows. First a certain region of interest in the genome is identified, possibly from earlier inheritance studies. In this region one locates a set of tag SNPs from the HapMap data; these are SNPs that are very well correlated with all the other SNPs in the region. Using these, genotype imputation can be used to determine (impute) the other SNPs and thus the entire haplotype with high confidence. Next, one determines the genotype for these tag SNPs in several individuals, some with the disease and some without. By comparing the two groups, one determines the likely locations and haplotypes that are involved in the disease.
Samples used
All samples were collected through a community engagement process with appropriate informed consent. The community engagement process was designed to identify and attempt to respond to culturally specific concerns and give participating communities input into the informed consent and sample collection processes.[9]
In phase III, 11 global ancestry groups have been assembled: ASW (African ancestry in Southwest USA); CEU (Utah residents with Northern and Western European ancestry from the CEPH collection); CHB (Han Chinese in Beijing, China); CHD (Chinese in Metropolitan Denver, Colorado); GIH (Gujarati Indians in Houston, Texas); JPT (Japanese in Tokyo, Japan); LWK (Luhya in Webuye, Kenya); MEX (Mexican ancestry in Los Angeles, California); MKK (Maasai in Kinyawa, Kenya); TSI (Tuscans in Italy); YRI (Yoruba in Ibadan, Nigeria).[10]
Phase | ID | Place | Population | Detail |
---|---|---|---|---|
I/II | CEU | Western European ancestry from the CEPH collection
|
Detail | |
I/II | CHB | Han Chinese in Beijing, China | Detail | |
I/II | JPT | Japanese in Tokyo, Japan | Detail | |
I/II | YRI | Yoruba in Ibadan, Nigeria | Detail | |
III | ASW | African ancestry in the Southwest USA | Detail | |
III | CHD | metropolitan Denver, CO, United States
|
Detail | |
III | GIH | Gujarati Indians in Houston, TX, United States | Detail | |
III | LWK | Luhya in Webuye, Kenya | Detail | |
III | MKK | Maasai in Kinyawa, Kenya | Detail | |
III | MXL | Mexican ancestry in Los Angeles, CA, United States | Detail | |
III | TSI | Toscani in Italia | Detail |
Three combined panels have also been created, which allow better identification of SNPs in groups outside the nine homogenous samples: CEU+TSI (Combined panel of Utah residents with Northern and Western European ancestry from the CEPH collection and Tuscans in Italy); JPT+CHB (Combined panel of Japanese in Tokyo, Japan and Han Chinese in Beijing, China) and JPT+CHB+CHD (Combined panel of Japanese in Tokyo, Japan, Han Chinese in Beijing, China and Chinese in Metropolitan Denver, Colorado). CEU+TSI, for instance, is a better model of UK British individuals than is CEU alone.[10]
Scientific strategy
It was expensive in the 1990s to sequence patients’ whole genomes. So the
For the Phase I, one common SNP was genotyped every 5,000 bases. Overall, more than one million SNPs were genotyped. The genotyping was carried out by 10 centres using five different genotyping technologies. Genotyping quality was assessed by using duplicate or related samples and by having periodic quality checks where centres had to genotype common sets of SNPs.
The Canadian team was led by
To obtain enough SNPs to create the Map, the Consortium funded a large re-sequencing project to discover millions of additional SNPs. These were submitted to the public dbSNP database. As a result, by August 2006, the database included more than ten million SNPs, and more than 40% of them were known to be polymorphic. By comparison, at the start of the project, fewer than 3 million SNPs were identified, and no more than 10% of them were known to be polymorphic.
During Phase II, more than two million additional SNPs were genotyped throughout the genome by David R. Cox, Kelly A. Frazer and others at Perlegen Sciences and 500,000 by the company Affymetrix.
Data access
All of the data generated by the project, including SNP frequencies,
Publications
- International HapMap Consortium (2003). "The International HapMap Project" (PDF). Nature. 426 (6968): 789–796. S2CID 4387110.
- International HapMap Consortium (2004). "Integrating ethics and science in the International HapMap Project". Nature Reviews Genetics. 5 (6): 467–475. PMID 15153999.
- International HapMap Consortium (2005). "A haplotype map of the human genome". Nature. 437 (7063): 1299–1320. PMID 16255080.
- International HapMap Consortium (2007). "A second generation human haplotype map of over 3.1 million SNPs". Nature. 449 (7164): 851–861. PMID 17943122.
- International HapMap 3 Consortium (2010). "Integrating common and rare genetic variation in diverse human populations". Nature. 467 (7311): 52–58. PMID 20811451.)
{{cite journal}}
: CS1 maint: numeric names: authors list (link - Deloukas P, Bentley D (2004). "The HapMap project and its application to genetic studies of drug response". The Pharmacogenomics Journal. 4 (2): 88–90. PMID 14676823.
- Thorisson GA, Smith AV, Krishnan L, Stein LD (2005). "The International HapMap Project Web site". Genome Research. 15 (11): 1592–1593. PMID 16251469.
- Terwilliger JD, Hiekkalinna T (2006). "An utter refutation of the 'Fundamental Theorem of the HapMap'". European Journal of Human Genetics. 14 (4): 426–437. PMID 16479260.
- Secko, David (2005). "Phase I of the HapMap Complete" Archived 2011-05-14 at the Wayback Machine. The Scientist
See also
- Genealogical DNA test
- The 1000 Genomes Project
- Population groups in biomedicine
- Human Variome Project
- Human genetic variation
References
- PMID 16255080.
- PMID 17943122.
- PMID 20811451.
- PMID 32753378.
- ^ "Allele". Genome.gov. National Human Genome Research Institute.
- S2CID 8151693.
- PMID 35046990.
- ^ "Haplotype". Genome.gov. National Human Genome Research Institute. Retrieved 25 June 2022.
- S2CID 10844405.
- ^ a b International HapMap consortium et al. (2010). Integrating common and rare genetic variation in diverse human populations. Nature, 467, 52-8. doi
- PMID 22155605.
- PMID 16251469.