Microsatellite
A microsatellite is a tract of repetitive
Microsatellites and their longer cousins, the minisatellites, together are classified as VNTR (variable number of tandem repeats) DNA. The name "satellite" DNA refers to the early observation that centrifugation of genomic DNA in a test tube separates a prominent layer of bulk DNA from accompanying "satellite" layers of repetitive DNA.[5]
They are widely used for DNA profiling in cancer diagnosis, in kinship analysis (especially paternity testing) and in forensic identification. They are also used in genetic linkage analysis to locate a gene or a mutation responsible for a given trait or disease. Microsatellites are also used in population genetics to measure levels of relatedness between subspecies, groups and individuals.
History
Although the first microsatellite was characterised in 1984 at the University of Leicester by Weller, Jeffreys and colleagues as a polymorphic GGAT repeat in the human myoglobin gene, the term "microsatellite" was introduced later, in 1989, by Litt and Luty.[1] The name "satellite" DNA refers to the early observation that centrifugation of genomic DNA in a test tube separates a prominent layer of bulk DNA from accompanying "satellite" layers of repetitive DNA.[5] The increasing availability of DNA amplification by PCR at the beginning of the 1990s triggered a large number of studies using the amplification of microsatellites as genetic markers for forensic medicine, for paternity testing, and for positional cloning to find the gene underlying a trait or disease. Prominent early applications include the identifications by microsatellite genotyping of the eight-year-old skeletal remains of a British murder victim (Hagelberg et al. 1991), and of the Auschwitz concentration camp doctor Josef Mengele who escaped to South America following World War II (Jeffreys et al. 1992).[1]
Structures, locations, and functions
A microsatellite is a tract of tandemly repeated (i.e. adjacent) DNA motifs that range in length from one to six or up to ten nucleotides (the exact definition and delineation to the longer minisatellites varies from author to author),[1][6] and are typically repeated 5–50 times. For example, the sequence TATATATATA is a dinucleotide microsatellite, and GTCGTCGTCGTCGTC is a trinucleotide microsatellite (with A being Adenine, G Guanine, C Cytosine, and T Thymine). Repeat units of four and five nucleotides are referred to as tetra- and pentanucleotide motifs, respectively. Most eukaryotes have microsatellites, with the notable exception of some yeast species. Microsatellites are distributed throughout the genome.[7][1][8] The human genome for example contains 50,000–100,000 dinucleotide microsatellites, and lesser numbers of tri-, tetra- and pentanucleotide microsatellites.[9] Many are located in non-coding parts of the human genome and therefore do not produce proteins, but they can also be located in regulatory regions and coding regions.
Microsatellites in non-coding regions may not have any specific function, and therefore might not be
Mutation mechanisms and mutation rates
Unlike
One proposed cause of such length changes is replication slippage, caused by mismatches between DNA strands while being replicated during meiosis.[17] DNA polymerase, the enzyme responsible for reading DNA during replication, can slip while moving along the template strand and continue at the wrong nucleotide. DNA polymerase slippage is more likely to occur when a repetitive sequence (such as CGCGCG) is replicated. Because microsatellites consist of such repetitive sequences, DNA polymerase may make errors at a higher rate in these sequence regions. Several studies have found evidence that slippage is the cause of microsatellite mutations.[18][19] Typically, slippage in each microsatellite occurs about once per 1,000 generations.[20] Thus, slippage changes in repetitive DNA are three orders of magnitude more common than point mutations in other parts of the genome.[21] Most slippage results in a change of just one repeat unit, and slippage rates vary for different allele lengths and repeat unit sizes,[3] and within different species.[22][23][24] If there is a large size difference between individual alleles, then there may be increased instability during recombination at meiosis.[21]
Another possible cause of microsatellite mutations are point mutations, where only one nucleotide is incorrectly copied during replication. A study comparing human and primate genomes found that most changes in repeat number in short microsatellites appear due to point mutations rather than slippage.[25]
Microsatellite mutation rates
Direct estimates of microsatellite mutation rates have been made in numerous organisms, from insects to humans. In the desert locust Schistocerca gregaria, the microsatellite mutation rate was estimated at 2.1 × 10−4 per generation per locus.[26] The microsatellite mutation rate in human male germ lines is five to six times higher than in female germ lines and ranges from 0 to 7 × 10−3 per locus per gamete per generation.[3] In the nematode Pristionchus pacificus, the estimated microsatellite mutation rate ranges from 8.9 × 10−5 to 7.5 × 10−4 per locus per generation.[27]
Microsatellite mutation rates vary with base position relative to the microsatellite, repeat type, and base identity.
Biological effects of microsatellite mutations
Many microsatellites are located in
Effects on proteins
In mammals, 20–40% of proteins contain repeating sequences of amino acids encoded by short sequence repeats.[31] Most of the short sequence repeats within protein-coding portions of the genome have a repeating unit of three nucleotides, since that length will not cause frame-shifts when mutating.[32] Each trinucleotide repeating sequence is transcribed into a repeating series of the same amino acid. In yeasts, the most common repeated amino acids are glutamine, glutamic acid, asparagine, aspartic acid and serine.
Mutations in these repeating segments can affect the physical and chemical properties of proteins, with the potential for producing gradual and predictable changes in protein action.
Effects on gene regulation
Length changes of microsatellites within promoters and other cis-regulatory regions can change gene expression quickly, between generations. The human genome contains many (>16,000) short sequence repeats in regulatory regions, which provide 'tuning knobs' on the expression of many genes.[30][41]
Length changes in bacterial SSRs can affect
In Ewing sarcoma (a type of painful bone cancer in young humans), a point mutation has created an extended GGAA microsatellite which binds a transcription factor, which in turn activates the EGR2 gene which drives the cancer.[43] In addition, other GGAA microsatellites may influence the expression of genes that contribute to the clinical outcome of Ewing sarcoma patients.[44]
Effects within introns
Microsatellites within introns also influence phenotype, through means that are not currently understood. For example, a GAA triplet expansion in the first intron of the X25 gene appears to interfere with transcription, and causes Friedreich's ataxia.[45] Tandem repeats in the first intron of the Asparagine synthetase gene are linked to acute lymphoblastic leukaemia.[46] A repeat polymorphism in the fourth intron of the NOS3 gene is linked to hypertension in a Tunisian population.[47] Reduced repeat lengths in the EGFR gene are linked with osteosarcomas.[48]
An archaic form of splicing preserved in zebrafish is known to use microsatellite sequences within intronic mRNA for the removal of introns in the absence of U2AF2 and other splicing machinery. It is theorized that these sequences form highly stable cloverleaf configurations that bring the 3' and 5' intron splice sites into close proximity, effectively replacing the spliceosome. This method of RNA splicing is believed to have diverged from human evolution at the formation of tetrapods and to represent an artifact of an RNA world.[49]
Effects within transposons
Almost 50% of the human genome is contained in various types of transposable elements (also called transposons, or 'jumping genes'), and many of them contain repetitive DNA.[50] It is probable that short sequence repeats in those locations are also involved in the regulation of gene expression.[51]
Applications
Microsatellites are used for assessing chromosomal DNA deletions in cancer diagnosis. Microsatellites are widely used for
Cancer diagnosis
In
Forensic and medical fingerprinting
Microsatellite analysis became popular in the field of
The microsatellites in use today for forensic analysis are all tetra- or penta-nucleotide repeats, as these give a high degree of error-free data while being short enough to survive degradation in non-ideal conditions. Even shorter repeat sequences would tend to suffer from artifacts such as PCR stutter and preferential amplification, while longer repeat sequences would suffer more highly from environmental degradation and would amplify less well by
Kinship analysis (paternity testing)
Genetic linkage analysis
During the 1990s and the first several years of this millennium, microsatellites were the workhorse genetic markers for genome-wide scans to locate any gene responsible for a given phenotype or disease, using segregation observations across generations of a sampled pedigree. Although the rise of higher throughput and cost-effective single-nucleotide polymorphism (SNP) platforms led to the era of the SNP for genome scans, microsatellites remain highly informative measures of genomic variation for linkage and association studies. Their continued advantage lies in their greater allelic diversity than biallelic SNPs, thus microsatellites can differentiate alleles within a SNP-defined linkage disequilibrium block of interest. Thus, microsatellites have successfully led to discoveries of type 2 diabetes (TCF7L2) and prostate cancer genes (the 8q21 region).[6][64]
Population genetics
Microsatellites were popularized in
Plant breeding
Analysis
Repetitive DNA is not easily analysed by next generation DNA sequencing methods, for some technologies struggle with homopolymeric tracts. A variety of software approaches have been created for the analysis or raw nextgen DNA sequencing reads to determine the genotype and variants at repetitive loci.[75][76] Microsatellites can be analysed and verified by established PCR amplification and amplicon size determination, sometimes followed by Sanger DNA sequencing.
In forensics, the analysis is performed by extracting
Amplification
Microsatellites can be amplified for identification by the
Design of microsatellite primers
If searching for microsatellite markers in specific regions of a genome, for example within a particular intron, primers can be designed manually. This involves searching the genomic DNA sequence for microsatellite repeats, which can be done by eye or by using automated tools such as repeat masker. Once the potentially useful microsatellites are determined, the flanking sequences can be used to design oligonucleotide primers which will amplify the specific microsatellite repeat in a PCR reaction.
Random microsatellite primers can be developed by cloning random segments of DNA from the focal species. These random segments are inserted into a plasmid or bacteriophage vector, which is in turn implanted into Escherichia coli bacteria. Colonies are then developed, and screened with fluorescently–labelled oligonucleotide sequences that will hybridize to a microsatellite repeat, if present on the DNA segment. If positive clones can be obtained from this procedure, the DNA is sequenced and PCR primers are chosen from sequences flanking such regions to determine a specific locus. This process involves significant trial and error on the part of researchers, as microsatellite repeat sequences must be predicted and primers that are randomly isolated may not display significant polymorphism.[21][83] Microsatellite loci are widely distributed throughout the genome and can be isolated from semi-degraded DNA of older specimens, as all that is needed is a suitable substrate for amplification through PCR.
More recent techniques involve using oligonucleotide sequences consisting of repeats complementary to repeats in the microsatellite to "enrich" the DNA extracted (microsatellite enrichment). The oligonucleotide probe hybridizes with the repeat in the microsatellite, and the probe/microsatellite complex is then pulled out of solution. The enriched DNA is then cloned as normal, but the proportion of successes will now be much higher, drastically reducing the time required to develop the regions for use. However, which probes to use can be a trial and error process in itself.[84]
ISSR-PCR
ISSR (for inter-simple sequence repeat) is a general term for a genome region between microsatellite loci. The complementary sequences to two neighboring microsatellites are used as PCR primers; the variable region between them gets amplified. The limited length of amplification cycles during PCR prevents excessive replication of overly long contiguous DNA sequences, so the result will be a mix of a variety of amplified DNA strands which are generally short but vary much in length.
Sequences amplified by ISSR-PCR can be used for DNA fingerprinting. Since an ISSR may be a conserved or nonconserved region, this technique is not useful for distinguishing individuals, but rather for phylogeography analyses or maybe delimiting species; sequence diversity is lower than in SSR-PCR, but still higher than in actual gene sequences. In addition, microsatellite sequencing and ISSR sequencing are mutually assisting, as one produces primers for the other.
Limitations
Repetitive DNA is not easily analysed by next generation DNA sequencing methods, which struggle with homopolymeric tracts.[85] Therefore, microsatellites are normally analysed by conventional PCR amplification and amplicon size determination. The use of PCR means that microsatellite length analysis is prone to PCR limitations like any other PCR-amplified DNA locus. A particular concern is the occurrence of 'null alleles':
- Occasionally, within a sample of individuals such as in paternity testing casework, a mutation in the DNA flanking the microsatellite can prevent the PCR primer from binding and producing an amplicon (creating a "null allele" in a gel assay), thus only one allele is amplified (from the non-mutated sister chromosome), and the individual may then falsely appear to be homozygous. This can cause confusion in paternity casework. It may then be necessary to amplify the microsatellite using a different set of primers.[21][86] Null alleles are caused especially by mutations at the 3' section, where extension commences.
- In species or population analysis, for example in conservation work, PCR primers which amplify microsatellites in one individual or species can work in other species. However, the risk of applying PCR primers across different species is that null alleles become likely, whenever sequence divergence is too great for the primers to bind. The species may then artificially appear to have a reduced diversity. Null alleles in this case can sometimes be indicated by an excessive frequency of homozygotes causing deviations from Hardy-Weinberg equilibrium expectations.
See also
- Genetic marker
- Junk DNA
- List of biological databases
- Long interspersed nucleotide elements
- Microsatellite instability
- Mobile element
- Satellite DNA
- Short interspersed repetitive element
- Simple sequence length polymorphism (SSLP)—a search tool
- Snpstr
- Strbase
- Earth Human STR Allele Frequencies Database
- Transposon
- UgMicroSatdb
References
- ^ PMID 19052325.
- PMID 10899146.
- ^ PMID 9585597.
- ^ Short+Tandem+Repeat at the U.S. National Library of Medicine Medical Subject Headings (MeSH)
- ^ PMID 14456492.
- ^ PMID 23006819.
- .
- .
- ISBN 9780443100451.
- ^ S2CID 26672703.
- PMID 28949426.
- S2CID 4356170.
- PMID 8755922.
- ^ PMID 8621672.
- PMID 9755151.
- ^ PMID 10889045.
- PMID 7888752.
- S2CID 22298567.
- PMID 25694621.
- PMID 8401493.
- ^ PMID 21237902.
- PMID 9724780.
- PMID 17437958.
- PMID 15673565.
- ^ S2CID 1424625.
- S2CID 33307624.
- PMID 22973539.
- PMID 26740567.
- S2CID 6086527.
- ^ PMID 26642241.
- S2CID 11102561.
- PMID 7731957.
- PMID 15716087.
- PMID 15596718.
- S2CID 26718314.
- S2CID 22181414.
- S2CID 25142061.
- PMID 16086015.
- ^ S2CID 11203457.
- PMID 17726525.
- ^ PMID 12411608.
- S2CID 18899853.
- PMID 26214589.
- PMID 31511524.
- PMID 9443873.
- PMID 19054556.
- PMID 19111531.
- S2CID 19472307.
- PMID 26566657.
- ^ Scherer S (2008). A short guide to the human genome. New York: Cold Spring Harbor University Press.
- PMID 18348251.
- PMID 10766185.
- S2CID 22893621.
- PMID 22927958.
- PMID 24778007.
- PMID 9823339.
- PMID 31562520.
- S2CID 32129259.
- S2CID 21655592.
- ^ a b Curtis C, Hereward J (August 29, 2017). "From the crime scene to the courtroom: the journey of a DNA sample". The Conversation.
- PMID 11669214.
- ^ Carracedo A. "DNA Profiling". Archived from the original on 2001-09-27. Retrieved 2010-09-20.
- S2CID 28270630.
- PMID 25824869.
- PMID 23550135.
- S2CID 2984426.
- S2CID 22244000.
- S2CID 3063754.
- PMID 7705646.
- PMID 10331287.
- S2CID 46475635.
- S2CID 10811958.
- PMID 24240810.
- ^ Image by Mikael Häggström, MD, using following source image: Figure 1 - available via license: Creative Commons Attribution 4.0 International", from the following article:
Sitnik R, Torres MA, Bacal NS, Rebello Pinho JR (2006). "Using PCR for molecular monitoring of post-transplantation chimerism". Einstein. 4 (2). Sao Paulo – via ResearchGate. - S2CID 213733005.
- S2CID 256019433.
- ^ a b "Technology for Resolving STR Alleles". Retrieved 2010-09-20.
- ^ "The National DNA Database" (PDF). Archived (PDF) from the original on 2010-10-13. Retrieved 2010-09-20.
- ^ "House of Lords Select Committee on Science and Technology Written Evidence". Retrieved 2010-09-20.
- ^ "FBI CODIS Core STR Loci". Retrieved 2010-09-20.
- ^ Butler JM (2005). Forensic DNA Typing: Biology, Technology, and Genetics of STR Markers, Second Edition. New York: Elsevier Academic Press.
- ^ Griffiths AJ, Miller JF, Suzuki DT, Lewontin RC, Gelbart WM (1996). Introduction to Genetic Analysis (5th ed.). New York: W.H. Freeman.
- PMID 21236170.
- ^ Kaukinen KH, Supernault KJ, and Miller KM (2004). "Enrichment of tetranucleotide microsatellite loci from invertebrate species". Journal of Shellfish Research. 23 (2): 621.
- S2CID 214786277.
- PMID 15292911.
Further reading
- Caporale LH (2003). "Natural selection and the emergence of a mutation phenotype: an update of the evolutionary synthesis considering mechanisms that affect genome variation". Annual Review of Microbiology. 57: 467–85. PMID 14527288.
- Kashi Y, et al. (1997). "Simple sequence repeats as a source of quantitative genetic variation". Trends Genet. 13 (2): 74–78. PMID 9055609.
- Kinoshita Y, Saze H, Kinoshita T, Miura A, Soppe WJ, Koornneef M, Kakutani T (January 2007). "Control of FWA gene silencing in Arabidopsis thaliana by SINE-related direct repeats". The Plant Journal. 49 (1): 38–45. PMID 17144899.
- Li YC, Korol AB, Fahima T, Beiles A, Nevo E (December 2002). "Microsatellites: genomic distribution, putative functions and mutational mechanisms: a review". Molecular Ecology. 11 (12): 2453–65. PMID 12453231.
- Li YC, Korol AB, Fahima T, Nevo E (June 2004). "Microsatellites within genes: structure, function, and evolution". Molecular Biology and Evolution. 21 (6): 991–1007. PMID 14963101.
- Mattick JS (October 2003). "Challenging the dogma: the hidden layer of non-protein-coding RNAs in complex organisms". BioEssays. 25 (10): 930–9. PMID 14505360.
- Meagher TR, Vassiliadis C (October 2005). "Phenotypic impacts of repetitive DNA in flowering plants". The New Phytologist. 168 (1): 71–80. PMID 16159322.
- Müller KJ, Romano N, Gerstner O, Garcia-Maroto F, Pozzi C, Salamini F, Rohde W (April 1995). "The barley Hooded mutation caused by a duplication in a homeobox gene intron". Nature. 374 (6524): 727–30. S2CID 4344876.
- Pumpernik D, Oblak B, Borstnik B (January 2008). "Replication slippage versus point mutation rates in short tandem repeats of the human genome". Molecular Genetics and Genomics. 279 (1): 53–61. S2CID 20542422.
- Streelman JT, Kocher TD (2002). "Microsatellite variation associated with prolactin expression and growth of salt-challenged Tilapia". Physiol. Genomics. 9 (1): 1–4. S2CID 8360732.
- Vinces MD, Legendre M, Caldara M, Hagihara M, Verstrepen KJ (May 2009). "Unstable tandem repeats in promoters confer transcriptional evolvability". Science. 324 (5931): 1213–6. PMID 19478187.
External links
- All known disease-causing short tandem repeats
- MicroSatellite DataBase
- Search tools:
- FireMuSat2+ Archived 2014-02-21 at the Wayback Machine
- IMEx Archived 2013-09-14 at the Wayback Machine
- Imperfect SSR Finder Archived 2021-07-23 at the Wayback Machine—find perfect or imperfect SSRs in FASTA sequences.
- JSTRING—Java Search for Tandem Repeats In Genomes
- Microsatellite repeats finder
- MISA—MIcroSAtellite identification tool
- MREPATT
- Mreps
- Phobos—a tandem repeat search tool for perfect and imperfect repeats—the maximum pattern size depends only on computational power
- Poly
- SciRoKo
- SSR Finder
- STAR
- Tandem Repeats Finder
- TandemSWAN
- TRED
- TROLL
- Zebrafish Repeats Archived 2019-09-12 at the Wayback Machine