Watterson estimator
In population genetics, the Watterson estimator is a method for describing the genetic diversity in a population. It was developed by Margaret Wu and G. A. Watterson in the 1970s.[1][2] It is estimated by counting the number of polymorphic sites. It is a measure of the "population mutation rate" (the product of the effective population size and the neutral mutation rate) from the observed nucleotide diversity of a population. , [3] where is the effective population size and is the per-generation mutation rate of the population of interest (Watterson (1975) ). The assumptions made are that there is a sample of
The estimate of , often denoted as , is
where is the number of segregating sites (an example of a segregating site would be a single-nucleotide polymorphism) in the sample and
is the th harmonic number.
This estimate is based on coalescent theory. Watterson's estimator is commonly used for its simplicity. When its assumptions are met, the estimator is unbiased and the variance of the estimator decreases with increasing sample size or recombination rate. However, the estimator can be biased by population structure. For example, is downwardly biased in an exponentially growing population. It can also be biased by violation of the infinite-sites mutational model; if multiple mutations can overwrite one another, Watterson's estimator will be biased downward.
Comparing the value of the Watterson's estimator, to nucleotide diversity is the basis of Tajima's D which allows inference of the evolutionary regime of a given locus.
See also
- Tajima's D
- Coupon collector's problem
- Ewens sampling formula
References
- ^ Yong, Ed (2019-02-11). "The Women Who Contributed to Science but Were Buried in Footnotes". The Atlantic. Retrieved 2019-02-13.
- PMID 30733376.
- PMID 25595553.
- Watterson, G.A. (1975), "On the number of segregating sites in genetical models without recombination.", Theoretical Population Biology, 7 (2): 256–276, PMID 1145509
- McVean, Gil; Awadalla, Philip; Fearnhead, Paul (2002) "A Coalescent-Based Method for Detecting and Estimating Recombination From Gene Sequences", Genetics, 160, 1231–1241.