F-statistics

In

Hardy–Weinberg expectation

.

F-statistics can also be thought of as a measure of the correlation between genes drawn at different levels of a (hierarchically) subdivided population. This correlation is influenced by several evolutionary processes, such as genetic drift, founder effect, bottleneck, genetic hitchhiking, meiotic drive, mutation, gene flow, inbreeding, natural selection, or the Wahlund effect, but it was originally designed to measure the amount of allelic fixation owing to genetic drift.

The concept of F-statistics was developed during the 1920s by the American geneticist

complete dominance causes the phenotypes of homozygote dominants and heterozygotes to be the same, it was not until the advent of molecular genetics

from the 1960s onwards that heterozygosity in populations could be measured.

F can be used to define effective population size.^{[further explanation needed]}

Definitions and equations

The measures F_IS, F_ST, and F_IT are related to the amounts of heterozygosity at various levels of population structure. Together, they are called F-statistics, and are derived from F, the inbreeding coefficient. In a simple two-allele system with inbreeding, the genotypic frequencies are:

p^{2}(1-F)+pF{\text{ for }}\mathbf {AA} ;\ 2pq(1-F){\text{ for }}\mathbf {Aa} ;{\text{ and }}q^{2}(1-F)+qF{\text{ for }}\mathbf {aa} .

The value for $F$ is found by solving the equation for $F$ using heterozygotes in the above inbred population. This becomes one minus the observed frequency of heterozygotes in a population divided by the expected frequency of heterozygotes at Hardy–Weinberg equilibrium:

F=1-{\frac {\operatorname {O} (f(\mathbf {Aa} ))}{\operatorname {E} (f(\mathbf {Aa} ))}}=1-{\frac {\operatorname {ObservedFrequency} (\mathbf {Aa} )}{\operatorname {ExpectedFrequency} (\mathbf {Aa} )}},\!

where the expected frequency at Hardy–Weinberg equilibrium is given by

\operatorname {E} (f(\mathbf {Aa} ))=2pq,\!

where $p$ and $q$ are the

allele frequencies

of

\mathbf {A}

and

\mathbf {a}

, respectively. It is also the probability that at any

locus, two alleles from a random individual of the population are identical by descent

.

For example, consider the data from

E.B. Ford (1971) on a single population of the scarlet tiger moth

:

**Table 1:**
Genotype	White-spotted ( $\mathbf {AA}$ )	Intermediate ( $\mathbf {Aa}$ )	Little spotting ( $\mathbf {aa}$ )	Total
Number	1469	138	5	1612

From this, the

allele frequencies

can be calculated, and the expectation of

f\left(\mathbf {Aa} \right)

derived :

p={2\times \mathrm {obs} (AA)+\mathrm {obs} (Aa) \over 2\times (\mathrm {obs} (AA)+\mathrm {obs} (Aa)+\mathrm {obs} (aa))}=0.954

q=1-p=0.046\,

F=1-{\frac {\mathrm {obs} (Aa)/n}{2pq}}=1-{138/1612 \over 2(0.954)(0.046)}=0.023

The different F-statistics look at different levels of population structure. F_IT is the inbreeding coefficient of an individual (I) relative to the total (T) population, as above; F_IS is the inbreeding coefficient of an individual (I) relative to the subpopulation (S), using the above for subpopulations and averaging them; and F_ST is the effect of subpopulations (S) compared to the total population (T), and is calculated by solving the equation:

(1-F_{IS})(1-F_{ST})=1-F_{IT},\,

as shown in the next section.

Partition due to population structure

$F_{IT}$ can be partitioned into $F_{ST}$ due to the Wahlund effect and $F_{IS}$ due to inbreeding.

Consider a population that has a

population structure

of two levels; one from the individual (I) to the subpopulation (S) and one from the subpopulation to the total (T). Then the total

F

, known here as

F_{IT}

, can be

partitioned

into

F_{IS}

and

F_{ST}

:

1-F_{IT}=(1-F_{IS})\,(1-F_{ST}).\!

This may be further partitioned for population substructure, and it expands according to the rules of

binomial expansion

, so that for I partitions:

1-F=\prod _{i=0}^{i=I}(1-F_{i,i+1})\!

Fixation index

A reformulation of the definition of $F$ would be the ratio of the average number of differences between pairs of chromosomes sampled within diploid individuals with the average number obtained when sampling chromosomes randomly from the population (excluding the grouping per individual). One can modify this definition and consider a grouping per sub-population instead of per individual. Population geneticists have used that idea to measure the degree of structure in a population.

Unfortunately, there is a large number of definitions for $F_{ST}$ , causing some confusion in the scientific literature. A common definition is the following:

F_{ST}={\frac {\operatorname {var} (\mathbf {p} )}{p\,(1-p)}}\!

where the variance of $\mathbf {p}$ is computed across sub-populations and $p\,(1-p)$ is the expected frequency of heterozygotes.

Fixation index in human populations

It is well established that the genetic diversity among human populations is low,^[3] although the distribution of the genetic diversity was only roughly estimated. Early studies argued that 85–90% of the genetic variation is found within individuals residing in the same populations within continents (intra-continental populations) and only an additional 10–15% is found between populations of different continents (continental populations).^[4]^[5]^[6]^[7]^[8] Later studies based on hundreds of thousands single-nucleotide polymorphism (SNPs) suggested that the genetic diversity between continental populations is even smaller and accounts for 3 to 7%^[9]^[10]^[11]^[12]^[13]^[14] A later study based on three million SNPs found that 12% of the genetic variation is found between continental populations and only 1% within them.^[15] Most of these studies have used the F_ST statistics ^[16] or closely related statistics.^[17]^[18]

References

S2CID 36311175
.

PMID 4063030
.

PMID 19687804
.

ISBN 978-1-4684-9065-7
.

PMID 1992475
.

PMID 9114021
.

PMID 10712212
.

PMID 15508000
.

PMID 16957813
.

PMID 18713460
.

PMID 18691889
.

PMID 19442770
.

PMID 19424496
.

PMID 19779445
.

PMID 23185452
.

JSTOR 2406450
.

PMID 1933444
.

PMID 1644282
.

External links

Shane's Simple Guide to F-Statistics

Analyzing the genetic structure of populations

Wahlund effect, Wright's F-statistics Archived 2005-05-27 at the Wayback Machine

Worked example of calculating F-statistics from genotypic data

IAM based F-statistics

F-statistics for Population Genetics Eco-Tool

Population Structure (slides)

v
t
e
Population genetics
Key concepts

Hardy–Weinberg principle

Genetic linkage

Identity by descent

Linkage disequilibrium

Fisher's fundamental theorem

Neutral theory

Shifting balance theory

Price equation

Coefficient of inbreeding

Coefficient of relationship

Selection coefficient

Fitness

Heritability

Population structure

Constructive neutral evolution

Selection

Natural

Artificial

Sexual

Ecological

Effects of selection
on genomic variation

Genetic hitchhiking

Background selection

Genetic drift

Small population size

Population bottleneck

Founder effect

Coalescence

Balding–Nichols model

Founders

R. A. Fisher

J. B. S. Haldane

Sewall Wright

Related topics

Biogeography

Evolution

Evolutionary game theory

Fitness landscape

Genetic genealogy

Landscape genetics and genomics

Microevolution

Population genomics

Phylogeography

Quantitative genetics

Index of evolutionary biology articles

Retrieved from "https://en.wikipedia.org/w/index.php?title=F-statistics&oldid=1217199902"

[1] S2CID 36311175
.

[2] PMID 4063030
.

[3] PMID 19687804
.

[4] ISBN 978-1-4684-9065-7
.

[5] PMID 1992475
.

[6] PMID 9114021
.

[7] PMID 10712212
.

[8] PMID 15508000
.

[9] PMID 16957813
.

[10] PMID 18713460
.

[11] PMID 18691889
.

[12] PMID 19442770
.

[13] PMID 19424496
.

[14] PMID 19779445
.

[15] PMID 23185452
.

[16] JSTOR 2406450
.

[17] PMID 1933444
.

[18] PMID 1644282
.

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

Definitions and equations

Partition due to population structure

Fixation index

Fixation index in human populations

See also

References

External links