Statistical distance
This article has multiple issues. Please help improve it or discuss these issues on the talk page. (Learn how and when to remove these template messages)
|
In
A distance between populations can be interpreted as measuring the distance between two
Many statistical distance measures are not
Terminology
Many terms are used to refer to various notions of distance; these are often confusingly similar, and may be used inconsistently between authors and over time, either loosely or with precise technical meaning. In addition to "distance", similar terms include
Distances as metrics
Metrics
A metric on a set X is a function (called the distance function or simply distance) d : X × X → R+ (where R+ is the set of non-negative real numbers). For all x, y, z in X, this function is required to satisfy the following conditions:
- d(x, y) ≥ 0 (non-negativity)
- d(x, y) = 0 if and only if x = y (identity of indiscernibles. Note that condition 1 and 2 together produce positive definiteness)
- d(x, y) = d(y, x) (symmetry)
- d(x, z) ≤ d(x, y) + d(y, z) (subadditivity / triangle inequality).
Generalized metrics
Many statistical distances are not
Statistically close
The total variation distance of two distributions and over a finite domain , (often referred to as statistical difference[2] or statistical distance[3] in cryptography) is defined as
.
We say that two
Examples
Metrics
- Total variation distance(sometimes just called "the" statistical distance)
- Hellinger distance
- Lévy–Prokhorov metric
- Wasserstein metric: also known as the Kantorovich metric, or earth mover's distance
- Mahalanobis distance
- Amari distance
- Integral probability metrics generalize several metrics or pseudometrics on distributions
Divergences
- Kullback–Leibler divergence
- Rényi divergence
- Jensen–Shannon divergence
- Bhattacharyya distance (despite its name it is not a distance, as it violates the triangle inequality)
- f-divergence: generalizes several distances and divergences
- Bayes discriminability index, is a positive-definite symmetric measure of the overlap of two distributions.
See also
Notes
- ^ Dodge, Y. (2003)—entry for distance
- ^
ISBN 0-521-79172-3.
- ^ Reyzin, Leo. (Lecture Notes) Extractors and the Leftover Hash Lemma
External links
References
- Dodge, Y. (2003) Oxford Dictionary of Statistical Terms, OUP. ISBN 0-19-920613-9