Statistical distance

In

samples

, or the distance can be between an individual sample point and a population or a wider sample of points.

A distance between populations can be interpreted as measuring the distance between two

statistical dependence,^[1]

and hence these distances are not directly related to measures of distances between probability measures. Again, a measure of distance between random variables may relate to the extent of dependence between them, rather than to their individual values.

Many statistical distance measures are not

metrics, and some are not symmetric. Some types of distance measures, which generalize squared distance, are referred to as (statistical) divergences

.

Terminology

Many terms are used to refer to various notions of distance; these are often confusingly similar, and may be used inconsistently between authors and over time, either loosely or with precise technical meaning. In addition to "distance", similar terms include

information gain

.

Distances as metrics

Metrics

A metric on a set X is a function (called the distance function or simply distance) d : X × X → R⁺ (where R⁺ is the set of non-negative real numbers). For all x, y, z in X, this function is required to satisfy the following conditions:

d(x, y) ≥ 0 (non-negativity
)
d(x, y) = 0 if and only if x = y (identity of indiscernibles. Note that condition 1 and 2 together produce positive definiteness)
d(x, y) = d(y, x) (symmetry)
d(x, z) ≤ d(x, y) + d(y, z) (subadditivity / triangle inequality).

Generalized metrics

Many statistical distances are not

semimetrics violate property (4), the triangle inequality. Statistical distances that satisfy (1) and (2) are referred to as divergences

.

Statistically close

The total variation distance of two distributions $X$ and $Y$ over a finite domain $D$ , (often referred to as statistical difference^[2] or statistical distance^[3] in cryptography) is defined as

$\Delta (X,Y)={\frac {1}{2}}\sum _{\alpha \in D}|\Pr[X=\alpha ]-\Pr[Y=\alpha ]|$ .

We say that two

probability ensembles

\{X_{k}\}_{k\in \mathbb {N} }

and

\{Y_{k}\}_{k\in \mathbb {N} }

are statistically close if

\Delta (X_{k},Y_{k})

is a negligible function in

k

.

Examples

Metrics

Total variation distance
(sometimes just called "the" statistical distance)
Hellinger distance
Lévy–Prokhorov metric
Wasserstein metric: also known as the Kantorovich metric, or earth mover's distance
Mahalanobis distance
Amari distance
Integral probability metrics generalize several metrics or pseudometrics on distributions

Divergences

Kullback–Leibler divergence
Rényi divergence
Jensen–Shannon divergence
Bhattacharyya distance (despite its name it is not a distance, as it violates the triangle inequality)
f-divergence: generalizes several distances and divergences
Bayes discriminability index
, is a positive-definite symmetric measure of the overlap of two distributions.

Notes

^ Dodge, Y. (2003)—entry for distance
^
ISBN 0-521-79172-3
.

^ Reyzin, Leo. (Lecture Notes) Extractors and the Leftover Hash Lemma

External links

Distance and Similarity Measures (Wolfram Alpha)

References

Dodge, Y. (2003) Oxford Dictionary of Statistical Terms, OUP.
ISBN 0-19-920613-9

[1] Dodge, Y. (2003)—entry for distance

[2] 
ISBN 0-521-79172-3
.

[3] Reyzin, Leo. (Lecture Notes) Extractors and the Leftover Hash Lemma

[1]

[2]

[3]

Correlation		Pearson product-moment Partial correlation Confounding variable Coefficient of determination
Regression analysis	Errors and residuals Regression validation Mixed effects models Simultaneous equations models Multivariate adaptive regression splines (MARS)
Linear regression	Simple linear regression Ordinary least squares General linear model Bayesian regression
Non-standard predictors	Nonlinear regression Nonparametric Semiparametric Isotonic Robust Heteroscedasticity Homoscedasticity
Generalized linear model	Exponential families Logistic (Bernoulli) / Binomial / Poisson regressions
Partition of variance	Analysis of variance (ANOVA, anova) Analysis of covariance Multivariate ANOVA Degrees of freedom

Terminology

Distances as metrics

Metrics

Generalized metrics

Statistically close

Examples

Metrics

Divergences

See also

Notes

External links

References