In statistical theory, a U-statistic is a class of statistics defined as the average over the application of a given function applied to all tuples of a fixed size. The letter "U" stands for unbiased. In elementary statistics, U-statistics arise naturally in producing minimum-variance unbiased estimators.
: For example, for every probability distribution, the population median is an estimable parameter. The theory of U-statistics applies to general classes of probability distributions.
History
Many statistics originally derived for particular parametric families have been recognized as U-statistics for general distributions. In
asymptotic normality and to the variance (in finite samples) of such quantities.[3] The theory has been used to study more general statistics as well as stochastic processes, such as random graphs.[4][5][6]
Suppose that a problem involves
independent and identically-distributed random variables
and that estimation of a certain parameter is required. Suppose that a simple unbiased estimate can be constructed based on only a few observations: this defines the basic estimator based on a given number of observations. For example, a single observation is itself an unbiased estimate of the mean and a pair of observations can be used to derive an unbiased estimate of the variance. The U-statistic based on this estimator is defined as the average (across all combinatorial selections of the given size from the full set of observations) of the basic estimator applied to the sub-samples.
independent and identically-distributed random variables or to scalar random-variables.[9]
Definition
The term U-statistic, due to Hoeffding (1948), is defined as follows.
Let be either the real or complex numbers, and let be a -valued function of -dimensional variables.
For each the associated U-statistic is defined to be the average of the values over the set of -tuples of indices from with distinct entries.
Formally,
.
In particular, if is symmetric the above is simplified to
,
where now denotes the subset of of increasing tuples.
U-statistics are very natural in statistical work, particularly in Hoeffding's context of
simple random sampling
from a finite population, where the defining property is termed ‘inheritance on the average’.
Fisher's k-statistics and Tukey's polykays are examples of homogeneous polynomial U-statistics (Fisher, 1929; Tukey, 1950).
For a simple random sample φ of size n taken from a population of size N, the U-statistic has the property that the average over sample values ƒn(xφ) is exactly equal to the population value ƒN(x).[clarification needed]
Examples
Some examples:
If the U-statistic is the sample mean.
If , the U-statistic is the mean pairwise deviation
, defined for .
If , the U-statistic is the
sample variance
with divisor , defined for .
The third -statistic ,
the sample skewness defined for ,
is a U-statistic.
The following case highlights an important point. If is the median of three values, is not the median of values. However, it is a minimum variance unbiased estimate of the expected value of the median of three values, not the median of the population. Similar estimates play a central role where the parameters of a family of
Koroljuk, V. S.; Borovskich, Yu. V. (1994). Theory of U-statistics. Mathematics and its Applications. Vol. 273 (Translated by P. V. Malyshev and D. V. Malyshev from the 1989 Russian original ed.). Dordrecht: Kluwer Academic Publishers Group. pp. x+552.