Asymptotic theory (statistics)

In

sample size

n

may grow indefinitely; the properties of estimators and tests are then evaluated under the limit of

n \to \infty

. In practice, a limit evaluation is considered to be approximately valid for large finite sample sizes too.^[1]

Overview

Most statistical problems begin with a dataset of

size

n

. The asymptotic theory proceeds by assuming that it is possible (in principle) to keep collecting additional data, thus that the sample size grows infinitely, i.e.

n \to \infty

. Under the assumption, many results can be obtained that are unavailable for samples of finite size. An example is the weak law of large numbers. The law states that for a sequence of independent and identically distributed (IID) random variables

X 1, X 2, ...

, if one value is drawn from each random variable and the average of the first

n

values is computed as

X n

, then the

X n

converge in probability to the population mean

E[X i]

as

n \to \infty

.^[2]

In asymptotic theory, the standard approach is $n \to \infty$ . For some statistical models, slightly different approaches of asymptotics may be used. For example, with panel data, it is commonly assumed that one dimension in the data remains fixed, whereas the other dimension grows: $T = constant$ and $N \to \infty$ , or vice versa.^[2]

Besides the standard approach to asymptotics, other alternative approaches exist:

Within the local asymptotic normality framework, it is assumed that the value of the "true parameter" in the model varies slightly with $n$ , such that the $n$ -th model corresponds to $θ n = θ + h / \sqrt n$ . This approach lets us study the regularity of estimators.
When
statistical tests are studied for their power to distinguish against the alternatives that are close to the null hypothesis, it is done within the so-called "local alternatives" framework: the null hypothesis is $H 0 : θ = θ 0$ and the alternative is $H 1 : θ = θ 0 + h / \sqrt n$ . This approach is especially popular for the unit root tests
.

There are models where the dimension of the parameter space

Θ n

slowly expands with

n

, reflecting the fact that the more observations there are, the more structural effects can be feasibly incorporated in the model.

In kernel density estimation and kernel regression, an additional parameter is assumed—the bandwidth $h$ . In those models, it is typically taken that $h \to 0$ as $n \to \infty$ . The rate of convergence must be chosen carefully, though, usually $h \propto n -1/5$ .

In many cases, highly accurate results for finite samples can be obtained via numerical methods (i.e. computers); even in such cases, though, asymptotic analysis can be useful. This point was made by Small (2010, §1.4), as follows.

A primary goal of asymptotic analysis is to obtain a deeper qualitative understanding of quantitative tools. The conclusions of an asymptotic analysis often supplement the conclusions which can be obtained by numerical methods.

Modes of convergence of random variables

Asymptotic properties

Estimators

Consistency

A sequence of estimates is said to be consistent, if it

converges in probability

to the true value of the parameter being estimated:

{\hat {\theta }}_{n}\ {\xrightarrow {\overset {}{p}}}\ \theta _{0}.

That is, roughly speaking with an infinite amount of data the estimator (the formula for generating the estimates) would almost surely give the correct result for the parameter being estimated.^[2]

Asymptotic distribution

If it is possible to find sequences of non-random constants ${a n$ }, ${b n$ } (possibly depending on the value of $θ 0$ ), and a non-degenerate distribution $G$ such that

b_{n}({\hat {\theta }}_{n}-a_{n})\ {\xrightarrow {d}}\ G,

then the sequence of estimators $\textstyle {\hat {\theta }}_{n}$ is said to have the asymptotic distribution G.

Most often, the estimators encountered in practice are asymptotically normal, meaning their asymptotic distribution is the normal distribution, with $a n = θ 0$ , $b n = \sqrt n$ , and $G = N (0, V)$ :

{\sqrt {n}}({\hat {\theta }}_{n}-\theta _{0})\ {\xrightarrow {d}}\ {\mathcal {N}}(0,V).

Asymptotic confidence regions

Asymptotic theorems

References

ISBN 978-3110250244

^
ISBN 978-0387759708

Bibliography

Balakrishnan, N.; Ibragimov, I. A. V. B.; Nevzorov, V. B., eds. (2001), Asymptotic Methods in Probability and Statistics with Applications,
ISBN 9781461202097

Borovkov, A. A.; Borovkov, K. A. (2010), Asymptotic Analysis of Random Walks, Cambridge University Press

Buldygin, V. V.; Solntsev, S. (1997), Asymptotic Behaviour of Linearly Transformed Sums of Random Variables, Springer,
ISBN 9789401155687

Le Cam, Lucien; Yang, Grace Lo (2000), Asymptotics in Statistics (2nd ed.), Springer

Dawson, D.; Kulik, R.; Ould Haye, M.; Szyszkowicz, B.; Zhao, Y., eds. (2015), Asymptotic Laws and Methods in Stochastics,
Springer-Verlag

Höpfner, R. (2014), Asymptotic Statistics,
Walter de Gruyter

Lin'kov, Yu. N. (2001), Asymptotic Statistical Methods for Stochastic Processes, American Mathematical Society

Oliveira, P. E. (2012), Asymptotics for Associated Random Variables, Springer

Petrov, V. V. (1995), Limit Theorems of Probability Theory, Oxford University Press

Sen, P. K.; Singer, J. M.; Pedroso de Lima, A. C. (2009), From Finite Sample to Asymptotic Methods in Statistics, Cambridge University Press

Shiryaev, A. N.; Spokoiny, V. G. (2000), Statistical Experiments and Decisions: Asymptotic theory, World Scientific

Small, C. G. (2010), Expansions and Asymptotics for Statistics, Chapman & Hall

van der Vaart, A. W. (1998), Asymptotic Statistics, Cambridge University Press

v
t
e
Statistics

Outline

Index

Continuous data
Center

Mean
Arithmetic

Arithmetic-Geometric

Cubic

Generalized/power

Geometric

Harmonic

Heronian

Heinz

Lehmer

Median

Mode

Dispersion

Average absolute deviation

Coefficient of variation

Interquartile range

Percentile

Range

Standard deviation

Variance

Shape

Central limit theorem

Moments
Kurtosis

L-moments

Skewness

Count data

Index of dispersion

Summary tables

Contingency table

Frequency distribution

Grouped data

Dependence

Partial correlation

Pearson product-moment correlation

Rank correlation
Kendall's τ

Spearman's ρ

Scatter plot

Graphics

Bar chart

Biplot

Box plot

Control chart

Correlogram

Fan chart

Forest plot

Histogram

Pie chart

Q–Q plot

Radar chart

Run chart

Scatter plot

Stem-and-leaf display

Violin plot

Data collection
Study design

Effect size

Missing data

Optimal design

Population

Replication

Sample size determination

Statistic

Statistical power

Survey methodology

Sampling
Cluster

Stratified

Opinion poll

Questionnaire

Standard error

Controlled experiments

Blocking

Factorial experiment

Interaction

Random assignment

Randomized controlled trial

Randomized experiment

Scientific control

Adaptive designs

Adaptive clinical trial

Stochastic approximation

Up-and-down designs

Observational studies

Cohort study

Cross-sectional study

Natural experiment

Quasi-experiment

Statistical inference
Statistical theory

Population

Statistic

Probability distribution

Sampling distribution
Order statistic

Empirical distribution
Density estimation

Statistical model
Model specification

L^p space

Parameter
location

scale

shape

Parametric family
Likelihood (monotone)

Location–scale family

Exponential family

Completeness

Sufficiency

Statistical functional

Bootstrap

U

V

Optimal decision
loss function

Efficiency

Statistical distance
divergence

Asymptotics

Robustness

Frequentist inference
Point estimation

Estimating equations
Maximum likelihood

Method of moments

M-estimator

Minimum distance

Unbiased estimators
Mean-unbiased minimum-variance
Rao–Blackwellization

Lehmann–Scheffé theorem

Median unbiased

Plug-in

Interval estimation

Confidence interval

Pivot

Likelihood interval

Prediction interval

Tolerance interval

Resampling
Bootstrap

Jackknife

Testing hypotheses

1- & 2-tails

Power

Uniformly most powerful test

Permutation test
Randomization test

Multiple comparisons

Parametric tests

Likelihood-ratio

Score/Lagrange multiplier

Wald

Specific tests

Z-test (normal)

Student's t-test

F-test

Goodness of fit

Chi-squared

G-test

Kolmogorov–Smirnov

Anderson–Darling

Lilliefors

Jarque–Bera

Normality (Shapiro–Wilk)

Likelihood-ratio test

Model selection
Cross validation

AIC

BIC

Rank statistics

Sign
Sample median

Signed rank (Wilcoxon)
Hodges–Lehmann estimator

Rank sum (Mann–Whitney)

Nonparametric anova
1-way (Kruskal–Wallis)

2-way (Friedman)

Ordered alternative (Jonckheere–Terpstra)

Van der Waerden test

Bayesian inference

Bayesian probability
prior

posterior

Credible interval

Bayes factor

Bayesian estimator
Maximum posterior estimator

Correlation

Pearson product-moment

Partial correlation

Confounding variable

Coefficient of determination

Regression analysis

Errors and residuals

Regression validation

Mixed effects models

Simultaneous equations models

Multivariate adaptive regression splines (MARS)

Linear regression

Simple linear regression

Ordinary least squares

General linear model

Bayesian regression

Non-standard predictors

Nonlinear regression

Nonparametric

Semiparametric

Isotonic

Robust

Heteroscedasticity

Homoscedasticity

Generalized linear model

Exponential families

Logistic (Bernoulli) / Binomial / Poisson regressions

Partition of variance

Analysis of variance (ANOVA, anova)

Analysis of covariance

Multivariate ANOVA

Degrees of freedom

Categorical / Multivariate / Time-series / Survival analysis
Categorical

Cohen's kappa

Contingency table

Graphical model

Log-linear model

McNemar's test

Cochran–Mantel–Haenszel statistics

Multivariate

Regression

Manova

Principal components

Canonical correlation

Discriminant analysis

Cluster analysis

Classification

Structural equation model
Factor analysis

Multivariate distributions

Elliptical distributions
Normal

Time-series
General

Decomposition

Trend

Stationarity

Seasonal adjustment

Exponential smoothing

Cointegration

Structural break

Granger causality

Specific tests

Dickey–Fuller

Johansen

Q-statistic (Ljung–Box)

Durbin–Watson

Breusch–Godfrey

Time domain

Autocorrelation (ACF)
partial (PACF)

Cross-correlation (XCF)

ARMA model

ARIMA model (Box–Jenkins)

Autoregressive conditional heteroskedasticity (ARCH)

Vector autoregression (VAR)

Frequency domain

Spectral density estimation

Fourier analysis

Least-squares spectral analysis

Wavelet

Whittle likelihood

Survival
Survival function

Kaplan–Meier estimator (product limit)

Proportional hazards models

Accelerated failure time (AFT) model

First hitting time

Hazard function

Nelson–Aalen estimator

Test

Log-rank test

Applications
Biostatistics

Bioinformatics

Clinical trials / studies

Epidemiology

Medical statistics

Engineering statistics

Chemometrics

Methods engineering

Probabilistic design

Process / quality control

Reliability

System identification

Social statistics

Actuarial science

Census

Crime statistics

Demography

Econometrics

Jurimetrics

National accounts

Official statistics

Population statistics

Psychometrics

Spatial statistics

Cartography

Environmental statistics

Geographic information system

Geostatistics

Kriging

Category

Mathematics portal

Commons

WikiProject

Authority control databases: National

Germany

Czech Republic