Cointegration

Cointegration is a

statistical property of a collection

(X 1, X 2, ..., X k)

of time series variables. First, all of the series must be integrated of order d (see Order of integration). Next, if a linear combination of this collection is integrated of order less than d, then the collection is said to be co-integrated. Formally, if (X,Y,Z) are each integrated of order d, and there exist coefficients a,b,c such that

aX + bY + cZ

is integrated of order less than d, then X, Y, and Z are cointegrated. Cointegration has become an important property in contemporary time series analysis. Time series often have trends—either deterministic or stochastic. In an influential paper,^[1] Charles Nelson and Charles Plosser

(1982) provided statistical evidence that many US macroeconomic time series (like GNP, wages, employment, etc.) have stochastic trends.

Introduction

If two or more series are individually integrated (in the time series sense) but some linear combination of them has a lower order of integration, then the series are said to be cointegrated. A common example is where the individual series are first-order integrated ( $I(1)$ ) but some (cointegrating) vector of coefficients exists to form a

statistically significant

connection between the futures price and the spot price could now be done by testing for the existence of a cointegrated combination of the two series.

History

The first to introduce and analyse the concept of spurious—or nonsense—regression was Udny Yule in 1926.^[2] Before the 1980s, many economists used

Robert Engle formalized the cointegrating vector approach, and coined the term.^[6]

For integrated $I(1)$ processes, Granger and Newbold showed that de-trending does not work to eliminate the problem of spurious correlation, and that the superior alternative is to check for co-integration. Two series with $I(1)$ trends can be co-integrated only if there is a genuine relationship between the two. Thus the standard current methodology for time series regressions is to check all-time series involved for integration. If there are $I(1)$ series on both sides of the regression relationship, then it is possible for regressions to give misleading results.

The possible presence of cointegration must be taken into account when choosing a technique to test hypotheses concerning the relationship between two variables having unit roots (i.e. integrated of at least order one).^[3] The usual procedure for testing hypotheses concerning the relationship between non-stationary variables was to run ordinary least squares (OLS) regressions on data which had been differenced. This method is biased if the non-stationary variables are cointegrated.

For example, regressing the consumption series for any country (e.g. Fiji) against the GNP for a randomly selected dissimilar country (e.g. Afghanistan) might give a high

spurious regression

: two integrated

I(1)

series which are not directly causally related may nonetheless show a significant correlation.

Tests

The six main methods for testing for cointegration are:

Engle–Granger two-step method

If $x_{t}$ and $y_{t}$ both have order of integration d=1 and are cointegrated, then a linear combination of them must be stationary for some value of $\beta$ and $u_{t}$ . In other words:

y_{t}-\beta x_{t}=u_{t}\,

where $u_{t}$ is stationary.

If $\beta$ is known, we can test $u_{t}$ for stationarity with an Augmented Dickey–Fuller test or Phillips–Perron test. If $\beta$ is unknown, we must first estimate it. This is typically done by using ordinary least squares (by regressing $y_{t}$ on $x_{t}$ and an intercept). Then, we can run an ADF test on $u_{t}$ . However, when $\beta$ is estimated, the critical values of this ADF test are non-standard, and increase in absolute value as more regressors are included.^[7]

If the variables are found to be cointegrated, a second-stage regression is conducted. This is a regression of $\Delta y_{t}$ on the lagged regressors, $\Delta x_{t}$ and the lagged residuals from the first stage, ${\hat {u}}_{t-1}$ . The second stage regression is given as: $\Delta y_{t}=\Delta x_{t}b+\alpha u_{t-1}+\varepsilon _{t}$

If the variables are not cointegrated (if we cannot reject the null of no cointegration when testing $u_{t}$ ), then $\alpha =0$ and we estimate a differences model: $\Delta y_{t}=\Delta x_{t}b+\varepsilon _{t}$

Johansen test

The Johansen test is a test for cointegration that allows for more than one cointegrating relationship, unlike the Engle–Granger method, but this test is subject to asymptotic properties, i.e. large samples. If the sample size is too small then the results will not be reliable and one should use Auto Regressive Distributed Lags (ARDL).^[8]^[9]

Phillips–Ouliaris cointegration test

Peter C. B. Phillips and Sam Ouliaris (1990) show that residual-based unit root tests applied to the estimated cointegrating residuals do not have the usual Dickey–Fuller distributions under the null hypothesis of no-cointegration.^[10]

Because of the spurious regression phenomenon under the null hypothesis, the distribution of these tests have asymptotic distributions that depend on (1) the number of deterministic trend terms and (2) the number of variables with which co-integration is being tested. These distributions are known as Phillips–Ouliaris distributions and critical values have been tabulated. In finite samples, a superior alternative to the use of these asymptotic critical value is to generate critical values from simulations.

Multicointegration

In practice, cointegration is often used for two $I(1)$ series, but it is more generally applicable and can be used for variables integrated of higher order (to detect correlated accelerations or other second-difference effects). Multicointegration extends the cointegration technique beyond two variables, and occasionally to variables integrated at different orders.

Variable shifts in long time series

Tests for cointegration assume that the cointegrating vector is constant during the period of study. In reality, it is possible that the long-run relationship between the underlying variables change (shifts in the cointegrating vector can occur). The reason for this might be technological progress, economic crises, changes in the people's preferences and behaviour accordingly, policy or regime alteration, and organizational or institutional developments. This is especially likely to be the case if the sample period is long. To take this issue into account, tests have been introduced for cointegration with one unknown structural break,^[11] and tests for cointegration with two unknown breaks are also available.^[12]

Bayesian inference

Several

Bayesian methods have been proposed to compute the posterior distribution of the number of cointegrating relationships and the cointegrating linear combinations.^[13]

References

doi:10.1016/0304-3932(82)90012-5
.

S2CID 126346450
.

^
doi:10.1016/0304-4076(74)90034-7
.

S2CID 154550363
.

doi:10.1016/0304-4076(81)90079-8
.

JSTOR 1913236
.

^ https://www.econ.queensu.ca/sites/econ.queensu.ca/files/wpaper/qed_wp_1227.pdf ^{[bare URL PDF]}

^ Giles, David (19 June 2013). "ARDL Models - Part II - Bounds Tests". Retrieved 4 August 2014.

hdl:10983/25617
.

JSTOR 2938339
.

doi:10.1016/0304-4076(69)41685-7
.

S2CID 153437469
.

ISBN 978-1-4039-4155-8
.

Further reading

Enders, Walter (2004). "Cointegration and Error-Correction Models". Applied Econometrics Time Series (Second ed.). New York: Wiley. pp. 319–386.
ISBN 978-0-471-23065-6
.

ISBN 978-0-691-01018-2
.

ISBN 978-0-521-58782-2
.

Murray, Michael P. (1994). "A Drunk and her Dog: An Illustration of Cointegration and Error Correction" (PDF). doi:10.1080/00031305.1994.10476017
. An intuitive introduction to cointegration.

v
t
e
Statistics

Outline

Index

Continuous data
Center

Mean
Arithmetic

Arithmetic-Geometric

Cubic

Generalized/power

Geometric

Harmonic

Heronian

Heinz

Lehmer

Median

Mode

Dispersion

Average absolute deviation

Coefficient of variation

Interquartile range

Percentile

Range

Standard deviation

Variance

Shape

Central limit theorem

Moments
Kurtosis

L-moments

Skewness

Count data

Index of dispersion

Summary tables

Contingency table

Frequency distribution

Grouped data

Dependence

Partial correlation

Pearson product-moment correlation

Rank correlation
Kendall's τ

Spearman's ρ

Scatter plot

Graphics

Bar chart

Biplot

Box plot

Control chart

Correlogram

Fan chart

Forest plot

Histogram

Pie chart

Q–Q plot

Radar chart

Run chart

Scatter plot

Stem-and-leaf display

Violin plot

Data collection
Study design

Effect size

Missing data

Optimal design

Population

Replication

Sample size determination

Statistic

Statistical power

Survey methodology

Sampling
Cluster

Stratified

Opinion poll

Questionnaire

Standard error

Controlled experiments

Blocking

Factorial experiment

Interaction

Random assignment

Randomized controlled trial

Randomized experiment

Scientific control

Adaptive designs

Adaptive clinical trial

Stochastic approximation

Up-and-down designs

Observational studies

Cohort study

Cross-sectional study

Natural experiment

Quasi-experiment

Statistical inference
Statistical theory

Population

Statistic

Probability distribution

Sampling distribution
Order statistic

Empirical distribution
Density estimation

Statistical model
Model specification

L^p space

Parameter
location

scale

shape

Parametric family
Likelihood (monotone)

Location–scale family

Exponential family

Completeness

Sufficiency

Statistical functional

Bootstrap

U

V

Optimal decision
loss function

Efficiency

Statistical distance
divergence

Asymptotics

Robustness

Frequentist inference
Point estimation

Estimating equations
Maximum likelihood

Method of moments

M-estimator

Minimum distance

Unbiased estimators
Mean-unbiased minimum-variance
Rao–Blackwellization

Lehmann–Scheffé theorem

Median unbiased

Plug-in

Interval estimation

Confidence interval

Pivot

Likelihood interval

Prediction interval

Tolerance interval

Resampling
Bootstrap

Jackknife

Testing hypotheses

1- & 2-tails

Power

Uniformly most powerful test

Permutation test
Randomization test

Multiple comparisons

Parametric tests

Likelihood-ratio

Score/Lagrange multiplier

Wald

Specific tests

Z-test (normal)

Student's t-test

F-test

Goodness of fit

Chi-squared

G-test

Kolmogorov–Smirnov

Anderson–Darling

Lilliefors

Jarque–Bera

Normality (Shapiro–Wilk)

Likelihood-ratio test

Model selection
Cross validation

AIC

BIC

Rank statistics

Sign
Sample median

Signed rank (Wilcoxon)
Hodges–Lehmann estimator

Rank sum (Mann–Whitney)

Nonparametric anova
1-way (Kruskal–Wallis)

2-way (Friedman)

Ordered alternative (Jonckheere–Terpstra)

Van der Waerden test

Bayesian inference

Bayesian probability
prior

posterior

Credible interval

Bayes factor

Bayesian estimator
Maximum posterior estimator

Correlation

Pearson product-moment

Partial correlation

Confounding variable

Coefficient of determination

Regression analysis

Errors and residuals

Regression validation

Mixed effects models

Simultaneous equations models

Multivariate adaptive regression splines (MARS)

Linear regression

Simple linear regression

Ordinary least squares

General linear model

Bayesian regression

Non-standard predictors

Nonlinear regression

Nonparametric

Semiparametric

Isotonic

Robust

Heteroscedasticity

Homoscedasticity

Generalized linear model

Exponential families

Logistic (Bernoulli) / Binomial / Poisson regressions

Partition of variance

Analysis of variance (ANOVA, anova)

Analysis of covariance

Multivariate ANOVA

Degrees of freedom

Categorical / Multivariate / Time-series / Survival analysis
Categorical

Cohen's kappa

Contingency table

Graphical model

Log-linear model

McNemar's test

Cochran–Mantel–Haenszel statistics

Multivariate

Regression

Manova

Principal components

Canonical correlation

Discriminant analysis

Cluster analysis

Classification

Structural equation model
Factor analysis

Multivariate distributions

Elliptical distributions
Normal

Time-series
General

Decomposition

Trend

Stationarity

Seasonal adjustment

Exponential smoothing

Cointegration

Structural break

Granger causality

Specific tests

Dickey–Fuller

Johansen

Q-statistic (Ljung–Box)

Durbin–Watson

Breusch–Godfrey

Time domain

Autocorrelation (ACF)
partial (PACF)

Cross-correlation (XCF)

ARMA model

ARIMA model (Box–Jenkins)

Autoregressive conditional heteroskedasticity (ARCH)

Vector autoregression (VAR)

Frequency domain

Spectral density estimation

Fourier analysis

Least-squares spectral analysis

Wavelet

Whittle likelihood

Survival
Survival function

Kaplan–Meier estimator (product limit)

Proportional hazards models

Accelerated failure time (AFT) model

First hitting time

Hazard function

Nelson–Aalen estimator

Test

Log-rank test

Applications
Biostatistics

Bioinformatics

Clinical trials / studies

Epidemiology

Medical statistics

Engineering statistics

Chemometrics

Methods engineering

Probabilistic design

Process / quality control

Reliability

System identification

Social statistics

Actuarial science

Census

Crime statistics

Demography

Econometrics

Jurimetrics

National accounts

Official statistics

Population statistics

Psychometrics

Spatial statistics

Cartography

Environmental statistics

Geographic information system

Geostatistics

Kriging

Category

Mathematics portal

Commons

WikiProject

Authority control databases: National

France

BnF data

Germany

Israel

United States

Retrieved from "https://en.wikipedia.org/w/index.php?title=Cointegration&oldid=1209965463"

[1] :10.1016/0304-3932(82)90012-5
.

[2] S2CID 126346450
.

[GrangerNewbold-3] 
doi:10.1016/0304-4076(74)90034-7
.

[4] S2CID 154550363
.

[5] :10.1016/0304-4076(81)90079-8
.

[6] JSTOR 1913236
.

[7] ttps://www.econ.queensu.ca/sites/econ.queensu.ca/files/wpaper/qed_wp_1227.pdf ^{[bare URL PDF]}

[8] Giles, David (19 June 2013). "ARDL Models - Part II - Bounds Tests". Retrieved 4 August 2014.

[9] hdl:10983/25617
.

[10] JSTOR 2938339
.

[11] :10.1016/0304-4076(69)41685-7
.

[12] S2CID 153437469
.

[13] ISBN 978-1-4039-4155-8
.

[1]

[2]

[6]

[3]

[7]

[8]

[9]

[10]

[11]

[12]

[13]