8
May 16

## What is cointegration?

What is cointegration? The discussions here and here  are bad because they link the definition to differencing a time series. In fact, to understand cointegration, you need two notions: stationary processes  (please read before continuing) and linear dependence.

Definition. We say that vectors $X_1,...,X_n$ are linearly dependent if there exist numbers $a_1,...,a_n$, not all of which are zero, such that the linear combination $a_1X_1+...+a_nX_n$ is a zero vector.

Recall from this post that stationary processes play the role of zero in the set of all processes. Replace in the above definition "vectors" with "processes" and "a zero vector" with "a stationary process" and - voilà - you have the definition of cointegration:

Definition. We say that processes $X_1,...,X_n$ are cointegrated if there exist numbers $a_1,...,a_n$, not all of which are zero, such that the linear combination $a_1X_1+...+a_nX_n$ is a stationary process. Remembering that each process is a collection of random variables indexed with time moments $t$, we obtain a definition that explicitly involves time: processes $\{X_{1,t}\},...,\{X_{n,t}\}$ are cointegrated if there exist numbers $a_1,...,a_n$, not all of which are zero, such that $a_1X_{1,t}+...+a_nX_{n,t}=u_t$ where $\{u_t\}$ is a stationary process.

To fully understand the implications, you need to know all the intricacies of linear dependence. I do not want to plunge into this lengthy discussion here. Instead, I want to explain how this definition leads to a regression in case of two processes.

If $\{X_{1,t}\},\{X_{2,t}\}$ are cointegrated, then there exist numbers $a_1,a_2$, at least one of which is not zero, such that $a_1X_{1,t}+a_2X_{2,t}=u_t$ where $\{u_t\}$ is a stationary process. If $a_1\ne 0$, we can solve for $X_{1,t}$ obtaining $X_{1,t}=\beta X_{2,t}+v_t$ with $\beta=-a_2/a_1$ and $v_t=1/a_1u_t$. This is almost a regression, except that the mean of $v_t$ may not be zero. We can represent $v_t=(v_t-Ev_t)+Ev_t=w_t+\alpha$, where $\alpha=Ev_t$, $w_t=v_t-Ev_t$. Then the above equation becomes $X_{1,t}=\alpha+\beta X_{2,t}+w_t$, which is simple regression. The case $a_2\ne 0$ leads to a similar result.

Practical recommendation. To see if $\{X_{1,t}\},\{X_{2,t}\}$ are cointegrated, regress one of them on the other and test the residuals for stationarity.