8
May 16

## What is cointegration?

What is cointegration? The discussions here and here  are bad because they link the definition to differencing a time series. In fact, to understand cointegration, you need two notions: stationary processes  (please read before continuing) and linear dependence.

Definition. We say that vectors $X_1,...,X_n$ are linearly dependent if there exist numbers $a_1,...,a_n$, not all of which are zero, such that the linear combination $a_1X_1+...+a_nX_n$ is a zero vector.

Recall from this post that stationary processes play the role of zero in the set of all processes. Replace in the above definition "vectors" with "processes" and "a zero vector" with "a stationary process" and - voilà - you have the definition of cointegration:

Definition. We say that processes $X_1,...,X_n$ are cointegrated if there exist numbers $a_1,...,a_n$, not all of which are zero, such that the linear combination $a_1X_1+...+a_nX_n$ is a stationary process. Remembering that each process is a collection of random variables indexed with time moments $t$, we obtain a definition that explicitly involves time: processes $\{X_{1,t}\},...,\{X_{n,t}\}$ are cointegrated if there exist numbers $a_1,...,a_n$, not all of which are zero, such that $a_1X_{1,t}+...+a_nX_{n,t}=u_t$ where $\{u_t\}$ is a stationary process.

To fully understand the implications, you need to know all the intricacies of linear dependence. I do not want to plunge into this lengthy discussion here. Instead, I want to explain how this definition leads to a regression in case of two processes.

If $\{X_{1,t}\},\{X_{2,t}\}$ are cointegrated, then there exist numbers $a_1,a_2$, at least one of which is not zero, such that $a_1X_{1,t}+a_2X_{2,t}=u_t$ where $\{u_t\}$ is a stationary process. If $a_1\ne 0$, we can solve for $X_{1,t}$ obtaining $X_{1,t}=\beta X_{2,t}+v_t$ with $\beta=-a_2/a_1$ and $v_t=1/a_1u_t$. This is almost a regression, except that the mean of $v_t$ may not be zero. We can represent $v_t=(v_t-Ev_t)+Ev_t=w_t+\alpha$, where $\alpha=Ev_t$, $w_t=v_t-Ev_t$. Then the above equation becomes $X_{1,t}=\alpha+\beta X_{2,t}+w_t$, which is simple regression. The case $a_2\ne 0$ leads to a similar result.

Practical recommendation. To see if $\{X_{1,t}\},\{X_{2,t}\}$ are cointegrated, regress one of them on the other and test the residuals for stationarity.

10
Jan 16

## What is a z score: the scientific explanation

You know what is a z score when you know why people invented it.

As usual, we start with a theoretical motivation. There is a myriad of distributions. Even if we stay within the set of normal distributions, there is an infinite number of them, indexed by their means $\mu(X)=EX$ and standard deviations $\sigma(X)=\sqrt{Var(X)}$. When computers did not exist, people had to use statistical tables. It was impossible to produce statistical tables for an infinite number of distributions, so the problem was to reduce the case of general $\mu(X)$ and $\sigma(X)$ to that of $\mu(X)=0$ and $\sigma(X)=1$.

But we know that that can be achieved by centering and scaling. Combining these two transformations, we obtain the definition of the z score:

$z=\frac{X-\mu(X)}{\sigma(X)}.$

Using the properties of means and variances we see that

$Ez=\frac{E(X-\mu(X))}{\sigma(X)}=0,$ $Var(z)=\frac{Var(X-\mu(X))}{\sigma^2(X)}=\frac{Var(X)}{\sigma^2(X)}=1.$

The transformation leading from $X$ to its z score sometimes is called standardization.

This site promises to tell you the truth about undergraduate statistics. The truth about the z score is that:

(1) Standardization can be applied to any variable with finite variance, not only to normal variables. The z score is a standard normal variable only when the original variable $X$ is normal, contrary to what some sites say.

(2) With modern computers, standardization is not necessary to find critical values for $X$, see Chapter 14 of my book.

9
Jan 16

## Scaling a distribution

Scaling a distribution is as important as centering or demeaning considered here. The question we want to find an answer for is this: What can you do to a random variable $X$ to obtain another random variable, say, $Y$, whose variance is one? Like in case of centering, geometric considerations can be used but I want to follow the algebraic approach, which is more powerful.

Hint: in case of centering, we subtract the mean, $Y=X-EX$. For the problem at hand the suggestion is to use scaling: $Y=aX$, where $a$ is a number to be determined.

Using the fact that variance is homogeneous of degree 2, we have

$Var(Y)=Var(aX)=a^2Var(X)$.

We want $Var(Y)$ to be 1, so solving for $a$ gives $a=1/\sqrt{Var(X)}=1/\sigma(X)$. Thus, division by the standard deviation answers our question: the variable $Y=X/\sigma(X)$ has variance and standard deviation equal to 1.

Note. Always use the notation for standard deviation $\sigma$ with its argument $X$.

7
Jan 16

## Mean plus deviation-from-mean decomposition

This is about separating the deterministic and random parts of a variable. This topic can be difficult or easy, depending on how you look at it. The right way to think about it is theoretical.

Everything starts with a simple question: What can you do to a random variable $X$ to obtain a new variable, say, $Y$, whose mean is equal to zero? Intuitively, when you subtract the mean from $X$, the distribution moves to the left or right, depending on the sign of $EX$, so that the distribution of $Y$ is centered on zero. One of my students used this intuition to guess that you should subtract the mean: $Y=X-EX$. The guess should be confirmed by algebra: from this definition

$EY=E(X-EX)=EX-E(EX)=EX-EX=0$

(here we distributed the expectation operator and used the property that the mean of a constant ($EX$) is that constant). By the way, subtracting the mean from a variable is called centering or demeaning.

If you understand the above, you can represent $X$ as

$X = EX+(X-EX).$

Here $\mu=EX$ is the mean and $u=X-EX$ is the deviation from the mean. As was shown above, $Eu=0$. Thus, we obtain the mean plus deviation-from-mean decomposition $X=\mu+u.$ Simple, isn't it? It is so simple, that students don't pay attention to it. In fact, it is omnipresent in Statistics because $Var(X)=Var(u)$. The analysis of $Var(X)$ is reduced to that of $Var(u)$!