8
May 16

What is cointegration?

What is cointegration? The discussions here and here  are bad because they link the definition to differencing a time series. In fact, to understand cointegration, you need two notions: stationary processes  (please read before continuing) and linear dependence.

Definition. We say that vectors X_1,...,X_n are linearly dependent if there exist numbers a_1,...,a_n, not all of which are zero, such that the linear combination a_1X_1+...+a_nX_n is a zero vector.

Recall from this post that stationary processes play the role of zero in the set of all processes. Replace in the above definition "vectors" with "processes" and "a zero vector" with "a stationary process" and - voilà - you have the definition of cointegration:

Definition. We say that processes X_1,...,X_n are cointegrated if there exist numbers a_1,...,a_n, not all of which are zero, such that the linear combination a_1X_1+...+a_nX_n is a stationary process. Remembering that each process is a collection of random variables indexed with time moments t, we obtain a definition that explicitly involves time: processes \{X_{1,t}\},...,\{X_{n,t}\} are cointegrated if there exist numbers a_1,...,a_n, not all of which are zero, such that a_1X_{1,t}+...+a_nX_{n,t}=u_t where \{u_t\} is a stationary process.

To fully understand the implications, you need to know all the intricacies of linear dependence. I do not want to plunge into this lengthy discussion here. Instead, I want to explain how this definition leads to a regression in case of two processes.

If \{X_{1,t}\},\{X_{2,t}\} are cointegrated, then there exist numbers a_1,a_2, at least one of which is not zero, such that a_1X_{1,t}+a_2X_{2,t}=u_t where \{u_t\} is a stationary process. If a_1\ne 0, we can solve for X_{1,t} obtaining X_{1,t}=\beta X_{2,t}+v_t with \beta=-a_2/a_1 and v_t=1/a_1u_t. This is almost a regression, except that the mean of v_t may not be zero. We can represent v_t=(v_t-Ev_t)+Ev_t=w_t+\alpha, where \alpha=Ev_t, w_t=v_t-Ev_t. Then the above equation becomes X_{1,t}=\alpha+\beta X_{2,t}+w_t, which is simple regression. The case a_2\ne 0 leads to a similar result.

Practical recommendation. To see if \{X_{1,t}\},\{X_{2,t}\} are cointegrated, regress one of them on the other and test the residuals for stationarity.

10
Jan 16

What is a z score: the scientific explanation

You know what is a z score when you know why people invented it.

As usual, we start with a theoretical motivation. There is a myriad of distributions. Even if we stay within the set of normal distributions, there is an infinite number of them, indexed by their means \mu(X)=EX and standard deviations \sigma(X)=\sqrt{Var(X)}. When computers did not exist, people had to use statistical tables. It was impossible to produce statistical tables for an infinite number of distributions, so the problem was to reduce the case of general \mu(X) and \sigma(X) to that of \mu(X)=0 and \sigma(X)=1.

But we know that that can be achieved by centering and scaling. Combining these two transformations, we obtain the definition of the z score:

z=\frac{X-\mu(X)}{\sigma(X)}.

Using the properties of means and variances we see that

Ez=\frac{E(X-\mu(X))}{\sigma(X)}=0,

Var(z)=\frac{Var(X-\mu(X))}{\sigma^2(X)}=\frac{Var(X)}{\sigma^2(X)}=1.

The transformation leading from X to its z score sometimes is called standardization.

This site promises to tell you the truth about undergraduate statistics. The truth about the z score is that:

(1) Standardization can be applied to any variable with finite variance, not only to normal variables. The z score is a standard normal variable only when the original variable X is normal, contrary to what some sites say.

(2) With modern computers, standardization is not necessary to find critical values for X, see Chapter 14 of my book.

9
Jan 16

Scaling a distribution

Scaling a distribution is as important as centering or demeaning considered here. The question we want to find an answer for is this: What can you do to a random variable X to obtain another random variable, say, Y, whose variance is one? Like in case of centering, geometric considerations can be used but I want to follow the algebraic approach, which is more powerful.

Hint: in case of centering, we subtract the mean, Y=X-EX. For the problem at hand the suggestion is to use scaling: Y=aX, where a is a number to be determined.

Using the fact that variance is homogeneous of degree 2, we have

Var(Y)=Var(aX)=a^2Var(X).

We want Var(Y) to be 1, so solving for a gives a=1/\sqrt{Var(X)}=1/\sigma(X). Thus, division by the standard deviation answers our question: the variable Y=X/\sigma(X) has variance and standard deviation equal to 1.

Note. Always use the notation for standard deviation \sigma with its argument X.

7
Jan 16

Mean plus deviation-from-mean decomposition

Mean plus deviation-from-mean decomposition

This is about separating the deterministic and random parts of a variable. This topic can be difficult or easy, depending on how you look at it. The right way to think about it is theoretical.

Everything starts with a simple question: What can you do to a random variable X to obtain a new variable, say, Y, whose mean is equal to zero? Intuitively, when you subtract the mean from X, the distribution moves to the left or right, depending on the sign of EX, so that the distribution of Y is centered on zero. One of my students used this intuition to guess that you should subtract the mean: Y=X-EX. The guess should be confirmed by algebra: from this definition

EY=E(X-EX)=EX-E(EX)=EX-EX=0

(here we distributed the expectation operator and used the property that the mean of a constant (EX) is that constant). By the way, subtracting the mean from a variable is called centering or demeaning.

If you understand the above, you can represent X as

X = EX+(X-EX).

Here \mu=EX is the mean and u=X-EX is the deviation from the mean. As was shown above, Eu=0. Thus, we obtain the mean plus deviation-from-mean decomposition X=\mu+u. Simple, isn't it? It is so simple, that students don't pay attention to it. In fact, it is omnipresent in Statistics because Var(X)=Var(u). The analysis of Var(X) is reduced to that of Var(u)!