Jul 18

Linear dependence of vectors: definition and principal result

Linear dependence of vectors: definition and principal result

This is a topic most students have trouble with. It's because there is a lot of logic involved, so skimming it is not a good idea. We start with the most common definition.

Definition 1. Let x^{(1)},...,x^{(k)}\in R^n be some vectors. They are called linearly dependent if there exist numbers a_1,...,a_k, not all of which are zero, such that

(1) a_1x^{(1)}+...+a_kx^{(k)}=0.

The sheer length of this definition scares some people and all they remember is equation (1). Stating just (1) instead of the whole definition is like skinning an animal.

Seizing the bull by the horns

The first matter of business is to shorten the definition and relate it to what we already know.

It's better to join the set of numbers a_1,...,a_k into a vector a=(a_1,...,a_k). Then the requirement that "not all of a_1,...,a_k are zero" is equivalent to a single equation a\neq 0. Further, let us write the vectors x^{(1)},...,x^{(k)} as columns and put them into a matrix X=\left( x^{(1)},...,x^{(k)}\right) . Using multiplication of partitioned matrices we see that (1) is equivalent to

(2) Xa=0

and therefore Definition 1 is equivalent to the following.

Definition 2. Vectors x^{(1)},...,x^{(k)}\in R^n are called linearly dependent if the homogeneous equation (2) has nonzero solutions a\in R^k, that is, the null space of X is not \{0\}.

Negating Definition 2 gives the definition of linear independence. Negating Definition 2 is easier than negating Definition 1.

Definition 3. Vectors x^{(1)},...,x^{(k)}\in R^{n} are called linearly independent if N(X)=\{0\} (for any nonzero a\in R^k we have Xa\neq 0 or, alternatively, Xa=0 only for a=0).

Exercise 1. For any matrix X, one has

(3) N(X)=N(X^TX).

Proof. 1) Proving that N(X)\subset N(X^TX). If a\in N(X), then Xa=0 and X^TXa=0, so a\in N(X^TX). 2) Proving that N(X^TX)\subset N(X). If a\in N(X^TX), then X^TXa=0 and 0=(X^TXa)\cdot a=(Xa)\cdot(Xa)=\|Xa\|^2, so a\in N(X).

Exercise 2. X^TX is a square symmetric matrix, for any X.

Proof. If X is of size n\times k, then X^TX is k\times k. It is symmetric: (X^TX)^T=X^T(X^T)^T=X^TX. By the way, some students write (X^TX)^{-1}=X^{-1}(X^T)^{-1}. You cannot do this if X is not square.

Criterion of linear independence. Vectors x^{(1)},...,x^{(k)}\in R^n are linearly independent if and only if \det X^TX\neq 0.

Proof. We are going to use (3). By Exercise 2, A=X^TX is a square matrix and for a square matrix we have the equivalence N(A)=\{0\}\Longleftrightarrow \det A\neq 0. Application of this result proves the statement.

Direct application of Definition 1 can be problematic. To prove that some vectors are linearly dependent, you have to produce a_1,...,a_k such that Definition 1 is satisfied. This usually involves some guesswork, see exercises below. The criterion above doesn't require guessing and can be realized on the computer. The linear independence requirement is common in multiple regression analysis but not all econometricians know this criterion.

Putting some flesh on the bones

These are simple facts you need to know in addition to the above criterion.

Exercise 3. 1) Why do we exclude the case when all a_1,...,a_k are zero?

2) What happens if among x^{(1)},...,x^{(k)} there are zero vectors?

3) Show that in case of two non-zero vectors Definition 1 is equivalent to just proportionality of one vector to another.

Solution. 1) If all a_1,...,a_k are zero, (1) is trivially satisfied, no matter what the vectors are.

2) If one of the vectors x^{(1)},...,x^{(k)} is zero, the coefficient of that vector can be set to one and all others to zero, so such vectors will be linearly dependent by Definition 1.

3) Consider two non-zero vectors x^{(1)},x^{(2)}. If they are linearly dependent, then

(4) a_1x^{(1)}+a_2x^{(2)}=0

where at least one of a_1,a_2 is not zero. Suppose a_1\neq 0. Then a_2\neq 0 because otherwise x^{(1)} would be zero. Hence

(5) x^{(1)}=-a_2/a_1x^{(2)}=cx^{(2)}

where the proportionality coefficient c is not zero. Conversely, (5) implies (4) (you have to produce the coefficients).

Exercise 4. Prove that if to the system of unit vectors we add any vector x\in R^n, the resulting system will be linearly dependent.

Dec 16

Multiple regression through the prism of dummy variables

Agresti and Franklin on p.658 say: The indicator variable for a particular category is binary. It equals 1 if the observation falls into that category and it equals 0 otherwise. I say: For most students, this is not clear enough.

Problem statement

Figure 1. Residential power consumption in 2014 and 2015. Source: http://www.eia.gov/electricity/data.cfm

Residential power consumption in the US has a seasonal pattern. Heating in winter and cooling in summer cause the differences. We want to capture the dependence of residential power consumption PowerC on the season.

 Visual approach to dummy variables

Seasons of the year are categorical variables. We have to replace them with quantitative variables, to be able to use in any mathematical procedure that involves arithmetic operations. To this end, we define a dummy variable (indicator) D_{win} for winter such that it equals 1 in winter and 0 in any other period of the year. The dummies D_{spr},\ D_{sum},\ D_{aut} for spring, summer and autumn are defined similarly. We provide two visualizations assuming monthly observations.

Table 1. Tabular visualization of dummies
Month D_{win} D_{spr} D_{sum} D_{aut} D_{win}+D_{spr}+ D_{sum}+D_{aut}
December 1 0 0 0 1
January 1 0 0 0 1
February 1 0 0 0 1
March 0 1 0 0 1
April 0 1 0 0 1
May 0 1 0 0 1
June 0 0 1 0 1
July 0 0 1 0 1
August 0 0 1 0 1
September 0 0 0 1 1
October 0 0 0 1 1
November 0 0 0 1 1

Figure 2. Graphical visualization of D_spr

The first idea may be wrong

The first thing that comes to mind is to regress PowerC on dummies as in

(1) PowerC=a+bD_{win}+cD_{spr}+dD_{sum}+eD_{aut}+error.

Not so fast. To see the problem, let us rewrite (1) as

(2) PowerC=a\times 1+bD_{win}+cD_{spr}+dD_{sum}+eD_{aut}+error.

This shows that, in addition to the four dummies, there is a fifth variable, which equals 1 across all observations. Let us denote it T (for Trivial). Table 1 shows that

(3) T=D_{win}+D_{spr}+ D_{sum}+D_{aut}.

This makes the next definition relevant. Regressors x_1,...,x_k are called linearly dependent if one of them, say, x_1, can be expressed as a linear combination of the others: x_1=a_2x_2+...+a_kx_k.  In case (3), all coefficients a_i are unities, so we have linear dependence. Using (3), let us replace T in (2). The resulting equation is rearranged as

(4) PowerC=(a+b)D_{win}+(a+c)D_{spr}+(a+d)D_{sum}+(a+e)D_{aut}+error.

Now we see what the problem is. When regressors are linearly dependent, the model is not uniquely specified. (1) and (4) are two different representations of the same model.

What is the way out?

If regressors are linearly dependent, drop them one after another until you get linearly independent ones. For example, dropping the winter dummy, we get

(5) PowerC=a+cD_{spr}+dD_{sum}+eD_{aut}+error.

Here is the estimation result for the two-year data in Figure 1:


This means that:

PowerC=128176 in winter, PowerC=128176-27380 in spring,

PowerC=128176+5450 in summer, and PowerC=128176-22225 in autumn.

It is revealing that cooling requires more power than heating. However, the summer coefficient is not significant. Here is the Excel file with the data and estimation result.

The category that has been dropped is called a base (or reference) category. Thus, the intercept in (5) measures power consumption in winter. The dummy coefficients in (5) measure deviations of power consumption in respective seasons from that in winter.

Here is the question I ask my students

We want to see how beer consumption BeerC depends on gender and income Inc. Let M and F denote the dummies for males and females, resp. Correct the following model and interpret the resulting coefficients:


Final remark

When a researcher includes all categories plus the trivial regressor, he/she falls into what is called a dummy trap. The problem of linear dependence among regressors is usually discussed under the heading of multiple regression. But since the trivial regressor is present in simple regression too, it might be a good idea to discuss it earlier.

Linear dependence/independence of regressors is an exact condition for existence of the OLS estimator. That is, if regressors are linearly dependent, then the OLS estimator doesn't exist, in which case the question about its further properties doesn't make sense. If, on the other hand, regressors are linearly independent, then the OLS estimator exists, and further properties can be studied, such as unbiasedness, variance and efficiency.