Aug 18

Final touches on linear independence

Final touches on linear independence

Correctness of the dimension definition

The main definition of linear dependence is of type

(1) \sum_{j=1}^ma_jx^{(j)}=0 for some nonzero a\in R^m (a non-trivial annihilating set of coefficients exists).

Exercise 1. Prove that the main definition is equivalent to the following: vectors x^{(1)},...,x^{(m)}\in R^n are called linearly dependent if one of them can be expressed as a linear combination of the others.

Proof. This exercise is about passing from (1) to

(2) x^{(j)}=\sum_{i\neq j}b_ix^{(i)} (one is a function of the others).

If there is dependence of the first type, it can be solved for a vector whose coefficient is different from zero. Dependence of the second type can be converted to the first type by throwing everything to one side.

Application. In multiple regression, we try to explain the behavior of a random vector y with the help of the vectors x^{(1)},...,x^{(m)} (called regressors) and a random vector u

(3) y=a_1x^{(1)}+...+a_mx^{(m)}+u.

Suppose that these regressors are linearly dependent. By Exercise 1, we can plug x^{(j)} in (3) and the behavior of y can be explained with the help of a smaller number of regressors. After eliminating all regressors that depend on the others, we obtain a more economical model. See a related discussion.

Exercise 2 (going from a small system to large and back) a) If some subsystem of x^{(1)},...,x^{(m)} is linearly dependent, then the whole system is linearly dependent. b) If the system x^{(1)},...,x^{(m)} is linearly independent, then any its subsystem is linearly independent.

Proof. a) If we have a non-trivial annihilating set of coefficients for a subsystem, it can be completed with zeros to obtain a non-trivial annihilating set of coefficients for the whole system. b) Proof by contradiction. Assume that some subsystem is linearly dependent. Then by part a) the whole system is too.

Theorem (correctness of the dimension definition) Any basis in R^n consists of n vectors.

The proof in Halmos (Finite-dimensional vector spaces, Springer, 1987) is the shortest I could find. To understand it, you will need Exercise 2. I don't give the proof here because I don't find the proof of general interest. Note that the theorem itself is important for any statement involving the dimension notion.

Using square matrices to study non-square ones

We'll need the upper bound on the rank:

(1) rank(A)\leq \min \{n,k\} if A is of size n\times k.

Exercise 3. rank(A^T)=rank(A^TA).

Proof. We know that R^n=N(A^T)\oplus\text{Img}(A). Applying A^T to both sides we get \text{Img}(A^T)=\text{Img}(A^TA). [In detail: any x\in R^n can be represented as x=y+z, where y\in N(A^T), z\in\text{Img}(A), z=Au for some u. Hence, applying A^T, we get A^Tx=A^Ty+A^Tz=A^Tz=A^TAu. When x runs over R^n, the left side runs over \text{Img}(A^T), while the right side runs over \text{Img}(A^TA).] If two subspaces coincide, their dimensions are the same.

Exercise 4. In R^n any set of vectors x^{(1)},...,x^{(m)} with m>n is linearly dependent.

Proof. Put X=(x^{(1)},...,x^{(m)}). X is of size n\times m. By Exercise 3 and (1)

(2) rank(X^TX)=rank(X^T)\leq n<m.

Suppose x^{(1)},...,x^{(m)} are linearly independent. By the criterion of linear independence then \det X^TX\neq 0. Since X^TX is a square matrix of size m\times m, this implies that \text{Img}(X^TX)=R^m and, consequently, rank(X^TX)=m, which contradicts (2).

Leave a Reply

You must be logged in to post a comment.