15
Aug 18

Law and order in the set of matrices

Law and order in the set of matrices

The law is to feel by touch every little fact. The order is discussed below.

Why do complex numbers come right before this topic?

The analog of a conjugate number is the transpose A^T. How can you tell this? Using Exercise 2 we see that (cx)\cdot {y}=x\cdot (\bar{c}y). This looks similar to the identity (Ax)\cdot y=x\cdot(A^Ty) (see Exercise 4). Therefore the mapping c\rightarrow \bar{c} is similar to A\rightarrow A^{T}.

Once you know this, it's easy to come up with a couple of ideas.

Idea 1. From the characterization of real numbers (see Equation (4)) we see that matrices that satisfy A=A^T (symmetric matrices) correspond to real numbers and should be better in some way than asymmetric ones.

Idea 2. From Equation (3) we see that the matrix A^TA should be symmetric and non-negative.

What is a non-negative matrix?

The set of real numbers is ordered in the sense that for any two real numbers a,b we can say that either a\geq b or a<b is true. The most important property that we used in my class is this: if a\geq b and c>0, then ac\geq bc (the sign of an inequality is preserved if the inequality is multiplied by a positive number). Since any two numbers can be compared like that, it is a complete order. One way in which symmetric matrices are better than more general ones is that for symmetric matrices one can define order. The limitation caused by dimensionality is that this order is not complete (some symmetric matrices are not comparable).

Exercise 1. For the matrix A=\left(\begin{array}{cc}a_{11}&a_{12}\\a_{12}&a_{22}\end{array}\right) and vector x\in R^2 find the expression Q_A(x)=x^TAx. What is the value of this expression at x=0?

SolutionQ_A(x)=x_1^2+2a_{12}x_1x_2+a_{22}x_2^2 and Q_A(0)=0.

Definition 1. The function Q_A(x)=x^TAx is called a quadratic form of the matrix A. Here A is symmetric of size n\times n and x\in R^n.

Discussion. 1) The facts that A is in the subscript and the argument is x mean that A is fixed and x is changing.

2) While the argument is a vector, the value of this function is a real number: Q_A acts from R^n to R.

3) Q_A(x) does not contain constant or linear terms (of type c and ax_i). It contains only quadratic terms (write x_1x_2=x_1^1x_2^1 to see that the total power is 2), that's why it is called a quadratic form and not a quadratic function.

Definition 2. We say that A is positive if Q_A(x)>0 for all nonzero x\in R^n and non-negative if Q_A(x)\geq 0 for all x\in R^n. (Most sources say positive definite instead of just positive and non-negative definite instead of just non-negative. I prefer a shorter terminology. If you don't understand why in the definition of positivity we require nonzero x, go back to Exercise 1). As with numbers, for two symmetric matrices A,B of the same size, we write A>B or A\geq B if A-B is positive or non-negative, respectively. Continuing this idea, we can say that A is negative if -A is positive.

More on motivation. A legitimate definition of order A\geq B would obtain if we compared the two matrices element-wise. Definition 2 is motivated by the fact that quadratic forms arise in the multivariate Taylor decomposition.

Sylvester's criterion is the only practical tool for determining positivity or non-negativity. However, in one case this is simple.

Exercise 2. Show that A^TA is symmetric and non-negative.

Solution. The symmetry is straightforward and has been shown before. Non-negativity is not difficult either: Q_{A^TA}(x)=x^TA^TAx=(Ax)^TAx=\|Ax\|^2\ge 0.

Geometry

The graph of a quadratic form in good cases is an elliptic paraboloid and has various other names in worse cases. Geometrically, the definition of the inequality A\geq B means that the graph of A is everywhere above the graph of B (at the origin they always coincide). In particular, A\geq 0 means that the graph of A is everywhere above the horizontal plane.

Examples. All examples are matrices of size 2\times 2.

Quadratic form of identity (elliptic paraboloid)

Figure 1. Quadratic form of identity (elliptic paraboloid)

1) The identity matrix I is positive because Q_I(x)=x_1^2+x_2^2, see Figure 1.

Parabolic cylinder

Figure 2. Parabolic cylinder

2) The matrix A=\left(\begin{array}{cc}1&0\\0&0\end{array}\right) is non-negative. Its quadratic form Q_A(x)=x_1^2 grows when |x_1| grows and stays flat when x_2 changes and x_1 is fixed, see Figure 2.

Hyperbolic paraboloid

Figure 3. Hyperbolic paraboloid

3) The matrix B=\left(\begin{array}{cc}1&0\\0&-1\end{array}\right) is not positive or non-negative or negative or non-positive. Its quadratic form Q_B(x)=x_1^2-x_2^2 is a parabola with branches looking upward when the second argument is fixed and a parabola with branches looking downward when the first argument is fixed, see Figure 3. When a surface behaves like that around some point, that point is called a saddle point.

7
Aug 18

Summary and questions for repetition

Summary and questions for repetition

I was planning to cover the Moore-Penrose inverse which allows one to solve the equation Ax=y for any A (not necessarily square). Now my feeling is that it would be too much for a standard linear algebra course. This is the most easily accessible sourse.

  1. Give the definition and example of an orthonormal system. Prove that elements of such a system are linearly independent.
  2. "To know how a matrix acts on vectors, it is enough to know how it acts on the elements of an orthonormal basis." Explain.
  3. How can you reveal the elements of a matrix A from (Ax)\cdot y?
  4. A linear mapping from one Euclidean space to another generates a matrix. Prove.
  5. Prove that an inverse of a linear mapping is linear.
  6. What is a linear mapping in the one-dimensional case?
  7. In case of a square matrix, what are the four equivalent conditions for the equation Ax=y to be good (uniquely solvable for all y)?
  8. Give two equivalent definitions of linear independence.
  9. List the simple facts about linear dependence that students need to learn first.
  10. Prove the criterion of linear independence.
  11. Let the vectors x^{(1)},...,x^{(k)} be linearly dependent and consider the regression model y=\beta_1x^{(1)}+...+\beta_kx^{(k)}+u. Show that here the coefficients \beta_1,...,\beta_k cannot be uniquely determined (this is a purely algebraic fact, you don't need to know anything about multiple regression).
  12. Define a basis. Prove that if x^{(1)},...,x^{(k)} is a basis and x is decomposed as x=a_1x^{(1)}+...+a_kx^{(k)}, then the coefficients a_1,...,a_k are unique. Prove further that they are linear functions of x.
  13. Prove that the terms in the orthogonal sum of two subspaces have intersection zero.
  14. Prove dimension additivity.
  15. Prove that a matrix and its adjoint have the same rank.
  16. Prove the rank-nullity theorem.
  17. Prove the upper bound on the matrix rank in terms of the matrix dimensions.
  18. Vectors are linearly independent if and only if one of them can be expressed as a linear combination of the others.
  19. What can be said about linear (in)dependence if some vectors are added to or removed from the system of vectors?
  20. Prove that if the number of vectors in a system is larger than the space dimension, then such a system is linearly dependent.
  21. Give a list of all properties of rank that you've learned so far.
2
Aug 18

Rank of a matrix and the rank-nullity theorem

Rank of a matrix and the rank-nullity theorem

If you find proofs shorter than mine, shoot me an email.

Rank and its properties

Definition 1. The dimension of the image \text{Img}(A) is called a rank of the matrix A and denoted \text{rank}(A).

Exercise 1. For any matrix A

(1) \text{rank}(A)=\text{rank}(A^{T}).

Proof. Suppose A is of size n\times k. We know that

(2) R^k=N(A)\oplus\text{Img}(A^T).

By (2) any x\in R^k can be represented as

(3) x=y+z, with y\in N(A), z\in\text{Img}(A^T) and y\perp z.

From (3) Ax=Ay+Az=Az. By Definition 1, \text{Img}(A^T) is spanned by some linearly independent vectors z^{(1)},...,z^{(m)} with m=\dim\text{Img}(A^T). Hence z=\sum_{i=1}^ma_iz^{(i)} with some a_i depending on z and Ax=\sum_{i=1}^ma_iAz^{(i)}. This shows that \text{Img}(A) is spanned by Az^{(1)},...,Az^{(m)}. If we prove linear independence of these vectors, (1) will follow.

Suppose \sum_{i=1}^mb_iAz^{(i)}=0 with some b_{i}. Then A\left(\sum_{i=1}^mb_iz^{(i)}\right)=0. This tells us that \sum_{i=1}^mb_iz^{(i)}\in N(A). Since the vector \sum_{i=1}^mb_iz^{(i)} also belongs to \text{Img}(A^T), by Exercise 3 it is zero. By linear independence, \sum_{i=1}^mb_iz^{(i)}=0 is possible only when b_1=...=b_m=0. This proves the linear independence of Az^{(1)},...,Az^{(m)} and the statement.

Exercise 2 (rank-nullity theorem) If A is n\times k, then \dim N(A)+\dim \text{Img}(A)=k.

Proof. Exercise 4 and equation (2) imply k=\dim N(A)+\dim\text{Img}(A^T). But we know from Exercise 1 that \dim\text{Img}(A^T)=\dim\text{Img}(A).

Exercise 3. Let A be of size n\times k. Then \text{rank}(A)\leq\min\{n,k\}.

Proof. From Exercise 2 we see that \text{rank}(A)=\dim\text{Img}(A)\leq k. Applying this inequality to A^T we get \text{rank}(A)=\text{rank}(A^T)=\dim\text{Img}(A^T)\leq n.

1
Aug 18

Basis and dimension

Basis and dimension

Definition 1. We say that vectors x^{(1)},...,x^{(m)} form a basis in a subspace L if 1) it is spanned by x^{(1)},...,x^{(m)} and 2) these vectors are linearly independent. The number of vectors in the basis is called a dimension of L and the notation is \dim L.

An orthogonal basis is a special type of a basis, when in addition to the above conditions 1)-2) the basis vectors are orthonormal. For the dimension definition to be correct, the number of vectors in any basis should be the same. We don't prove correctness.

Exercise 1. In R^n the unit vectors are linearly independent. Prove this fact 1) directly and 2) using the properties of an orthonormal system.

Direct proof. Any x\in R^n can be represented as

(1) x=x_1e_1+...+x_ne_n.

If the right side is zero, then x=0 and all x_i are zero.

Proof using orthonormality.  If the right side in (1) is zero, then x_i=x\cdot e_i=0 for all i.

Exercise 2. \dim R^{n}=n.

Proof. (1) shows that R^n is spanned by e_1,...,e_n. Besides, they are linearly independent by Exercise 1.

Exercise 3. If a vector x belongs to both terms in the orthogonal sum of two subspaces L=L_1\oplus L_2, then it is zero. This means that L_1\cap L_2=\{0\}.

Proof. This is because any element of L_1 is orthogonal to any element of L_2, so x is orthogonal to itself, 0=x\cdot x=\|x\|^2 and x=0.

Exercise 4 (dimension additivity) Let L=L_1\oplus L_2 be an orthogonal sum of two subspaces. Then \dim L=\dim L_1+\dim L_2.

Proof. Let l_i=\dim L_i, i=1,2. By definition, L_1 is spanned by some linearly independent vectors y^{(1)},...,y^{(l_{1})} and L_2 is spanned by some linearly independent vectors z^{(1)},...,z^{(l_{2})}. Any x\in L can be decomposed as x=y+z, y\in L_1, z\in L_2. Since y,z can be further decomposed as y=\sum_{i=1}^{l_1}a_iy^{(i)}, z=\sum_{i=1}^{l_2}b_iz^{(i)}, the system y^{(1)},...,y^{(l_1)}, z^{(1)},...,z^{(l_2)} spans L.

Moreover, this system is linearly independent. If

\sum_{i=1}^{l_1}a_iy^{(i)}+\sum_{i=1}^{l_2}b_iz^{(i)}=0,

then

L_1\ni\sum_{i=1}^{l_1}a_iy^{(i)}=-\sum_{i=1}^{l_2}b_iz^{(i)}\in L_2.

By Exercise 3 then \sum_{i=1}^{l_1}a_iy^{(i)}=0, \sum_{i=1}^{l_2}b_iz^{(i)}=0. By linear independence of the vectors in the two systems all coefficients a_i,b_i must be zero.

The conclusion is that \dim L=l_1+l_2.

30
Jul 18

Linear dependence of vectors: definition and principal result

Linear dependence of vectors: definition and principal result

This is a topic most students have trouble with. It's because there is a lot of logic involved, so skimming it is not a good idea. We start with the most common definition.

Definition 1. Let x^{(1)},...,x^{(k)}\in R^n be some vectors. They are called linearly dependent if there exist numbers a_1,...,a_k, not all of which are zero, such that

(1) a_1x^{(1)}+...+a_kx^{(k)}=0.

The sheer length of this definition scares some people and all they remember is equation (1). Stating just (1) instead of the whole definition is like skinning an animal.

Seizing the bull by the horns

The first matter of business is to shorten the definition and relate it to what we already know.

It's better to join the set of numbers a_1,...,a_k into a vector a=(a_1,...,a_k). Then the requirement that "not all of a_1,...,a_k are zero" is equivalent to a single equation a\neq 0. Further, let us write the vectors x^{(1)},...,x^{(k)} as columns and put them into a matrix X=\left( x^{(1)},...,x^{(k)}\right) . Using multiplication of partitioned matrices we see that (1) is equivalent to

(2) Xa=0

and therefore Definition 1 is equivalent to the following.

Definition 2. Vectors x^{(1)},...,x^{(k)}\in R^n are called linearly dependent if the homogeneous equation (2) has nonzero solutions a\in R^k, that is, the null space of X is not \{0\}.

Negating Definition 2 gives the definition of linear independence. Negating Definition 2 is easier than negating Definition 1.

Definition 3. Vectors x^{(1)},...,x^{(k)}\in R^{n} are called linearly independent if N(X)=\{0\} (for any nonzero a\in R^k we have Xa\neq 0 or, alternatively, Xa=0 only for a=0).

Exercise 1. For any matrix X, one has

(3) N(X)=N(X^TX).

Proof. 1) Proving that N(X)\subset N(X^TX). If a\in N(X), then Xa=0 and X^TXa=0, so a\in N(X^TX). 2) Proving that N(X^TX)\subset N(X). If a\in N(X^TX), then X^TXa=0 and 0=(X^TXa)\cdot a=(Xa)\cdot(Xa)=\|Xa\|^2, so a\in N(X).

Exercise 2. X^TX is a square symmetric matrix, for any X.

Proof. If X is of size n\times k, then X^TX is k\times k. It is symmetric: (X^TX)^T=X^T(X^T)^T=X^TX. By the way, some students write (X^TX)^{-1}=X^{-1}(X^T)^{-1}. You cannot do this if X is not square.

Criterion of linear independence. Vectors x^{(1)},...,x^{(k)}\in R^n are linearly independent if and only if \det X^TX\neq 0.

Proof. We are going to use (3). By Exercise 2, A=X^TX is a square matrix and for a square matrix we have the equivalence N(A)=\{0\}\Longleftrightarrow \det A\neq 0. Application of this result proves the statement.

Direct application of Definition 1 can be problematic. To prove that some vectors are linearly dependent, you have to produce a_1,...,a_k such that Definition 1 is satisfied. This usually involves some guesswork, see exercises below. The criterion above doesn't require guessing and can be realized on the computer. The linear independence requirement is common in multiple regression analysis but not all econometricians know this criterion.

Putting some flesh on the bones

These are simple facts you need to know in addition to the above criterion.

Exercise 3. 1) Why do we exclude the case when all a_1,...,a_k are zero?

2) What happens if among x^{(1)},...,x^{(k)} there are zero vectors?

3) Show that in case of two non-zero vectors Definition 1 is equivalent to just proportionality of one vector to another.

Solution. 1) If all a_1,...,a_k are zero, (1) is trivially satisfied, no matter what the vectors are.

2) If one of the vectors x^{(1)},...,x^{(k)} is zero, the coefficient of that vector can be set to one and all others to zero, so such vectors will be linearly dependent by Definition 1.

3) Consider two non-zero vectors x^{(1)},x^{(2)}. If they are linearly dependent, then

(4) a_1x^{(1)}+a_2x^{(2)}=0

where at least one of a_1,a_2 is not zero. Suppose a_1\neq 0. Then a_2\neq 0 because otherwise x^{(1)} would be zero. Hence

(5) x^{(1)}=-a_2/a_1x^{(2)}=cx^{(2)}

where the proportionality coefficient c is not zero. Conversely, (5) implies (4) (you have to produce the coefficients).

Exercise 4. Prove that if to the system of unit vectors we add any vector x\in R^n, the resulting system will be linearly dependent.

28
Jul 18

Solvability of an equation with a square matrix

Solvability of an equation with a square matrix

Everywhere in this section we assume that A is square of size k\times k. The main problem is about solvability of the equation Ax=y.

Dissecting the problem

Exercise 1. If A is invertible, then N(A)=\{0\} and \text{Img}(A)=R^k.

Proof. If Ax=0, then x=A^{-1}Ax=0, so N(A)=\{0\} (you could also say that A is one-to-one). From AA^{-1}=I we have AA^{-1}x=x. This tells us that x=Ay with y=A^{-1}x. We see that any x belongs to the image of A, so \text{Img}(A)=R^k.

Exercise 2. If N(A)=\{0\}, then A is invertible.

Proof. Let f^{-1} denote the inverse of f(x)=Ax in the general sense: f^{-1}(Ax)=x. It is defined on the image of A. We know that f^{-1} is given by some matrix B: BAx=x for all x. Hence, BAe_i=e_i for i=1,...,k. Putting these equations side by side we get

(1) BA=BAI=I.

(In detail: the identity matrix is partitioned as I=(e_1,...,e_k), so BAI =(BAe_1,...,BAe_k)=(e_1,...,e_k)=I). Thus, B is the left inverse of A, and we've seen before that for square matrices this implies existence of the right inverse, invertibility of A and the equation B=A^{-1}.

Exercise 3. If \text{Img}(A)=R^k, then A is invertible.

Proof. If \text{Img}(A)=R^k, then by the second characterization of matrix image N(A^T)=(\text{Img}(A))^{\perp}=\{0\}. Applying Exercise 2 to A^T, we see that the transpose is invertible and hence \det A=\det A^{T}\neq 0. This implies invertibility of A.

Collecting the pieces together

Exercise 4. The following conditions are equivalent:

a) N(A)=\{0\}, b) \text{Img}(A)=R^k, c) \det A\neq 0, d) A is invertible.

Proof. The equivalence c) \Longleftrightarrow d) has been established before. For the implication d) \Longrightarrow a)+b) see Exercise 1. By Exercise 2 we have a) \Longrightarrow d). By Exercise 3, b) \Longrightarrow d.

Conclusion. The condition \det A\neq 0 is the easiest to check. If it holds, then the equation Ax=y has a unique solution for any y\in R^k.

Remark 1. By negating Exercise 4, we see that the following conditions are equivalent:

a^\prime) N(A)\neq \{0\}, b^\prime) \text{Img}(A)\neq R^k, c^\prime) \det A=0, d^\prime) A^{-1} does not exists.

Remark 2. By going through the proof one can check that the result of Exercise 4 holds if R^k is replaced by C^k (the set of vectors with complex coordinates).

26
Jul 18

Is the inverse of a linear mapping linear?

Is the inverse of a linear mapping linear?

Orthonormal basis

Exercise 1. I) Let e_j denote unit vectors. They possess properties

(1) e_i\cdot e_j=0 for all i\neq j, \left\Vert e_{i}\right\Vert =1 for all i.

II) For any x\in R^n we have the representation

(2) x=\sum_jx_je_j.

III) In (2) the coefficients can be found as x_{j}=x\cdot e_{j}:

(3) x=\sum_j(x\cdot e_j)e_j.

Proof. I) (1) is proved by direct calculation. II) To prove (2) we write

x=(x_1,...,x_n)=x_1(1,0,...,0)+...+x_n(0,...,0,1)=\sum_jx_je_j.

III) If we have (2), then it's easy to see that by (1) x\cdot e_i=\sum_jx_j(e_j\cdot e_i)=x_i.

Definition 1. Any system of vectors that satisfies (1) is called an orthonormal system. An orthonormal system is called complete if for any x\in R^n we have the decomposition (3). Exercise 1 shows that our system of unit vectors is complete orthonormal. A complete orthonormal system is called an orthonormal basis.

Analyzing a linear mapping

Exercise 2. Let A be a matrix of size n\times k. Suppose you don't know the elements of A but you know the products (Ax)\cdot y for all x,y. How can you reveal the elements of A from (Ax)\cdot y? How do you express Ax using the elements you define?

Solution. Let us try unit vectors as x,y:

(4) (Ae_j)\cdot e_i=\left(\begin{array}{c}A_1e_j\\...\\A_ne_j\end{array}\right)\cdot e_i=A_ie_j=a_{ij}.

One can check that ( Ax)\cdot e_i=A_ix=\sum_ja_{ij}x_j. Hence, from (3) we have the answer to the second question:

(5) Ax=\sum_i[\left(Ax\right)\cdot e_i]e_i=\sum_i\left(\sum_ja_{ij}x_j\right)e_i=\sum_{i,j}x_ja_{ij}e_i.

We know that a mapping generated by a matrix is linear. The converse is also true: a linear mapping is given by a matrix:

Exercise 3. Suppose a mapping f:R^k\rightarrow R^n is linear: f(ax+by)=af(x)+bf(y) for any numbers a,b and vectors x,y. Then there exists a matrix A of size n\times k such that f(x)=Ax for all x.

Proof. Based on (4) in our case put a_{ij}=f(e_j)\cdot e_i.

Applying (3) to f(x) we get

(6) f(x)=\sum_i[f(x)\cdot e_i]e_i

(the summation index j is replaced by i on purpose). Plugging (1) in (6)

f(x)=\sum_i\left[f\left(\sum_jx_je_j\right)\cdot e_i\right]e_i= (both f and scalar product are linear)

=\sum_i\left[\left(\sum_jx_jf\left(e_j\right)\right)\cdot e_i\right]e_i=\sum_{i,j}x_j(f\left(e_j\right)\cdot e_i)e_i= (applying (5))

=\sum_{i,j}x_ja_{ij}e_i=Ax.

Exercise 4. An inverse of a linear mapping is linear (and given by a matrix by Exercise 3).

Proof. Let f(x)=Ax be a linear mapping and suppose its inverse f^{-1} in the general sense exists. Then f^{-1}(Ax)=x for all x. Let us put x=ay+bz for arbitrary numbers a,b and vectors y,z. Then we have f^{-1}(A(ay+bz))=ay+bz or, using linearity of A,

f^{-1}(aAy+bAz)=ay+bz.

Putting Ay=u, Az=v we get

f^{-1}(au+bv)=af^{-1}(u)+bf^{-1}(v).

Thus, f^{-1} is linear.

23
Jul 18

Geometry of linear equations: questions for repetition

Geometry of linear equations: questions for repetition

This section evolves around the concepts of linearity, linear subspaces and orthogonality. As usual, you are expected to produce at least those proofs that I give.

Prove linearity of the mapping generated by a matrix. This fundamental fact will have many implications. Do you think any linear mapping is generated by some matrix? Do you think a mapping inverse to a linear mapping should be linear?

What is the first characterization of a matrix image? In other sources it is called a column space. Later on it will be linked to the notions of linear independence and rank.

How would you write Ax (A is a matrix and x is a vector), if A is partitioned a) into rows and b) into columns?

Show that a span of some vectors is a linear subspace. Note that one- and two-dimensional examples of linear subspaces we considered are special cases of subspaces described by linear systems of equations.

Can spans of two different systems of vectors coincide? Give an example.

Show that the image and null space of a matrix are linear subspaces. This facts indicate that the subspace notion is an adequate tool for problems at hand.

What is the description of the set of solutions of Ax=y? It is pretty general. For example, the structure of solutions of an ordinary linear differential equation is the same.

Illustrate geometrically the theorem on second orthocomplement and explain why it holds.

What is the relationship between (Ax)\cdot y and x\cdot (A^Ty)?

Derive the second characterization of matrix image.

What are the geometric conditions for the solvability and uniqueness of solutions of an inhomogeneous equation?

23
Jul 18

Geometry of linear equations: orthogonal complement and equation solvability

Geometry of linear equations: orthogonal complement and equation solvability

From orthogonality of vectors to orthogonality of subspaces

Definition 1. Let L be a linear subspace of R^{n}. Its orthogonal complement L^{\perp } is defined as the set of all vectors orthogonal to all elements of L: L^{\perp }=\{ x\in R^n:x\cdot y=0 \text{ for all}\ y\in L\}.

Exercise 1. Let L=\{ (x_1,0):x_1\in R\} be the x axis on the plane. Find L^{\perp}.

Solution. The condition x\cdot y=x_1y_1=0 for all x=(x_1,x_2)\in L implies y_1=0. The component y_2 is free. Thus, L^{\perp}=\{ (0,y_2):y_2\in R\} is the y axis.

Similarly, the orthogonal complement (L^{\perp})^{\perp} of the y axis is the x axis. This fact is generalized as follows:

Theorem (on second orthocomplement) For any linear subspace L, one has (L^{\perp})^{\perp}=L.

This statement is geometrically simple but the proof is rather complex and will be omitted (see Akhiezer & Glazman, Theory of Linear Operators in Hilbert Space, Dover Publications, 1993, Chapter 1, sections 6 and 7). It's good to keep in mind what makes it possible. For any set A\subset R^n, its orthogonal complement A^{\perp} can be defined in the same way: A^{\perp}=\{ x\in R^n:x\cdot y=0 \text{ for all }y\in A\}. As in Exercise 1, you can show that if A=\{(1,0)\} contains just the unit vector of the x axis, then A^{\perp} is the y axis and therefore (A^{\perp})^{\perp} is the x axis. Thus, in this example we have a strict inclusion A\subset (A^{\perp})^{\perp}. The equality is achieved only for linear subspaces.

Exercise 2. In R^3, consider the x axis L=\{(x_1,0,0):x_1\in R\}. Can you tell what L^{\perp} is?

Exercise 3. \{ 0\}^{\perp}=R^n and (R^{n})^{\perp}=\{ 0\} .

Link between image of a matrix and null space of its transpose

Exercise 4. The rule for a transpose of a product (AB)^T=B^TA^T implies that for any x\in R^k, y\in R^n

(1) (Ax)\cdot y=(Ax)^Ty=x^TA^Ty=x\cdot (A^Ty).

Exercise 5 (second characterization of matrix image) \text{Img}(A)=N(A^T)^{\perp}.

Proof. Let us prove that

(2) \text{Img}(A)^{\perp}=N(A^T).

To this end, we first prove the inclusion \text{Img}(A)^{\perp}\subset N(A^T). Let y\in \text{Img}(A)^{\perp}. For an arbitrary x, we have Ax\in \text{Img}(A). Hence by (1) 0=(Ax)\cdot y=x\cdot(A^Ty). Since x is arbitrary, we can put x=A^Ty and obtain \|A^Ty\|^2=0. This implies A^Ty=0 and y\in N(A^T).

Conversely, to prove N(A^T)\subset\text{Img}(A)^{\perp}, suppose y\in N(A^T). Then A^Ty=0 and for an arbitrary x we have 0=x\cdot(A^Ty)=(Ax)\cdot y. Since Ax runs over \text{Img}(A), this means that y\in\text{Img}(A)^{\perp}.

Passing to orthogonal complements in (2), by the theorem on second orthocomplement we get what we need.

The importance of Exercise 5 is explained by the fact that the null space of a matrix is easier to describe analytically than the image. For the next summary, you might want to review the conclusion on the role of the null space.

Summary. 1) In order to see if Ax=y has solutions, check if y is orthogonal to N(A^T). In particular, if A^T is one-to-one, then N(A^T)=\{0\} and Ax=y has solutions for all y (see Exercise 3 above).

2) If A is one-to-one, then Ax=y may have only unique solutions.

3) If both A and A^T are one-to-one, then Ax=y has a unique solution for any y.

19
Jul 18

Geometry of linear equations: structure of image and null space

Geometry of linear equations: structure of image and null space

Definition 1. The subspace notion allows us to describe the algebraic structure of the set of solutions of Ax=y. The special case Ax=0 is called a homogeneous equation. Obviously, x=0 satisfies it but there may be other solutions. The equation Ax=y is called an inhomogeneous equation. We address the questions of existence and uniqueness of its solutions.

Structure of the image of A

Recall Basic observation 1: The image \text{Img}(A)=\{ Ax:x\in R^k\} is the set of y=Ax for which the inhomogeneous equation has solutions.

Exercise 1. \text{Img}(A) is a linear subspace in R^n.

Proof. This follows from the first characterization of the matrix image. Here is a direct proof. Suppose y_1,y_2\in \text{Img}(A). Then there exist x_1,x_2\in R^k such that Ax_1=y_1, Ax_2=y_2. By linearity this gives ay_1+by_2=aAx_1+bAx_2=A(ax_1+bx_2). Thus for y=ay_1+by_2 we have found x=ax_1+bx_2 such that y=Ax, which means ay_1+by_2\in \text{Img}(A) for any a,b and \text{Img}(A) is a subspace.

Structure of the null space of A

Definition 2. The set of solutions x of the homogeneous equation Ax=0 is called the null space of A. It is denoted N(A)=\{ x:Ax=0\}.

Exercise 2. The null space of A is a linear subspace of D(A).

Proof. Suppose x_1,x_2\in N(A) so that Ax_1=0, Ax_2=0. Then by linearity A(ax_1+bx_2)=aAx_1+bAx_2=0, so ax_1+bx_2\in N(A) for any a,b and N(A) is a linear subspace.

Description of the set of solutions of Ax=y

Intuition. In R^{3}, straight lines and planes that don't contain the origin can be obtained by shifting straight lines and planes that do (geometry should dominate the algebra at this point, see Figure 1). This is generalized in the next definition.

Figure 1. Shifting a subspace gives a hyperplane

Figure 1. Shifting a subspace gives a hyperplane

Definition 3. Let x\in R^n be any vector and let L be a subspace. x+L=\{x+l:l\in L\} denotes a shift of L by x and it is obtained by adding to x all elements of L. Some people call x+Lhyperplane.

Exercise 3. As we know, the equation Ax=y has solutions if and only if y\in \text{Img}(A). Let us fix y\in \text{Img}(A) and let x_{0} be some solution of Ax=y. Then the set of all solutions of this equation is x_0+N(A) (in detail: any other solution x of that equation can be obtained by adding an element z of the null space to x_0: x=x_0+z). This is written as \{ x:Ax=y\}=x_0+N(A).

Proof. Let Ax_0=y and let x be any solution of Ax=y. Then A(x-x_0)=0, x-x_0\in N(A) and x-x_{0}=z for some z\in N(A). We obtain x=x_0+z which proves the statement.

Conclusion. If N(A)=\{0\}, then A is one-to-one and we have uniqueness of solutions of the inhomogeneous equation; otherwise, N(A) can serve as a measure of non-uniqueness.