15
Aug 18

## Law and order in the set of matrices

The law is to feel by touch every little fact. The order is discussed below.

### Why do complex numbers come right before this topic?

The analog of a conjugate number is the transpose $A^T.$ How can you tell this? Using Exercise 2 we see that $(cx)\cdot {y}=x\cdot (\bar{c}y).$ This looks similar to the identity $(Ax)\cdot y=x\cdot(A^Ty)$ (see Exercise 4). Therefore the mapping $c\rightarrow \bar{c}$ is similar to $A\rightarrow A^{T}.$

Once you know this, it's easy to come up with a couple of ideas.

Idea 1. From the characterization of real numbers (see Equation (4)) we see that matrices that satisfy $A=A^T$ (symmetric matrices) correspond to real numbers and should be better in some way than asymmetric ones.

Idea 2. From Equation (3) we see that the matrix $A^TA$ should be symmetric and non-negative.

### What is a non-negative matrix?

The set of real numbers is ordered in the sense that for any two real numbers $a,b$ we can say that either $a\geq b$ or $a is true. The most important property that we used in my class is this: if $a\geq b$ and $c>0,$ then $ac\geq bc$ (the sign of an inequality is preserved if the inequality is multiplied by a positive number). Since any two numbers can be compared like that, it is a complete order.

One way in which symmetric matrices are better than more general ones is that for symmetric matrices one can define order. The limitation caused by dimensionality is that this order is not complete (some symmetric matrices are not comparable).

Exercise 1. For the matrix $A=\left(\begin{array}{cc}a_{11}&a_{12}\\a_{12}&a_{22}\end{array}\right)$ and vector $x\in R^2$ find the expression $Q_A(x)=x^TAx.$ What is the value of this expression at $x=0?$

Solution$Q_A(x)=x_1^2+2a_{12}x_1x_2+a_{22}x_2^2$ and $Q_A(0)=0.$

Definition 1. The function $Q_A(x)=x^TAx$ is called a quadratic form of the matrix $A.$ Here $A$ is symmetric of size $n\times n$ and $x\in R^n.$

Discussion. 1) The facts that $A$ is in the subscript and the argument is $x$ mean that $A$ is fixed and $x$ is changing.

2) While the argument is a vector, the value of this function is a real number: $Q_A$ acts from $R^n$ to $R.$

3) $Q_A(x)$ does not contain constant or linear terms (of type $c$ and $ax_i).$ It contains only quadratic terms (write $x_1x_2=x_1^1x_2^1$ to see that the total power is 2), that's why it is called a quadratic form and not a quadratic function.

Definition 2. We say that $A$ is positive if $Q_A(x)>0$ for all nonzero $x\in R^n$ and non-negative if $Q_A(x)\geq 0$ for all $x\in R^n.$ (Most sources say positive definite instead of just positive and non-negative definite instead of just non-negative. I prefer a shorter terminology. If you don't understand why in the definition of positivity we require nonzero $x,$ go back to Exercise 1). As with numbers, for two symmetric matrices $A,B$ of the same size, we write $A>B$ or $A\geq B$ if $A-B$ is positive or non-negative, respectively. Continuing this idea, we can say that $A$ is negative if $-A$ is positive.

More on motivation. A legitimate definition of order $A\geq B$ would obtain if we compared the two matrices element-wise. Definition 2 is motivated by the fact that quadratic forms arise in the multivariate Taylor decomposition.

Sylvester's criterion is the only practical tool for determining positivity or non-negativity. However, in one case this is simple.

Exercise 2. Show that $A^TA$ is symmetric and non-negative.

Solution. The symmetry is straightforward and has been shown before. Non-negativity is not difficult either: $Q_{A^TA}(x)=x^TA^TAx=(Ax)^TAx=\|Ax\|^2\ge 0.$

### Geometry

The graph of a quadratic form in good cases is an elliptic paraboloid and has various other names in worse cases. Geometrically, the definition of the inequality $A\geq B$ means that the graph of $A$ is everywhere above the graph of $B$ (at the origin they always coincide). In particular, $A\geq 0$ means that the graph of $A$ is everywhere above the horizontal plane.

Examples. All examples are matrices of size $2\times 2$.

Figure 1. Quadratic form of identity (elliptic paraboloid)

1) The identity matrix $I$ is positive because $Q_I(x)=x_1^2+x_2^2$, see Figure 1.

Figure 2. Parabolic cylinder

2) The matrix $A=\left(\begin{array}{cc}1&0\\0&0\end{array}\right)$ is non-negative. Its quadratic form $Q_A(x)=x_1^2$ grows when $|x_1|$ grows and stays flat when $x_2$ changes and $x_1$ is fixed, see Figure 2.

Figure 3. Hyperbolic paraboloid

3) The matrix $B=\left(\begin{array}{cc}1&0\\0&-1\end{array}\right)$ is not positive or non-negative or negative or non-positive. Its quadratic form $Q_B(x)$$=x_1^2-x_2^2$ is a parabola with branches looking upward when the second argument is fixed and a parabola with branches looking downward when the first argument is fixed, see Figure 3. When a surface behaves like that around some point, that point is called a saddle point.

7
Aug 18

## Summary and questions for repetition

I was planning to cover the Moore-Penrose inverse which allows one to solve the equation $Ax=y$ for any $A$ (not necessarily square). Now my feeling is that it would be too much for a standard linear algebra course. This is the most easily accessible sourse.

1. Give the definition and example of an orthonormal system. Prove that elements of such a system are linearly independent.

2. "To know how a matrix acts on vectors, it is enough to know how it acts on the elements of an orthonormal basis." Explain.

3. How can you reveal the elements of a matrix $A$ from $(Ax)\cdot y$?

4. A linear mapping from one Euclidean space to another generates a matrix. Prove.

5. Prove that an inverse of a linear mapping is linear.

6. What is a linear mapping in the one-dimensional case?

7. In case of a square matrix, what are the four equivalent conditions for the equation $Ax=y$ to be good (uniquely solvable for all $y$)?

8. Give two equivalent definitions of linear independence.

9. List the simple facts about linear dependence that students need to learn first.

10. Prove the criterion of linear independence.

11. Let the vectors $x^{(1)},...,x^{(k)}$ be linearly dependent and consider the regression model $y=\beta_1x^{(1)}+...+\beta_kx^{(k)}+u.$ Show that here the coefficients $\beta_1,...,\beta_k$ cannot be uniquely determined (this is a purely algebraic fact, you don't need to know anything about multiple regression).

12. Define a basis. Prove that if $x^{(1)},...,x^{(k)}$ is a basis and $x$ is decomposed as $x=a_1x^{(1)}+...+a_kx^{(k)},$ then the coefficients $a_1,...,a_k$ are unique. Prove further that they are linear functions of $x.$

13. Prove that the terms in the orthogonal sum of two subspaces have intersection zero.

15. Prove that a matrix and its transpose have the same rank.

16. Prove the rank-nullity theorem.

17. Prove the upper bound on the matrix rank in terms of the matrix dimensions.

18. Vectors are linearly independent if and only if one of them can be expressed as a linear combination of the others.

19. What can be said about linear (in)dependence if some vectors are added to or removed from the system of vectors?

20. Prove that if the number of vectors in a system is larger than the space dimension, then such a system is linearly dependent.

21. Give a list of all properties of rank that you've learned so far.

22. Prove the inequality $\text{rank}(AB)\le \min\{\text{rank}(A),\text{rank}(B)\}$ and use it to give an alternative proof of the rank-nullity theorem.

23. Prove that the definition of the space dimension does not depend on the basis used.

2
Aug 18

## Rank of a matrix and the rank-nullity theorem

If you find proofs shorter than mine, shoot me an email.

### Rank and its properties

Definition 1. The dimension of the image $\text{Img}(A)$ is called a rank of the matrix $A$ and denoted $\text{rank}(A)$.

Exercise 1. For any matrix $A$

(1) $\text{rank}(A)=\text{rank}(A^{T}).$

Proof. Suppose $A$ is of size $n\times k$. We know that

(2) $R^k=N(A)\oplus\text{Img}(A^T).$

By (2) any $x\in R^k$ can be represented as

(3) $x=y+z,$ with $y\in N(A),$ $z\in\text{Img}(A^T)$ and $y\perp z.$

From (3) $Ax=Ay+Az=Az.$ By Definition 1, $\text{Img}(A^T)$ is spanned by some linearly independent vectors $z^{(1)},...,z^{(m)}$ with $m=\dim\text{Img}(A^T).$ Hence $z=\sum_{i=1}^ma_iz^{(i)}$ with some $a_i$ depending on $z$ and $Ax=\sum_{i=1}^ma_iAz^{(i)}.$ This shows that $\text{Img}(A)$ is spanned by $Az^{(1)},...,Az^{(m)}.$ If we prove linear independence of these vectors, (1) will follow.

Suppose $\sum_{i=1}^mb_iAz^{(i)}=0$ with some $b_{i}.$ Then $A\left(\sum_{i=1}^mb_iz^{(i)}\right)=0.$ This tells us that $\sum_{i=1}^mb_iz^{(i)}\in N(A).$ Since the vector $\sum_{i=1}^mb_iz^{(i)}$ also belongs to $\text{Img}(A^T),$ by Exercise 3 it is zero. By linear independence, $\sum_{i=1}^mb_iz^{(i)}=0$ is possible only when $b_1=...=b_m=0.$ This proves the linear independence of $Az^{(1)},...,Az^{(m)}$ and the statement.

Exercise 2 (rank-nullity theorem) If $A$ is $n\times k,$ then $\dim N(A)+\dim \text{Img}(A)=k.$

Proof. Exercise 4 and equation (2) imply $k=\dim N(A)+\dim\text{Img}(A^T).$ But we know from Exercise 1 that $\dim\text{Img}(A^T)=\dim\text{Img}(A).$

Exercise 3. Let $A$ be of size $n\times k.$ Then $\text{rank}(A)\leq\min\{n,k\}.$

Proof. From Exercise 2 we see that $\text{rank}(A)=\dim\text{Img}(A)\leq k.$ Applying this inequality to $A^T$ we get $\text{rank}(A)=\text{rank}(A^T)=\dim\text{Img}(A^T)\leq n.$

1
Aug 18

## Basis and dimension

### Basis and dimension

Definition 1. We say that vectors $x^{(1)},...,x^{(m)}$ form a basis in a subspace $L$ if 1) it is spanned by $x^{(1)},...,x^{(m)}$ and 2) these vectors are linearly independent. The number of vectors in the basis is called a dimension of $L$ and the notation is $\dim L.$

An orthogonal basis is a special type of a basis, when in addition to the above conditions 1)-2) the basis vectors are orthonormal. For the dimension definition to be correct, the number of vectors in any basis should be the same. We prove correctness in a separate post.

Exercise 1. In $R^n$ the unit vectors are linearly independent. Prove this fact 1) directly and 2) using the properties of an orthonormal system.

Direct proof. Any $x\in R^n$ can be represented as

(1) $x=x_1e_1+...+x_ne_n.$

If the right side is zero, then $x=0$ and all $x_i$ are zero.

Proof using orthonormality.  If the right side in (1) is zero, then $x_i=x\cdot e_i=0$ for all $i.$

Exercise 2. $\dim R^{n}=n.$

Proof. (1) shows that $R^n$ is spanned by $e_1,...,e_n.$ Besides, they are linearly independent by Exercise 1.

Definition 2. Let $L_1,\ L_2$ be two subspaces such that any element of one is orthogonal to any element of the other. Then the set $\{x_1+x_2:x_1\in L_1,\ x_2\in L_2\}$ is called an orthogonal sum of $L_1,\ L_2$ and denoted $L=L_1\oplus L_2.$

Exercise 3. If a vector $x$ belongs to both terms in the orthogonal sum of two subspaces $L=L_1\oplus L_2$, then it is zero. This means that $L_1\cap L_2=\{0\}.$

Proof. This is because any element of $L_1$ is orthogonal to any element of $L_2,$ so $x$ is orthogonal to itself, $0=x\cdot x=\|x\|^2$ and $x=0.$

Exercise 4 (dimension additivity) Let $L=L_1\oplus L_2$ be an orthogonal sum of two subspaces. Then $\dim L=\dim L_1+\dim L_2.$

Proof. Let $l_i=\dim L_i,$ $i=1,2.$ By definition, $L_1$ is spanned by some linearly independent vectors $y^{(1)},...,y^{(l_{1})}$ and $L_2$ is spanned by some linearly independent vectors $z^{(1)},...,z^{(l_{2})}.$ Any $x\in L$ can be decomposed as $x=y+z,$ $y\in L_1,$ $z\in L_2.$ Since $y,z$ can be further decomposed as $y=\sum_{i=1}^{l_1}a_iy^{(i)},$ $z=\sum_{i=1}^{l_2}b_iz^{(i)},$ the system $y^{(1)},...,y^{(l_1)}, z^{(1)},...,z^{(l_2)}$ spans $L.$

Moreover, this system is linearly independent. If

$\sum_{i=1}^{l_1}a_iy^{(i)}+\sum_{i=1}^{l_2}b_iz^{(i)}=0,$

then

$L_1\ni\sum_{i=1}^{l_1}a_iy^{(i)}=-\sum_{i=1}^{l_2}b_iz^{(i)}\in L_2.$

By Exercise 3 then $\sum_{i=1}^{l_1}a_iy^{(i)}=0,$ $\sum_{i=1}^{l_2}b_iz^{(i)}=0.$ By linear independence of the vectors in the two systems all coefficients $a_i,b_i$ must be zero.

The conclusion is that $\dim L=l_1+l_2.$

30
Jul 18

## Linear dependence of vectors: definition and principal result

This is a topic most students have trouble with. It's because there is a lot of logic involved, so skimming it is not a good idea. We start with the most common definition.

Definition 1. Let $x^{(1)},...,x^{(k)}\in R^n$ be some vectors. They are called linearly dependent if there exist numbers $a_1,...,a_k,$ not all of which are zero, such that

(1) $a_1x^{(1)}+...+a_kx^{(k)}=0.$

The sheer length of this definition scares some people and all they remember is equation (1). Stating just (1) instead of the whole definition is like skinning an animal.

### Seizing the bull by the horns

The first matter of business is to shorten the definition and relate it to what we already know.

It's better to join the set of numbers $a_1,...,a_k$ into a vector $a=(a_1,...,a_k).$ Then the requirement that "not all of $a_1,...,a_k$ are zero" is equivalent to a single equation $a\neq 0.$ Further, let us write the vectors $x^{(1)},...,x^{(k)}$ as columns and put them into a matrix $X=\left( x^{(1)},...,x^{(k)}\right) .$ Using multiplication of partitioned matrices we see that (1) is equivalent to

(2) $Xa=0$

and therefore Definition 1 is equivalent to the following.

Definition 2. Vectors $x^{(1)},...,x^{(k)}\in R^n$ are called linearly dependent if the homogeneous equation (2) has nonzero solutions $a\in R^k,$ that is, the null space of $X$ is not $\{0\}.$

Negating Definition 2 gives the definition of linear independence. Negating Definition 2 is easier than negating Definition 1.

Definition 3. Vectors $x^{(1)},...,x^{(k)}\in R^{n}$ are called linearly independent if $N(X)=\{0\}$ (for any nonzero $a\in R^k$ we have $Xa\neq 0$ or, alternatively, $Xa=0$ only for $a=0$).

Exercise 1. For any matrix $X,$ one has

(3) $N(X)=N(X^TX).$

Proof. 1) Proving that $N(X)\subset N(X^TX).$ If $a\in N(X),$ then $Xa=0$ and $X^TXa=0,$ so $a\in N(X^TX).$ 2) Proving that $N(X^TX)\subset N(X).$ If $a\in N(X^TX),$ then $X^TXa=0$ and $0=(X^TXa)\cdot a=(Xa)\cdot(Xa)=\|Xa\|^2,$ so $a\in N(X).$

Exercise 2. $X^TX$ is a square symmetric matrix, for any $X.$

Proof. If $X$ is of size $n\times k,$ then $X^TX$ is $k\times k.$ It is symmetric: $(X^TX)^T=X^T(X^T)^T=X^TX.$ By the way, some students write $(X^TX)^{-1}=X^{-1}(X^T)^{-1}.$ You cannot do this if $X$ is not square.

Criterion of linear independence. Vectors $x^{(1)},...,x^{(k)}\in R^n$ are linearly independent if and only if $\det X^TX\neq 0.$

Proof. We are going to use (3). By Exercise 2, $A=X^TX$ is a square matrix and for a square matrix we have the equivalence $N(A)=\{0\}\Longleftrightarrow \det A\neq 0.$ Application of this result proves the statement.

Direct application of Definition 1 can be problematic. To prove that some vectors are linearly dependent, you have to produce $a_1,...,a_k$ such that Definition 1 is satisfied. This usually involves some guesswork, see exercises below. The criterion above doesn't require guessing and can be realized on the computer. The linear independence requirement is common in multiple regression analysis but not all econometricians know this criterion.

### Putting some flesh on the bones

These are simple facts you need to know in addition to the above criterion.

Exercise 3. 1) Why do we exclude the case when all $a_1,...,a_k$ are zero?

2) What happens if among $x^{(1)},...,x^{(k)}$ there are zero vectors?

3) Show that in case of two non-zero vectors Definition 1 is equivalent to just proportionality of one vector to another.

Solution. 1) If all $a_1,...,a_k$ are zero, (1) is trivially satisfied, no matter what the vectors are.

2) If one of the vectors $x^{(1)},...,x^{(k)}$ is zero, the coefficient of that vector can be set to one and all others to zero, so such vectors will be linearly dependent by Definition 1.

3) Consider two non-zero vectors $x^{(1)},x^{(2)}.$ If they are linearly dependent, then

(4) $a_1x^{(1)}+a_2x^{(2)}=0$

where at least one of $a_1,a_2$ is not zero. Suppose $a_1\neq 0.$ Then $a_2\neq 0$ because otherwise $x^{(1)}$ would be zero. Hence

(5) $x^{(1)}=-a_2/a_1x^{(2)}=cx^{(2)}$

where the proportionality coefficient $c$ is not zero. Conversely, (5) implies (4) (you have to produce the coefficients).

Exercise 4. Prove that if to the system of unit vectors we add any vector $x\in R^n$, the resulting system will be linearly dependent.

28
Jul 18

## Solvability of an equation with a square matrix

Everywhere in this section we assume that $A$ is square of size $k\times k.$ The main problem is about solvability of the equation $Ax=y$.

### Dissecting the problem

Exercise 1. If $A$ is invertible, then $N(A)=\{0\}$ and $\text{Img}(A)=R^k.$

Proof. If $Ax=0,$ then $x=A^{-1}Ax=0,$ so $N(A)=\{0\}$ (you could also say that $A$ is one-to-one). From $AA^{-1}=I$ we have $AA^{-1}x=x.$ This tells us that $x=Ay$ with $y=A^{-1}x.$ We see that any $x$ belongs to the image of $A,$ so $\text{Img}(A)=R^k.$

Exercise 2. If $N(A)=\{0\},$ then $A$ is invertible.

Proof. Since $A$ is one-to-one, we can use the inverse $f^{-1}$ of $f(x)=Ax$ in the general sense: $f^{-1}(Ax)=x$. It is defined on the image of $A.$ We know that $f^{-1}$ is given by some matrix $B$: $BAx=x$ for all $x.$ Hence, $BAe_i=e_i$ for $i=1,...,k.$ Putting these equations side by side we get

(1) $BA=BAI=I.$

(In detail: the identity matrix is partitioned as $I=(e_1,...,e_k),$ so $BAI$ $=(BAe_1,...,BAe_k)=(e_1,...,e_k)=I$). Thus, $B$ is the left inverse of $A,$ and we've seen before that for square matrices this implies existence of the right inverse, invertibility of $A$ and the equation $B=A^{-1}$.

Exercise 3. If $\text{Img}(A)=R^k,$ then $A$ is invertible.

Proof. If $\text{Img}(A)=R^k,$ then by the second characterization of matrix image $N(A^T)=(\text{Img}(A))^{\perp}=\{0\}.$ Applying Exercise 2 to $A^T,$ we see that the transpose is invertible and hence $\det A=\det A^{T}\neq 0.$ This implies invertibility of $A.$

### Collecting the pieces together

Exercise 4. The following conditions are equivalent:

a) $N(A)=\{0\},$ b) $\text{Img}(A)=R^k,$ c) $\det A\neq 0,$ d) $A$ is invertible.

Proof. The equivalence c) $\Longleftrightarrow$ d) has been established before. For the implication d) $\Longrightarrow$ a)+b) see Exercise 1. By Exercise 2 we have a) $\Longrightarrow$ d). By Exercise 3, b) $\Longrightarrow$ d.

Conclusion. The condition $\det A\neq 0$ is the easiest to check. If it holds, then the equation $Ax=y$ has a unique solution for any $y\in R^k.$

Remark 1. By negating Exercise 4, we see that the following conditions are equivalent:

$a^\prime) N(A)\neq \{0\},$ $b^\prime) \text{Img}(A)\neq R^k,$ $c^\prime) \det A=0,$ $d^\prime) A^{-1}$ does not exists.

Remark 2. By going through the proof one can check that the result of Exercise 4 holds if $R^k$ is replaced by $C^k$ (the set of vectors with complex coordinates).

26
Jul 18

## Is the inverse of a linear mapping linear?

### Orthonormal basis

Exercise 1. I) Let $e_j$ denote unit vectors. They possess properties

(1) $e_i\cdot e_j=0$ for all $i\neq j,$ $\left\Vert e_{i}\right\Vert =1$ for all $i.$

II) For any $x\in R^n$ we have the representation

(2) $x=\sum_jx_je_j.$

III) In (2) the coefficients can be found as $x_{j}=x\cdot e_{j}:$

(3) $x=\sum_j(x\cdot e_j)e_j.$

Proof. I) (1) is proved by direct calculation. II) To prove (2) we write

$x=(x_1,...,x_n)=x_1(1,0,...,0)+...+x_n(0,...,0,1)=\sum_jx_je_j.$

III) If we have (2), then it's easy to see that by (1) $x\cdot e_i=\sum_jx_j(e_j\cdot e_i)=x_i.$

Definition 1. Any system of vectors that satisfies (1) is called an orthonormal system. An orthonormal system is called complete if for any $x\in R^n$ we have the decomposition (3). Exercise 1 shows that our system of unit vectors is complete orthonormal. A complete orthonormal system is called an orthonormal basis.

### Analyzing a linear mapping

Exercise 2. Let $A$ be a matrix of size $n\times k.$ Suppose you don't know the elements of $A$ but you know the products $(Ax)\cdot y$ for all $x,y.$ How can you reveal the elements of $A$ from $(Ax)\cdot y?$ How do you express $Ax$ using the elements you define?

Solution. Let us partition $A$ into rows and suppose the elements $a_{ij}$ are known. Let us try unit vectors as $x,y:$

(4) $(Ae_j)\cdot e_i=\left(\begin{array}{c}A_1e_j\\...\\A_ne_j\end{array}\right)\cdot e_i=A_ie_j=a_{ij}.$

Using (2) and (4) one can check that $(Ax)\cdot e_i=A_ix=\sum_ja_{ij}x_j.$ Hence, from (3) we have the answer to the second question:

(5) $Ax=\sum_i[\left(Ax\right)\cdot e_i]e_i=\sum_i\left(\sum_ja_{ij}x_j\right)e_i=\sum_{i,j}x_ja_{ij}e_i.$

The above calculation means that when $a_{ij}$ are unknown, we can define them by $a_{ij}=(Ae_j)\cdot e_i$ and then the action of $A$ on $x$ will be described by the last expression in (5).

We know that a mapping generated by a matrix is linear. The converse is also true: a linear mapping is given by a matrix:

Exercise 3. Suppose a mapping $f:R^k\rightarrow R^n$ is linear: $f(ax+by)=af(x)+bf(y)$ for any numbers $a,b$ and vectors $x,y.$ Then there exists a matrix $A$ of size $n\times k$ such that $f(x)=Ax$ for all $x.$

Proof. Based on (4) in our case put $a_{ij}=f(e_j)\cdot e_i$. Applying (3) to $f(x)$ we get

(6) $f(x)=\sum_i[f(x)\cdot e_i]e_i$

(the summation index $j$ is replaced by $i$ on purpose). Plugging (2) in (6)

$f(x)=\sum_i\left[f\left(\sum_jx_je_j\right)\cdot e_i\right]e_i=$ (both $f$ and scalar product are linear)

$=\sum_i\left[\left(\sum_jx_jf\left(e_j\right)\right)\cdot e_i\right]e_i=\sum_{i,j}x_j(f\left(e_j\right)\cdot e_i)e_i$$=\sum_{i,j}x_ja_{ij}e_i=Ax.$

The last equation is the definition of $A$.

Exercise 4. An inverse of a linear mapping is linear (and given by a matrix by Exercise 3).

Proof. Let $f(x)=Ax$ be a linear mapping and suppose its inverse $f^{-1}$ in the general sense exists. Then $f^{-1}(Ax)=x$ for all $x.$ Let us put $x=ay+bz$ for arbitrary numbers $a,b$ and vectors $y,z.$ Then we have $f^{-1}(A(ay+bz))=ay+bz$ or, using linearity of $A,$

$f^{-1}(aAy+bAz)=ay+bz.$

Putting $Ay=u,$ $Az=v$ we get

$f^{-1}(au+bv)=af^{-1}(u)+bf^{-1}(v).$

Thus, $f^{-1}$ is linear.

23
Jul 18

## Geometry of linear equations: questions for repetition

This section evolves around the concepts of linearity, linear subspaces and orthogonality. As usual, you are expected to produce at least those proofs that I give.

1. Prove linearity of the mapping generated by a matrix. This fundamental fact will have many implications. Do you think any linear mapping is generated by some matrix? Do you think a mapping inverse to a linear mapping should be linear?

2. What is the first characterization of a matrix image? In other sources it is called a column space. Later on it will be linked to the notions of linear independence and rank.

3. How would you write $Ax$ ($A$ is a matrix and $x$ is a vector), if $A$ is partitioned a) into rows and b) into columns?

4. Show that a span of some vectors is a linear subspace. Note that one- and two-dimensional examples of linear subspaces we considered are special cases of subspaces described by linear systems of equations.

5. Can spans of two different systems of vectors coincide? Give an example.

6. Show that the image and null space of a matrix are linear subspaces. This facts indicate that the subspace notion is an adequate tool for problems at hand.

7. What is the description of the set of solutions of $Ax=y$? It is pretty general. For example, the structure of solutions of an ordinary linear differential equation is the same.

8. Illustrate geometrically the theorem on second orthocomplement and explain why it holds.

9. What is the relationship between $(Ax)\cdot y$ and $x\cdot (A^Ty)$?

10. Derive the second characterization of matrix image.

11. What are the geometric conditions for the solvability and uniqueness of solutions of an inhomogeneous equation?

23
Jul 18

## Geometry of linear equations: orthogonal complement and equation solvability

### From orthogonality of vectors to orthogonality of subspaces

Definition 1. Let $L$ be a linear subspace of $R^{n}.$ Its orthogonal complement $L^{\perp }$ is defined as the set of all vectors orthogonal to all elements of $L:$ $L^{\perp }=\{ x\in R^n:x\cdot y=0$ $\text{ for all}\ y\in L\}.$

Exercise 1. Let $L=\{ (x_1,0):x_1\in R\}$ be the $x$ axis on the plane. Find $L^{\perp}.$

Solution. The condition $x\cdot y=x_1y_1=0$ for all $x=(x_1,x_2)\in L$ implies $y_1=0.$ The component $y_2$ is free. Thus, $L^{\perp}=\{ (0,y_2):y_2\in R\}$ is the $y$ axis.

Similarly, the orthogonal complement $(L^{\perp})^{\perp}$ of the $y$ axis is the $x$ axis. This fact is generalized as follows:

Theorem (on second orthocomplement) For any linear subspace $L,$ one has $(L^{\perp})^{\perp}=L$.

This statement is geometrically simple but the proof is rather complex and will be omitted (see Akhiezer & Glazman, Theory of Linear Operators in Hilbert Space, Dover Publications, 1993, Chapter 1, sections 6 and 7). It's good to keep in mind what makes it possible. For any set $A\subset R^n,$ its orthogonal complement $A^{\perp}$ can be defined in the same way: $A^{\perp}=\{ x\in R^n:x\cdot y=0$ $\text{ for all }y\in A\}.$ As in Exercise 1, you can show that if $A=\{(1,0)\}$ contains just the unit vector of the $x$ axis, then $A^{\perp}$ is the $y$ axis and therefore $(A^{\perp})^{\perp}$ is the $x$ axis. Thus, in this example we have a strict inclusion $A\subset (A^{\perp})^{\perp}.$ The equality is achieved only for linear subspaces.

Exercise 2. In $R^3,$ consider the $x$ axis $L=\{(x_1,0,0):x_1\in R\}.$ Can you tell what $L^{\perp}$ is?

Exercise 3. $\{ 0\}^{\perp}=R^n$ and $(R^{n})^{\perp}=\{ 0\} .$

### Link between image of a matrix and null space of its transpose

Exercise 4. The rule for a transpose of a product $(AB)^T=B^TA^T$ implies that for any $x\in R^k,$ $y\in R^n$

(1) $(Ax)\cdot y=(Ax)^Ty=x^TA^Ty=x\cdot (A^Ty).$

Exercise 5 (second characterization of matrix image) $\text{Img}(A)=N(A^T)^{\perp}$ (and, consequently, $\text{Img}(A^T)=N(A)^{\perp}$).

Proof. Let us prove that

(2) $\text{Img}(A)^{\perp}=N(A^T)$.

To this end, we first prove the inclusion $\text{Img}(A)^{\perp}\subset N(A^T).$ Let $y\in \text{Img}(A)^{\perp}$. For an arbitrary $x,$ we have $Ax\in \text{Img}(A).$ Hence by (1) $0=(Ax)\cdot y=x\cdot(A^Ty).$ Since $x$ is arbitrary, we can put $x=A^Ty$ and obtain $\|A^Ty\|^2=0.$ This implies $A^Ty=0$ and $y\in N(A^T).$

Conversely, to prove $N(A^T)\subset\text{Img}(A)^{\perp},$ suppose $y\in N(A^T).$ Then $A^Ty=0$ and for an arbitrary $x$ we have $0=x\cdot(A^Ty)=(Ax)\cdot y.$ Since $Ax$ runs over $\text{Img}(A),$ this means that $y\in\text{Img}(A)^{\perp}.$

Passing to orthogonal complements in (2), by the theorem on second orthocomplement we get what we need.

Exercise 6. Assume that $A$ is of size $n\times k$. Exercise 5 and Exercise 3 imply $R^n=\text{Img}(A)\oplus N(A^T)$ and $R^k=\text{Img}(A^T)\oplus N(A)$.

The importance of Exercise 5 is explained by the fact that the null space of a matrix is easier to describe analytically than the image. For the next summary, you might want to review the conclusion on the role of the null space.

Summary. 1) In order to see if $Ax=y$ has solutions, check if $y$ is orthogonal to $N(A^T).$ In particular, if $A^T$ is one-to-one, then $N(A^T)=\{0\}$ and $Ax=y$ has solutions for all $y$ (see Exercise 3 above).

2) If $A$ is one-to-one, then $Ax=y$ may have only unique solutions.

3) If both $A$ and $A^T$ are one-to-one, then $Ax=y$ has a unique solution for any $y.$

19
Jul 18

## Geometry of linear equations: structure of image and null space

Definition 1. The subspace notion allows us to describe the algebraic structure of the set of solutions of $Ax=y.$ The special case $Ax=0$ is called a homogeneous equation. Obviously, $x=0$ satisfies it but there may be other solutions. The equation $Ax=y$ is called an inhomogeneous equation. We address the questions of existence and uniqueness of its solutions.

### Structure of the image of $A$$A$

Recall Basic observation 1: The image $\text{Img}(A)=\{ Ax:x\in R^k\}$ is the set of $y=Ax$ for which the inhomogeneous equation has solutions.

Exercise 1. $\text{Img}(A)$ is a linear subspace in $R^n.$

Proof. This follows from the first characterization of the matrix image. Here is a direct proof. Suppose $y_1,y_2\in \text{Img}(A).$ Then there exist $x_1,x_2\in R^k$ such that $Ax_1=y_1,$ $Ax_2=y_2.$ By linearity this gives $ay_1+by_2=aAx_1+bAx_2=A(ax_1+bx_2).$ Thus for $y=ay_1+by_2$ we have found $x=ax_1+bx_2$ such that $y=Ax,$ which means $ay_1+by_2\in \text{Img}(A)$ for any $a,b$ and $\text{Img}(A)$ is a subspace.

### Structure of the null space of $A$$A$

Definition 2. The set of solutions $x$ of the homogeneous equation $Ax=0$ is called the null space of $A.$ It is denoted $N(A)=\{ x:Ax=0\}.$

Exercise 2. The null space of $A$ is a linear subspace of $D(A).$

Proof. Suppose $x_1,x_2\in N(A)$ so that $Ax_1=0,$ $Ax_2=0.$ Then by linearity $A(ax_1+bx_2)=aAx_1+bAx_2=0,$ so $ax_1+bx_2\in N(A)$ for any $a,b$ and $N(A)$ is a linear subspace.

### Description of the set of solutions of $Ax=y$$Ax=y$

Intuition. In $R^{3},$ straight lines and planes that don't contain the origin can be obtained by shifting straight lines and planes that do (geometry should dominate the algebra at this point, see Figure 1). This is generalized in the next definition.

Figure 1. Shifting a subspace gives a hyperplane

Definition 3. Let $x\in R^n$ be any vector and let $L$ be a subspace. $x+L=\{x+l:l\in L\}$ denotes a shift of $L$ by $x$ and it is obtained by adding to $x$ all elements of $L.$ Some people call $x+L$hyperplane.

Exercise 3. As we know, the equation $Ax=y$ has solutions if and only if $y\in \text{Img}(A).$ Let us fix $y\in \text{Img}(A)$ and let $x_{0}$ be some solution of $Ax=y$. Then the set of all solutions of this equation is $x_0+N(A)$ (in detail: any other solution $x$ of that equation can be obtained by adding an element $z$ of the null space to $x_0$: $x=x_0+z$). This is written as $\{ x:Ax=y\}=x_0+N(A).$

Proof. Let $Ax_0=y$ and let $x$ be any solution of $Ax=y.$ Then $A(x-x_0)=0,$ $x-x_0\in N(A)$ and $x-x_{0}=z$ for some $z\in N(A).$ We obtain $x=x_0+z$ which proves the inclusion $\{ x:Ax=y\}\subset x_0+N(A).$ Conversely, if $x=x_0+z$ with $z\in N(A)$, then $Ax=Ax_0+Az=y$ and we obtain $x_0+N(A)\subset \{ x:Ax=y\}.$

Conclusion. If $N(A)=\{0\}$, then $A$ is one-to-one and we have uniqueness of solutions of the inhomogeneous equation; otherwise, $N(A)$ can serve as a measure of non-uniqueness.