Definition. 1) Adding some row multiplied by a constant to another row or 2) adding some column multiplied by a constant to another column or 3) permuting rows or columns is called an elementary operation. Accordingly, matrices that realize them are called elementary matrices.

The law is to feel by touch every little fact. The order is discussed below.

Why do complex numbers come right before this topic?

The analog of a conjugate number is the transpose How can you tell this? Using Exercise 2 we see that This looks similar to the identity (see Exercise 4). Therefore the mapping is similar to

Once you know this, it's easy to come up with a couple of ideas.

Idea 1. From the characterization of real numbers (see Equation (4)) we see that matrices that satisfy (symmetric matrices) correspond to real numbers and should be better in some way than asymmetric ones.

Idea 2. From Equation (3) we see that the matrix should be symmetric and non-negative.

What is a non-negative matrix?

The set of real numbers is ordered in the sense that for any two real numbers we can say that either or is true. The most important property that we used in my class is this: if and then (the sign of an inequality is preserved if the inequality is multiplied by a positive number). Since any two numbers can be compared like that, it is a complete order.

One way in which symmetric matrices are better than more general ones is that for symmetric matrices one can define order. The limitation caused by dimensionality is that this order is not complete (some symmetric matrices are not comparable).

Exercise 1. For the matrix and vector find the expression What is the value of this expression at

Solution. and

Definition 1. The function is called a quadratic form of the matrix Here is symmetric of size and

Discussion. 1) The facts that is in the subscript and the argument is mean that is fixed and is changing.

2) While the argument is a vector, the value of this function is a real number: acts from to

3) does not contain constant or linear terms (of type and It contains only quadratic terms (write to see that the total power is 2), that's why it is called a quadratic form and not a quadratic function.

Definition 2. We say that is positive if for all nonzero and non-negative if for all (Most sources say positive definite instead of just positive and non-negative definite instead of just non-negative. I prefer a shorter terminology. If you don't understand why in the definition of positivity we require nonzero go back to Exercise 1). As with numbers, for two symmetric matrices of the same size, we write or if is positive or non-negative, respectively. Continuing this idea, we can say that is negative if is positive.

More on motivation. A legitimate definition of order would obtain if we compared the two matrices element-wise. Definition 2 is motivated by the fact that quadratic forms arise in the multivariate Taylor decomposition.

Sylvester's criterion is the only practical tool for determining positivity or non-negativity. However, in one case this is simple.

Exercise 2. Show that is symmetric and non-negative.

Solution. The symmetry is straightforward and has been shown before. Non-negativity is not difficult either:

Geometry

The graph of a quadratic form in good cases is an elliptic paraboloid and has various other names in worse cases. Geometrically, the definition of the inequality means that the graph of is everywhere above the graph of (at the origin they always coincide). In particular, means that the graph of is everywhere above the horizontal plane.

Examples. All examples are matrices of size .

Figure 1. Quadratic form of identity (elliptic paraboloid)

1) The identity matrix is positive because , see Figure 1.

Figure 2. Parabolic cylinder

2) The matrix is non-negative. Its quadratic form grows when grows and stays flat when changes and is fixed, see Figure 2.

Figure 3. Hyperbolic paraboloid

3) The matrix is not positive or non-negative or negative or non-positive. Its quadratic form is a parabola with branches looking upward when the second argument is fixed and a parabola with branches looking downward when the first argument is fixed, see Figure 3. When a surface behaves like that around some point, that point is called a saddle point.

I was planning to cover the Moore-Penrose inverse which allows one to solve the equation for any (not necessarily square). Now my feeling is that it would be too much for a standard linear algebra course. This is the most easily accessible sourse.

Give the definition and example of an orthonormal system. Prove that elements of such a system are linearly independent.

"To know how a matrix acts on vectors, it is enough to know how it acts on the elements of an orthonormal basis." Explain.

How can you reveal the elements of a matrix from ?

A linear mapping from one Euclidean space to another generates a matrix. Prove.

Let the vectors be linearly dependent and consider the regression model Show that here the coefficients cannot be uniquely determined (this is a purely algebraic fact, you don't need to know anything about multiple regression).

Define a basis. Prove that if is a basis and is decomposed as then the coefficients are unique. Prove further that they are linear functions of

From (3) By Definition 1, is spanned by some linearly independent vectors with Hence with some depending on and This shows that is spanned by If we prove linear independence of these vectors, (1) will follow.

Suppose with some Then This tells us that Since the vector also belongs to by Exercise 3 it is zero. By linear independence, is possible only when This proves the linear independence of and the statement.

Exercise 2 (rank-nullity theorem) If is then

Proof. Exercise 4 and equation (2) imply But we know from Exercise 1 that

Exercise 3. Let be of size Then

Proof. From Exercise 2 we see that Applying this inequality to we get

Definition 1. We say that vectors form a basis in a subspace if 1) it is spanned by and 2) these vectors are linearly independent. The number of vectors in the basis is called a dimension of and the notation is

An orthogonal basis is a special type of a basis, when in addition to the above conditions 1)-2) the basis vectors are orthonormal. For the dimension definition to be correct, the number of vectors in any basis should be the same. We prove correctness in a separate post.

Exercise 1. In the unit vectors are linearly independent. Prove this fact 1) directly and 2) using the properties of an orthonormal system.

Direct proof. Any can be represented as

(1)

If the right side is zero, then and all are zero.

Proof using orthonormality. If the right side in (1) is zero, then for all

Exercise 2.

Proof. (1) shows that is spanned by Besides, they are linearly independent by Exercise 1.

Definition 2. Let be two subspaces such that any element of one is orthogonal to any element of the other. Then the set is called an orthogonal sum of and denoted

Exercise 3. If a vector belongs to both terms in the orthogonal sum of two subspaces , then it is zero. This means that

Proof. This is because any element of is orthogonal to any element of so is orthogonal to itself, and

Exercise 4 (dimension additivity) Let be an orthogonal sum of two subspaces. Then

Proof. Let By definition, is spanned by some linearly independent vectors and is spanned by some linearly independent vectors Any can be decomposed as Since can be further decomposed as the system spans

Moreover, this system is linearly independent. If

then

By Exercise 3 then By linear independence of the vectors in the two systems all coefficients must be zero.

Linear dependence of vectors: definition and principal result

This is a topic most students have trouble with. It's because there is a lot of logic involved, so skimming it is not a good idea. We start with the most common definition.

Definition 1. Let be some vectors. They are called linearly dependent if there exist numbers not all of which are zero, such that

(1)

The sheer length of this definition scares some people and all they remember is equation (1). Stating just (1) instead of the whole definition is like skinning an animal.

Seizing the bull by the horns

The first matter of business is to shorten the definition and relate it to what we already know.

It's better to join the set of numbers into a vector Then the requirement that "not all of are zero" is equivalent to a single equation Further, let us write the vectors as columns and put them into a matrix Using multiplication of partitioned matrices we see that (1) is equivalent to

(2)

and therefore Definition 1 is equivalent to the following.

Definition 2. Vectors are called linearly dependent if the homogeneous equation (2) has nonzero solutions that is, the null space of is not

Negating Definition 2 gives the definition of linear independence. Negating Definition 2 is easier than negating Definition 1.

Definition 3. Vectors are called linearly independent if (for any nonzero we have or, alternatively, only for ).

Exercise 1. For any matrix one has

(3)

Proof. 1) Proving that If then and so 2) Proving that If then and so

Exercise 2. is a square symmetric matrix, for any

Proof. If is of size then is It is symmetric: By the way, some students write You cannot do this if is not square.

Criterion of linear independence. Vectors are linearly independent if and only if

Proof. We are going to use (3). By Exercise 2, is a square matrix and for a square matrix we have the equivalence Application of this result proves the statement.

Direct application of Definition 1 can be problematic. To prove that some vectors are linearly dependent, you have to produce such that Definition 1 is satisfied. This usually involves some guesswork, see exercises below. The criterion above doesn't require guessing and can be realized on the computer. The linear independence requirement is common in multiple regression analysis but not all econometricians know this criterion.

Putting some flesh on the bones

These are simple facts you need to know in addition to the above criterion.

Exercise 3. 1) Why do we exclude the case when all are zero?

2) What happens if among there are zero vectors?

3) Show that in case of two non-zero vectors Definition 1 is equivalent to just proportionality of one vector to another.

Solution. 1) If all are zero, (1) is trivially satisfied, no matter what the vectors are.

2) If one of the vectors is zero, the coefficient of that vector can be set to one and all others to zero, so such vectors will be linearly dependent by Definition 1.

3) Consider two non-zero vectors If they are linearly dependent, then

(4)

where at least one of is not zero. Suppose Then because otherwise would be zero. Hence

(5)

where the proportionality coefficient is not zero. Conversely, (5) implies (4) (you have to produce the coefficients).

Exercise 4. Prove that if to the system of unit vectors we add any vector , the resulting system will be linearly dependent.

Everywhere in this section we assume that is square of size The main problem is about solvability of the equation .

Dissecting the problem

Exercise 1. If is invertible, then and

Proof. If then so (you could also say that is one-to-one). From we have This tells us that with We see that any belongs to the image of so

Exercise 2. If then is invertible.

Proof. Since is one-to-one, we can use the inverse of in the general sense: . It is defined on the image of We know that is given by some matrix : for all Hence, for Putting these equations side by side we get

(1)

(In detail: the identity matrix is partitioned as so ). Thus, is the left inverse of and we've seen before that for square matrices this implies existence of the right inverse, invertibility of and the equation .

Exercise 4. The following conditions are equivalent:

a) b) c) d) is invertible.

Proof. The equivalence c) d) has been established before. For the implication d) a)+b) see Exercise 1. By Exercise 2 we have a) d). By Exercise 3, b) d.

Conclusion. The condition is the easiest to check. If it holds, then the equation has a unique solution for any

Remark 1. By negating Exercise 4, we see that the following conditions are equivalent:

does not exists.

Remark 2. By going through the proof one can check that the result of Exercise 4 holds if is replaced by (the set of vectors with complex coordinates).

Exercise 1. I) Let denote unit vectors. They possess properties

(1) for all for all

II) For any we have the representation

(2)

III) In (2) the coefficients can be found as

(3)

Proof. I) (1) is proved by direct calculation. II) To prove (2) we write

III) If we have (2), then it's easy to see that by (1)

Definition 1. Any system of vectors that satisfies (1) is called an orthonormal system. An orthonormal system is called complete if for any we have the decomposition (3). Exercise 1 shows that our system of unit vectors is complete orthonormal. A complete orthonormal system is called an orthonormal basis.

Analyzing a linear mapping

Exercise 2. Let be a matrix of size Suppose you don't know the elements of but you know the products for all How can you reveal the elements of from How do you express using the elements you define?

Solution. Let us partition into rows and suppose the elements are known. Let us try unit vectors as

(4)

Using (2) and (4) one can check that Hence, from (3) we have the answer to the second question:

(5)

The above calculation means that when are unknown, we can define them by and then the action of on will be described by the last expression in (5).

Exercise 3. Suppose a mapping is linear: for any numbers and vectors Then there exists a matrix of size such that for all

Proof. Based on (4) in our case put . Applying (3) to we get

(6)

(the summation index is replaced by on purpose). Plugging (2) in (6)

(both and scalar product are linear)

The last equation is the definition of .

Exercise 4. An inverse of a linear mapping is linear (and given by a matrix by Exercise 3).

Proof. Let be a linear mapping and suppose its inverse in the general sense exists. Then for all Let us put for arbitrary numbers and vectors Then we have or, using linearity of

Putting we get

Thus, is linear.

Remark. In all of the above it is important that are unit vectors. For a different basis, the results drastically change.

Geometry of linear equations: questions for repetition

This section evolves around the concepts of linearity, linear subspaces and orthogonality. As usual, you are expected to produce at least those proofs that I give.

Prove linearity of the mapping generated by a matrix. This fundamental fact will have many implications. Do you think any linear mapping is generated by some matrix? Do you think a mapping inverse to a linear mapping should be linear?

How would you write ( is a matrix and is a vector), if is partitioned a) into rows and b) into columns?

Show that a span of some vectors is a linear subspace. Note that one- and two-dimensional examples of linear subspaces we considered are special cases of subspaces described by linear systems of equations.

Can spans of two different systems of vectors coincide? Give an example.

What is the description of the set of solutions of ? It is pretty general. For example, the structure of solutions of an ordinary linear differential equation is the same.

Geometry of linear equations: orthogonal complement and equation solvability

From orthogonality of vectors to orthogonality of subspaces

Definition 1. Let be a linear subspace of Its orthogonal complement is defined as the set of all vectors orthogonal to all elements of

Exercise 1. Let be the axis on the plane. Find

Solution. The condition for all implies The component is free. Thus, is the axis.

Similarly, the orthogonal complement of the axis is the axis. This fact is generalized as follows:

Theorem (on second orthocomplement) For any linear subspace one has .

This statement is geometrically simple but the proof is rather complex and will be omitted (see Akhiezer & Glazman, Theory of Linear Operators in Hilbert Space, Dover Publications, 1993, Chapter 1, sections 6 and 7). It's good to keep in mind what makes it possible. For any set its orthogonal complement can be defined in the same way: As in Exercise 1, you can show that if contains just the unit vector of the axis, then is the axis and therefore is the axis. Thus, in this example we have a strict inclusion The equality is achieved only for linear subspaces.

Exercise 2. In consider the axis Can you tell what is?

Exercise 3. and

Link between image of a matrix and null space of its transpose

Exercise 5 (second characterization of matrix image) (and, consequently, ).

Proof. Let us prove that

(2) .

To this end, we first prove the inclusion Let . For an arbitrary we have Hence by (1) Since is arbitrary, we can put and obtain This implies and

Conversely, to prove suppose Then and for an arbitrary we have Since runs over this means that

Passing to orthogonal complements in (2), by the theorem on second orthocomplement we get what we need.

Definition 2. Let be two subspaces. We say that is their orthogonal sum if 1) every element can be decomposed as with and 2) every element of is orthogonal to every element of Orthogonality of to implies which, in turn, guarantees uniqueness of the representation

Exercise 6. Assume that is of size . Exercise 5 and Exercise 3 imply and .

The importance of Exercise 5 is explained by the fact that the null space of a matrix is easier to describe analytically than the image. For the next summary, you might want to review the conclusion on the role of the null space.

Summary. 1) In order to see if has solutions, check if is orthogonal to In particular, if is one-to-one, then and has solutions for all (see Exercise 3 above).

2) If is one-to-one, then may have only unique solutions.

3) If both and are one-to-one, then has a unique solution for any