Geometry of linear equations: structure of image and null space

Definition 1. The subspace notion allows us to describe the algebraic structure of the set of solutions of The special case is called a homogeneous equation. Obviously, satisfies it but there may be other solutions. The equation is called an inhomogeneous equation. We address the questions of existence and uniqueness of its solutions.

Structure of the image of

Recall Basic observation 1: The image is the set of for which the inhomogeneous equation has solutions.

Exercise 1. is a linear subspace in

Proof. This follows from the first characterization of the matrix image. Here is a direct proof. Suppose Then there exist such that By linearity this gives Thus for we have found such that which means for any and is a subspace.

Structure of the null space of

Definition 2. The set of solutions of the homogeneous equation is called the null space of It is denoted

Exercise 2. The null space of is a linear subspace of

Proof. Suppose so that Then by linearity so for any and is a linear subspace.

Description of the set of solutions of

Intuition. In straight lines and planes that don't contain the origin can be obtained by shifting straight lines and planes that do (geometry should dominate the algebra at this point, see Figure 1). This is generalized in the next definition.

Figure 1. Shifting a subspace gives a hyperplane

Definition 3. Let be any vector and let be a subspace. denotes a shift of by and it is obtained by adding to all elements of Some people call a hyperplane.

Exercise 3. As we know, the equation has solutions if and only if Let us fix and let be some solution of . Then the set of all solutions of this equation is (in detail: any other solution of that equation can be obtained by adding an element of the null space to : ). This is written as

Proof. Let and let be any solution of Then and for some We obtain which proves the statement.

Conclusion. If , then is one-to-one and we have uniqueness of solutions of the inhomogeneous equation; otherwise, can serve as a measure of non-uniqueness.

Geometry of linear equations: linear spaces and subspaces

is a linear space in the following sense: for any and any the linear combination belongs to From earlier exercises we know that and are straight lines drawn through the vectors Therefore is a plane drawn through those vectors. We can say that is a linear space in the following sense: for any , the whole plane drawn through is contained in

Subsets of may have this property.

Example 1. On the plane, take any straight line passing through the origin (the slope doesn't matter). The equation of such a line is

(1)

where at least one of the coefficients is different from zero. If we can solve the equation for and get a more familiar form If we get a vertical line If two points satisfy (1), then you can check that their linear combination also satisfies (1). A straight line that does not pass through the origin is described by with

Example 2. In , take any straight line or any 2D plane passing through the origin. A straight line is described by a system of two equations and a plane is described by one equation You can do the algebra as above to show that linear combinations of elements of these sets belong to these sets. However, I suggest using the geometric interpretation of linear operations to show that these are examples of subspaces. A straight line or a plane that does not pass through the origin is not a subspace, see Figure 1. The straight line does not pass through the origin. If we take vectors from it, their sum, found by the parallelogram rule, does not belong to .

Figure 1. A hyperplane is not a subspace

Definition 1. The set is called a linear subspace of if for any it contains the whole plane drawn through By induction, any linear combination of any elements of a subspace belongs to it.

The definition does not exclude the extreme cases and . These two cases are called trivial.

Sometimes we are interested in how a subspace is generated.

Definition 2. Take any vectors and consider the set of all linear combinations

(2) with

This set is a linear subspace (because a linear combination of expressions of type (2) is again of this type) and it is called a span of We also say that is spanned by

General idea. Let be a function with the domain We would like to know for which the equation has solutions, and if it does, then how many (one or more). For the existence part, let the argument run over and see what set the values belong to. It is the image This definition directly implies

Basic observation 1. The equation has solutions if and only if

For the uniqueness part, fix and see for which arguments the value equals that This is the counter-image of

Basic observation 2. If consists of one point, you have uniqueness.

See how this works for the function This function is not linear. The function generated by a matrix is linear, and a lot more can be said in addition to the above observations.

Matrix as a mapping

Definition 1. Let be a matrix of size It generates a mapping according to ( is written as a column). Following the common practice, we identify the mapping with the matrix

Exercise 1. The mapping we just defined is linear: for any vectors and numbers one has

Exercise 2 (first characterization of matrix image) Show that the image consists of linear combinations of the columns of

Solution. Partitioning into columns, for any we have

(2)

This means that when runs over the images are linear combinations of the column-vectors

Remark. In (2) we silently used the multiplication rule for partitioned matrices. Here is the statement of the rule in a simple situation. Let be two matrices compatible for multiplication. Let us partition them into smaller matrices

.

Then the product can be found as if those blocks were numbers:

The only requirement for this to be true is that the blocks be compatible for muliplication. You will not be bored with the proof. In (2) the multiplication is performed as if were numbers.

Euclidean space geometry: questions for repetition

How to study definitions and statements

For a definition, choose the shortest and most geometrical formulation. If it's still difficult to remember, repeat it aloud several times. Speaking aloud sends more energy to your brain. Think out equivalent definitions and consequences. As you do this, new neurons will be created in your brain, and that requires time.

Unlike definitions, statements require proofs. If you use a particular fact in your proof, state it explicitly. My criterion of understanding is this: how could I guess a result like this? Can I generalize it or find an analogy? Repeat the theory in large chunks. Explain clearly, as if lecturing. This will allow you to concentrate on logic, as opposed to memorizing.

Roll up your sleeves

Prove or solve the next statements and exercises.

Statement 1. A scalar product is a function of two vector arguments such that

1) it is linear in the first argument when the second argument is fixed and linear in the second argument when the first one is fixed (this property is called bilinearity);

2) it is symmetric,

3) the scalar product of a vector by itself is non-negative:

4) if and only if

Statement 2. Once we have a scalar product, we can define an associated norm by It has the properties:

Statement 3. Once we have a norm, we can define a distance between two points by It satisfies the triangle inequality for any three points

Statement 4. With a scalar product and the associated norm at hand we can define the notions of orthogonality and cosine of an angle between two vectors. The Pythagoras theorem becomes a simple consequence of definitions.

Exercise 1. Derive the expression for the norm of a linear combination.

Exercise 2. What condition on the centers and radii is sufficient for inclusion of balls

Exercise 3. Prove that the upper part of the plane is an open set.

Exercise 4. Describe the geometry related to the quadratic equation

1) How do we know if the branches of the parabola look upward or downward?

2) What is the link between the discriminant and the following situations: a) the parabola does not touch or cross the axis, b) the parabola touches (is tangent to) the axis and c) the parabola crosses the axis at two different points?

Exercise 5. Prove and interpret the parallelogram law:

Euclidean space geometry: Cauchy-Schwarz inequality

At first glance, the scalar product has nothing to do with the distance notion. It has, and the Cauchy-Schwarz inequality provides one of the links.

Cauchy-Schwarz inequality

Statement (Cauchy-Schwarz inequality) (I) For any vectors one has .

(II) If the inequality sign turns into equality, , then is proportional to : .

Proof. (I) If at least one of the vectors is zero, both sides of the inequality are 0 and there is nothing to prove. To exclude the trivial case, suppose that none of is zero and, therefore, are positive. Consider a real-valued function of a real number defined by . Here we have a norm of a linear combination .

We see that is a parabola with branches looking upward (because the senior coefficient is positive). By nonnegativity of the norm, and the parabola lies above the horizontal axis in the plane. Hence, the quadratic equation may have at most one real root. This means that the discriminant of the equation is non-positive: Applying square roots to both sides of we finish the proof of the first part.

(II) In case of the equality sign the discriminant is 0. Therefore the parabola touches the horizontal axis where . But we know that this implies which is just another way of writing .

Do you think this proof is tricky? During the long history of development of mathematics, mathematicians have invented many tricks, small and large. No matter how smart you are, you cannot reinvent all of them. Studying the existing tricks is a necessity. By the way, the definition of what is tricky and what is not is strictly personal and time-dependent.

Definition. For any two nonzero vectors from the Cauchy-Schwarz inequality we have Therefore we can define the cosine of the angle between by

This definition agrees with the definition of orthogonality: means that the angle between is and should be zero.

Remark. Every bit of information about scalar products, norms and the Cauchy-Schwarz is true in the infinite-dimensional case. For applications in Optimization see this post and that post.

Playing with balls

Once we have a norm, we can define the distance between by For any and let us put This is a set of points whose distance from is less than Therefore it is a ball centered at and with radius

Exercise 2. (Application of the triangle inequality) Consider two balls and where so the first ball is larger. Under what condition the smaller ball is contained in the larger ball:

(1)

Solution. There are two possible ways to solve this exercise. One is to look at the geometry first and then try to translate it to math. The other is just try to see the solution from calculations. We follow the second way.

The inclusion relationship in terms of sets (1) is equivalent to a point-wise statement: any element of the smaller ball belongs to the larger ball. This means that we have to start with the assumption and arrive to Using the triangle inequality we have

We want to be smaller than This is achieved if we require

Exercise 3. A set is called open if any its element belongs to together with some ball Prove that is open.

Proof. Take any We have to produce such From the previous exercise we have this inclusion if .

Definition 1. The scalar product of two vectors is defined by .

Remark. If matrix notation is of essence and are written as column vectors, we have The first notation is better when we want to emphasize symmetry

Linearity. The scalar product is linear in the first argument when the second argument is fixed: for any vectors and numbers one has

(1)

Proof.

Special cases. 1) Homogeneity: by setting we get 2) Additivity: by setting we get

Exercise 1. Formulate and prove the corresponding properties of the scalar product with respect to the second argument.

Definition 2. The vectors are called orthogonal if

Exercise 2. 1) The zero vector is orthogonal to any other vector. 2) If are orthogonal, then any vectors proportional to them are also orthogonal. 3) The unit vectors in are defined by (the unit is in the th place, all other components are zeros), Check that they are pairwise orthogonal.

Norm

Exercise 3. On the plane find the distance between a point and the origin.

Figure 1. Pythagoras theorem

Once I introduce the notation on a graph (Figure 1), everybody easily finds the distance to be using the Pythagoras theorem. Equally easily, almost everybody fails to connect this simple fact with the ensuing generalizations.

Definition 3. The norm in is defined by It is interpreted as the distance from point to the origin and also the length of the vector .

Exercise 4. 1) Can the norm be negative? We know that, in general, there are two square roots of a positive number: one is positive and the other is negative. The positive one is called an arithmetic square root. Here we are using the arithmetic square root.

2) Using the norm can you define the distance between points

3) The relationship between the norm and scalar product:

(2)

True or wrong?

4) Later on we'll prove that Explain why this is called a triangle inequality. For this, you need to recall the parallelogram rule.

5) How much is If what can you say about

Norm of a linear combination. For any vectors and numbers one has

The combination of these words may sound frightening. In fact, if you want to succeed with matrix algebra, you need to start drawing inspiration from geometry as early as possible.

Sum of vectors

Definition. The set of all -dimensional vectors with is denoted and is called a Euclidean space.

is a plane. The space we live in is Our intuition doesn't work in dimensions higher than 3 but most facts we observe in real life on the plane and in the 3-dimensional space have direct analogs in higher dimensions. Keep in mind that can be called a vector or a point in depending on the context. When we think of it as a vector, we associate with it an arrow that starts at the origin and ends at the point

Careful inspection shows that the sum of two vectors is found using the parallelogram rule in Figure 1. The rule itself comes from physics: if two forces are applied to a point, their resultant force is found by the parallelogram rule. Whatever works in real life is guaranteed to work in Math.

Figure 1. Sum of vectors

Exercise 1. Let be the unit vector of the axis and the unit vector of the axis. Find the sum Generalization: if runs over the whole axis and runs over the whole axis, what is the set of resulting sums Further generalization: on the plane take two straight lines and that pass through the origin and are not parallel to one another. If runs over and runs over , what is the set of resulting sums

This seemingly innocuous exercise leads to profound ideas, to be considered later. The answer for the last question is that the sums will cover the whole This fact is written like this: Note that I intentionally use words that emphasize movement and geometry: "runs over" and "covers the whole". One of the differences between elementary and higher mathematics is that the former deals with fixed elements and the latter with sets within which there are movement and change.

Multiplication of a vector by a number

If is a number and is a vector, we put (scaling or multiplication by a number). Scaling by a positive number means lengthening it in case and shortening in case . Scaling by a negative number means, additionally, reverting the direction of , see Figure 2.

Figure 2. Scaling a vector

Exercise 2. Take a nonzero vector and find the set of all products where runs over

The answer is that is the straight line drawn through the vector or, alternatively, the straight line drawn through the origin and point

The expression is called a linear combination of vectors with coefficients To find you first scale and and then add the results.

Exercise 3. Let be two nonparallel vectors on the plane. Describe the set verbally and geometrically.

Verbally, this is the set of all linear combinations that result when run over Geometrically, this will be the whole plane

From minimum to infimum: Math is just a logical game

The true Math is a continuous exercise in logic. A good teacher makes that logic visible and tangible. A genius teacher does the logic mentally and only displays the result. I am trying to be as good a teacher as humanly possible.

Steps to develop a new definition

Step 1. Start with a simple example.

What is the minimum of the set ? In my class everybody says .

Step 2. Give the most visual definition.

Here is a good candidate: the minimum of a set is its leftmost point.

Step 3. Formalize the definition you gave.

We said "its point". This means should be an element of . "Leftmost" is formalized as for any . Thus the formal definition should be: the number is called a minimum of a set if

1) for any and

2) .

Step 4. Look at bad cases when the definition fails. Try to come up with a generalized definition that would cover the bad cases.

Here is a bad case: meet the infinity

Let . In my class, some students suggested and . Can you see why neither is good? See the explanation below if you can't. I would try . It still satisfies part 1) of the above definition but does not belong to . This is the place to stretch one's imagination.

Statement 1. No element of satisfies the formal definition above.

Proof. Take any element . Then it is larger than 1 and the number is halfway between and 1. We have shown that to the left of any there is another element of . So no element of is leftmost.

In the game of chess, the one who thinks several moves ahead wins. A stronger version of Statement 1 is the following.

Statement 2. To the left of any there are infinitely many elements of .

Proof. Indeed, set The numbers approach 1 from the right. As increases, they become closer and closer to 1. For any given , infinitely many of these numbers will satisfy .

Step 5. When thinking about a new definition, try to exclude undesirable outcomes.

Since for condition 2) is not satisfied, one might want to omit it altogether. However, leaving only part 1) would give a bad result. For example, would satisfy such a "definition", and it would be bad for two reasons. Firstly, there is no uniqueness. We could take any number . Secondly, if you take any , in the interval there are no elements of , so there is no reason to call such a number a minimum of .

Definition. A number is called an infimum of (denoted ) if

I) for any and

II) in , there is a sequence approaching .

The above discussion shows that .

Exercise 1. Repeat all of the above for . First define the maximum of a set. Then modify the definition to obtain what is called a supremum.

Exercise 2. Let , where is the basis of the natural log and is the ratio Length of circumference/Diameter of that circumference. Find the infimum and supremum of . Simply naming them is not enough; you have to prove that both parts of the definitions are satisfied.

Exercise 3. The infimum is also defined as the largest lower bound. Can you prove equivalence of the two definitions?

Exercise 4. The supremum is also defined as the least upper bound. Can you prove equivalence of the two definitions?

Exercise 5. Consider any two sets and their union. What is the relationship between the infimums of the three sets?

See if this definition of a function is better than others

Try the definitions from the Khan Academy, Math is Fun, Wikipedia, Paul's Online Math Notes, and there are some others. Most of them are obsessed with the input-output terminology, which is not used by professionals. I think my explanation is more visual.

"Sending letters" is associated with arrows

Situation: every citizen from the city of sends letters to some citizen(s) of the city of . In Figure 1, the arrows show that citizen sends letters to three citizens of : they are , while citizen sends letters to only . Let us write if sends letters to . Here, is called an argument (some people say an input) and is called a value (some people say an output).

Figure 1. This is not a function

Definition. is called a function if for each argument, the corresponding value is unique.

The situation depicted in Figure 1 does not describe a function, because sends letters to more than one person. The visualization of a function is given in Figure 2, where arrows going from one citizen of to more that one citizen of are excluded.

Figure 2. This is a function

We finalize the function definition with some remarks. In theory, of course, instead of cities, we talk about sets. Also, instead of "sending letters" we say "the function maps to ". is called an image of .

The set is called the domain of the function . Go to the beginning of this section and see that we said "every citizen from the city of ..." This means that our function is defined everywhere in the domain. In practice, when you are given a function, you may have to describe the domain by removing all elements where your function is not defined.

Here is a detective story

One citizen of mysteriously dies. Before dying, he manages to tell those around him that he had visited his correspondent in and was probably poisoned during the visit. The local detective's task will be simpler if the poor guy had been receiving letters from only one person. We write if receives letters from . If nobody in receives letters from more than one sender, then satisfies the definition of a function.

Definition. We say that the inverse function exists if satisfies the definition of a function. Equivalently,

(1) whenever , their images should be different: .

In Figure 2, the inverse does not exist. In Figure 3, it does. The function takes us along the red arrow from to its image . The inverse function takes us back along the yellow arrow from to , which is called a counterimage (or preimage) of .

Figure 3. This function has an inverse

How to find the derivative of the inverse

From now on the arguments and values will be real.

The direct function maps to and the inverse function maps back to . This gives an obvious identity: . Similarly, . Since this is true for all , we can differentiate this equation using the chain rule to get

Hence, which is often written as with

Conditions sufficient for existence of the inverse

Condition 1. Let the function be strictly monotone. If it is increasing, for example, then this condition means that implies . Obviously, this implies (1) and the inverse exists.

Condition 2 (condition sufficient for Condition 1). Suppose the derivative of does not change sign. For example, if the derivative is everywhere positive (think about the exponential function), then the function is strictly increasing and we can apply Condition 1.

Example. The function is not one-to-one on the real line. If we take as its domain , it will be one-to-one (it is strictly monotone and its derivative is positive on the set of positive numbers). On this set, it's inverse is . The equations and take the form and . The equations and are equivalent.

Roughly speaking, there are two types of Math: formula Math, where all calculations can be seen, and abstract (or invisible) Math, which happens in the head. In my Optimization class last year, students had serious problems with abstract Math. The problems started at the definitions level, which meant the students couldn’t move any further. This post is my attempt to help them with abstract Math. There is established terminology that people use to describe rules of logic. Good luck with that terminology. Below I introduce my own. Trying to explain something at an elementary level is always a thankless task, so I apologize in advance for lapses.

Math uses the usual human logic

Example 1. In each subject, our students have to take two exams: one at KBTU and another (at a later date) at the University of London (UoL). Experience shows that the UoL grade on average is 30 points lower than mine. The passing grade at the UoL is 40. Therefore I call my student successful if he/she gets at least 70 points (out of 100) in my class.

Each definition has a preamble, which provides logical grounds for the definition. All objects under consideration are elements of one large set, which I call an encompassing set. The set we are defining is a subset of the encompassing set. We use a certain property to separate elements in the set we are defining from those which are not its elements. This defining property should make sense in the encompassing set.

In Example 1, the preamble is “In each subject … passing grade at the UoL is 40.” The encompassing set is “Students taking my class”, the set we are defining is “Successful students”. The defining property is “Getting at least 70 in my class”. This property wouldn’t make sense if we used “All KZ citizens” as the encompassing set.

Let denote the encompassing set, the set we are defining and its complement. The original definition (of ) is called a direct definition. The definition of is its opposite. The direct definition and its opposite should be stated in such a way that and a) do not intersect and b) together cover the whole encompassing set. In proofs by contradiction often it is necessary to formulate the opposite definition.

The set can be defined using two types of definitions. In the first type we directly negate the property that is used to define . I call such a definition a negative opposite. In the second type we use the opposite of the property . I call such a definition a positive opposite. The opposite of “Getting at least 70 in my class” is “Getting less than 70 in my class”.

Example 2. I call a student unsuccessful if he/she does not get a grade of at least 70 in my class (negative opposite). I call a student unsuccessful if he/she gets a grade lower than 70 in my class (positive opposite).

In the negative opposite we don’t change the defining property; we simply say that it is not satisfied. In the positive opposite we pass from the property to its opposite and say that the elements satisfy the opposite property. The negative opposite contains negation “does not”. The positive opposite does not contain it (it may contain negations in more complex cases).

... but applies it to unusual objects

One and the same definition can be formulated in several different (however, equivalent) ways. Always try to find the most geometric form and then establish equivalence of different definitions.

Example 3. A set is called bounded from above if there is a number such that is a subset of a half-infinite interval : . Any such is called an upper bound for .

I start with this version of the definition because it is the most visual. Note that we require existence of a number with a certain property. A slightly shorter version of the above definition is:

A set is called bounded from above if is a subset of a half-infinite interval , for some .

In this version, the word “some” indicates existence. When we say “for all”, we affirm universality. A replacement of an existence requirement by a universal requirement drastically changes the definition. See what we get from Example 3 by such a replacement.

Example 4. A set is called bounded from above if for any number , is a subset of a half-infinite interval . Equivalently, a set is called bounded from above if is a subset of a half-infinite interval , for any .

Can you tell for which this definition holds? Some people can answer this question without hesitation. If you are not one of them, try to move the number . Many mathematical arguments require a choice of an object that would allow the researcher to prove or disprove a statement. In Example 4, we say “for any M”, and it is UP TO YOU to try any and select the ones which show what is going on.

Example 5. An inclusion relation in terms of sets equivalently in terms of set elements takes the form for all . If we replace the set relation by the element-wise relation in Example 3, we obtain a longer definition: a set is called bounded from above if there is a number such that for all .

This is the place people start having problems. The errors that I saw include confusion of “there is” (existence) and “for all” (universality). See what is wrong with the following three definitions (what does not correspond to the intuition of being “bounded from above”; we also don't want to be empty).

Example6. A set is called bounded from above if for any number there exists such that .

Example 7. A set is called bounded from above if for any number and for any one has .

Example 8. A set is called bounded from above if there is such that for any number one has . Example 9. A set is called bounded from above if for any there exists a number such that .

Remark. In Example 6, first is chosen and depends on it. In Example 7, the choices of and are independent. In Example 8, is fixed and is arbitrary. Finally, in Example 9, first is chosen and the choice of depends on .

In the next exercises, start with the most graphic version. In case of doubt, do everything step by step (describe the encompassing set, defining property etc.).

Exercise 1. Define a set bounded from below using a) sets and b) element-wise terminology. In both cases formulate the opposite definition and the definition of a lower bound.

Exercise 2. Define a bounded set (that is, bounded from below and above) using a) sets and b) element-wise terminology. In both cases formulate the opposite definition.