Jul 18

Euclidean space geometry: Cauchy-Schwarz inequality

Euclidean space geometry: Cauchy-Schwarz inequality

At first glance, the scalar product and the distance notion are largely independent entities. As a matter of fact, they are intimately related, and the Cauchy-Schwarz inequality provides one of the links.

Cauchy-Schwarz inequality

Statement (Cauchy-Schwarz inequality) (I) For any vectors x,y one has |{x\cdot y}|\leq \Vert x\Vert\Vert y\Vert .

(II) If the inequality sign turns into equality, |{x\cdot y}|=\Vert x\Vert\Vert y\Vert , then y is proportional to x: y=ax.

Proof. (I) If at least one of the vectors is zero, both sides of the inequality are 0 and there is nothing to prove. To exclude the trivial case, suppose that none of x,y is zero and, therefore, \Vert x\Vert,\Vert y\Vert are positive. Consider a real-valued function of a real number t defined by f(t)=\Vert tx+y\Vert^2. Here we have a norm of a linear combination f(t)=t^2\Vert x\Vert  ^2+2tx\cdot y+\Vert y\Vert^2.

We see that f(t) is a parabola with branches looking upward (because the senior coefficient \Vert x\Vert^2 is positive). By nonnegativity of the squared norm, f(t)\geq 0 and the parabola lies above the horizontal axis in the (f,t) plane. Hence, the quadratic equation f(t)=0 may have at most one real root. This means that the discriminant of the equation is non-positive: D=[x\cdot y]^2-\Vert x\Vert^2\Vert y\Vert^2\leq 0. Applying square roots to both sides of [x\cdot y]^2\leq\Vert x\Vert^2\Vert y\Vert^2 we finish the proof of the first part.

(II) In case of the equality sign the discriminant is 0. Therefore the parabola touches the horizontal axis where f(t)=\Vert tx+y\Vert^2=0. But we know that this implies tx+y=0 which is just another way of writing y=ax.

Do you think this proof is tricky? During the long history of development of mathematics, mathematicians have invented many tricks, small and large. No matter how smart you are, you cannot reinvent all of them. Studying the existing tricks is a necessity. By the way, the definition of what is tricky and what is not is strictly personal and time-dependent.

Consequences of the Cauchy-Schwarz inequality

Exercise 1. Prove the triangle inequality.

Proof. From the expression for the norm of a linear combination

\Vert x+y\Vert^2=\Vert x\Vert^2+2x\cdot y+\Vert y\Vert^2           (using Cauchy-Schwarz)

\leq\Vert x\Vert^2+2\Vert x\Vert\Vert y\Vert+\Vert y\Vert^2=(\Vert x\Vert+\Vert y\Vert )^2

which gives \Vert x+y\Vert\leq\Vert x\Vert+\Vert y\Vert.

Definition. For any two nonzero vectors from the Cauchy-Schwarz inequality we have \left|\frac{x\cdot y}{\Vert x\Vert\Vert y\Vert}\right|\leq 1. Therefore we can define the cosine of the angle \widehat{x,y} between x,y by

\cos (\widehat{x,y})=\frac{x\cdot y}{\Vert x\Vert\Vert y\Vert}.

This definition agrees with the definition of orthogonality: {x\cdot y=0} means that the angle between x,y is \pi/2 and \cos (\widehat{x,y}) should be zero.

Remark. Every bit of information about scalar products, norms and the Cauchy-Schwarz is true in the infinite-dimensional case. For applications in Optimization see this post and that post.

Playing with balls

Once we have a norm, we can define the distance between x,y by \text{dist}(x,y)=\Vert x-y\Vert . For any c\in R^n and r>0 let us put B(c,r)=\{ x\in R^n:\Vert x-c\Vert <r\}. This is a set of points whose distance from c is less than r. Therefore it is a ball centered at c and with radius r.

Exercise 2. (Application of the triangle inequality) Consider two balls B(C,R) and B(c,r), where R>r, so the first ball is larger. Under what condition the smaller ball is contained in the larger ball:

(1) B(c,r)\subset B(C,R)?

Solution. There are two possible ways to solve this exercise. One is to look at the geometry first and then try to translate it to math. The other is just try to see the solution from calculations. We follow the second way.

The inclusion relationship in terms of sets (1) is equivalent to a point-wise statement: any element x of the smaller ball belongs to the larger ball. This means that we have to start with the assumption \Vert  x-c\Vert<r and arrive to \Vert x-C\Vert<R. Using the triangle inequality we have

\Vert x-C\Vert=\Vert x-c+c-C\Vert\leq  \Vert x-c\Vert +\Vert c-C\Vert <r+\Vert  c-C\Vert.

We want \Vert x-C\Vert to be smaller than R. This is achieved if we require r+\Vert c-C\Vert\leq R.

Exercise 3. A set A\subset R^n is called open if any its element a\in A belongs to A together with some ball B(a,\varepsilon ). Prove that B(c,r) is open.

Proof. Take any a\in B(c,r). We have to produce \varepsilon such B(a,\varepsilon )\subset B(c,r). From the previous exercise we have this inclusion if \varepsilon =r-\Vert a-c\Vert.

Jul 18

Euclidean space geometry: scalar product, norm and distance

Euclidean space geometry: scalar product, norm and distance

Learning this material has spillover effects for Stats because everything in this section has analogs for means, variances and covariances.

Scalar product

Definition 1. The scalar product of two vectors x,y\in R^n is defined by x\cdot y=\sum_{i=1}^nx_iy_i. The motivation has been provided earlier.

Remark. If matrix notation is of essence and x,y are written as column vectors, we have x\cdot y=x^Ty. The first notation is better when we want to emphasize symmetry x\cdot y=y\cdot x.

Linearity. The scalar product is linear in the first argument when the second argument is fixed: for any vectors x,y,z and numbers a,b one has

(1) (ax+by)\cdot z=a(x\cdot z)+b(y\cdot z).

Proof. (ax+by)\cdot z=\sum_{i=1}^n(ax_i+by_i)z_i=\sum_{i=1}^n(ax_iz_i+by_iz_i)

=a\sum_{i=1}^nx_iz_i+b\sum_{i=1}^ny_iz_i=ax\cdot z+by\cdot z.

Special cases. 1) Homogeneity: by setting b=0 we get (ax)\cdot z=a(x\cdot  z). 2) Additivity: by setting a=b=1 we get (x+y)\cdot z=x\cdot z+y\cdot  z.

Exercise 1. Formulate and prove the corresponding properties of the scalar product with respect to the second argument.

Definition 2. The vectors x,y are called orthogonal if x\cdot y=0.

Exercise 2. 1) The zero vector is orthogonal to any other vector. 2) If x,y are orthogonal, then any vectors proportional to them are also orthogonal. 3) The unit vectors in R^n are defined by e_i=(0,...,1,...,0) (the unit is in the ith place, all other components are zeros), i=1,...,n. Check that they are pairwise orthogonal.


Exercise 3. On the plane find the distance between a point x and the origin.

Figure 1. Pythagoras theorem

Figure 1. Pythagoras theorem

Once I introduce the notation on a graph (Figure 1), everybody easily finds the distance to be \text{dist}(0,x)=\sqrt{x_1^2+x_2^2} using the Pythagoras theorem. Equally easily, almost everybody fails to connect this simple fact with the ensuing generalizations.

Definition 3. The norm in R^n is defined by \left\Vert x\right\Vert=\sqrt{\sum_{i=1}^nx_i^2}. It is interpreted as the distance from point x to the origin and also the length of the vector x.

Exercise 4. 1) Can the norm be negative? We know that, in general, there are two square roots of a positive number: one is positive and the other is negative. The positive one is called an arithmetic square root. Here we are using the arithmetic square root.

2) Using the norm can you define the distance between points x,y\in R^n?

3) The relationship between the norm and scalar product:

(2) \left\Vert x\right\Vert =\sqrt{x\cdot x}.

True or wrong?

4) Later on we'll prove that \Vert x+y\Vert\leq\Vert x\Vert+\Vert{ y}\Vert . Explain why this is called a triangle inequality. For this, you need to recall the parallelogram rule.

5) How much is \left\Vert 0\right\Vert ? If \left\Vert x\right\Vert =0, what can you say about x?

Norm of a linear combination. For any vectors x,y and numbers a,b one has

(3) \left\Vert ax+by\right\Vert^2=a^2\left\Vert x\right\Vert^2+2ab(x\cdot y)+b^2\left\Vert y\right\Vert^2.

Proof. From (2) we have

\left\Vert ax+by\right\Vert^2=\left(ax+by\right)\cdot\left(ax+by\right)     (using linearity in the first argument)

=ax\cdot\left(ax+by\right)+by\cdot\left(ax+by\right)         (using linearity in the second argument)

=a^2x\cdot x+abx\cdot y+bay\cdot x+b^2y\cdot y (applying symmetry of the scalar product and (2))

=a^2\left\Vert x\right\Vert^2+2ab(x\cdot y)+b^2\left\Vert y\right\Vert^2.

Pythagoras theorem. If x,y are orthogonal, then \left\Vert x+y\right\Vert^2=\left\Vert x\right\Vert^2+\left\Vert y\right\Vert^2.

This is immediate from (3).

Norm homogeneity. Review the definition of the absolute value and the equation |a|=\sqrt{a^2}. The norm is homogeneous of degree 1:

\left\Vert ax\right\Vert=\sqrt{(ax)\cdot (ax)}=\sqrt{{a^2x\cdot x}}=|a|\left\Vert x\right\Vert.