7
Jul 18

Euclidean space geometry: scalar product, norm and distance

Euclidean space geometry: scalar product, norm and distance

Learning this material has spillover effects for Stats because everything in this section has analogs for means, variances and covariances.

Scalar product

Definition 1. The scalar product of two vectors x,y\in R^n is defined by x\cdot y=\sum_{i=1}^nx_iy_i. The motivation has been provided earlier.

Remark. If matrix notation is of essence and x,y are written as column vectors, we have x\cdot y=x^Ty. The first notation is better when we want to emphasize symmetry x\cdot y=y\cdot x.

Linearity. The scalar product is linear in the first argument when the second argument is fixed: for any vectors x,y,z and numbers a,b one has

(1) (ax+by)\cdot z=a(x\cdot z)+b(y\cdot z).

Proof. (ax+by)\cdot z=\sum_{i=1}^n(ax_i+by_i)z_i=\sum_{i=1}^n(ax_iz_i+by_iz_i)

=a\sum_{i=1}^nx_iz_i+b\sum_{i=1}^ny_iz_i=ax\cdot z+by\cdot z.

Special cases. 1) Homogeneity: by setting b=0 we get (ax)\cdot z=a(x\cdot z). 2) Additivity: by setting a=b=1 we get (x+y)\cdot z=x\cdot z+y\cdot z.

Exercise 1. Formulate and prove the corresponding properties of the scalar product with respect to the second argument.

Definition 2. The vectors x,y are called orthogonal if x\cdot y=0.

Exercise 2. 1) The zero vector is orthogonal to any other vector. 2) If x,y are orthogonal, then any vectors proportional to them are also orthogonal. 3) The unit vectors in R^n are defined by e_i=(0,...,1,...,0) (the unit is in the ith place, all other components are zeros), i=1,...,n. Check that they are pairwise orthogonal.

Norm

Exercise 3. On the plane find the distance between a point x and the origin.

Figure 1. Pythagoras theorem

Figure 1. Pythagoras theorem

Once I introduce the notation on a graph (Figure 1), everybody easily finds the distance to be \text{dist}(0,x)=\sqrt{x_1^2+x_2^2} using the Pythagoras theorem. Equally easily, almost everybody fails to connect this simple fact with the ensuing generalizations.

Definition 3. The norm in R^n is defined by \left\Vert x\right\Vert=\sqrt{\sum_{i=1}^nx_i^2}. It is interpreted as the distance from point x to the origin and also the length of the vector x.

Exercise 4. 1) Can the norm be negative? We know that, in general, there are two square roots of a positive number: one is positive and the other is negative. The positive one is called an arithmetic square root. Here we are using the arithmetic square root.

2) Using the norm can you define the distance between points x,y\in R^n?

3) The relationship between the norm and scalar product:

(2) \left\Vert x\right\Vert =\sqrt{x\cdot x}.

True or wrong?

4) Later on we'll prove that \Vert x+y\Vert\leq\Vert x\Vert+\Vert{ y}\Vert . Explain why this is called a triangle inequality. For this, you need to recall the parallelogram rule.

5) How much is \left\Vert 0\right\Vert ? If \left\Vert x\right\Vert =0, what can you say about x?

Norm of a linear combination. For any vectors x,y and numbers a,b one has

(3) \left\Vert ax+by\right\Vert^2=a^2\left\Vert x\right\Vert^2+2ab(x\cdot y)+b^2\left\Vert y\right\Vert^2.

Proof. From (2) we have

\left\Vert ax+by\right\Vert^2=\left(ax+by\right)\cdot\left(ax+by\right)     (using linearity in the first argument)

=ax\cdot\left(ax+by\right)+by\cdot\left(ax+by\right)         (using linearity in the second argument)

=a^2x\cdot x+abx\cdot y+bay\cdot x+b^2y\cdot y (applying symmetry of the scalar product and (2))

=a^2\left\Vert x\right\Vert^2+2ab(x\cdot y)+b^2\left\Vert y\right\Vert^2.

Pythagoras theorem. If x,y are orthogonal, then \left\Vert x+y\right\Vert^2=\left\Vert x\right\Vert^2+\left\Vert y\right\Vert^2.

This is immediate from (3).

Norm homogeneity. Review the definition of the absolute value and the equation |a|=\sqrt{a^2}. The norm is homogeneous of degree 1:

\left\Vert ax\right\Vert=\sqrt{(ax)\cdot (ax)}=\sqrt{{a^2x\cdot x}}=|a|\left\Vert x\right\Vert.

12
Nov 16

Properties of standard deviation

Properties of standard deviation are divided in two parts. The definitions and consequences are given here. Both variance and standard deviation are used to measure variability of values of a random variable around its mean. Then why use both of them? The why will be explained in another post.

Properties of standard deviation: definitions and consequences

Definition. For a random variable X, the quantity \sigma (X) = \sqrt {Var(X)} is called its standard deviation.

    Digression about square roots and absolute values

In general, there are two square roots of a positive number, one positive and the other negative. The positive one is called an arithmetic square root. The arithmetic root is applied here to Var(X) \ge 0 (see properties of variance), so standard deviation is always nonnegative.
Definition. An absolute value of a real number a is defined by
(1) |a| =a if a is nonnegative and |a| =-a if a is negative.
This two-part definition is a stumbling block for many students, so making them plug in a few numbers is a must. It is introduced to measure the distance from point a to the origin. For example, dist(3,0) = |3| = 3 and dist(-3,0) = |-3| = 3. More generally, for any points a,b on the real line the distance between them is given by dist(a,b) = |a - b|.

By squaring both sides in Eq. (1) we obtain |a|^2={a^2}. Application of the arithmetic square root gives

(2) |a|=\sqrt {a^2}.

This is the equation we need right now.

Back to standard deviation

Property 1. Standard deviation is homogeneous of degree 1. Indeed, using homogeneity of variance and equation (2), we have

\sigma (aX) =\sqrt{Var(aX)}=\sqrt{{a^2}Var(X)}=|a|\sigma(X).

Unlike homogeneity of expected values, here we have an absolute value of the scaling coefficient a.

Property 2. Cauchy-Schwarz inequality. (Part 1) For any random variables X,Y one has

(3) |Cov(X,Y)|\le\sigma(X)\sigma(Y).

(Part 2) If the inequality sign in (3) turns into equality, |Cov(X,Y)|=\sigma (X)\sigma (Y), then Y is a linear function of X: Y = aX + b, with some constants a,b.
Proof. (Part 1) If at least one of the variables is constant, both sides of the inequality are 0 and there is nothing to prove. To exclude the trivial case, let X,Y be non-constant and, therefore, Var(X),\ Var(Y) are positive. Consider a real-valued function of a real number t defined by f(t) = Var(tX + Y). Here we have variance of a linear combination

f(t)=t^2Var(X)+2tCov(X,Y)+Var(Y).

We see that f(t) is a parabola with branches looking upward (because the senior coefficient Var(X) is positive). By nonnegativity of variance, f(t)\ge 0 and the parabola lies above the horizontal axis in the (f,t) plane. Hence, the quadratic equation f(t) = 0 may have at most one real root. This means that the discriminant of the equation is non-positive:

D=Cov(X,Y)^2-Var(X)Var(Y)\le 0.

Applying square roots to both sides of Cov(X,Y)^2\le Var(X)Var(Y) we finish the proof of the first part.

(Part 2) In case of the equality sign the discriminant is 0. Therefore the parabola touches the horizontal axis where f(t)=Var(tX + Y)=0. But we know that this implies tX + Y = constant which is just another way of writing Y = aX + b.

Comment. (3) explains one of the main properties of the correlation:

-1\le\rho(X,Y)=\frac{Cov(X,Y)}{\sigma(X)\sigma(Y)}\le 1.