18
Nov 18

## Application: Ordinary Least Squares estimator

### Generalized Pythagoras theorem

Exercise 1. Let $P$ be a projector and denote $Q=I-P.$ Then $\Vert x\Vert^2=\Vert Px\Vert^2+\Vert Qx\Vert^2.$

Proof. By the scalar product properties

$\Vert x\Vert^2=\Vert Px+Qx\Vert^2=\Vert Px\Vert^2+2(Px)\cdot (Qx)+\Vert Qx\Vert^2.$

$P$ is symmetric and idempotent, so

$(Px)\cdot (Qx)=(Px)\cdot[(I-P)x]=x\cdot[(P-P^2)x]=0.$

This proves the statement.

### Ordinary Least Squares (OLS) estimator derivation

Problem statement. A vector $y\in R^n$ (the dependent vector) and vectors $x^{(1)},...,x^{(k)}\in R^n$ (independent vectors or regressors) are given. The OLS estimator is defined as that vector $\beta \in R^k$ which minimizes the total sum of squares $TSS=\sum_{i=1}^n(y_i-x^{(1)}\beta_1-...-x^{(k)}\beta_k)^2.$

Denoting $X=(x^{(1)},...,x^{(k)}),$ we see that $TSS=\Vert y-X\beta\Vert^2$ and that finding the OLS estimator means approximating $y$ with vectors from the image $\text{Img}X.$ $x^{(1)},...,x^{(k)}$ should be linearly independent, otherwise the solution will not be unique.

Assumption. $x^{(1)},...,x^{(k)}$ are linearly independent. This, in particular, implies that $k\leq n.$

Exercise 2. Show that the OLS estimator is

(2) $\hat{\beta}=(X^TX)^{-1}X^Ty.$

Proof. By Exercise 1 we can use $P=X(X^TX)^{-1}X^T.$ Since $X\beta$ belongs to the image of $P,$ $P$ doesn't change it: $X\beta=PX\beta.$ Denoting also $Q=I-P$ we have

$\Vert y-X\beta\Vert^2=\Vert y-Py+Py-X\beta\Vert^2$

$=\Vert Qy+P(y-X\beta)\Vert^2$ (by Exercise 1)

$=\Vert Qy\Vert^2+\Vert P(y-X\beta)\Vert^2.$

This shows that $\Vert Qy\Vert^2$ is a lower bound for $\Vert y-X\beta\Vert^2.$ This lower bound is achieved when the second term is made zero. From

$P(y-X\beta)=Py-X\beta =X(X^TX)^{-1}X^Ty-X\beta=X[(X^TX)^{-1}X^Ty-\beta]$

we see that the second term is zero if $\beta$ satisfies (2).

Usually the above derivation is applied to the dependent vector of the form $y=X\beta+e$ where $e$ is a random vector with mean zero. But it holds without this assumption. See also simplified derivation of the OLS estimator.

7
Jul 18

## Euclidean space geometry: scalar product, norm and distance

Learning this material has spillover effects for Stats because everything in this section has analogs for means, variances and covariances.

### Scalar product

Definition 1. The scalar product of two vectors $x,y\in R^n$ is defined by $x\cdot y=\sum_{i=1}^nx_iy_i$. The motivation has been provided earlier.

Remark. If matrix notation is of essence and $x,y$ are written as column vectors, we have $x\cdot y=x^Ty.$ The first notation is better when we want to emphasize symmetry $x\cdot y=y\cdot x.$

Linearity. The scalar product is linear in the first argument when the second argument is fixed: for any vectors $x,y,z$ and numbers $a,b$ one has

(1) $(ax+by)\cdot z=a(x\cdot z)+b(y\cdot z).$

Proof. $(ax+by)\cdot z=\sum_{i=1}^n(ax_i+by_i)z_i=\sum_{i=1}^n(ax_iz_i+by_iz_i)$

$=a\sum_{i=1}^nx_iz_i+b\sum_{i=1}^ny_iz_i=ax\cdot z+by\cdot z.$

Special cases. 1) Homogeneity: by setting $b=0$ we get $(ax)\cdot z=a(x\cdot z).$ 2) Additivity: by setting $a=b=1$ we get $(x+y)\cdot z=x\cdot z+y\cdot z.$

Exercise 1. Formulate and prove the corresponding properties of the scalar product with respect to the second argument.

Definition 2. The vectors $x,y$ are called orthogonal if $x\cdot y=0.$

Exercise 2. 1) The zero vector is orthogonal to any other vector. 2) If $x,y$ are orthogonal, then any vectors proportional to them are also orthogonal. 3) The unit vectors in $R^n$ are defined by $e_i=(0,...,1,...,0)$ (the unit is in the $i$th place, all other components are zeros), $i=1,...,n.$ Check that they are pairwise orthogonal.

### Norm

Exercise 3. On the plane find the distance between a point $x$ and the origin.

Figure 1. Pythagoras theorem

Once I introduce the notation on a graph (Figure 1), everybody easily finds the distance to be $\text{dist}(0,x)=\sqrt{x_1^2+x_2^2}$ using the Pythagoras theorem. Equally easily, almost everybody fails to connect this simple fact with the ensuing generalizations.

Definition 3. The norm in $R^n$ is defined by $\left\Vert x\right\Vert=\sqrt{\sum_{i=1}^nx_i^2}.$ It is interpreted as the distance from point $x$ to the origin and also the length of the vector $x$.

Exercise 4. 1) Can the norm be negative? We know that, in general, there are two square roots of a positive number: one is positive and the other is negative. The positive one is called an arithmetic square root. Here we are using the arithmetic square root.

2) Using the norm can you define the distance between points $x,y\in R^n?$

3) The relationship between the norm and scalar product:

(2) $\left\Vert x\right\Vert =\sqrt{x\cdot x}.$

True or wrong?

4) Later on we'll prove that $\Vert x+y\Vert\leq\Vert x\Vert+\Vert{ y}\Vert .$ Explain why this is called a triangle inequality. For this, you need to recall the parallelogram rule.

5) How much is $\left\Vert 0\right\Vert ?$ If $\left\Vert x\right\Vert =0,$ what can you say about $x?$

Norm of a linear combination. For any vectors $x,y$ and numbers $a,b$ one has

(3) $\left\Vert ax+by\right\Vert^2=a^2\left\Vert x\right\Vert^2+2ab(x\cdot y)+b^2\left\Vert y\right\Vert^2.$

Proof. From (2) we have

$\left\Vert ax+by\right\Vert^2=\left(ax+by\right)\cdot\left(ax+by\right)$     (using linearity in the first argument)

$=ax\cdot\left(ax+by\right)+by\cdot\left(ax+by\right)$         (using linearity in the second argument)

$=a^2x\cdot x+abx\cdot y+bay\cdot x+b^2y\cdot y$ (applying symmetry of the scalar product and (2))

$=a^2\left\Vert x\right\Vert^2+2ab(x\cdot y)+b^2\left\Vert y\right\Vert^2.$

Pythagoras theorem. If $x,y$ are orthogonal, then $\left\Vert x+y\right\Vert^2=\left\Vert x\right\Vert^2+\left\Vert y\right\Vert^2.$

This is immediate from (3).

Norm homogeneity. Review the definition of the absolute value and the equation $|a|=\sqrt{a^2}$. The norm is homogeneous of degree 1:

$\left\Vert ax\right\Vert=\sqrt{(ax)\cdot (ax)}=\sqrt{{a^2x\cdot x}}=|a|\left\Vert x\right\Vert$.