18
Nov 18

## Application: Ordinary Least Squares estimator

### Generalized Pythagoras theorem

Exercise 1. Let $P$ be a projector and denote $Q=I-P.$ Then $\Vert x\Vert^2=\Vert Px\Vert^2+\Vert Qx\Vert^2.$

Proof. By the scalar product properties

$\Vert x\Vert^2=\Vert Px+Qx\Vert^2=\Vert Px\Vert^2+2(Px)\cdot (Qx)+\Vert Qx\Vert^2.$

$P$ is symmetric and idempotent, so

$(Px)\cdot (Qx)=(Px)\cdot[(I-P)x]=x\cdot[(P-P^2)x]=0.$

This proves the statement.

### Ordinary Least Squares (OLS) estimator derivation

Problem statement. A vector $y\in R^n$ (the dependent vector) and vectors $x^{(1)},...,x^{(k)}\in R^n$ (independent vectors or regressors) are given. The OLS estimator is defined as that vector $\beta \in R^k$ which minimizes the total sum of squares $TSS=\sum_{i=1}^n(y_i-x^{(1)}\beta_1-...-x^{(k)}\beta_k)^2.$

Denoting $X=(x^{(1)},...,x^{(k)}),$ we see that $TSS=\Vert y-X\beta\Vert^2$ and that finding the OLS estimator means approximating $y$ with vectors from the image $\text{Img}X.$ $x^{(1)},...,x^{(k)}$ should be linearly independent, otherwise the solution will not be unique.

Assumption. $x^{(1)},...,x^{(k)}$ are linearly independent. This, in particular, implies that $k\leq n.$

Exercise 2. Show that the OLS estimator is

(2) $\hat{\beta}=(X^TX)^{-1}X^Ty.$

Proof. By Exercise 1 we can use $P=X(X^TX)^{-1}X^T.$ Since $X\beta$ belongs to the image of $P,$ $P$ doesn't change it: $X\beta=PX\beta.$ Denoting also $Q=I-P$ we have

$\Vert y-X\beta\Vert^2=\Vert y-Py+Py-X\beta\Vert^2$

$=\Vert Qy+P(y-X\beta)\Vert^2$ (by Exercise 1)

$=\Vert Qy\Vert^2+\Vert P(y-X\beta)\Vert^2.$

This shows that $\Vert Qy\Vert^2$ is a lower bound for $\Vert y-X\beta\Vert^2.$ This lower bound is achieved when the second term is made zero. From

$P(y-X\beta)=Py-X\beta =X(X^TX)^{-1}X^Ty-X\beta=X[(X^TX)^{-1}X^Ty-\beta]$

we see that the second term is zero if $\beta$ satisfies (2).

Usually the above derivation is applied to the dependent vector of the form $y=X\beta+e$ where $e$ is a random vector with mean zero. But it holds without this assumption. See also simplified derivation of the OLS estimator.