30
Nov 18

## Application: estimating sigma squared

Consider multiple regression

(1) $y=X\beta +e$

where

(a) the regressors are assumed deterministic, (b) the number of regressors $k$ is smaller than the number of observations $n,$ (c) the regressors are linearly independent, $\det (X^TX)\neq 0,$ and (d) the errors are homoscedastic and uncorrelated,

(2) $Var(e)=\sigma^2I.$

Usually students remember that $\beta$ should be estimated and don't pay attention to estimation of $\sigma^2.$ Partly this is because $\sigma^2$ does not appear in the regression and partly because the result on estimation of error variance is more complex than the result on the OLS estimator of $\beta .$

Definition 1. Let $\hat{\beta}=(X^TX)^{-1}X^Ty$ be the OLS estimator of $\beta$. $\hat{y}=X\hat{\beta}$ is called the fitted value and $r=y-\hat{y}$ is called the residual.

Exercise 1. Using the projectors $P=X(X^TX)^{-1}X^T$ and $Q=I-P$ show that $\hat{y}=Py$ and $r=Qe.$

Proof. The first equation is obvious. From the model we have $r=X\beta+e-P(X\beta +e).$ Since $PX\beta=X\beta,$ we have further $r=e-Pe=Qe.$

Definition 2. The OLS estimator of $\sigma^2$ is defined by $s^2=\Vert r\Vert^2/(n-k).$

Exercise 2. Prove that $s^2$ is unbiased: $Es^2=\sigma^2.$

Proof. Using projector properties we have

$\Vert r\Vert^2=(Qe)^TQe=e^TQ^TQe=e^TQe.$

Expectations of type $Ee^Te$ and $Eee^T$ would be easy to find from (2). However, we need to find $Ee^TQe$ where there is an obstructing $Q.$ See how this difficulty is overcome in the next calculation.

$E\Vert r\Vert^2=Ee^TQe$ ($e^TQe$ is a scalar, so its trace is equal to itself)

$=Etr(e^TQe)$ (applying trace-commuting)

$=Etr(Qee^T)$ (the regressors and hence $Q$ are deterministic, so we can use linearity of $E$)

$=tr(QEee^T)$ (applying (2)) $=\sigma^2tr(Q).$

$tr(P)=k$ because this is the dimension of the image of $P.$ Therefore $tr(Q)=n-k.$ Thus, $E\Vert r\Vert^2=\sigma^2(n-k)$ and $Es^2=\sigma^2.$