18
Feb 16

## Simple regression - before and after estimation Simple regression: a useful comparison of what we have before and after estimation

Before anything

Initially, we have only observations

(1) $(x_1,y_1),...,(x_n,y_n)$.

Before estimation

Then we assume dependence between  y's and x's of the form

(2) $y_i=\beta_0+\beta_1x_i+u_i,\ i=1,...,n.$

Here $\beta_0,\beta_1$ are unknown parameters to be estimated and $u_i$ are random errors which satisfy the basic assumption

(3) $Eu_i=0.$

It is convenient to call $\beta_0+\beta_1x_i$ a linear part of model (2).

After estimation

The OLS estimators of $\beta_1,\beta_0$ are, respectively, $\hat{\beta}_1=\frac{Cov(x,y)}{Var(x)},\ \hat{\beta}_0=\bar{y}-\hat{\beta}_1\bar{x}.$

Using these estimators, we define the fitted value $\hat{y}_i=\hat{\beta}_0+\hat{\beta}_1x_i$ which mimics the linear part. To mimic the errors, we define residuals $e_i=y_i-\hat{y}_i$. These definitions give a sample analog of (2):

(2') $y_i=\hat{\beta}_0+\hat{\beta}_1x_i+e_i.$

The residuals also possess the property

(3') $\bar{e}=0$

which is a sample analog of (3).

Comparison

 Before estimation After estimation $\beta_0,\beta_1$ $\beta_0,\beta_1$ are unknown, the errors are unobservable The estimators, fitted values and residuals are known functions of observed values (1) (2) is just a product of our imagination Its analog (2') holds by construction Whether (3') is true or not we don't know Its analog (3') is always true

Tricky question. Put $e=\big(e_1,...,e_n)$. Ask your students to show that if $x_i$ are deterministic, then under condition (3) along with (3') one has

(3'') $Ee=0.$

This will reveal if they know the difference between sample means and population means.