Feb 16

Simple regression - before and after estimation

Simple regression: a useful comparison of what we have before and after estimation

Before anything

Initially, we have only observations

(1) (x_1,y_1),...,(x_n,y_n).

Before estimation

Then we assume dependence between  y's and x's of the form

(2)  y_i=\beta_0+\beta_1x_i+u_i,\ i=1,...,n.

Here \beta_0,\beta_1 are unknown parameters to be estimated and u_i are random errors which satisfy the basic assumption

(3) Eu_i=0.

It is convenient to call \beta_0+\beta_1x_i a linear part of model (2).

After estimation

The OLS estimators of \beta_1,\beta_0 are, respectively,

\hat{\beta}_1=\frac{Cov(x,y)}{Var(x)},\ \hat{\beta}_0=\bar{y}-\hat{\beta}_1\bar{x}.

Using these estimators, we define the fitted value \hat{y}_i=\hat{\beta}_0+\hat{\beta}_1x_i which mimics the linear part. To mimic the errors, we define residuals e_i=y_i-\hat{y}_i. These definitions give a sample analog of (2):

(2') y_i=\hat{\beta}_0+\hat{\beta}_1x_i+e_i.

The residuals also possess the property

(3') \bar{e}=0

which is a sample analog of (3).


Before estimation After estimation
\beta_0,\beta_1 are unknown, the errors are unobservable The estimators, fitted values and residuals are known functions of observed values (1)
(2) is just a product of our imagination Its analog (2') holds by construction
Whether (3') is true or not we don't know Its analog (3') is always true

Tricky question. Put e=\big(e_1,...,e_n). Ask your students to show that if x_i are deterministic, then under condition (3) along with (3') one has

(3'') Ee=0.

This will reveal if they know the difference between sample means and population means.


Leave a Reply

You must be logged in to post a comment.