18
Feb 16

## Simple regression - before and after estimation

Simple regression: a useful comparison of what we have before and after estimation

Before anything

Initially, we have only observations

(1) $(x_1,y_1),...,(x_n,y_n)$.

Before estimation

Then we assume dependence between  y's and x's of the form

(2)  $y_i=\beta_0+\beta_1x_i+u_i,\ i=1,...,n.$

Here $\beta_0,\beta_1$ are unknown parameters to be estimated and $u_i$ are random errors which satisfy the basic assumption

(3) $Eu_i=0.$

It is convenient to call $\beta_0+\beta_1x_i$ a linear part of model (2).

After estimation

The OLS estimators of $\beta_1,\beta_0$ are, respectively,

$\hat{\beta}_1=\frac{Cov(x,y)}{Var(x)},\ \hat{\beta}_0=\bar{y}-\hat{\beta}_1\bar{x}.$

Using these estimators, we define the fitted value $\hat{y}_i=\hat{\beta}_0+\hat{\beta}_1x_i$ which mimics the linear part. To mimic the errors, we define residuals $e_i=y_i-\hat{y}_i$. These definitions give a sample analog of (2):

(2') $y_i=\hat{\beta}_0+\hat{\beta}_1x_i+e_i.$

The residuals also possess the property

(3') $\bar{e}=0$

which is a sample analog of (3).

Comparison

 Before estimation After estimation $\beta_0,\beta_1$$\beta_0,\beta_1$ are unknown, the errors are unobservable The estimators, fitted values and residuals are known functions of observed values (1) (2) is just a product of our imagination Its analog (2') holds by construction Whether (3') is true or not we don't know Its analog (3') is always true

Tricky question. Put $e=\big(e_1,...,e_n)$. Ask your students to show that if $x_i$ are deterministic, then under condition (3) along with (3') one has

(3'') $Ee=0.$

This will reveal if they know the difference between sample means and population means.

### 5 Responses for "Simple regression - before and after estimation"

1. […] The first answer, , tells us that if we know the individual means, we can avoid calculating by simply adding two numbers. Similarly, the second formula, , simplifies calculation of . Methodologically, this is an excellent opportunity to dive into theory. Firstly, there is good motivation. Secondly, it's easy to see the link between numbers and algebra (see tabular representations of random variables in Chapters 4 and 5 of my book (you are welcome to download the free version). Thirdly, even though this is theory, many things here are done by analogy, which students love. Fourthly, this topic paves the road to properties of the variance and covariance (recall that the slope in simple regression is covariance over variance). […]

2. […] say: the formal treatment of the true model, error term and their implications for inference is beyond the scope of this […]

3. […] is the fitted value and is the residual, see this post. We still want to see how is far from  With this purpose, from both sides of equation (2) we […]

4. […] new in the area I thought I knew everything about. So here is the derivation. By definition, the fitted value […]

5. […] running regression, report the estimated equation. It is called a fitted line and in our case looks like this: Earnings = -13.93+2.45*S (use descriptive names and not abstract […]