Proving unbiasedness of OLS estimators - the do's and don'ts

### Groundwork

Here we derived the OLS estimators. To distinguish between sample and population means, the variance and covariance in the slope estimator will be provided with the subscript u (for "uniform", see the rationale here).

(1) ,

(2) .

These equations are used in conjunction with the model

(3)

where we remember that

(4) for all .

Since (2) depends on (1), we have to start with unbiasedness of the slope estimator.

### Using the right representation is critical

We have to show that .

**Step 1. Don't** apply the expectation directly to (1). **Do** separate in (1) what is supposed to be . To reveal the role of errors in (1), plug (3) in (1) and use linearity of covariance with respect to each argument when the other argument is fixed:

.

Here (a constant is uncorrelated with any variable), (covariance of with itself is its variance), so

(5) .

Equation (5) is the mean-plus-deviation-from-the-mean decomposition. Many students think that because of (4). **No!** The covariance here does not involve the population mean.

**Step 2**. It pays to make one more step to develop (5). Write out the numerator in (5) using summation:

**Don't** write out ! Presence of two summations confuses many students.

Multiplying parentheses and using the fact that we have

To simplify calculations, denote Then the slope estimator becomes

(6)

This is the *critical representation*.

### Unbiasedness of the slope estimator

**Convenience condition**. *The regressor is deterministic*. I call it a convenience condition because it's just a matter of mathematical expedience, and later on we'll study ways to bypass it.

From (6), linearity of means and remembering that the deterministic coefficients behave like constants,

(7)

by (4). This proves unbiasedness.

You don't know the difference between the population and sample means until you see them working in the same formula.

### Unbiasedness of the intercept estimator

As above we plug (3) in (2): . Applying expectation:

### Conclusion

Since in (1) there is division by , the condition is the **main condition for existence** of OLS estimators. From the above proof we see that (4) is the **main condition for unbiasedness**.

[…] Here I derived the representation of the OLS estimator of the slope […]

[…] applications: one, and two, and […]

[…] This condition is imposed to be able to apply the properties of expectation, see equation (6) in this post. The time trend and dummy variables are examples of deterministic regressors. However, most […]

[…] First grab the critical representation (6) derived here: […]

[…] Unbiasedness of OLS estimators is thoroughly discussed here. […]