26
Aug 16

## Derivation of OLS estimators: the do's and don'ts

Derivation of OLS estimators: the do's and don'ts

Here I give the shortest rigorous derivation of OLS estimators for simple regression $y=a+bx+e$ indicating the pitfalls.

If you need just an easy way to obtain the estimators and don't care about rigor, see this post.

### Definition and problem setup

Observations come in pairs $(x_1,y_1),...,(x_n,y_n)$. We want to approximate the y's by a linear function of x's and therefore we are interested in minimizing the residuals $r_i=y_i-a-bx_i$. It is impossible to choose two variables $a,b$ so as to minimize $n$ quantities at the same time. The compromise is achieved by minimizing the residual sum of squares $RSS=\sum r_i^2=\sum(y_i-a-bx_i)^2$. OLS estimators are the values $a,b$ that minimize RSS. We find them by applying first order conditions which are necessary for functions optima.

### Applying FOC's

Don't square out the residuals. This leads to large expressions, which most students fail to handle.

Do apply the chain rule: $[f(g(t))]'=f'(g(t))g'(t)$ where $f$ is called an external functions and $g$ is called an internal function. In case of the squared residual $r_i^2=(y_i-a-bx_i)^2$ we have $f(g)=g^2$ and $g(a,b)=y_i-a-bx_i$. Therefore $\frac{\partial r_i^2}{\partial a}=(-2)(y_i-a-bx_i)$, $\frac{\partial r_i^2}{\partial b}=(-2)(y_i-a-bx_i)x_i$. Summing, equating the result to zero and getting rid of $-2$ gives a system of equations

(1) $\sum(y_i-a-bx_i)=0$, $\sum(y_i-a-bx_i)x_i=0$.

### Solving for $a,b$$a,b$

Don't carry the summation signs. They are the trees that prevent many students from seeing the forest.

Do replace them with sample means ASAP  using $\sum x_i=n\bar{x}$. Equations (1) give

(2) $\overline{y-a-bx}=0$, $\overline{(y-a-bx)x}=0$.

Notice that $n$ has been dropped and the subscript $i$ disappears together with the summation signs. The general linearity property of expectations $E(ax+by)=aEx+bEy$ (where $a,b$ are numbers and $x,y$ are random variables) is true for sample means too: $\overline{(ax+by)}=a\bar{x}+b\bar{y}$. It is used to rewrite equations (2) as

(3) $\bar{y}-a-b\bar{x}=0$, $\overline{yx}-a\bar{x}-b\overline{x^2}=0$.

### Rearranging the result

Most students can solve a system of two equations (3) for two unknowns, so I skip this step. The solutions are

(4) $\hat{b}=\frac{\overline{yx}-\bar{x}\bar{y}}{\overline{x^2}-{\bar{x}}^2}$,

(5) $\hat{a}=\bar{y}-\hat{b}\bar{x}$.

Do put the hats on the resulting $a,b$: they are the estimators we have been looking for. Don't put the hats during the derivation, because $a,b$ have been variable.

Don't leave equation (4) in this form. Do use equations (5) from this post to rewrite equation (4) as

(6) $\hat{b}=\frac{\overline{(x-\bar{x})(y-\bar{y})}}{\overline{(x-\bar{x})^2}}=\frac{Cov(x,y)}{Var(x)}$.

Don't plug this expression in (5). In practice, (6) is calculated first and then the result is plugged in (5).

### 9 Responses for "Derivation of OLS estimators: the do's and don'ts"

1. […] Here we derived the OLS estimators. To distinguish between sample and population means, the variance and covariance in the slope estimator will be provided with the subscript u (for "uniform", see the rationale here). […]

2. […] See applications: one, and two, and three. […]

3. […] a simplified derivation or a full derivation. I am using the […]

4. […] in two important formulas: correlation coefficient and slope estimator in simple regression (see derivation, simplified derivation and proof of […]

5. […] Here we derived the OLS estimators of the intercept and slope: […]

6. […] is an OLS estimator - simplified derivation. In case of the OLS estimator, there is also the rigorous derivation. For the IV estimator, the simplified derivation is the only […]

7. […] is the density of  (we use the distribution function differentiation equation and the chain rule). In statistical software, the value of  is usually reported at the mean value of the […]

8. […] and a method are not the same. Application of the least squares method to the linear model gives OLS estimators. Here we apply the Maximum Likelihood (ML) method to the same […]

9. […] come in pairs . In case of ordinary least squares, we approximated the y's with linear functions of the parameters, possibly nonlinear in x's. Now we […]