26
Aug 16

Derivation of OLS estimators: the do's and don'ts

Derivation of OLS estimators: the do's and don'ts

Here I give the shortest rigorous derivation of OLS estimators for simple regression y=a+bx+e indicating the pitfalls.

If you need just an easy way to obtain the estimators and don't care about rigor, see this post.

Definition and problem setup

Observations come in pairs (x_1,y_1),...,(x_n,y_n). We want to approximate the y's by a linear function of x's and therefore we are interested in minimizing the residuals r_i=y_i-a-bx_i. It is impossible to choose two variables a,b so as to minimize n quantities at the same time. The compromise is achieved by minimizing the residual sum of squares RSS=\sum r_i^2=\sum(y_i-a-bx_i)^2. OLS estimators are the values a,b that minimize RSS. We find them by applying first order conditions which are necessary for functions optima.

Applying FOC's

Don't square out the residuals. This leads to large expressions, which most students fail to handle.

Do apply the chain rule: [f(g(t))]'=f'(g(t))g'(t) where f is called an external functions and g is called an internal function. In case of the squared residual r_i^2=(y_i-a-bx_i)^2 we have f(g)=g^2 and g(a,b)=y_i-a-bx_i. Therefore \frac{\partial r_i^2}{\partial a}=(-2)(y_i-a-bx_i), \frac{\partial r_i^2}{\partial b}=(-2)(y_i-a-bx_i)x_i. Summing, equating the result to zero and getting rid of -2 gives a system of equations

(1) \sum(y_i-a-bx_i)=0, \sum(y_i-a-bx_i)x_i=0.

Solving for a,b

Don't carry the summation signs. They are the trees that prevent many students from seeing the forest.

Do replace them with sample means ASAP  using \sum x_i=n\bar{x}. Equations (1) give

(2) \overline{y-a-bx}=0, \overline{(y-a-bx)x}=0.

Notice that n has been dropped and the subscript i disappears together with the summation signs. The general linearity property of expectations E(ax+by)=aEx+bEy (where a,b are numbers and x,y are random variables) is true for sample means too: \overline{(ax+by)}=a\bar{x}+b\bar{y}. It is used to rewrite equations (2) as

(3) \bar{y}-a-b\bar{x}=0, \overline{yx}-a\bar{x}-b\overline{x^2}=0.

Rearranging the result

Most students can solve a system of two equations (3) for two unknowns, so I skip this step. The solutions are

(4) \hat{b}=\frac{\overline{yx}-\bar{x}\bar{y}}{\overline{x^2}-{\bar{x}}^2},

(5) \hat{a}=\bar{y}-\hat{b}\bar{x}.

Do put the hats on the resulting a,b: they are the estimators we have been looking for. Don't put the hats during the derivation, because a,b have been variable.

Don't leave equation (4) in this form. Do use equations (5) from this post to rewrite equation (4) as

(6) \hat{b}=\frac{\overline{(x-\bar{x})(y-\bar{y})}}{\overline{(x-\bar{x})^2}}=\frac{Cov(x,y)}{Var(x)}.

Don't plug this expression in (5). In practice, (6) is calculated first and then the result is plugged in (5).

9 Responses for "Derivation of OLS estimators: the do's and don'ts"

  1. […] Here we derived the OLS estimators. To distinguish between sample and population means, the variance and covariance in the slope estimator will be provided with the subscript u (for "uniform", see the rationale here). […]

  2. […] a simplified derivation or a full derivation. I am using the […]

  3. […] in two important formulas: correlation coefficient and slope estimator in simple regression (see derivation, simplified derivation and proof of […]

  4. […] Here we derived the OLS estimators of the intercept and slope: […]

  5. […] is an OLS estimator - simplified derivation. In case of the OLS estimator, there is also the rigorous derivation. For the IV estimator, the simplified derivation is the only […]

  6. […] is the density of  (we use the distribution function differentiation equation and the chain rule). In statistical software, the value of  is usually reported at the mean value of the […]

  7. […] and a method are not the same. Application of the least squares method to the linear model gives OLS estimators. Here we apply the Maximum Likelihood (ML) method to the same […]

  8. […] come in pairs . In case of ordinary least squares, we approximated the y's with linear functions of the parameters, possibly nonlinear in x's. Now we […]

Leave a Reply

You must be logged in to post a comment.