5
Apr 17

Maximum likelihood: application to linear model




Maximum likelihood: application to linear model

We have to remember that a model and a method are not the same. Application of the least squares method to the linear model gives OLS estimators. Here we apply the Maximum Likelihood (ML) method to the same model.

Assumptions and first order conditions for maximizing likelihood

We assume that the observations satisfy

(1) y_i=\beta _1+\beta _2x_i+u_i,\ i=1,...,n.
Our task is to find ML estimators of \beta _1,\beta _2,\sigma^2. To be able to realize the ML algorithm, we assume that the regressor x is deterministic. Then at the right side of (1) the error is the only random term.

Step 1. Suppose that u_1,...,u_n are independent normal with mean 0 and variance \sigma ^2. (This implies that the errors are uncorrelated and identically distributed.) The density of u_i is

(2) p(x)=\frac{1}{\sqrt{2\pi\sigma^2}}\exp(-\frac{x^2}{2\sigma^2}).
From (1) we see that y_i is normal, as a linear transformation of u_i. By equation (2) in that post, the density of observation (x_i,y_i) is
f(x_i,y_i|\beta_1,\beta_2,\sigma^2)=\frac{1}{\sqrt{2\pi\sigma^2}}e^{-\frac{(y_i-\beta_1-\beta_2x_i)^2}{2\sigma^2}}.

Step 2. The likelihood function, by definition, is the joint density, considered a function of parameters. Because of the independence of observations, it can be obtained as a product of these densities
L(\beta_1,\beta_2,\sigma^2|y,x)=\prod_{i=1}^n\frac{1}{\sqrt{2\pi\sigma^2}}e^{-\frac{(y_i-\beta_1-\beta_2x_i)^2}{2\sigma^2}}

=(2\pi\sigma^2)^{-n/2}e^{-\sum_{i=1}^n\frac{(y_i-\beta_1-\beta_2x_i)^2}{2\sigma^2}}=(2\pi\sigma^2)^{-n/2}e^{-\frac{RSS}{2\sigma^2}}

(see this post for the definition of RSS).

Step 3. The log-likelihood is
\lambda(\beta_1,\beta_2,\sigma^2|y,x)=-\frac{n}{2}\log(2\pi)-\frac{n}{2}\log(\sigma^2)-\frac{RSS}{2\sigma^2}.

The first-order conditions are
\frac{\partial\lambda}{\partial\beta_1}=\frac{\partial\lambda}{\partial\beta_2}=\frac{\partial\lambda}{\partial\sigma^2}=0
(technically, it is easier to differentiate with respect to \sigma^2 than to \sigma). We obtain a system of three equations for determining the parameters:
\frac{\partial\lambda}{\partial\beta_1}=\frac{\partial RSS}{\partial\beta_1}/(2\sigma^2), \frac{\partial\lambda}{\partial\beta_2}=\frac{\partial RSS}{\partial\beta_2}/(2\sigma^2),

\frac{\partial\lambda}{\partial\sigma^2}=-\frac{n}{2\sigma^2}+\frac{RSS}{2\sigma^4}.

ML estimators and discussion

From the first two equations we see that the ML estimators of \beta_1,\beta_2 are the same as OLS estimators:

\hat{\beta_1}^{ML}=\hat{\beta_1}^{OLS}, \hat{\beta_2}^{ML}=\hat{\beta_2}^{OLS}.

We know by the Gauss-Markov theorem that these estimators are most efficient in the set of linear unbiased estimators. The third equation gives

\hat{\sigma}^2_{ML}=\frac{RSS}{n},

which is different from \hat{\sigma}^2_{OLS}=\frac{RSS}{n-2}. The ML estimator is more efficient as it achieves the Cramér-Rao lower bound for nonlinear estimators.

Leave a Reply

You must be logged in to post a comment.