26
May 20

5
Apr 17

## Maximum likelihood: application to linear model

### Maximum likelihood: application to linear model

We have to remember that a model and a method are not the same. Application of the least squares method to the linear model gives OLS estimators. Here we apply the Maximum Likelihood (ML) method to the same model.

### Assumptions and first order conditions for maximizing likelihood

We assume that the observations satisfy

(1) $y_i=\beta _1+\beta _2x_i+u_i,\ i=1,...,n.$
Our task is to find ML estimators of $\beta _1,\beta _2,\sigma^2$. To be able to realize the ML algorithm, we assume that the regressor $x$ is deterministic. Then at the right side of (1) the error is the only random term.

Step 1. Suppose that $u_1,...,u_n$ are independent normal with mean $0$ and variance $\sigma ^2$. (This implies that the errors are uncorrelated and identically distributed.) The density of $u_i$ is

(2) $p(x)=\frac{1}{\sqrt{2\pi\sigma^2}}\exp(-\frac{x^2}{2\sigma^2}).$
From (1) we see that $y_i$ is normal, as a linear transformation of $u_i$. By equation (2) in that post, the density of observation $(x_i,y_i)$ is
$f(x_i,y_i|\beta_1,\beta_2,\sigma^2)=\frac{1}{\sqrt{2\pi\sigma^2}}e^{-\frac{(y_i-\beta_1-\beta_2x_i)^2}{2\sigma^2}}.$

Step 2. The likelihood function, by definition, is the joint density, considered a function of parameters. Because of the independence of observations, it can be obtained as a product of these densities
$L(\beta_1,\beta_2,\sigma^2|y,x)=\prod_{i=1}^n\frac{1}{\sqrt{2\pi\sigma^2}}e^{-\frac{(y_i-\beta_1-\beta_2x_i)^2}{2\sigma^2}}$

$=(2\pi\sigma^2)^{-n/2}e^{-\sum_{i=1}^n\frac{(y_i-\beta_1-\beta_2x_i)^2}{2\sigma^2}}=(2\pi\sigma^2)^{-n/2}e^{-\frac{RSS}{2\sigma^2}}$

(see this post for the definition of RSS).

Step 3. The log-likelihood is
$\lambda(\beta_1,\beta_2,\sigma^2|y,x)=-\frac{n}{2}\log(2\pi)-\frac{n}{2}\log(\sigma^2)-\frac{RSS}{2\sigma^2}.$

The first-order conditions are
$\frac{\partial\lambda}{\partial\beta_1}=\frac{\partial\lambda}{\partial\beta_2}=\frac{\partial\lambda}{\partial\sigma^2}=0$
(technically, it is easier to differentiate with respect to $\sigma^2$ than to $\sigma$). We obtain a system of three equations for determining the parameters:
$\frac{\partial\lambda}{\partial\beta_1}=\frac{\partial RSS}{\partial\beta_1}/(2\sigma^2),$ $\frac{\partial\lambda}{\partial\beta_2}=\frac{\partial RSS}{\partial\beta_2}/(2\sigma^2),$

$\frac{\partial\lambda}{\partial\sigma^2}=-\frac{n}{2\sigma^2}+\frac{RSS}{2\sigma^4}.$

### ML estimators and discussion

From the first two equations we see that the ML estimators of $\beta_1,\beta_2$ are the same as OLS estimators:

$\hat{\beta_1}^{ML}=\hat{\beta_1}^{OLS},$ $\hat{\beta_2}^{ML}=\hat{\beta_2}^{OLS}.$

We know by the Gauss-Markov theorem that these estimators are most efficient in the set of linear unbiased estimators. The third equation gives

$\hat{\sigma}^2_{ML}=\frac{RSS}{n},$

which is different from $\hat{\sigma}^2_{OLS}=\frac{RSS}{n-2}.$ The ML estimator is more efficient as it achieves the Cramér-Rao lower bound for nonlinear estimators.