### Maximum likelihood: application to linear model

We have to remember that a model and a method are not the same. Application of the least squares method to the linear model gives OLS estimators. Here we apply the Maximum Likelihood (ML) method to the same model.

### Assumptions and first order conditions for maximizing likelihood

We assume that the observations satisfy

(1)

Our task is to find ML estimators of . To be able to realize the ML algorithm, we assume that the regressor is deterministic. Then at the right side of (1) the error is the only random term.

**Step 1**. Suppose that are independent normal with mean and variance . (This implies that the errors are uncorrelated and identically distributed.) The density of is

(2)

From (1) we see that is normal, as a linear transformation of . By equation (2) in that post, the density of observation is

**Step 2**. The likelihood function, by definition, is the joint density, considered a function of parameters. Because of the independence of observations, it can be obtained as a product of these densities

(see this post for the definition of RSS).

**Step 3**. The log-likelihood is

The first-order conditions are

(technically, it is easier to differentiate with respect to than to ). We obtain a system of three equations for determining the parameters:

### ML estimators and discussion

From the first two equations we see that the ML estimators of are the same as OLS estimators:

We know by the Gauss-Markov theorem that these estimators are most efficient in the set of linear unbiased estimators. The third equation gives

which is different from The ML estimator is more efficient as it achieves the Cramér-Rao lower bound for nonlinear estimators.