Maximum likelihood: application to linear model
We have to remember that a model and a method are not the same. Application of the least squares method to the linear model gives OLS estimators. Here we apply the Maximum Likelihood (ML) method to the same model.
Assumptions and first order conditions for maximizing likelihood
We assume that the observations satisfy
Our task is to find ML estimators of . To be able to realize the ML algorithm, we assume that the regressor is deterministic. Then at the right side of (1) the error is the only random term.
Step 1. Suppose that are independent normal with mean and variance . (This implies that the errors are uncorrelated and identically distributed.) The density of is
From (1) we see that is normal, as a linear transformation of . By equation (2) in that post, the density of observation is
Step 2. The likelihood function, by definition, is the joint density, considered a function of parameters. Because of the independence of observations, it can be obtained as a product of these densities
(see this post for the definition of RSS).
Step 3. The log-likelihood is
The first-order conditions are
(technically, it is easier to differentiate with respect to than to ). We obtain a system of three equations for determining the parameters:
ML estimators and discussion
From the first two equations we see that the ML estimators of are the same as OLS estimators:
We know by the Gauss-Markov theorem that these estimators are most efficient in the set of linear unbiased estimators. The third equation gives
which is different from The ML estimator is more efficient as it achieves the Cramér-Rao lower bound for nonlinear estimators.