The Gauss-Markov theorem states that the OLS estimator is the most efficient. Without algebra, you cannot make a single step further, whether it is the precise theoretical statement or an application.
Why do we care about linearity?
The concept of linearity has been repeated many times in my posts. Here we have to start from scratch, to apply it to estimators.
The slope in simple regression
can be estimated by
Note that the notation makes explicit the dependence of the estimator on . Imagine that we have two sets of observations: and (the x coordinates are the same but the y coordinates are different). In addition, the regressor is deterministic. The x's could be spatial units and the y's temperature measurements at these units at two different moments.
Definition. We say that is linear with respect to if for any two vectors and numbers we have
In addition to knowing how to establish linearity, it's a good idea to be able to see when something is not linear. Recall that linearity implies homogeneity of degree 1. Hence, if something is not homogeneous of degree 1, it cannot be linear. The OLS estimator is not linear in x because it is homogeneous of degree -1 in x:
Students don't have problems remembering the acronym BLUE: the OLS estimator is Best Linear Unbiased Estimator. Decoding this acronym starts from the end.
- An estimator, by definition, is a function of sample data.
- Unbiasedness of OLS estimators is thoroughly discussed here.
- Linearity of the slope estimator with respect to has been proved above. Linearity with respect to is not required.
- Now we look at the class of all slope estimators that are linear with respect to . As an exercise, show that the instrumental variables estimator belongs to this class.
Gauss-Markov Theorem. Under the classical assumptions, the OLS estimator of the slope has the smallest variance in the class of all slope estimators that are linear with respect to .
In particular, the OLS estimator of the slope is more efficient than the IV estimator. The beauty of this result is that you don't need expressions of their variances (even though they can be derived).
Remark. Even the above formulation is incomplete. In fact, the pair intercept estimator plus slope estimator is efficient. This requires matrix algebra.