The instrumental variables (IV) estimator is one of the most important alternatives to the OLS estimator.
Review the large sample approach to OLS estimator (second approach). We need properties of probability limits and conditions sufficient for consistency of the OLS estimator.
Besides, recall our convention regarding the notation of sample versus population characteristics.
We want to estimate the slope in simple regression
assuming that is stochastic. We established that if
then the OLS estimator of the slope is consistent. If the last condition is violated:
then there is no consistency. This is called an endogeneity problem and it may occur for various reasons. One of them is omission of relevant variables: if the true model is but we erroneously assume (1), then the error in (1) is . Most likely, are correlated and then in (1) will be correlated.
The easiest way to learn is by similarity
Suppose we have found a variable such that
(2) (IV existence condition)
(3) (IV consistency condition).
Such a variable is called an instrument for . Following the simplified derivation for the OLS estimator, plug (1) in
(using linearity of covariance)
(formally letting )
Solving this for and putting a hat on it, we arrive to the IV estimator:
To obtain the working representation, plug (1) in (4):
Repeating what we did for the OLS estimator, we get consistency from (2) and (3):
Now you can see why (2) and (3) have been imposed.
Remark 1. Sometimes in addition to (2) and (3) people say that the instrument should not be perfectly correlated with the regressor. This is because if are perfectly correlated, then is a linear function of : , with , so that and (3) is impossible.
Remark 2. An instrument is not the same thing as a proxy. If cannot be measured and we replace it by a close variable that can be measured (which then is called a proxy for ), instead of (1) we obtain and the slope estimator will be the usual OLS estimator, not IV.