Feb 17

Instrumental variables estimator

The instrumental variables (IV) estimator is one of the most important alternatives to the OLS estimator.


Review What is an OLS estimator - simplified derivation. In case of the OLS estimator, there is also the rigorous derivation. For the IV estimator, the simplified derivation is the only one.

Review the large sample approach to OLS estimator (second approach). We need properties of probability limits and conditions sufficient for consistency of the OLS estimator.

Besides, recall our convention regarding the notation of sample versus population characteristics.

Problem statement

We want to estimate the slope in simple regression

(1) y_i=a+bx_i+e_i

assuming that x_i is stochastic. We established that if

Var(x)\neq 0 (existence condition)


Cov(x,e)=0 (unbiasedness condition)

then the OLS estimator of the slope is consistent. If the last condition is violated:

Cov(x,e)\ne 0

then there is no consistency. This is called an endogeneity problem and it may occur for various reasons. One of them is omission of relevant variables: if the true model is y_i=a+bx_i+cz_i+v_i but we erroneously assume (1), then the error in (1) is e_i=cz_i+v_i. Most likely, x_i,z_i are correlated and then in (1) x_i,e_i will be correlated.

The easiest way to learn is by similarity

Suppose we have found a variable z such that

(2) Cov(z,x)\neq 0 (IV existence condition)


(3) Cov(z,e)=0 (IV consistency condition).

Such a variable z is called an instrument for x. Following the simplified derivation for the OLS estimator, plug (1) in

Cov_u(z,y)=Cov_u(z,a+bx+e) (using linearity of covariance)

=Cov_u(z,a)+bCov_u(z,x)+Cov_u(z,e) (formally letting Cov_u(z,e)=0)


Solving this for b and putting a hat on it, we arrive to the IV estimator:

(4) \hat{b}=\frac{Cov_u(z,y)}{Cov_u(z,x)}.


To obtain the working representation, plug (1) in (4):




Repeating what we did for the OLS estimator, we get consistency from (2) and (3):



Now you can see why (2) and (3) have been imposed.

Remark 1. Sometimes in addition to (2) and (3) people say that the instrument should not be perfectly correlated with the regressor. This is because if z,x are perfectly correlated, then z is a linear function of xz=c+dx, with d\ne 0, so that Cov(z,e)=dCov(x,e)\ne 0 and (3) is impossible.

Remark 2. An instrument is not the same thing as a proxy. If x cannot be measured and we replace it by a close variable z that can be measured (which then is called a proxy for x), instead of (1) we obtain y_i=a+bz_i+e_i and the slope estimator will be the usual OLS estimator, not IV.


Leave a Reply

You must be logged in to post a comment.