8
Feb 17

## Instrumental variables estimator

The instrumental variables (IV) estimator is one of the most important alternatives to the OLS estimator.

### Preliminaries

Review What is an OLS estimator - simplified derivation. In case of the OLS estimator, there is also the rigorous derivation. For the IV estimator, the simplified derivation is the only one.

Review the large sample approach to OLS estimator (second approach). We need properties of probability limits and conditions sufficient for consistency of the OLS estimator.

Besides, recall our convention regarding the notation of sample versus population characteristics.

### Problem statement

We want to estimate the slope in simple regression

(1) $y_i=a+bx_i+e_i$

assuming that $x_i$ is stochastic. We established that if $Var(x)\neq 0$ (existence condition)

and $Cov(x,e)=0$ (unbiasedness condition)

then the OLS estimator of the slope is consistent. If the last condition is violated: $Cov(x,e)\ne 0$

then there is no consistency. This is called an endogeneity problem and it may occur for various reasons. One of them is omission of relevant variables: if the true model is $y_i=a+bx_i+cz_i+v_i$ but we erroneously assume (1), then the error in (1) is $e_i=cz_i+v_i$. Most likely, $x_i,z_i$ are correlated and then in (1) $x_i,e_i$ will be correlated.

### The easiest way to learn is by similarity

Suppose we have found a variable $z$ such that

(2) $Cov(z,x)\neq 0$ (IV existence condition)

and

(3) $Cov(z,e)=0$ (IV consistency condition).

Such a variable $z$ is called an instrument for $x$. Following the simplified derivation for the OLS estimator, plug (1) in $Cov_u(z,y)=Cov_u(z,a+bx+e)$ (using linearity of covariance) $=Cov_u(z,a)+bCov_u(z,x)+Cov_u(z,e)$ (formally letting $Cov_u(z,e)=0$) $=bCov_u(z,x)$.

Solving this for $b$ and putting a hat on it, we arrive to the IV estimator:

(4) $\hat{b}=\frac{Cov_u(z,y)}{Cov_u(z,x)}$.

### Consistency

To obtain the working representation, plug (1) in (4): $\hat{b}=\frac{Cov_u(z,a+bx+e)}{Cov_u(z,x)}$ $=\frac{Cov_u(z,a)+bCov_u(z,x)+Cov_u(z,e)}{Cov_u(z,x)}$ $=b+\frac{Cov_u(z,e)}{Cov_u(z,x)}$.

Repeating what we did for the OLS estimator, we get consistency from (2) and (3): $\text{plim}\hat{b}=\text{plim}\left[b+\frac{Cov_u(z,e)}{Cov_u(z,x)}\right]=\text{plim}b+\text{plim}\frac{Cov_u(z,e)}{Cov_u(z,x)}$ $=b+\frac{\text{plim}Cov_u(z,e)}{\text{plim}Cov_u(z,x)}=b+\frac{Cov(z,e)}{Cov(z,x)}=b.$

Now you can see why (2) and (3) have been imposed.

Remark 1. Sometimes in addition to (2) and (3) people say that the instrument should not be perfectly correlated with the regressor. This is because if $z,x$ are perfectly correlated, then $z$ is a linear function of $x$ $z=c+dx$, with $d\ne 0$, so that $Cov(z,e)=dCov(x,e)\ne 0$ and (3) is impossible.

Remark 2. An instrument is not the same thing as a proxy. If $x$ cannot be measured and we replace it by a close variable $z$ that can be measured (which then is called a proxy for $x$), instead of (1) we obtain $y_i=a+bz_i+e_i$ and the slope estimator will be the usual OLS estimator, not IV.