11
Jan 17

## Regressions with stochastic regressors 1 ### Regressions with stochastic regressors 1: applying conditioning

The convenience condition states that the regressor in simple regression is deterministic. Here we look at how this assumption can be avoided using conditional expectation and variance. General idea: you check which parts of the proofs don't go through with stochastic regressors and modify the assumptions accordingly. It happens that only assumptions concerning the error term should be replaced by their conditional counterparts.

### Unbiasedness in case of stochastic regressors

We consider the slope estimator for the simple regression

(1) $y_i=a+bx_i+e_i$

assuming that $x_i$ is stochastic.

First grab the critical representation (6) derived here:

(1) $\hat{b}=b+\frac{1}{n}\sum a_i(x)e_i$, where $a_i(x)=(x_i-\bar{x})/Var_u(x).$

The usual linearity of means $E(aX + bY) = aEX + bEY$ applied to prove unbiasedness doesn't work because now the coefficients are stochastic (in other words, they are not constant). But we have generalized linearity which for the purposes of this proof can be written as

(2) $E(a(x)S+b(x)T|x)=a(x)E(S|x)+b(x)E(T|x).$

Let us replace the unbiasedness condition by its conditional version:

A3'. Unbiasedness condition $E(e_i|x)=0$.

Then (1) and (2) give

(3) $E(\hat{b}|x)=b+\frac{1}{n}\sum a_i(x)E(e_i|x)=b,$

which can be called conditional unbiasedness. Next applying the law of iterated expectations $E[E(S|x)]=ES$ we obtain unconditional unbiasedness: $E\hat{b}=E[E(\hat{b}|x)]=Eb=b.$

### Variance in case of stochastic regressors

As one can guess, we have to replace efficiency conditions by their conditional versions:

A4'. Conditional uncorrelatedness of errors. Assume that $E(e_ie_j|x)=0$ for all $i\ne j$.

A5'. Conditional homoscedasticity. All errors have the same conditional variances: $E(e_i^2|x)=\sigma^2$ for all $i$ ( $\sigma^2$ is a constant).

Now we can derive the conditional variance expression, using properties from this post: $Var(\hat{b}|x)=Var(b+\frac{1}{n}\sum_i a_i(x)e_i|x)$ (dropping a constant doesn't affect variance) $=Var(\frac{1}{n}\sum_i a_i(x)e_i|x)$ (for conditionally uncorrelated variables, conditional variance is additive) $=\sum_i Var(\frac{1}{n}a_i(x)e_i|x)$ (conditional variance is homogeneous of degree 2) $=\frac{1}{n^2}\sum_i a_i^2(x)Var(e_i|x)$ (applying conditional homoscedasticity) $=\frac{1}{n^2}\sum_i a_i^2(x)\sigma^2$ (plugging $a_i(x)$) $=\frac{1}{n^2}\sum_i(x_i-\bar{x})^2\sigma^2/Var^2_u(x)$ (using the notation of sample variance)

(4) $=\frac{1}{n}Var_u(x)\sigma^2/Var^2_u(x)=\sigma^2/(nVar_u(x)).$

Finally, using the law of total variance $Var(S)=Var(E(S|x))+E[Var(S|x)]$ and equations (3) and (4) we obtain

(5) $Var(\hat{b})=Var(b)+E[\sigma^2/(nVar_u(x))]=\frac{\sigma^2}{n}E[\frac{1}{Var_u(x)}].$

### Conclusion

Replacing the three assumptions about the error by their conditional counterparts allows us to obtain almost perfect analogs of the usual properties of OLS estimators: the usual (unconditional) unbiasedness plus the estimator variance, in which the part containing the regressor should be averaged, to account for its randomness. If you think that solving the problem of stochastic regressors requires nothing more but application of a couple of mathematical tricks, I agree with you.