Regressions with stochastic regressors 2: two approaches
We consider the slope estimator for the simple regression
assuming that is stochastic.
First approach: the sample size is fixed. The unbiasedness and efficiency conditions are replaced by their analogs conditioned on . The outcome is that the slope estimator is unbiased and its variance is the average of the variance that we have in case of a deterministic regressor. See the details.
Second approach: the sample size goes to infinity. The main tools used are the properties of probability limits and laws of large numbers. The outcome is that, in the limit, the sample characteristics are replaced by their population cousins and the slope estimator is consistent. This is what we focus on here.
A brush-up on convergence in probability
Review the intuition and formal definition. This is the summary:
Fact 1. Convergence in probability (which applies to sequences of random variables) is a generalization of the notion of convergence of number sequences. In particular, if is a numerical sequence that converges to a number
,
, then, treating
as a random variable, we have convergence in probability
.
Fact 2. For those who are familiar with the theory of limits of numerical sequences, from the previous fact it should be clear that convergence in probability preserves arithmetic operations. That is, for any sequences of random variables such that limits
and
exist, we have
and if then
This makes convergence in probability very handy. Convergence in distribution doesn't have such properties.
A brush-up on laws of large numbers
See the site map for several posts about this. Here we apply the Chebyshev inequality to prove the law of large numbers for sample means. A generalization is given in the Theorem in the end of that post. Here is a further intuitive generalization:
Normally, unbiased sample characteristics converge in probability to their population counterparts.
Example 1. We know that the sample variance unbiasedly estimates the population variance
:
. The intuitive generalization says that then
(1) .
Here I argue that, for the purposes of obtaining some identities from the general properties of means, instead of the sample variance it's better to use the variance defined by (with division by
instead of
). Using Facts 1 and 2 we get from (1) that
(2)
(sample variance converges in probability to population variance). Here we use .
Example 2. Similarly, sample covariance converges in probability to population covariance:
(3)
where by definition .
Proving consistency of the slope estimator
Here (see equation (5)) I derived the representation of the OLS estimator of the slope
Using preservation of arithmetic operations for convergence in probability, we get
(4)
In the last line we used (2) and (3). From (4) we see what conditions should be imposed for the slope estimator to converge to a spike at the true slope:
(existence condition)
and
(consistency condition).
Under these conditions, we have (this is called consistency).
Conclusion. In a way, the second approach is technically simpler than the first.