Binary choice models: theoretical obstacles (problems with the linear probability model and binary choice models)
What's wrong with the linear probability model
Recall the problem statement: the dependent variable can take only two values, 0 or 1, and the independent variables are joined into the index . The linear probability model
is equivalently written in linear regression form as
(2) with .
Let's study the error term. If , from (2) the value of is . From (1) we know the probability of this event. If , then the value of is and by (1) the probability of this event is . We can summarize this information in a table:
|Values of||Corresponding probabilities|
For each observation, the error is a binary variable. In particular, it's not continuous, much less normal. Since the index changes with the observation, the errors are not identically distributed.
It's easy to find the mean and variance of . The mean is
(this is good). The variance is, because it is a Bernoulli variable,
which is bad (heteroscedasticity). Besides, for this variance to be positive, the index should stay between 0 and 1.
Why in binary choice models there is no error term
We know the general specification of a binary choice model:
Here is a distribution function of some variable, say . Let's see what happens if we include the error term, as in
It is natural, as a first approximation, to consider identically distributed errors. By definition,
The variables are distributed identically. Denoting their common distribution, from (3) and (4) we have
Thus, including the error term in (3) leads to a change of a distribution function in the model specification. In probit and logit, we fix good distribution functions from the very beginning and don't want to change them by introducing (possibly bad) errors.