Binary choice models: theoretical obstacles (problems with the linear probability model and binary choice models)

### What's wrong with the linear probability model

Recall the problem statement: the dependent variable can take only two values, 0 or 1, and the independent variables are joined into the index . The linear probability model

(1)

is equivalently written in linear regression form as

(2) with .

Let's study the error term. If , from (2) the value of is . From (1) we know the probability of this event. If , then the value of is and by (1) the probability of this event is . We can summarize this information in a table:

Values of | Corresponding probabilities |

For each observation, *the error is a binary variable*. In particular, it's not continuous, much less normal. Since the index changes with the observation, *the errors are not identically distributed*.

It's easy to find the mean and variance of . The mean is

(this is good). The variance is

which is bad (*heteroscedasticity*). Besides, for this variance to be positive, the index should stay between 0 and 1.

### Why in binary choice models there is no error term

We know the general specification of a binary choice model:

Here is a distribution function of some variable, say . Let's see what happens if we include the error term, as in

(3)

It is natural, as a first approximation, to consider identically distributed errors. By definition,

(4) .

The variables are distributed identically. Denoting their common distribution, from (3) and (4) we have

.

Thus, including the error term in (3) leads to a change of a distribution function in the model specification. In probit and logit, we fix good distribution functions from the very beginning and don't want to change them by introducing (possibly bad) errors.