Mar 17

Binary choice models: theoretical obstacles

Binary choice models: theoretical obstacles (problems with the linear probability model and binary choice models)

What's wrong with the linear probability model

Recall the problem statement: the dependent variable y_i can take only two values, 0 or 1, and the independent variables are joined into the index Index_i=\beta_0+\beta_1x_{1i}+...+\beta_kx_{ki}. The linear probability model

(1) P(y_i=1|x_i)=Index_i

is equivalently written in linear regression form as

(2) y_i=Index_i+u_i with E(u_i|x_i)=0.

Let's study the error term. If y_i=1, from (2) the value of u_i is u_i=y_i-Index_i=1-Index_i. From (1) we know the probability of this event. If y_i=0, then the value of u_i is u_i=y_i-Index_i=-Index_i and by (1) the probability of this event is 1-P(y_i=1|x_i)=1-Index_i. We can summarize this information in a table:

Table 1. Error properties
Values of u_i Corresponding probabilities
1-Index_i Index_i
-Index_i 1-Index_i

For each observation, the error is a binary variable. In particular, it's not continuous, much less normal. Since the index changes with the observation, the errors are not identically distributed.

It's easy to find the mean and variance of u_i. The mean is


(this is good). The variance is

Var(u_i)=Eu_i^2-(Eu_i)^2=(1-Index_i)^2Index_i+(Index_i)^2(1-Index_i)=    (1-Index_i)Index_i[(1-Index_i)+(Index_i)]=(1-Index_i)Index_i,

which is bad (heteroscedasticity). Besides, for this variance to be positive, the index should stay between 0 and 1.

Why in binary choice models there is no error term

We know the general specification of a binary choice model:


Here F is a distribution function of some variable, say X. Let's see what happens if we include the error term, as in

(3) P(y_i=1|x_i)=F(Index_i+u_i).

It is natural, as a first approximation, to consider identically distributed errors. By definition,

(4) F(Index_i+u_i)=P(X\le Index_i+u_i)=P(X-u_i\le Index_i).

The variables Z_i=X-u_i are distributed identically. Denoting Z their common distribution, from (3) and (4) we have

P(y_i=1|x_i)=P(X-u_i\le Index_i)=P(Z\le Index_i)=F_Z(Index_i).

Thus, including the error term in (3) leads to a change of a distribution function in the model specification. In probit and logit, we fix good distribution functions from the very beginning and don't want to change them by introducing (possibly bad) errors.