23
Mar 17

## Binary choice models: theoretical obstacles

Binary choice models: theoretical obstacles (problems with the linear probability model and binary choice models)

### What's wrong with the linear probability model

Recall the problem statement: the dependent variable $y_i$ can take only two values, 0 or 1, and the independent variables are joined into the index $Index_i=\beta_0+\beta_1x_{1i}+...+\beta_kx_{ki}$. The linear probability model

(1) $P(y_i=1|x_i)=Index_i$

is equivalently written in linear regression form as

(2) $y_i=Index_i+u_i$ with $E(u_i|x_i)=0$.

Let's study the error term. If $y_i=1$, from (2) the value of $u_i$ is $u_i=y_i-Index_i=1-Index_i$. From (1) we know the probability of this event. If $y_i=0$, then the value of $u_i$ is $u_i=y_i-Index_i=-Index_i$ and by (1) the probability of this event is $1-P(y_i=1|x_i)=1-Index_i$. We can summarize this information in a table:

 Values of $u_i$ $u_i$ Corresponding probabilities $1-Index_i$ $1-Index_i$ $Index_i$ $Index_i$ $-Index_i$ $-Index_i$ $1-Index_i$ $1-Index_i$

For each observation, the error is a binary variable. In particular, it's not continuous, much less normal. Since the index changes with the observation, the errors are not identically distributed.

It's easy to find the mean and variance of $u_i$. The mean is $Eu_i=(1-Index_i)Index_i-Index_i(1-Index_i)=0$

(this is good). The variance is $Var(u_i)=Eu_i^2-(Eu_i)^2=(1-Index_i)^2Index_i+(Index_i)^2(1-Index_i)= (1-Index_i)Index_i[(1-Index_i)+(Index_i)]=(1-Index_i)Index_i,$

which is bad (heteroscedasticity). Besides, for this variance to be positive, the index should stay between 0 and 1.

### Why in binary choice models there is no error term

We know the general specification of a binary choice model: $P(y_i=1|x_i)=F(Index_i).$

Here $F$ is a distribution function of some variable, say $X$. Let's see what happens if we include the error term, as in

(3) $P(y_i=1|x_i)=F(Index_i+u_i).$

It is natural, as a first approximation, to consider identically distributed errors. By definition,

(4) $F(Index_i+u_i)=P(X\le Index_i+u_i)=P(X-u_i\le Index_i)$.

The variables $Z_i=X-u_i$ are distributed identically. Denoting $Z$ their common distribution, from (3) and (4) we have $P(y_i=1|x_i)=P(X-u_i\le Index_i)=P(Z\le Index_i)=F_Z(Index_i)$.

Thus, including the error term in (3) leads to a change of a distribution function in the model specification. In probit and logit, we fix good distribution functions from the very beginning and don't want to change them by introducing (possibly bad) errors.