20
Mar 17

Binary choice models

Problem statement

Consider the classical problem of why people choose to drive or use public transportation to go to their jobs. For a given individual i, we observe the decision variable y_i, which takes values either 0 (drive) or 1 (use public transportation), and various variables that impact the decision, like costs, convenience, availability of parking, etc. We denote the independent variables x_1,...,x_k and their values for a given individual x_i=(x_{1i},...,x_{ki}). For convenience, the usual expression \beta_0+\beta_1x_{1i}+...+\beta_kx_{ki} that arises on the right-hand side of a multiple regression is called an index and denoted Index_i.

We want to study how people's decisions y_i depend on the index Index_i.

Linear probability model

If you are familiar with the multiple regression, the first idea that comes to mind is

(1) y_i=Index_i+u_i.

This turns out to be a bad idea because the range of the variable on the left is bounded and the range of the index is not. Whenever there is a discrepancy between the bounded decision variable and the unbounded index, it has to be made up for by the error term. Thus the error term will be certainly bad. (A detailed analysis shows that it will be heteroscedastic but this fact is less important than the problem with range boundedness.)

The next statement helps to understand the right approach. We need the unbiasedness condition from the first approach to stochastic regressors:

(2) E(u_i|x_i)=0.

Statement. A combination of equations (1)+(2) is equivalent to just one equation

(3) P(y_i=1|x_i)=Index_i.

Proof. Step 1. Since y_i is a Bernoulli variable, by the definition of conditional expectation we have the identity

(4) E(y_i|x_i)=P(y_i=1|x_i)\times 1+P(y_i=0|x_i)\times 0=P(y_i=1|x_i).

Step 2. If (1)+(2) is true, then by (4)

P(y_i=1|x_i)=E(y_i|x_i)=E(Index_i+u_i|x_i)=Index_i,

so (3) holds (see Property 7). Conversely, suppose that (3) is true. Let us write

(5) y_i=P(y_i=1|x_i)+[y_i-P(y_i=1|x_i)]

and denote u_i=y_i-P(y_i=1|x_i). Then using (4) we see that (2) is satisfied:

E(u_i|x_i)=E(y_i|x_i)-E[P(y_i=1|x_i)|x_i]=E(y_i|x_i)-P(y_i=1|x_i)=0

(we use Property 7 again). (5) and (3) give (1). The proof is over.

This little exercise shows that the linear model (1) is the same as (3), which is called a linear probability model. (3) has the same problem as (1): the variable on the left is bounded and the index on the right is not. Note also a conceptual problem unseen before: while the decisions y_i are observed, the probabilities P(y_i=1|x_i) are not. This is why one has to use the maximum likelihood method.

Binary choice models

Since we know what is a distribution function, we can guess how to correct (3). A distribution function has the same range as the probability at the left of (3), so the right model should look like this:

(6) P(y_i=1|x_i)=F(Index_i)

where F is some distribution function. Two choices of F are common

(a) F is a distribution function of the standard normal; in this case (6) is called a probit model.

(b) F is a logistic function; then (6) is called a logit model.

Measuring marginal effects

For the linear model (1) the marginal effect of variable x_{ji} on y_i is measured by the derivative \partial y_i/\partial x_{ji}=\beta_j and is constant. If we apply the same idea to (6), we see that the marginal effect is not constant:

(7) \frac{\partial P(y_i=1|x_i)}{\partial x_{ji}}=\frac{\partial F(Index_i)}{\partial Index_i}\frac{\partial Index_i}{\partial x_{ji}}=f(Index_i)\beta_j

where f is the density of F (we use the distribution function differentiation equation and the chain rule). In statistical software, the value of f(Index_i) is usually reported at the mean value of the index.

For the probit model equation (7) gives

\frac{\partial P(y_i=1|x_i)}{\partial x_{ji}}=\frac{1}{\sqrt{2\pi}}\exp(-\frac{Index_i^2}{2})\beta_j

and for the logit

\frac{\partial P(y_i=1|x_i)}{\partial x_{ji}}=\frac{e^{-Index_i}}{(1+e^{-Index_i})^2}\beta_j.

There is no need to remember these equations if you know the algebra.