Consider the classical problem of why people choose to drive or use public transportation to go to their jobs. For a given individual , we observe the decision variable , which takes values either 0 (drive) or 1 (use public transportation), and various variables that impact the decision, like costs, convenience, availability of parking, etc. We denote the independent variables and their values for a given individual . For convenience, the usual expression that arises on the right-hand side of a multiple regression is called an index and denoted .
We want to study how people's decisions depend on the index .
Linear probability model
If you are familiar with the multiple regression, the first idea that comes to mind is
This turns out to be a bad idea because the range of the variable on the left is bounded and the range of the index is not. Whenever there is a discrepancy between the bounded decision variable and the unbounded index, it has to be made up for by the error term. Thus the error term will be certainly bad. (A detailed analysis shows that it will be heteroscedastic but this fact is less important than the problem with range boundedness.)
The next statement helps to understand the right approach. We need the unbiasedness condition from the first approach to stochastic regressors:
Statement. A combination of equations (1)+(2) is equivalent to just one equation
Proof. Step 1. Since is a Bernoulli variable, by the definition of conditional expectation we have the identity
Step 2. If (1)+(2) is true, then by (4)
so (3) holds (see Property 7). Conversely, suppose that (3) is true. Let us write
and denote . Then using (4) we see that (2) is satisfied:
(we use Property 7 again). (5) and (3) give (1). The proof is over.
This little exercise shows that the linear model (1) is the same as (3), which is called a linear probability model. (3) has the same problem as (1): the variable on the left is bounded and the index on the right is not. Note also a conceptual problem unseen before: while the decisions are observed, the probabilities are not. This is why one has to use the maximum likelihood method.
Binary choice models
Since we know what is a distribution function, we can guess how to correct (3). A distribution function has the same range as the probability at the left of (3), so the right model should look like this:
where is some distribution function. Two choices of are common
(a) is a distribution function of the standard normal; in this case (6) is called a probit model.
(b) is a logistic function; then (6) is called a logit model.
Measuring marginal effects
For the linear model (1) the marginal effect of variable on is measured by the derivative and is constant. If we apply the same idea to (6), we see that the marginal effect is not constant:
For the probit model equation (7) gives and for the logit . There is no need to remember these equations if you know the algebra.