The word "distribution" is repeated in elementary Stats texts hundreds of times yet the notion of a distribution function is usually mentioned tangentially or not studied at all. In fact, the distribution function is as important as the density and in binary choice models it is the king. The full name is a cumulative distribution function (cdf) but I am going to stick to the short name (used in advanced texts). This is one of the topics most students don't get on the first attempt (I was not an exception).
Example. Electricity consumption sharply increases when everybody starts using air conditioners, and this happens when temperature exceeds . The utility company would like to know the likelihood of a jump in electricity consumption tomorrow at noon.
- Consider the probability that the temperature tomorrow at noon will not exceed . How does it relate to the probability ? The second probability is obviously larger, and this can be visualized by comparing the intervals and .
- Suppose in the expression the real number increases to . What happens to the probability? As the intervals extend to the right, they eventually include all possible temperatures, and the probability approaches 1.
- Now think about going to . Then what happens to ? It's the opposite of the previous case. Eventually, all possible temperatures are excluded, and the probability goes to 0.
Definition. Let be a random variable and a real number. The distribution function of is defined by (the random variable is fixed and therefore put in the subscript, whereas the real number changes and serves as the argument).
Distribution function properties
- is the probability of the event , so the value always belongs to [0,1].
- As the event becomes wider, the probability increases. This property is called monotonicity and is formally written as follows: if , then and .
- As goes to , the event approaches a sure event and tends to 1.
- As goes to , the event approaches an impossible event and tends to 0.
From this we conclude that the graph of the distribution function may look as in Figure 1.
Interval formula in terms of the distribution function
In many applications we are interested in probability of an event that takes values in an interval . Such probability can be expressed in terms of the distribution function. Just apply the additivity rule to the set equation to get and, finally,
Definition. Equation (1) can be called an interval formula.