11
Mar 17

Distribution function properties

The word "distribution" is repeated in elementary Stats texts hundreds of times yet the notion of a distribution function is usually mentioned tangentially or not studied at all. In fact, the distribution function is as important as the density and in binary choice models it is the king. The full name is a cumulative distribution function (cdf) but I am going to stick to the short name (used in advanced texts). This is one of the topics most students don't get on the first attempt (I was not an exception).

Motivating example

Example. Electricity consumption sharply increases when everybody starts using air conditioners, and this happens when temperature exceeds 20\,^{\circ}C. The utility company would like to know the likelihood of a jump in electricity consumption tomorrow at noon.

  1. Consider the probability P(T \le 15) that the temperature tomorrow at noon T will not exceed 15\,^{\circ}C. How does it relate to the probability P(T \le 20)? The second probability is obviously larger, and this can be visualized by comparing the intervals (-\infty,15] and (-\infty,20].
  2. Suppose in the expression P(T \le t) the real number t increases to +\infty. What happens to the probability? As the intervals extend to the right, they eventually include all possible temperatures, and the probability P(T \le t) approaches 1.
  3. Now think about t going to -\infty. Then what happens to P(T \le t)? It's the opposite of the previous case. Eventually, all possible temperatures are excluded, and the probability P(T \le t) goes to 0.

Generalization

Definition. Let X be a random variable and x a real number. The distribution function F_X of X is defined by F_X(x)=P(X\le x) (the random variable X is fixed and therefore put in the subscript, whereas the real number x changes and serves as the argument).

Distribution function properties

  1. F_X(x) is the probability of the event \{ X\le x\}, so the value F_X(x) always belongs to [0,1].
  2. As the event becomes wider, the probability increases. This property is called monotonicity and is formally written as follows: if x_1\le x_2, then \{X\le x_1\}\subset\{X\le x_2\} and F_X(x_1)\le F_X(x_2).
  3. As x goes to +\infty, the event \{X\le x\} approaches a sure event \{X<+\infty\}=R and F_X(x) tends to 1.
  4. As x goes to -\infty , the event \{X\le x\} approaches an impossible event \{X=-\infty\}=\varnothing and F_X(x) tends to 0.

Figure 1. Distribution function of a normal variable

From this we conclude that the graph of the distribution function may look as in Figure 1.

Interval formula in terms of the distribution function

In many applications we are interested in probability of an event that X takes values in an interval \{a<X\le b\}. Such probability can be expressed in terms of the distribution function. Just apply the additivity rule to the set equation \{-\infty<X\le b\}=\{-\infty<X\le a\}\cup\{a<X\le b\} to get

F_X(b)=F_X(a)+P(a<X\le b)

and, finally,

(1) P(a<X\le b)=F_X(b)-F_X(a).

Definition. Equation (1) can be called an interval formula.

9
Dec 16

Ditch statistical tables if you have a computer

You don't need statistical tables if you have Excel or Mathematica. Here I give the relevant Excel and Mathematica functions described in Chapter 14 of my book. You can save all the formulas in one spreadsheet or notebook and use it multiple times.

Cumulative Distribution Function of the Standard Normal Distribution

For a given real z, the value of the distribution function of the standard normal is
F(z)=\frac{1}{\sqrt{2\pi }}\int_{-\infty }^{z}\exp (-t^{2}/2)dt.

In Excel, use the formula =NORM.S.DIST(z,TRUE).

In Mathematica, enter CDF[NormalDistribution[0,1],z]

Probability Function of the Binomial Distribution

For given number of successes x, number of trials n and probability p the probability is

P(Binomial=x)=C_{x}^{n}p^{x}(1-p)^{n-x}.

In Excel, use the formula =BINOM.DIST(x,n,p,FALSE)

In Mathematica, enter PDF[BinomialDistribution[n,p],x]

Cumulative Binomial Probabilities

For a given cut-off value x, number of trials n and probability p the cumulative probability is

P(Binomial\leq x)=\sum_{t=0}^{x}C_{t}^{n}p^{t}(1-p)^{n-t}.
In Excel, use the formula =BINOM.DIST(x,n,p,TRUE).

In Mathematica, enter CDF[BinomialDistribution[n,p],x]

Values of the exponential function e^{-\lambda}

In Excel, use the formula =EXP(-lambda)

In Mathematica, enter Exp[-lambda]

Individual Poisson Probabilities

For given number of successes x and arrival rate \lambda the probability is

P(Poisson=x)=\frac{e^{-\lambda }\lambda^{x}}{x!}.
In Excel, use the formula =POISSON.DIST(x,lambda,FALSE)

In Mathematica, enter PDF[PoissonDistribution[lambda],x]

Cumulative Poisson Probabilities

For given cut-off x and arrival rate \lambda the cumulative probability is

P(Poisson\leq x)=\sum_{t=0}^{x}\frac{e^{-\lambda }\lambda ^{t}}{t!}.
In Excel, use the formula =POISSON.DIST(x,lambda,TRUE)

In Mathematica, enter CDF[PoissonDistribution[lambda],x]

Cutoff Points of the Chi-Square Distribution Function

For given probability of the right tail \alpha and degrees of freedom \nu, the cut-off value (critical value) \chi _{\nu,\alpha }^{2} is a solution of the equation
P(\chi _{\nu}^{2}>\chi _{\nu,\alpha }^{2})=\alpha .
In Excel, use the formula =CHISQ.INV.RT(alpha,v)

In Mathematica, enter InverseCDF[ChiSquareDistribution[v],1-alpha]

Cutoff Points for the Student’s t Distribution

For given probability of the right tail \alpha and degrees of freedom \nu, the cut-off value t_{\nu,\alpha } is a solution of the equation P(t_{\nu}>t_{\nu,\alpha })=\alpha.
In Excel, use the formula =T.INV(1-alpha,v)

In Mathematica, enter InverseCDF[StudentTDistribution[v],1-alpha]

Cutoff Points for the F Distribution

For given probability of the right tail \alpha , degrees of freedom v_1 (numerator) and v_2 (denominator), the cut-off value F_{v_1,v_2,\alpha } is a solution of the equation P(F_{v_1,v_2}>F_{v_1,v_2,\alpha })=\alpha.

In Excel, use the formula =F.INV.RT(alpha,v1,v2)

In Mathematica, enter InverseCDF[FRatioDistribution[v1,v2],1-alpha]