What is probability: a whole lot of food for thought

### What is probability

**They say**: With a randomized experiment or a random sample or other random phenomenon (such as a simulation), the probability of a particular outcome is the proportion of times that the outcome would occur in a long run of observations (Agresti and Franklin, p.213)

**I say**: When introducing the notion of probability, it is not a good idea to refer to the “long run” (which is the law of large numbers). Firstly, the LLN is a higher-level notion. Secondly, there is no way to deduce various laws of probability from this kind of definition.

Instead, produce a table with simulation results of type Table 5.1 (p.211). Be sure to frame the table with a series of definitions. An **elementary event** is a simplest possible event. The largest possible event is called a **sample space**. Emphasize the difference between absolute frequencies and relative frequencies. Note that relative frequencies are *preferable* because they allow one to compare results across different sample sizes. Start calling them *percentages*, so it’s clear that they take values between and 1. Finally, note that the percentages sum to 1. This prepares students for the following definitions.

**Probability** is a mathematical measure of the likelihood of an event. It satisfies three axioms:

- probability of any event is between and 1 (inclusive);
- probability of the sample space is 1;
- probability is an additive function of events (it is like area: if two events are disjoint, then the probability of their union is a sum of their probabilities).

Axiom 1 can be reinforced by saying that an event whose probability is is called **impossible** and an event whose probability is 1 is called a **sure event**. Axioms 2 and 3 imply that the sum of probabilities of all elementary events is equal to 1. This means that the sample space is complete (no elementary event relevant to our experiment has been left out). This is why I call Axiom 2 a **completeness axiom**.

### Geometry is a magic wand

Immediately illustrate all of the above geometrically. Let the sample space be a unit square. Events are its subsets. Probability of an event is its area. Geometry is a powerful tool to explain probability rules. After playing with Venn diagrams, most students can deduce the additivity rule

(1)

and complement rule

(2) .

Challenge your students with de Morgan’s laws

(3) .

Watch how your students write equations (1), (2) and (3): set operations are applied to sets and arithmetic operations are applied to numbers.

Don’t give independence until after all this has been done. Independence and its consequences are a more advanced topic.

### If I were a student, I would ask these questions

The notions of disjoint events and a cover are easy to understand geometrically. The formula for probability of a complement on p. 221 implicitly uses them. Why is disjointness introduced later, on p.223?

I would like to see in the text the link between logic and geometry:

- logical “and” corresponds to intersection of events,
- logical “or” corresponds to a union of events,
- logical negation corresponds to a complement,
- logically, “at least one” is the opposite of “none”.

The multiplication rule (p. 225): does it follow from another axiom or property or is it true by definition?

What is the reason for division in the definition of conditional probability (p.231)?

Is it obvious that if A and B are dependent events, then so are A and the complement of B , and so are the complement of A and B, and so are the complement of A and the complement of B (p.237). Will this be true if “dependent” is replaced by “are not mutually exclusive”?

See the answers to these and other questions in my book.

[…] satisfy the completeness axiom, we divide both sides […]

[…] a constant is that constant, because a constant doesn't change, rain or shine: (we have used the completeness axiom). In particular, it follows […]

[…] and correlation should be decoupled; the importance of sampling distributions is overstated; probability is better explained without reference to the long run; the difference between the law of large numbers and central […]

[…] interval . Such probability can be expressed in terms of the distribution function. Just apply the additivity rule to the set equation to get and, […]