Demystifying sampling distributions: too much talking about nothing

### What we know about sample means

Let be an independent identically distributed sample and consider its sample mean .

**Fact 1**. The sample mean is an unbiased estimator of the population mean:

(1)

(use linearity of means).

**Fact 2**. Variance of the sample mean is

(2)

(use homogeneity of variance of degree 2 and additivity of variance for independent variables). Hence

**Fact 3**. The implication of these two properties is that the sample mean becomes more concentrated around the population mean as the sample size increases (see at least the law of large numbers; I have a couple more posts about this).

**Fact 4**. Finally, the z scores of sample means stabilize to a standard normal distribution (the central limit theorem).

### What is a sampling distribution?

The **sampling distribution** of a statistic is the probability distribution that specifies probabilities for the possible values the statistic can take (Agresti and Franklin, p.308). After this definition, the authors go ahead and discuss the above four facts. Note that none of them requires the knowledge of what the sampling distribution is. The ONLY sampling distribution that appears explicitly in AP Statistics is the binomial. However, in the book the binomial is given in Section 6.3, before sampling distributions, which are the subject of Chapter 7. Section 7.3 explains that the binomial is a sampling distribution but that section is optional. Thus the whole Chapter 7 (almost 40 pages) is redundant.

### Then what are sampling distributions for?

Here is a simple example that explains their role. Consider the binomial of two observations on an unfair coin. It involves two random variables and therefore is described by a joint distribution with the sample space consisting of pairs of values

**Table 1**. Sample space for pair

Coin 1 | |||

0 | 1 | ||

Coin 2 | 0 | (0,0) | (0,1) |

1 | (1,0) | (1,1) |

Each coin independently takes values 0 and 1 (shown in the margins); the sample space contains four pairs of these values (shown in the main body of the table). The corresponding probability distribution is given by the table

**Table 2**. Joint probabilities for pair

Coin 1 | |||

p | q | ||

Coin 2 | p | ||

q |

Since we are counting only the number of successes, the outcomes (0,1) and (1,0) for the purposes of our experiment are the same. Hence, joining indistinguishable outcomes, we obtain a smaller sample space

**Table 3**. Sampling distribution for binomial

# of successes | Corresponding probabilities |

0 | |

1 | |

2 |

The last table is the sampling distribution for the binomial with sample size 2. All the sampling distribution does is *replace* a large joint distribution Table 1+Table 2 by a smaller distribution Table 3. The beauty of proofs of equations (1) and (2) is that they do not depend on which distribution is used (the distribution is hidden in the expected value operator).

Unless you want your students to appreciate the reduction in the sample space brought about by sampling distributions, it is not worth discussing them. See Wikipedia for examples other than the binomial.

[…] applications: one, and two, and […]

[…] additions: regression and correlation should be decoupled; the importance of sampling distributions is overstated; probability is better explained without reference to the long run; the difference between the law […]