Dec 22

Final exam in Advanced Statistics ST2133, 2022

Final exam in Advanced Statistics ST2133, 2022

Unlike most UoL exams, here I tried to relate the theory to practical issues.

KBTU International School of Economics

Compiled by Kairat Mynbaev

The total for this exam is 41 points. You have two hours.

Everywhere provide detailed explanations. When answering please clearly indicate question numbers. You don’t need a calculator. As long as the formula you provide is correct, the numerical value does not matter.

Question 1. (12 points)

a) (2 points) At a casino, two players are playing on slot machines. Their payoffs X,Y are standard normal and independent. Find the joint density of the payoffs.

b) (4 points) Two other players watch the first two players and start to argue what will be larger: the sum U = X + Y or the difference V = X - Y. Find the joint density. Are variables U,V independent? Find their marginal densities.

c) (2 points) Are U,V normal? Why? What are their means and variances?

d) (2 points) Which probability is larger: P(U > V) or P\left( {U < V} \right)?

e) (2 points) In this context interpret the conditional expectation E\left( {U|V = v} \right). How much is it?

Reminder. The density of a normal variable X \sim N\left( {\mu ,{\sigma ^2}} \right) is {f_X}\left( x \right) = \frac{1}{{\sqrt {2\pi {\sigma ^2}} }}{e^{ - \frac{{{{\left( {x - \mu } \right)}^2}}}{{2{\sigma ^2}}}}}.

Question 2. (9 points) The distribution of a call duration X of one Kcell [largest mobile operator in KZ] customer is exponential: {f_X}\left( x \right) = \lambda {e^{ - \lambda x}},\,\,x \ge 0,\,\,{f_X}\left( x \right) = 0,\,\,x < 0. The number N of customers making calls simultaneously is distributed as Poisson: P\left( {N = n} \right) = {e^{ - \mu }}\frac{{{\mu ^n}}}{{n!}},\,\,n = 0,1,2,... Thus the total call duration for all customers is {S_N} = {X_1} + ... + {X_N} for N \ge 1. We put {S_0} = 0. Assume that customers make their decisions about calling independently.

a) (3 points) Find the general formula (when {X_1},...,{X_n} are identically distributed and X,N are independent but not necessarily exponential and Poisson, as above) for the moment generating function of S_N explaining all steps.

b) (3 points) Find the moment generating functions of X, N and {S_N} for your particular distributions.

c) (3 points) Find the mean and variance of {S_N}. Based on the equations you obtained, can you suggest estimators of parameters \lambda ,\mu ?

Remark. Direct observations on the exponential and Poisson distributions are not available. We have to infer their parameters by observing {S_N}. This explains the importance of the technique used in Question 2.

Question 3. (8 points)

a) (2 points) For a non-negative random variable X prove the Markov inequality P\left( {X > c} \right) \le \frac{1}{c}EX,\,\,\,c > 0.

b) (2 points) Prove the Chebyshev inequality P\left( {|X - EX| > c} \right) \le \frac{1}{c^2}Var\left( X \right) for an arbitrary random variable X.

c) (4 points) We say that the sequence of random variables \left\{ X_n \right\} converges in probability to a random variable X if P\left( {|{X_n} - X| > \varepsilon } \right) \to 0 as n \to \infty for any \varepsilon > 0.  Suppose that E{X_n} = \mu for all n and that Var\left(X_n \right) \to 0 as n \to \infty . Prove that then \left\{X_n\right\} converges in probability to \mu .

Remark. Question 3 leads to the simplest example of a law of large numbers: if \left\{ X_n \right\} are i.i.d. with finite variance, then their sample mean converges to their population mean in probability.

Question 4. (8 points)

a) (4 points) Define a distribution function. Give its properties, with intuitive explanations.

b) (4 points) Is a sum of two distribution functions a distribution function? Is a product of two distribution functions a distribution function?

Remark. The answer for part a) is here and the one for part b) is based on it.

Question 5. (4 points) The Rakhat factory prepares prizes for kids for the upcoming New Year event. Each prize contains one type of chocolates and one type of candies. The chocolates and candies are chosen randomly from two production lines, the total number of items is always 10 and all selections are equally likely.

a) (2 points) What proportion of prepared prizes contains three or more chocolates?

b) (2 points) 100 prizes have been sent to an orphanage. What is the probability that 50 of those prizes contain no more than two chocolates?

Jan 17

The law of large numbers proved

The law of large numbers overview

I have already several posts about the law of large numbers:

  1. start with the intuition, which is illustrated using Excel;
  2. simulations in Excel show that convergence is not as fast as some textbooks claim;
  3. to distinguish the law of large numbers from the central limit theorem read this;
  4. the ultimate purpose is the application to simple regression with a stochastic regressor.

Here we busy ourselves with the proof.

Measuring deviation of a random variable from a constant

Let X be a random variable and c some constant. We want a measure of X differing from the constant by a given number \varepsilon or more. The set where X differs from c by \varepsilon>0 or more is the outside of the segment [c-\varepsilon,c+\varepsilon], that is, \{|X-c|\ge\varepsilon\}=\{X\le c-\varepsilon\}\cup\{X\ge c+\varepsilon\}.

Figure 1. Measuring the outside of interval

Now suppose X has a density p(t). It is natural to measure the set \{|X-c|\ge\varepsilon\} by the probability P(|X-c|\ge\varepsilon). This is illustrated in Figure 1.

Convergence to a spike formalized

Figure 2. Convergence to a spike

Once again, check out the idea. Consider a sequence of random variables \{T_n\} and a parameter \tau. Fix some \varepsilon>0 and consider a corridor [\tau-\varepsilon,\tau+\varepsilon] of width 2\varepsilon around \tau. For \{T_n\} to converge to a spike at \tau we want the area P(|T_n-\tau|\ge\varepsilon) to go to zero as we move along the sequence to infinity. This is illustrated in Figure 2, where, say, \{T_1\} has a flat density and the density of \{T_{1000}\} is chisel-shaped. In the latter case the area P(|T_n-\tau|\ge\varepsilon) is much smaller than in the former. The math of this phenomenon is such that P(|T_n-\tau|\ge\varepsilon) should go to zero for any \varepsilon>0 (the narrower the corridor, the further to infinity we should move along the sequence).

Definition. Let \tau be some parameter and let \{T_n\} be a sequence of its estimators. We say that \{T_n\} converges to \tau in probability or, alternatively, \{T_n\} consistently estimates \tau if P(|T_n-\tau|\ge\varepsilon)\rightarrow 0 as n\rightarrow 0 for any \varepsilon>0.

The law of large numbers in its simplest form

Let \{X_n\} be an i.i.d. sample from a population with mean \mu and variance \sigma^2. This is the situation from the standard Stats course. We need two facts about the sample mean \bar{X}: it is unbiased,

(1) E\bar{X}=\mu,

and its variance tends to zero

(2) Var(\bar{X})=\sigma^2/n\rightarrow 0 as n\rightarrow 0.


P(|\bar{X}-\mu|\ge \varepsilon) (by (1))

=P(|\bar{X}-E\bar{X}|\ge \varepsilon) (by the Chebyshev inequality, see Extension 3))

\le\frac{1}{\varepsilon^2}Var(\bar{X}) (by (2))

=\frac{\sigma^2}{n\varepsilon^2}\rightarrow 0  as n\rightarrow 0.

Since this is true for any \varepsilon>0, the sample mean is a consistent estimator of the population mean. This proves Example 1.

Final remarks

The above proof applies in the next more general situation.

Theorem. Let \tau be some parameter and let \{T_n\} be a sequence of its estimators such that: a) ET_n=\tau for any n and b) Var(T_n)\rightarrow 0. Then \{T_n\} converges in probability to \tau.

This statement is often used on the Econometrics exams of the University of London.

In the unbiasedness definition the sample size is fixed. In the consistency definition it tends to infinity. The above theorem says that unbiasedness for all n plus Var(T_n)\rightarrow 0 are sufficient for consistency.

Jan 17

Review of Agresti and Franklin

Review of Agresti and Franklin "Statistics: The Art and Science of Learning from Data", 3rd edition

Who is this book for?

On the Internet you can find both positive and negative reviews. The ones that I saw on Goodreads.com and Amazon.com do not say much about the pros and cons. Here I try to be more specific.

The main limitation of the book is that it adheres to the College Board statement that "it is a one semester, introductory, non-calculus-based, college course in statistics". Hence, there are no derivations and no links between formulas. You will not find explanations of why Statistics works. As a result, there is too much emphasis on memorization. After reading the book, you will likely not have an integral view of statistical methods.

I have seen students who understand such texts well. Generally, they have an excellent memory and better-than-average imagination. But such students are better off reading more advanced books. A potential reader has to lower his/her expectations. I imagine a person who is not interested in taking a more advanced Stats course later. The motivation of that person would be: a) to understand the ways Statistics is applied and/or b) to pass AP Stats just because it is a required course. The review is written on the premise that this is the intended readership.

What I like

  1. The number and variety of exercises. This is good for an instructor who teaches large classes. Having authored several books, I can assure you that inventing many exercises is the most time-consuming part of this business.
  2. The authors have come up with good visual embellishments of graphs and tables summarized in "A Guide to Learning From the Art in This Text" in the end of the book.
  3. The book has generous left margins. Sometimes they contain reminders about the past material. Otherwise, the reader can use them for notes.
  4. MINITAB is prohibitively expensive, but the Student Edition of MINITAB is provided on the accompanying CD.

What I don't like

  1. I counted about 140 high-resolution photos that have nothing to do with the subject matter. They hardly add to the educational value of the book but certainly add to its cost. This bad trend in introductory textbooks is fueled to a considerable extent by Pearson Education.
  2. 800+ pages, even after slashing all appendices and unnecessary illustrations, is a lot of reading for one semester. Even if you memorize all of them, during the AP test it be will difficult for you to pull out of your memory exactly that page you need to answer exactly this particular question.
  3. In an introductory text, one has to refrain from giving too much theory. Still, I don't like some choices made by the authors. The learning curve is flat. As a way of gentle introduction to algebra, verbal descriptions of formulas are normal. But sticking to verbal descriptions until p. 589 is too much. This reminds me a train trip in Kazakhstan. You enter the steppe through the western border and two days later you see the same endless steppe, just the train station is different.
  4. At the theoretical level, many topics are treated superficially. You can find a lot of additional information in my posts named "The pearls of AP Statistics". Here is the list of most important additions: regression and correlation should be decoupled; the importance of sampling distributions is overstated; probability is better explained without reference to the long run; the difference between the law of large numbers and central limit theorem should be made clear; the rate of convergence in the law of large numbers is not that fast; the law of large numbers is intuitively simple; the uniform distribution can also be made simple; to understand different charts, put them side by side; the Pareto chart is better understood as a special type of a histogram; instead of using the software on the provided CD, try to simulate in Excel yourself.
  5. Using outdated Texas instruments calculators contradicts the American Statistical Association recommendation to "Use technology for developing concepts and analyzing data".


If I want to save time and don't intend to delve into theory, I would prefer to read a concise book that directly addresses questions given on the AP test. However, to decide for yourself, read the Preface to see how much fantasy has been put into the book, and you may want to read it.

Oct 16

The pearls of AP Statistics 31

Demystifying sampling distributions: too much talking about nothing

What we know about sample means

Let X_1,...,X_n be an independent identically distributed sample and consider its sample mean \bar{X}.

Fact 1. The sample mean is an unbiased estimator of the population mean:

(1) E\bar{X}=\frac{1}{n}(EX_1+...+EX_n)=\frac{1}{n}(\mu+...+\mu)=\mu

(use linearity of means).

Fact 2. Variance of the sample mean is

(2) Var(\bar{X})=\frac{1}{n^2}(Var(X_1)+...+Var(X_n)=\frac{1}{n^2}(\sigma^2(X)+...+\sigma^2(X))=\frac{\sigma^2(X)}{n}

(use homogeneity of variance of degree 2 and additivity of variance for independent variables). Hence \sigma(\bar{X})=\frac{\sigma(X)}{\sqrt{n}}

Fact 3. The implication of these two properties is that the sample mean becomes more concentrated around the population mean as the sample size increases (see at least the law of large numbers; I have a couple more posts about this).

Fact 4. Finally, the z scores of sample means stabilize to a standard normal distribution (the central limit theorem).

What is a sampling distribution?

The sampling distribution of a statistic is the probability distribution that specifies probabilities for the possible values the statistic can take (Agresti and Franklin, p.308). After this definition, the authors go ahead and discuss the above four facts. Note that none of them requires the knowledge of what the sampling distribution is. The ONLY sampling distribution that appears explicitly in AP Statistics is the binomial. However, in the book the binomial is given in Section 6.3, before sampling distributions, which are the subject of Chapter 7. Section 7.3 explains that the binomial is a sampling distribution but that section is optional. Thus the whole Chapter 7 (almost 40 pages) is redundant.

Then what are sampling distributions for?

Here is a simple example that explains their role. Consider the binomial X_1+X_2 of two observations on an unfair coin. It involves two random variables and therefore is described by a joint distribution with the sample space consisting of pairs of values

Table 1. Sample space for pair (X_1,X_2)

Coin 1
0 1
Coin 2 0 (0,0) (0,1)
1 (1,0) (1,1)

Each coin independently takes values 0 and 1 (shown in the margins); the sample space contains four pairs of these values (shown in the main body of the table). The corresponding probability distribution is given by the table

Table 2. Joint probabilities for pair (X_1,X_2)

Coin 1
p q
Coin 2 p p^2 pq
q pq q^2

Since we are counting only the number of successes, the outcomes (0,1) and (1,0) for the purposes of our experiment are the same. Hence, joining indistinguishable outcomes, we obtain a smaller sample space

Table 3. Sampling distribution for binomial X_1+X_2

# of successes Corresponding probabilities
0 p^2
1 2pq
2 q^2

The last table is the sampling distribution for the binomial with sample size 2. All the sampling distribution does is replace a large joint distribution Table 1+Table 2 by a smaller distribution Table 3. The beauty of proofs of equations (1) and (2) is that they do not depend on which distribution is used (the distribution is hidden in the expected value operator).

Unless you want your students to appreciate the reduction in the sample space brought about by sampling distributions, it is not worth discussing them. See Wikipedia for examples other than the binomial.

Sep 16

The pearls of AP Statistics 26

What is probability: a whole lot of food for thought

What is probability

They say: With a randomized experiment or a random sample or other random phenomenon (such as a simulation), the probability of a particular outcome is the proportion of times that the outcome would occur in a long run of observations (Agresti and Franklin, p.213)

I say: When introducing the notion of probability, it is not a good idea to refer to the “long run” (which is the law of large numbers). Firstly, the LLN is a higher-level notion. Secondly, there is no way to deduce various laws of probability from this kind of definition.

Instead, produce a table with simulation results of type Table 5.1 (p.211). Be sure to frame the table with a series of definitions. An elementary event is a simplest possible event. The largest possible event is called a sample space. Emphasize the difference between absolute frequencies and relative frequencies. Note that relative frequencies are preferable because they allow one to compare results across different sample sizes. Start calling them percentages, so it’s clear that they take values between 0 and 1. Finally, note that the percentages sum to 1. This prepares students for the following definitions.

Probability is a mathematical measure of the likelihood of an event. It satisfies three axioms:

  1. probability of any event is between 0 and 1 (inclusive);
  2. probability of the sample space is 1;
  3. probability is an additive function of events (it is like area: if two events are disjoint, then the probability of their union is a sum of their probabilities).

Axiom 1 can be reinforced by saying that an event whose probability is 0 is called impossible and an event whose probability is 1 is called a sure event. Axioms 2 and 3 imply that the sum of probabilities of all elementary events is equal to 1. This means that the sample space is complete (no elementary event relevant to our experiment has been left out). This is why I call Axiom 2 a completeness axiom.

Geometry is a magic wand

magic-wandImmediately illustrate all of the above geometrically. Let the sample space be a unit square. Events are its subsets. Probability of an event is its area. Geometry is a powerful tool to explain probability rules. After playing with Venn diagrams, most students can deduce the additivity rule

(1) P(A\cup B)=P(A)+P(B)-P(A\cap B)

and complement rule

(2) P(A^c)=1-P(A).

Challenge your students with de Morgan’s laws

(3) \overline{A\cap B}=\bar{A}\cup \bar{B},\ \overline{A\cup B}=\bar{A}\cap \bar{B}.

Watch how your students write equations (1), (2) and (3): set operations are applied to sets and arithmetic operations are applied to numbers.

Don’t give independence until after all this has been done. Independence and its consequences are a more advanced topic.

If I were a student, I would ask these questions

The notions of disjoint events and a cover are easy to understand geometrically. The formula for probability of a complement on p. 221 implicitly uses them. Why is disjointness introduced later, on p.223?

I would like to see in the text the link between logic and geometry:

  • logical “and” corresponds to intersection of events,
  • logical “or” corresponds to a union of events,
  • logical negation corresponds to a complement,
  • logically, “at least one” is the opposite of “none”.

The multiplication rule (p. 225): does it follow from another axiom or property or is it true by definition?

What is the reason for division in the definition of conditional probability (p.231)?

Is it obvious that if A and B are dependent events, then so are A and the complement of B , and so are the complement of A and B, and so are the complement of A and the complement of B (p.237). Will this be true if “dependent” is replaced by “are not mutually exclusive”?

See the answers to these and other questions in my book.


Sep 16

The pearls of AP Statistics 25

Central Limit Theorem versus Law of Large Numbers

They say: The Central Limit Theorem (CLT). Describes the Expected Shape of the Sampling Distribution for Sample Mean \bar{X}. For a random sample of size n from a population having mean μ and standard deviation σ, then as the sample size n increases, the sampling distribution of the sample mean \bar{X} approaches an approximately normal distribution. (Agresti and Franklin, p.321)

I say: There are at least three problems with this statement.

Problem 1. With any notion or statement, I would like to know its purpose in the first place. The primary purpose of the law of large numbers is to estimate population parameters. The Central Limit Theorem may be a nice theoretical result, but why do I need it? The motivation is similar to the one we use for introducing the z score. There is a myriad of distributions. Only some standard distributions have been tabulated. Suppose we have a sequence of variables X_n, none of which have been tabulated. Suppose also that, as n increases, those variables become close to a normal variable in the sense that the cumulative probabilities (areas under their respective densities) become close:

(1) P(X_n\le a)\rightarrow P(normal\le a) for all a.

Then we can use tables developed for normal variables to approximate P(X_n\le a). This justifies using (1) as the definition of a new convergence type called convergence in distribution.

Problem 2. Having introduced convergence (1), we need to understand what it means in terms of densities (distributions). As illustrated in Excel, the law of large numbers means convergence to a spike. In particular, the sample mean converges to a mass concentrated at μ (densities contract to one point). Referring to the sample mean in the context of CLT is misleading, because the CLT is about densities stabilization.


Figure 1. Law of large numbers with n=100, 1000, 10000

Figure 1 appeared in my posts before, I just added n=10,000, to show that densities do not stabilize.

Figure 2. Central limit theorem with n=100, 1000, 10000


In Figure 2, for clarity I use line plots instead of histograms. The density for n=100 is very rugged. The blue line (for n=1000) is more rugged than the orange (for n=10,000). Convergence to a normal shape is visible, although slow.

Main problem. It is not the sample means that converge to a normal distribution. It is their z scores


that do. Specifically,

P(\frac{\bar{X}-E\bar{X}}{\sigma(\bar{X})}\le a)\rightarrow P(z\le a) for all a

where z is a standard normal variable.

In my simulations I used sample means for Figure 1 and z scores of sample means for Figure 2. In particular, z scores always have means equal to zero, and that can be seen in Figure 2. In your class, you can use the Excel file. As usual, you have to enable macros.

Sep 16

All you need to know about the law of large numbers

All about the law of large numbers: properties and applications

Level 1: estimation of population parameters

The law of large numbers is a statement about convergence which is called convergence in probability and denoted \text{plim}. The precise definition is rather complex but the intuition is simple: it is convergence to a spike at the parameter being estimated. Usually, any unbiasedness statement has its analog in terms of the corresponding law of large numbers.

Example 1. The sample mean unbiasedly estimates the population mean: E\bar{X}=EX. Its analog: the sample mean converges to a spike at the population mean: \text{plim}\bar{X}=EX. See the proof based on the Chebyshev inequality.

Example 2. The sample variance unbiasedly estimates the population variance: E\overline{s^2}=Var(X) where s^2=\frac{\sum(X_i-\bar{X})^2}{n-1}. Its analog: the sample variance converges to a spike at the population variance:

(1) \text{plim}\overline{s^2}=Var(X).

Example 3. The sample covariance s_{X,Y}=\frac{\sum(X_i-\bar{X})(Y_i-\bar{Y})}{n-1} unbiasedly estimates the population covariance: E\overline{s_{X,Y}}=Cov(X,Y). Its analog: the sample covariance converges to a spike at the population covariance:

(2) \text{plim}\overline{s_{X,Y}}=Cov(X,Y).

Up one level: convergence in probability is just convenient

Using or not convergence in probability is a matter of expedience. For usual limits of sequences we know the properties which I call preservation of arithmetic operations:

\lim(a_n\pm b_n)=\lim a_n\pm \lim b_n,

\lim(a_n\times b_n)=\lim a_n\times\lim b_n,

\lim(a_n/ b_n)=\lim a_n/\lim b_n.

Convergence in probability has exact same properties, just replace \lim with \text{plim}.

Next level: making regression estimation more plausible

Using convergence in probability allows us to handle stochastic regressors and avoid the unrealistic assumption that regressors are deterministic.

Convergence in probability and in distribution are two types of convergence of random variables that are widely used in the Econometrics course of the University of London.

Aug 16

The pearls of AP Statistics 22

The law of large numbers - a bird's view

They say: In 1689, the Swiss mathematician Jacob Bernoulli proved that as the number of trials increases, the proportion of occurrences of any given outcome approaches a particular number (such as 1/6) in the long run. (Agresti and Franklin, p.213).

I say: The expression “law of large numbers” appears in the book 13 times, yet its meaning is never clearly explained. The closest approximation to the truth is the above sentence about Jacob Bernoulli. To see if this explanation works, tell it to your students and ask what they understood. To me, this is a clear case when withholding theory harms understanding.

Intuition comes first. I ask my students: if you flip a fair coin 100 times, what do you expect the proportion of ones to be? Absolutely everybody replies correctly, just the form of the answer may be different (50-50 or 0.5 or 50 out of 100). Then I ask: probably it will not be exactly 0.5 but if you flip the coin 1000 times, do you expect the proportion to be closer to 0.5? Everybody says: Yes. Next I ask: Suppose the coin is unfair and the probability of 1 appearing is 0.7. What would you expect the proportion to be close to in large samples? Most students come up with the right answer: 0.7. Congratulations, you have discovered what is called a law of large numbers!

Then we give a theoretical format to our little discovery. p=0.7 is a population parameter. Flipping a coin n times we obtain observations X_1,...,X_n. The proportion of ones is the sample mean \bar{X}=\frac{X_1+...+X_n}{n}. The law of large numbers says two things: 1) as the sample size increases, the sample mean approaches the population mean. 2) At the same time, its variation about the population mean becomes smaller and smaller.

Part 1) is clear to everybody. To corroborate statement 2), I give two facts. Firstly, we know that the standard deviation of the sample mean is \frac{\sigma}{\sqrt{n}}. From this we see that as n increases, the standard deviation of the sample mean decreases and the values of the sample mean become more and more concentrated around the population mean. We express this by saying that the sample mean converges to a spike. Secondly, I produce two histograms. With the sample size n=100, there are two modes (just 1o%) of the histogram at 0.69 and 0.72, while 0.7 was used as the population mean in my simulations. Besides, the spread of the values is large. With n=1000, the mode (27%) is at the true value 0.7, and the spread is low.

Histogram of proportions with n=100


Histogram of proportions with n=1000

Finally, we relate our little exercise to practical needs. In practice, the true mean is never known. But we can obtain a sample and calculate its mean. With a large sample size, the sample mean will be close to the truth. More generally, take any other population parameter, such as its standard deviation, and calculate the sample statistic that estimates it, such as the sample standard deviation. Again, the law of large numbers applies and the sample statistic will be close to the population parameter. The histograms have been obtained as explained here and here. Download the Excel file.

Aug 16

The pearls of AP Statistics 19

Make it easy, keep 'em busy - law of large numbers illustrated

They say: Use the "Simulating the Probability of Head With a Fair Coin" applet on the text CD or other software to illustrate the long-run definition of probability by simulating short-term and long-term results of flipping a balanced coin (Agresti and Franklin, p.216)

I say: Since it is not explained how to use "other software", the readers are in fact forced to use the applet from the text CD. This is nothing but extortion. This exercise requires the students to write down 30 observations (which is not so bad, I also practice this in a more entertaining way). But the one following it invites them to simulate 100 observations. The two exercises are too simple, time consuming and their sole purpose is to illustrate the law of large numbers introduced on p. 213. The students have but one life and the authors want them to lose it for AP Statistics?

The same objective can be achieved in a more efficient and instructive manner. The students are better off if they see how to model the law of large numbers in Excel. As I promised, here is the explanation.

Law of large numbers

Law of large numbers - click to view video




Here is the Excel file.

Dec 15

Population mean versus sample mean

Population mean versus sample mean.

Equations involving both population and sample means are especially confusing for students. One of them is unbiasedness of the sample mean E\bar{X}=EX. In the Econometrics context there are many relations of this type. They need to be emphasized and explained many times until everybody understands the difference.

On the practical side, the first thing to understand is that the population mean uses all population elements and the population distribution, which are usually unknown. On the other hand, the sample mean uses only the sample and is known, as long as the sample is known.

On the theoretical side, we know that 1) as the sample size increases, the sample mean tends to the population mean (law of large numbers), 2) the population mean of the sample mean equals the population mean (unbiasedness), 3) for a discrete uniformly distributed variable with a finite number of elements, the population mean equals the sample mean (see equation (4) in that post) if the sample is the whole population, 4) if the population mean equals \mu, that does not mean that any sample from that population has the same sample mean.

For the preliminary material on properties of means see this post.