14
Dec 16

It’s time to modernize the AP Stats curriculum

It's time to modernize the AP Stats curriculum

The suggestions below are based on the College Board AP Statistics Course Description, Effective Fall 2010. Citing this description, “AP teachers are encouraged to develop or maintain their own curriculum that either includes or exceeds each of these expectations; such courses will be authorized to use the “AP” designation.” However, AP teachers are constrained by the statement that “The Advanced Placement Program offers a course description and exam in statistics to secondary school students who wish to complete studies equivalent to a one semester, introductory, non-calculus-based, college course in statistics.”

Too much material for a one-semester course

I tried to teach AP Stats in one semester following the College Board description and methodology. That is, with no derivations, giving only recipes and concentrating on applications. The students were really stretched, didn’t remember anything after completing the course, and usefulness of the course for the subsequent calculus-based course was minimal.

Suggestion. Reduce the number of topics and concentrate on those, which require going all the way from (again citing the description) Exploring Data to Sampling and Experimentation to Anticipating Patterns to Statistical Inference. Simple regression is such a topic.

I would drop the stem-and-leaf plot, because it is stupid; chi-square test for goodness of fit, homogeneity of proportions and independence, including ANOVA, because it is too advanced and looks too vague without the right explanation. Instead of going wide, it is better to go deeper, building upon what students already know. I’ll post a couple of regression applications.

“Introductory” should not mean stupefying

Statistics has its specifics. Even I, with my extensive experience in Math, made quite a few discoveries for myself while studying Stats. Textbook authors, in their attempts to make exposition accessible, often replace the true statistical ideas by after-the-fact intuition or formulas by their verbal description. See, for example, the z score.

Using TI-83+ and TI-84 graphing calculators is like using a Tesla electric car in conjunction with candles for generating electricity. The sole purpose of these calculators is to prevent cheating. The inclination for cheating is a sign of low understanding and the best proof that the College Board strategy is wrong.

Once you say “this course is non-calculus-based”, you close many doors

When we format a document in Word, we don’t care how formatting is implemented technically and we don’t need to know anything about programming. Looks like the same attitude is imparted to students of Stats. Few people notice a big difference. When we format a document, we have an idea of what we want and test the result against that idea. In Stats, the idea has to be translated to a formula, and the software output has to be translated into a formula for interpretation.

I understand that, for the majority of Stats students, the amount of algebra I use in some of my posts is not accessible. However, the opposite tendency of telling students that they don’t need to remember any formulas is unproductive. It’s only by memorizing and reproducing equations that they can augment their algebraic proficiency. Stats is largely a mental science. To improve mental activity, you have to engage in one.

Suggestion. Instead of “this course is non-calculus-based”, say: the course develops the ability to interpret equations and translate ideas to formulas.

Follow a logical sequence

The way most AP Stats books are written does not give any idea as to what comes from where. When I was a bachelor student, I was looking for explanations, and I would hate reading one of today’s AP Stats textbooks. For those who think, memorizing a bunch of recipes, without seeing the logical links, is a nightmare. In some cases, the absence of logic leads to statements that are plain wrong. Just following the logical sequence will put the pieces of the puzzle together.

8
Oct 16

The pearls of AP Statistics 32

Student's t distribution: one-line explanation of its origin

They sayWe’ll now learn about a confidence interval that applies even for small sample sizes… Suppose we knew the standard deviation,  \sigma/\sqrt{n}, of the sample mean. Then, with the additional assumption that the population is normal, with small n we could use the formula \bar{x}\pm z\sigma/\sqrt{n}, for instance with z = 1.96 for 95% confidence. In practice, we don’t know the population standard deviation σ. Substituting the sample standard deviation s for σ to get se=s/\sqrt{n} then introduces extra error. This error can be sizeable when n is small. To account for this increased error, we must replace the z-score by a slightly larger score, called a t-score. The confidence interval is then a bit wider. (Agresti and Franklin, p.369)

I say: The opening statement in italic (We’ll now learn about...) creates the wrong impression that the task at hand is to address small sample sizes. The next part in italic (To account for this increased error...) confuses the reader further by implying that

1) using the sample standard deviation instead of the population standard deviation and

2) replacing the z score by the t score

are two separate acts. They are not: see equation (3) below. The last proposition in italic (The confidence interval is then a bit wider) is true. It confused me to the extent that I made a wrong remark in the first version of this post, see Remark 4 below.

Preliminaries

William Gosset published his result under a pseudonym ("Student"), and that result was modified by Ronald Fisher to what we know now as Student's t distribution. Gosset with his statistic wanted to address small sample sizes. The modern explanation is different: the t statistic arises from replacing the unknown population variance by its estimator, the sample variance, and it works regardless of the sample size. If we take a couple of facts on trust, the explanation will be just a one-line formula.

Let X_1,...,X_n be a sample of independent observations from a normal population.

Fact 1. The z-score of the sample mean

(1) z_0=\frac{\bar{X}-E\bar{X}}{\sigma(\bar{X})}=\frac{\bar{X}-\mu}{\sigma/\sqrt{n}}

is a standard normal variable.

Fact 2. The sample variance s^2=\frac{1}{n-1}\sum(X_i-\bar{X})^2 upon scaling becomes a chi-square variable. More precisely, the variable

(2) \chi^2_{n-1}=\frac{(n-1)s^2}{\sigma^2}

is a chi-square with n-1 degrees of freedom.

Fact 3. The variables in (1) and (2) are independent.

Intuitive introduction to t distribution

When a population parameter is unknown, replace it by its estimator. Following this general statistical idea, in the situation when \sigma is unknown, instead of (1) consider

(3) t=\frac{\bar{X}-\mu}{s/\sqrt{n}} (dividing and multiplying by \sigma=\frac{\bar{X}-\mu}{\sigma/\sqrt{n}}\frac{1}{\sqrt{s^2/\sigma^2}} (using (1), (2)) =\frac{z_0}{\sqrt{\chi^2_{n-1}/(n-1)}}.

By definition and because the numerator and denominator are independent, the last expression is a t distribution with n-1 degrees of freedom. This is all there is to it.

Concluding remarks

Remark 1. When I give the definitions of chi-square, t statistic and F statistic, my students are often surprised. This is because there is no reference to samples. To be precise, it is better to remember that the way I define them, they are random variables and not statistics and using "distribution" or "variable" would be more appropriate than "statistic". A statistic, by definition, is a function of observations. The variable we start with in (3) is, obviously, a statistic. Equation (3) means that that statistic is distributed as t with n-1 degrees of freedom.

Remark 2. Many AP Stats books claim that a sum of normal variables is normal. In fact, for this to be true we need independence of the summands. Under our assumption of independent observations and normality, the sum X_1+...+X_n is normal. The variable in (1) is normal as a linear transformation of this sum. Since its mean is zero and its variance is 1, it is a standard normal. We have proved Fact 1. The proofs of Facts 2 and 3 are much more complex.

Remark 3. The t statistic is not used for large samples not because it does not work for large n but because for large n it is close to the z score.

relationship-between-confidence-intervals-based-on-t-score-and-z-scoreRemark 4. Taking the t distribution defined here as a standard t, we can define a general t as its linear transformation, GeneralT=\sigma*StandardT+\mu (similarly to general normals). Since the standard deviation of the standard t is not 1, the standard deviation of the general t we have defined will not be \sigma. The general t is necessary to use the Mathematica function StudentTCI (confidence interval for Student's t). The t score that arises in estimation is the standard t. In this case, confidence intervals based on t are indeed wider than those based on z. I apologize for my previous wrong comment and am posting this video. See an updated Mathematica file.

23
Sep 16

The pearls of AP Statistics 30

Where do the confidence interval and margin of error come from?

They say: A confidence interval is an interval containing the most believable values for a parameter.
The probability that this method produces an interval that contains the parameter is called the confidence level. This is a number chosen to be close to 1, most commonly 0.95... The key is the sampling distribution of the point estimate. This distribution tells us the probability that the point estimate will fall within any certain distance of the parameter (Agresti and Franklin, p.352)... The margin of error measures how accurate the point estimate is likely to be in estimating a parameter. It is a multiple of the standard deviation of the sampling distribution of the estimate, such as 1.96 x (standard deviation) when the sampling distribution is a normal distribution (p.353)

I say: Confidence intervals, invented by Jerzy Neyman, were an important contribution to the statistical science. The logic behind them is substantial. Some math is better to hide from students but not in this case. The authors keep in mind complex notions involving math and try to deliver them verbally. Instead of hoping that students will recreate mentally those notions, why not give them directly?

Motivation

I ask my students what kind of information they would prefer:

a) I predict the price S of Apple stock to be $114 tomorrow or

b) Tomorrow the price of Apple stock is expected to stay within $1 distance from $114 with probability 95%, that is P(113<S<115)=0.95.

Everybody says statement b) is better. A follow-up question: Do you want the probability in statement b) to be high or low? Unanimous answer: High. A series of definitions follows.

An interval (a,b) containing the values of a random variable S with high probability

(1) P(a<S<b)=p

is called a confidence interval. The value p which controls probability is called a confidence level and the number \alpha=1-p is called a level of significance. The interpretation of \alpha is that P(S\ falls\ outside\ of\ (a,b))=\alpha as follows from (1).

How to find a confidence interval

We want the confidence level to be close to 1 and the significance level to be close to zero. In applications, we choose them and we need to find the interval (a,b) from equation (1).

Step 1. Consider the standard normal. (1) becomes P(a<z<b)=p. Note that usually it is impossible to find two unknowns from one equation. Therefore we look for a symmetric interval, in which case we have to solve

(2) P(-a<z<a)=p

for a. The solution a=z_{cr} is called a critical value corresponding to the confidence level p or significance level \alpha=1-p. It is impossible to find it by hand, that's why people use statistical tables. In Mathematica, the critical value is given by

z_{cr}=Max[NormalCI[0, 1, ConfidenceLevel -> p]].

Geometrically, it is obvious that, as the confidence level approaches 1, the critical value goes to infinity, see the video or download the Mathematica file.

Step 2. In case of a general normal variable, plug its z-score in (2):

(3) P(-z_{cr}<\frac{X-\mu}{\sigma}<z_{cr})=p.

The event -z_{cr}<\frac{X-\mu}{\sigma}<z_{cr} is the same as -z_{cr}\sigma<X-\mu<z_{cr}\sigma which is the same as \mu-z_{cr}\sigma<X<\mu+z_{cr}\sigma. Hence, their probabilities are the same:

(4) P(-z_{cr}<\frac{X-\mu}{\sigma}<z_{cr})=P(\mu-z_{cr}\sigma<X<\mu+z_{cr}\sigma)=p.

We have found the confidence interval (\mu-z_{cr}\sigma,\mu+z_{cr}\sigma) for a normal variable. This explains where the margin of error z_{cr}\sigma comes from.

Step 3. In case of a random variable which is not necessarily normal we can use the central limit theorem. z-scores of sample means z=\frac{\bar{X}-E\bar{X}}{\sigma(\bar{X})}, for example, approach the standard normal. Instead of (3) we have an approximation

P(-z_{cr}<\frac{\bar{X}-E\bar{X}}{\sigma(\bar{X})}<z_{cr})\approx p.

Then instead of (4) we get

P(E\bar{X}-z_{cr}\sigma(\bar{X})<\bar{X}<E\bar{X}+z_{cr}\sigma(\bar{X}))\approx p.

 

In my classes, I insist that logically interconnected facts should be given in one place. To consolidate this information, I give my students the case of one-sided intervals as an exercise.

4
Sep 16

The pearls of AP Statistics 25

Central Limit Theorem versus Law of Large Numbers

They say: The Central Limit Theorem (CLT). Describes the Expected Shape of the Sampling Distribution for Sample Mean \bar{X}. For a random sample of size n from a population having mean μ and standard deviation σ, then as the sample size n increases, the sampling distribution of the sample mean \bar{X} approaches an approximately normal distribution. (Agresti and Franklin, p.321)

I say: There are at least three problems with this statement.

Problem 1. With any notion or statement, I would like to know its purpose in the first place. The primary purpose of the law of large numbers is to estimate population parameters. The Central Limit Theorem may be a nice theoretical result, but why do I need it? The motivation is similar to the one we use for introducing the z score. There is a myriad of distributions. Only some standard distributions have been tabulated. Suppose we have a sequence of variables X_n, none of which have been tabulated. Suppose also that, as n increases, those variables become close to a normal variable in the sense that the cumulative probabilities (areas under their respective densities) become close:

(1) P(X_n\le a)\rightarrow P(normal\le a) for all a.

Then we can use tables developed for normal variables to approximate P(X_n\le a). This justifies using (1) as the definition of a new convergence type called convergence in distribution.

Problem 2. Having introduced convergence (1), we need to understand what it means in terms of densities (distributions). As illustrated in Excel, the law of large numbers means convergence to a spike. In particular, the sample mean converges to a mass concentrated at μ (densities contract to one point). Referring to the sample mean in the context of CLT is misleading, because the CLT is about densities stabilization.

LLN

Figure 1. Law of large numbers with n=100, 1000, 10000

Figure 1 appeared in my posts before, I just added n=10,000, to show that densities do not stabilize.

Figure 2. Central limit theorem with n=100, 1000, 10000

 

In Figure 2, for clarity I use line plots instead of histograms. The density for n=100 is very rugged. The blue line (for n=1000) is more rugged than the orange (for n=10,000). Convergence to a normal shape is visible, although slow.

Main problem. It is not the sample means that converge to a normal distribution. It is their z scores

z=\frac{\bar{X}-E\bar{X}}{\sigma(\bar{X})}

that do. Specifically,

P(\frac{\bar{X}-E\bar{X}}{\sigma(\bar{X})}\le a)\rightarrow P(z\le a) for all a

where z is a standard normal variable.

In my simulations I used sample means for Figure 1 and z scores of sample means for Figure 2. In particular, z scores always have means equal to zero, and that can be seen in Figure 2. In your class, you can use the Excel file. As usual, you have to enable macros.

10
Jan 16

What is a z score: the scientific explanation

You know what is a z score when you know why people invented it.

As usual, we start with a theoretical motivation. There is a myriad of distributions. Even if we stay within the set of normal distributions, there is an infinite number of them, indexed by their means \mu(X)=EX and standard deviations \sigma(X)=\sqrt{Var(X)}. When computers did not exist, people had to use statistical tables. It was impossible to produce statistical tables for an infinite number of distributions, so the problem was to reduce the case of general \mu(X) and \sigma(X) to that of \mu(X)=0 and \sigma(X)=1.

But we know that that can be achieved by centering and scaling. Combining these two transformations, we obtain the definition of the z score:

z=\frac{X-\mu(X)}{\sigma(X)}.

Using the properties of means and variances we see that

Ez=\frac{E(X-\mu(X))}{\sigma(X)}=0, Var(z)=\frac{Var(X-\mu(X))}{\sigma^2(X)}=\frac{Var(X)}{\sigma^2(X)}=1.

The transformation leading from X to its z score sometimes is called standardization.

This site promises to tell you the truth about undergraduate statistics. The truth about the z score is that:

(1) Standardization can be applied to any variable with finite variance, not only to normal variables. The z score is a standard normal variable only when the original variable X is normal, contrary to what some sites say.

(2) With modern computers, standardization is not necessary to find critical values for X, see Chapter 14 of my book.