Mar 19

AP Statistics the Genghis Khan way 1

AP Statistics the Genghis Khan way 1

Recently I enjoyed reading Jack Weatherford's "Genghis Khan and the Making of the Modern World" (2004). I was reading the book with a specific question in mind: what were the main reasons of the success of the Mongols? Here you can see the list of their innovations, some of which were in fact adapted from the nations they subjugated. But what was the main driving force behind those innovations? The conclusion I came to is that Genghis Khan was a genial psychologist. He used what he knew about individual and social psychology to constantly improve the government of his empire.

I am no Genghis Khan but I try to base my teaching methods on my knowledge of student psychology.

Problems and suggested solutions

Steven Krantz in his book (How to teach mathematics : Second edition, 1998, don't remember the page) says something like this: If you want your students to do something, arrange your classes so that they do it in the class.

Problem 1. Students mechanically write down what the teacher says and writes.

Solution. I don't allow my students to write while I am explaining the material. When I explain, their task is to listen and try to understand. I invite them to ask questions and prompt me to write more explanations and comments. After they all say "We understand", I clean the board and then they write down whatever they understood and remembered.

Problem 2. Students are not used to analyze what they read or write.

Solution. After students finish their writing, I ask them to exchange notebooks and check each other's writings. It's easier for them to do this while everything is fresh in their memory. I bought and distributed red pens. When they see that something is missing or wrong, they have to write in red. Errors or omissions must stand out. Thus, right there in the class students repeat the material twice.

Problem 3. Students don't study at home.

Solution. I let my students know in advance what the next quiz will be about. Even with this knowledge, most of them don't prepare at home. Before the quiz I give them about half an hour to repeat and discuss the material (this is at least the third repetition). We start the quiz when they say they are ready.

Problem 4. Students don't understand that active repetition (writing without looking at one's notes) is much more productive than passive repetition (just reading the notes).

Solution. Each time before discussion sessions I distribute scratch paper and urge students to write, not just read or talk. About half of them follow my recommendation. Their desire to keep their notebooks neat is not their last consideration. The solution to Problem 1 also hinges upon active repetition.

Problem 5. If students work and are evaluated individually, usually there is no or little interaction between them.

Solution. My class is divided in teams (currently I have teams of two to six people). I randomly select one person from each team to write the quiz. That person's grade is the team's grade. This forces better students to coach others and weaker students to seek help.

Problem 6. Some students don't want to work in teams. They are usually either good students, who don't want to suffer because of weak team members, or weak students, who don't want their low grades to harm other team members.

Solution. The good students usually argue that it's not fair if their grade becomes lower because of somebody else's fault. My answer to them is that the meaning of fairness depends on the definition. In my grading scheme, 30 points out of 100 is allocated for team work and the rest for individual achievements. Therefore I never allow good students to work individually. I want them to be my teaching assistants and help other students. While doing so, I tell them that I may reward good students with a bonus in the end of the semester. In some cases I allow weak students to write quizzes individually but only if the team so requests. The request of the weak student doesn't matter. The weak student still has to participate in team discussions.

Problem 7. There is no accumulation of theoretical knowledge (flat learning curve).

Solution. a) Most students come from high school with little experience in algebra. I raise the level gradually and emphasize understanding. Students never see multiple choice questions in my classes. They also know that right answers without explanations will be discarded.

b) Normally, during my explanations I fill out the board. The amount of the information the students have to remember is substantial and increases over time. If you know a better way to develop one's internal vision, let me know.

c) I don't believe in learning the theory by doing applied exercises. After explaining the theory I formulate it as a series of theoretical exercises. I give the theory in large, logically consistent blocks for students to see the system. Half of exam questions are theoretical (students have to provide proofs and derivations) and the other half - applied.

d) The right motivation can be of two types: theoretical or applied, and I never substitute one for another.

Problem 8. In low-level courses you need to conduct frequent evaluations to keep your students in working shape. Multiply that by the number of students, and you get a serious teaching overload.

Solution. Once at a teaching conference in Prague my colleague from New York boasted that he grades 160 papers per week. Evaluating one paper per team saves you from that hell.


In the beginning of the academic year I had 47 students. In the second semester 12 students dropped the course entirely or enrolled in Stats classes taught by other teachers. Based on current grades, I expect 15 more students to fail. Thus, after the first year I'll have about 20 students in my course (if they don't fail other courses). These students will master statistics at the level of my book.

Jan 17

Review of Albert and Rossman

Review of Albert and Rossman "Workshop Statistics: Discovery with Data, A Bayesian Approach", Key College Publishing, 2001

Who is this book for?

In this review I concentrate on how this book is similar to and different from  Agresti and Franklin. The book contains almost no formulas and in this respect is even more basic than Agresti and Franklin. The emphasis of the book is on the Bayesian approach, which is not mainstream Statistics, and this makes it stand out from the crowd. The advantages of this emphasis are described on p.11 (avoiding the notion of a sampling distribution, making the course shorter and doing with less recipes).

What I like

The text is business-like. Just one page (p.1) explains the difference between descriptive and inferential statistics, without even mentioning these names.

The book urges the instructor to reduce the amount of lecturing and rely more on active learning. It often prompts the student to think about ideas before providing the theoretical answer. That's what I like to do in my class. Activity 3-12 (Wrong Conclusions) pursues the same purpose.

The description of basic features of a data distribution on p.22 is concise and clear.

What I don't like

The definition of a categorical variable (p.5) does not allow one to distinguish it from a numerical one. See my explanation.

No attempt is made to improve students' algebra skills.  This is what undermines the attempt to explain the Bayesian approach.

Like Agresti and Franklin, the authors make the study of regression dependent on the correlation coefficient. See Correlation and regression are two separate entities.

The logical sequence is broken. In particular, probabilities are introduced after regression.

The normal distribution, one of the pillars of Statistics, is given too late (in Chapter 18).


As much as I like the idea of active learning, I cannot recommend the book, for the simple reason that it doesn't comply with the College Board curriculum.

Not being a Bayesian specialist, I was hoping to pick up something useful for myself. That hope didn't realize. The five chapters on the Bayesian approach are nothing more than just a collection of recipes accompanied by numerical examples. Even the Bayes theorem is not stated. If I were to write such a book, I would write it as a complement to a widely adopted text. This would allow me to avoid repeating the common stuff (graphical illustration of statistical data, measures of center and spread, probabilities etc.) and give more theory.

Sep 16

The pearls of AP Statistics 29

Normal distributions: sometimes it is useful to breast the current

The usual way of defining normal variables is to introduce the whole family of normal distributions and then to say that the standard normal is a special member of this family. Here I show that, for didactic purposes, it is better to do the opposite.

Standard normal distribution

The standard normal distribution z is defined by its probability density


Usually students don't remember this equation, and they don't need to. The point is to emphasize that this is a specific density, not a generic "bell shape".


Figure 1. Standard normal density

From the plot of the density (Figure 1) they can guess that the mean of this variable is zero.







Figure 2. Plot of xp(x)

Alternatively, they can look at the definition of the mean of a continuous random variable Ez=\int_\infty^\infty xp(x)dx. Here the function f(x)=xp(x) has the shape given in Figure 2, where the positive area to the right of the origin exactly cancels out with the negative area to the left of the origin. Since an integral means the area under the function curve, it follows that

(1) Ez=0.


To find variance, we use the shortcut:

Var(z)=Ez^2-(Ez)^2=Ez^2=\int_{-\infty}^\infty x^2p(x)dx=2\int_0^\infty x^2p(x)dx=1.

Figure 3. Plot of x^2p(x)


The total area under the curve is twice the area to the right of the origin, see Figure 3. Here the last integral has been found using Mathematica. It follows that

(2) \sigma(z)=\sqrt{Var(z)}=1.

General normal distribution


Figure $. Visualization of linear transformation - click to view video

Fix some positive \sigma and real \mu. A (general) normal variable \mu is defined as a linear transformation of z:

(3) X=\sigma z+\mu.

Changing \mu moves the density plot to the left (if \mu is negative) and to the right (if \mu is positive). Changing \sigma makes the density peaked or flat. See video. Enjoy the Mathematica file.




Properties follow like from the horn of plenty:

A) Using (1) and (3) we easily find the mean of X:

EX=\sigma Ez+\mu=\mu.

B) From (2) and (3) we have

Var(X)=Var(\sigma z)=\sigma^2Var(z)=\sigma^2

(the constant \mu does not affect variance and variance is homogeneous of degree 2).

C) Solving (3) for z gives us the z-score:


D) Moreover, we can prove that a linear transformation of a normal variable is normal. Indeed, let X be defined by (3) and let Y be its linear transformation: Y=\delta X+\nu. Then

Y=\delta (\sigma z+\mu)+\nu=\delta\sigma z+(\delta\mu+\nu)

is a linear transformation of the standard normal and is therefore normal.

Remarks. 1) In all of the above, no derivation is longer than one line. 2) Reliance on geometry improves understanding. 3) Only basic properties of means and variances are used. 4) With the traditional way of defining the normal distribution using the equation


there are two problems. Nobody understands this formula and it is difficult to extract properties of the normal variable from it.

Compare the above exposition with that of Agresti and Franklin: a) The normal distribution is symmetric, bell-shaped, and characterized by its mean μ and standard deviation σ (p.277) and b) The Standard Normal Distribution has Mean = 0 and Standard Deviation = 1 (p.285). It is the same old routine: remember this, remember that.

Aug 16

The pearls of AP Statistics 22

The law of large numbers - a bird's view

They say: In 1689, the Swiss mathematician Jacob Bernoulli proved that as the number of trials increases, the proportion of occurrences of any given outcome approaches a particular number (such as 1/6) in the long run. (Agresti and Franklin, p.213).

I say: The expression “law of large numbers” appears in the book 13 times, yet its meaning is never clearly explained. The closest approximation to the truth is the above sentence about Jacob Bernoulli. To see if this explanation works, tell it to your students and ask what they understood. To me, this is a clear case when withholding theory harms understanding.

Intuition comes first. I ask my students: if you flip a fair coin 100 times, what do you expect the proportion of ones to be? Absolutely everybody replies correctly, just the form of the answer may be different (50-50 or 0.5 or 50 out of 100). Then I ask: probably it will not be exactly 0.5 but if you flip the coin 1000 times, do you expect the proportion to be closer to 0.5? Everybody says: Yes. Next I ask: Suppose the coin is unfair and the probability of 1 appearing is 0.7. What would you expect the proportion to be close to in large samples? Most students come up with the right answer: 0.7. Congratulations, you have discovered what is called a law of large numbers!

Then we give a theoretical format to our little discovery. p=0.7 is a population parameter. Flipping a coin n times we obtain observations X_1,...,X_n. The proportion of ones is the sample mean \bar{X}=\frac{X_1+...+X_n}{n}. The law of large numbers says two things: 1) as the sample size increases, the sample mean approaches the population mean. 2) At the same time, its variation about the population mean becomes smaller and smaller.

Part 1) is clear to everybody. To corroborate statement 2), I give two facts. Firstly, we know that the standard deviation of the sample mean is \frac{\sigma}{\sqrt{n}}. From this we see that as n increases, the standard deviation of the sample mean decreases and the values of the sample mean become more and more concentrated around the population mean. We express this by saying that the sample mean converges to a spike. Secondly, I produce two histograms. With the sample size n=100, there are two modes (just 1o%) of the histogram at 0.69 and 0.72, while 0.7 was used as the population mean in my simulations. Besides, the spread of the values is large. With n=1000, the mode (27%) is at the true value 0.7, and the spread is low.

Histogram of proportions with n=100


Histogram of proportions with n=1000

Finally, we relate our little exercise to practical needs. In practice, the true mean is never known. But we can obtain a sample and calculate its mean. With a large sample size, the sample mean will be close to the truth. More generally, take any other population parameter, such as its standard deviation, and calculate the sample statistic that estimates it, such as the sample standard deviation. Again, the law of large numbers applies and the sample statistic will be close to the population parameter. The histograms have been obtained as explained here and here. Download the Excel file.

Aug 16

The pearls of AP Statistics 21

Turn a boring piece of theory into a creative exercise

Here is Figure 6.5 from Agresti and Franklin:

Figure 6.5I say: why make students memorize this? Why give it as another axiom when they can deduce this themselves? I do it in two steps, to separate the logic from arithmetics. First, I ask two questions: 1) If you have a symmetric bell-shaped distribution and you know the area of the left tail, what will be the area of the right tail symmetric to it? 2) With the same distribution and tails, how much will be the area between the tails if you know the area of one tail? I didn't see a single student who wouldn't be able to answer these questions. Then I tell the students to look up the tail area in the statistical table in Appendix A and find the areas in Figure 6.5. Initially this can be done for the standard normal (μ=0, σ=1) and then generalized for other normals.

Certainly, this takes more time than just giving a ready recipe, but who said the shortest way is the best? By the way, this exercise satisfies the requirements for exercises used for team competitions.

Wolfram Alpha is a good free resource about anything related to Math (if you have access to the Internet). In particular, if you want your students to visualize what happens when σ changes, tell them to enter the command

Plot[Evaluate@Table[PDF[NormalDistribution[0, \[Sigma]], x], {\[Sigma], {.75, 1, 2}}], {x, -6, 6}, Filling -> Axis]

Similarly, to see the effect of changing μ, they can enter

Plot[Evaluate@Table[PDF[NormalDistribution[\[Mu], 1.5], x], {\[Mu], {-1, 1, 2}}], {x, -6, 6}, Filling -> Axis]
Aug 16

The pearls of AP Statistics 19

Make it easy, keep 'em busy - law of large numbers illustrated

They say: Use the "Simulating the Probability of Head With a Fair Coin" applet on the text CD or other software to illustrate the long-run definition of probability by simulating short-term and long-term results of flipping a balanced coin (Agresti and Franklin, p.216)

I say: Since it is not explained how to use "other software", the readers are in fact forced to use the applet from the text CD. This is nothing but extortion. This exercise requires the students to write down 30 observations (which is not so bad, I also practice this in a more entertaining way). But the one following it invites them to simulate 100 observations. The two exercises are too simple, time consuming and their sole purpose is to illustrate the law of large numbers introduced on p. 213. The students have but one life and the authors want them to lose it for AP Statistics?

The same objective can be achieved in a more efficient and instructive manner. The students are better off if they see how to model the law of large numbers in Excel. As I promised, here is the explanation.

Law of large numbers

Law of large numbers - click to view video




Here is the Excel file.

Aug 16

The pearls of AP Statistics 18

Better see once than hear a thousand times: the error in regression model

They say: The regression line is introduced in Chapter 2 of Agresti and Franklin. The true regression model y=a+bx+e is never mentioned (here e is the error term). In Chapter 12 (p.583) the existence of the error term is acknowledged in section "The Regression Model Also Allows Variability About the Line" and Figure 12.4.

I say: the formal treatment of the true model, error term and their implications for inference is beyond the scope of this book. The informal understanding can be enhanced by the following illustrations. In both cases the true intercepts, slopes, sigmas and error distributions are the same. The differences between the observations and regression lines are caused by randomness. Download the Excel file with simulations. Press F9 to see different realizations.

Simulations steps:

  1. The user can change the intercept, slope and sigma to his/her liking.
  2. The x values are just natural numbers.
  3. The errors are obtained from rand() by centering and scaling
  4. The y values are generated using the regression formulas
  5. The estimated slope and intercept are Excel functions
  6. They are used to calculate the fitted values
  7. For the second sample steps 3-6 are repeated


Regression line and observations1

Figure 1. Regression line and observations for sample 1


Regression line and observations2

Figure 2. Regression line and observations for sample 2


2 regression lines

Figure 3. Comparison of two regression lines

Aug 16

The pearls of AP Statistics 11

When a student has problems, the culprit may be narrow internal vision.

The next situation is way too familiar: the student has absolutely normal cognitive skills, knows all the theory, understands the problem statement but doesn't see the solution. And then we embark on explanations. However, in most cases the root of the problem lies elsewhere.

Internal vision is the ability to imagine and hold in the working memory a complex picture or structure. An example will help explain its importance.

Wide vision

Wide vision - click to see the video

There is a computer game, called Toybox, which is based mainly on simulating the laws of physics. In most cases the user has to find a move that triggers off a series of events, eventually leading to a construction collapsing. In the first video, for example, one of the canons has a red button. Pressing that button causes canons shooting, and the last cannonball destroys the castle. If you see the whole picture, spotting that special canon with a red button is easy.



Narrow vision

Narrow vision - click to see the video

Finding the solution is much more difficult, if you don’t see the whole picture at once and have to scan it by parts, as in the second video. A person with a narrow internal vision behaves as if illuminating the picture with a narrow flashlight. When he/she sees only parts of the picture, it’s difficult to remember them and find the solution. If the student can’t solve the problem, the teacher starts thinking that perhaps he/she doesn’t know the laws of physics and explains the solution.


Egg dominoes

Nonstandard problem - click to see the video


If in the next problem there is a canon with a red button, the student will find the solution. But what if the key to solution is different, as in the third video?

The same problem occurs when studying sciences. Just having a narrow internal vision may prevent your student from seeing the solution. A wide internal vision is a prerequisite for logic and its underdevelopment may be the main culprit when logic leaves much to be desired. Making students study and repeat the theory in large pieces develops internal vision. Multiple explanations of the same theory by the teacher do not!

Aug 16

The pearls of AP Statistics 10

Pirates are everywhere! But this post is about team work.

Here I said that the TI-83+ and TI-84 graphing calculators are so primitive that it is impossible to upload files to them. Well, if you are wrong, it's never too late to admit that. Here is a video on YouTube showing that these calculators are real dinosaurs. There are many other cool ways to bypass the restrictions set by the College Board. Where there is demand there is supply.

teamThis is what I do to prevent cheating. 1) Use only open-end questions that require more understanding than memorization. 2) Improve students' preparation. If they know the material, no reason for them to cheat. 3) And, of course, administer exams/quizzes in a way that prevents cheating. Here is how:

1) I give a list of questions in advance. Sample question: Describe a situation in which: a) The mode is preferred to the mean and median, b) The mean is preferred to the mode and median, c) The median is preferred to the mean and mode. Summarize your conclusions in a table.

2) I divide students in teams and tell them to study and discuss the questions. The main rule is that the quiz is written by randomly selected team reps. Team members get the grade received by their rep. Joint responsibility and natural mutual help are what drive the team work.

3) When the students say they are ready for the quiz, I select the team reps and seat them in the front row. Based on the list of questions they worked on, I make several smaller lists that they can answer in at most 30 min, and distribute those smaller lists randomly among the reps. They cannot cheat because they are sitting right in front of me, answer different lists and are not allowed to use any devices. Most importantly, they know that the material for the next quiz will be even more difficult, and if they don't know the current material, they will have serious problems later. While they are writing, other students are working on a new list.

This method also saves me tons of time. Instead of checking answers of all students I check only answers of reps, and those answers are not based on the large list. My impression is that this method will not work with US students, who are big individualists.

Jul 16

The pearls of AP Statistics 5

Reducing memorization: Don't do for your students what they can do on their own

They say: The proportion of the observations that fall in a certain category is the frequency (count) of observations in that category divided by the total number of observations. The percentage is the proportion multiplied by 100. Proportions and percentages are also called relative frequencies and serve as a way to summarize the measurements in categories of a categorical variable. A frequency table is a listing of possible values for a variable, together with the number of observations for each value. (Agresti and Franklin, p.26) Insight. Don’t mistake the frequencies as values of a variable. They are merely a summary of how many times the observation (a shark attack) occurred in each category (region) (same source, p.27)

I say: Each time I see something in bold font, I think: is this something I have to remember? In this case, I would like my students to remember only the definitions of relative frequencies and frequency tables. Everything else is something the students have to understand after having worked through examples. Don't make your students memorize every little detail. Make them understand as much as possible and memorize as little as possible. Explaining and requiring minutiae is the best way to kill researchers in your students. I would not make this post if the book did not have many trivialities like this.