Jan 17

Review of Agresti and Franklin

Review of Agresti and Franklin "Statistics: The Art and Science of Learning from Data", 3rd edition

Who is this book for?

On the Internet you can find both positive and negative reviews. The ones that I saw on Goodreads.com and Amazon.com do not say much about the pros and cons. Here I try to be more specific.

The main limitation of the book is that it adheres to the College Board statement that "it is a one semester, introductory, non-calculus-based, college course in statistics". Hence, there are no derivations and no links between formulas. You will not find explanations of why Statistics works. As a result, there is too much emphasis on memorization. After reading the book, you will likely not have an integral view of statistical methods.

I have seen students who understand such texts well. Generally, they have an excellent memory and better-than-average imagination. But such students are better off reading more advanced books. A potential reader has to lower his/her expectations. I imagine a person who is not interested in taking a more advanced Stats course later. The motivation of that person would be: a) to understand the ways Statistics is applied and/or b) to pass AP Stats just because it is a required course. The review is written on the premise that this is the intended readership.

What I like

  1. The number and variety of exercises. This is good for an instructor who teaches large classes. Having authored several books, I can assure you that inventing many exercises is the most time-consuming part of this business.
  2. The authors have come up with good visual embellishments of graphs and tables summarized in "A Guide to Learning From the Art in This Text" in the end of the book.
  3. The book has generous left margins. Sometimes they contain reminders about the past material. Otherwise, the reader can use them for notes.
  4. MINITAB is prohibitively expensive, but the Student Edition of MINITAB is provided on the accompanying CD.

What I don't like

  1. I counted about 140 high-resolution photos that have nothing to do with the subject matter. They hardly add to the educational value of the book but certainly add to its cost. This bad trend in introductory textbooks is fueled to a considerable extent by Pearson Education.
  2. 800+ pages, even after slashing all appendices and unnecessary illustrations, is a lot of reading for one semester. Even if you memorize all of them, during the AP test it be will difficult for you to pull out of your memory exactly that page you need to answer exactly this particular question.
  3. In an introductory text, one has to refrain from giving too much theory. Still, I don't like some choices made by the authors. The learning curve is flat. As a way of gentle introduction to algebra, verbal descriptions of formulas are normal. But sticking to verbal descriptions until p. 589 is too much. This reminds me a train trip in Kazakhstan. You enter the steppe through the western border and two days later you see the same endless steppe, just the train station is different.
  4. At the theoretical level, many topics are treated superficially. You can find a lot of additional information in my posts named "The pearls of AP Statistics". Here is the list of most important additions: regression and correlation should be decoupled; the importance of sampling distributions is overstated; probability is better explained without reference to the long run; the difference between the law of large numbers and central limit theorem should be made clear; the rate of convergence in the law of large numbers is not that fast; the law of large numbers is intuitively simple; the uniform distribution can also be made simple; to understand different charts, put them side by side; the Pareto chart is better understood as a special type of a histogram; instead of using the software on the provided CD, try to simulate in Excel yourself.
  5. Using outdated Texas instruments calculators contradicts the American Statistical Association recommendation to "Use technology for developing concepts and analyzing data".


If I want to save time and don't intend to delve into theory, I would prefer to read a concise book that directly addresses questions given on the AP test. However, to decide for yourself, read the Preface to see how much fantasy has been put into the book, and you may want to read it.

Dec 16

Testing for structural changes: a topic suitable for AP Stats

Testing for structural changes: a topic suitable for AP Stats

Problem statement

Economic data are volatile but sometimes changes in them look more permanent than transitory.

Figure 1. US GDP from agriculture. Source: http://www.tradingeconomics.com/united-states/gdp-from-agriculture

Figure 1 shows fluctuations of US GDP from agriculture. There have been ups and downs throughout the period of 2005-2016 but overall the trend has been up until 2013 and down since then. We want an objective, statistical confirmation of the fact that in 2013 the change was structural, substantial rather than a random fluctuation.

Chow test steps

  1. Divide the observed sample in two parts, A and B, at the point where you suspect the structural change (or break) has occurred. Run three regressions: one for A, another for B and the third one for the whole sample (pooled regression). Get residual sums of squares from each of them, denoted RSS_ARSS_B and RSS_p, respectively.
  2. Let n_A and n_B be the numbers of observations in the two subsamples and suppose there are k coefficients in your regression (for Figure 1, we would regress GDP on a time variable, so the number of coefficients would be 2, including the intercept). The Chow test statistic is defined by


This statistic is distributed as F with k,n_A+n_B-2k degrees of freedom. The null hypothesis is that the coefficients are the same for the two subsamples and the alternative is that they are not. If the statistic is larger than the critical value at your chosen level of significance, splitting the sample in two is beneficial (better describes the data). If the statistic is not larger than the critical value, the pooled regression better describes the data.

Figure 2. Splitting is better (there is a structural change)

In Figure 2, the gray lines are the fitted lines for the two subsamples. They fit the data much better than the orange line (the fitted line for the whole sample).

Figure 3. Pooling is better

In Figure 3, pooling is better because the intercept and slope are about the same and pooling amounts to increasing the sample size.

Download the Excel file used for simulation. See the video.

Sep 16

The pearls of AP Statistics 25

Central Limit Theorem versus Law of Large Numbers

They say: The Central Limit Theorem (CLT). Describes the Expected Shape of the Sampling Distribution for Sample Mean \bar{X}. For a random sample of size n from a population having mean μ and standard deviation σ, then as the sample size n increases, the sampling distribution of the sample mean \bar{X} approaches an approximately normal distribution. (Agresti and Franklin, p.321)

I say: There are at least three problems with this statement.

Problem 1. With any notion or statement, I would like to know its purpose in the first place. The primary purpose of the law of large numbers is to estimate population parameters. The Central Limit Theorem may be a nice theoretical result, but why do I need it? The motivation is similar to the one we use for introducing the z score. There is a myriad of distributions. Only some standard distributions have been tabulated. Suppose we have a sequence of variables X_n, none of which have been tabulated. Suppose also that, as n increases, those variables become close to a normal variable in the sense that the cumulative probabilities (areas under their respective densities) become close:

(1) P(X_n\le a)\rightarrow P(normal\le a) for all a.

Then we can use tables developed for normal variables to approximate P(X_n\le a). This justifies using (1) as the definition of a new convergence type called convergence in distribution.

Problem 2. Having introduced convergence (1), we need to understand what it means in terms of densities (distributions). As illustrated in Excel, the law of large numbers means convergence to a spike. In particular, the sample mean converges to a mass concentrated at μ (densities contract to one point). Referring to the sample mean in the context of CLT is misleading, because the CLT is about densities stabilization.


Figure 1. Law of large numbers with n=100, 1000, 10000

Figure 1 appeared in my posts before, I just added n=10,000, to show that densities do not stabilize.

Figure 2. Central limit theorem with n=100, 1000, 10000


In Figure 2, for clarity I use line plots instead of histograms. The density for n=100 is very rugged. The blue line (for n=1000) is more rugged than the orange (for n=10,000). Convergence to a normal shape is visible, although slow.

Main problem. It is not the sample means that converge to a normal distribution. It is their z scores


that do. Specifically,

P(\frac{\bar{X}-E\bar{X}}{\sigma(\bar{X})}\le a)\rightarrow P(z\le a) for all a

where z is a standard normal variable.

In my simulations I used sample means for Figure 1 and z scores of sample means for Figure 2. In particular, z scores always have means equal to zero, and that can be seen in Figure 2. In your class, you can use the Excel file. As usual, you have to enable macros.

Aug 16

The pearls of AP Statistics 22

The law of large numbers - a bird's view

They say: In 1689, the Swiss mathematician Jacob Bernoulli proved that as the number of trials increases, the proportion of occurrences of any given outcome approaches a particular number (such as 1/6) in the long run. (Agresti and Franklin, p.213).

I say: The expression “law of large numbers” appears in the book 13 times, yet its meaning is never clearly explained. The closest approximation to the truth is the above sentence about Jacob Bernoulli. To see if this explanation works, tell it to your students and ask what they understood. To me, this is a clear case when withholding theory harms understanding.

Intuition comes first. I ask my students: if you flip a fair coin 100 times, what do you expect the proportion of ones to be? Absolutely everybody replies correctly, just the form of the answer may be different (50-50 or 0.5 or 50 out of 100). Then I ask: probably it will not be exactly 0.5 but if you flip the coin 1000 times, do you expect the proportion to be closer to 0.5? Everybody says: Yes. Next I ask: Suppose the coin is unfair and the probability of 1 appearing is 0.7. What would you expect the proportion to be close to in large samples? Most students come up with the right answer: 0.7. Congratulations, you have discovered what is called a law of large numbers!

Then we give a theoretical format to our little discovery. p=0.7 is a population parameter. Flipping a coin n times we obtain observations X_1,...,X_n. The proportion of ones is the sample mean \bar{X}=\frac{X_1+...+X_n}{n}. The law of large numbers says two things: 1) as the sample size increases, the sample mean approaches the population mean. 2) At the same time, its variation about the population mean becomes smaller and smaller.

Part 1) is clear to everybody. To corroborate statement 2), I give two facts. Firstly, we know that the standard deviation of the sample mean is \frac{\sigma}{\sqrt{n}}. From this we see that as n increases, the standard deviation of the sample mean decreases and the values of the sample mean become more and more concentrated around the population mean. We express this by saying that the sample mean converges to a spike. Secondly, I produce two histograms. With the sample size n=100, there are two modes (just 1o%) of the histogram at 0.69 and 0.72, while 0.7 was used as the population mean in my simulations. Besides, the spread of the values is large. With n=1000, the mode (27%) is at the true value 0.7, and the spread is low.

Histogram of proportions with n=100


Histogram of proportions with n=1000

Finally, we relate our little exercise to practical needs. In practice, the true mean is never known. But we can obtain a sample and calculate its mean. With a large sample size, the sample mean will be close to the truth. More generally, take any other population parameter, such as its standard deviation, and calculate the sample statistic that estimates it, such as the sample standard deviation. Again, the law of large numbers applies and the sample statistic will be close to the population parameter. The histograms have been obtained as explained here and here. Download the Excel file.

Aug 16

The pearls of AP Statistics 19

Make it easy, keep 'em busy - law of large numbers illustrated

They say: Use the "Simulating the Probability of Head With a Fair Coin" applet on the text CD or other software to illustrate the long-run definition of probability by simulating short-term and long-term results of flipping a balanced coin (Agresti and Franklin, p.216)

I say: Since it is not explained how to use "other software", the readers are in fact forced to use the applet from the text CD. This is nothing but extortion. This exercise requires the students to write down 30 observations (which is not so bad, I also practice this in a more entertaining way). But the one following it invites them to simulate 100 observations. The two exercises are too simple, time consuming and their sole purpose is to illustrate the law of large numbers introduced on p. 213. The students have but one life and the authors want them to lose it for AP Statistics?

The same objective can be achieved in a more efficient and instructive manner. The students are better off if they see how to model the law of large numbers in Excel. As I promised, here is the explanation.

Law of large numbers

Law of large numbers - click to view video




Here is the Excel file.

Aug 16

The pearls of AP Statistics 18

Better see once than hear a thousand times: the error in regression model

They say: The regression line is introduced in Chapter 2 of Agresti and Franklin. The true regression model y=a+bx+e is never mentioned (here e is the error term). In Chapter 12 (p.583) the existence of the error term is acknowledged in section "The Regression Model Also Allows Variability About the Line" and Figure 12.4.

I say: the formal treatment of the true model, error term and their implications for inference is beyond the scope of this book. The informal understanding can be enhanced by the following illustrations. In both cases the true intercepts, slopes, sigmas and error distributions are the same. The differences between the observations and regression lines are caused by randomness. Download the Excel file with simulations. Press F9 to see different realizations.

Simulations steps:

  1. The user can change the intercept, slope and sigma to his/her liking.
  2. The x values are just natural numbers.
  3. The errors are obtained from rand() by centering and scaling
  4. The y values are generated using the regression formulas
  5. The estimated slope and intercept are Excel functions
  6. They are used to calculate the fitted values
  7. For the second sample steps 3-6 are repeated


Regression line and observations1

Figure 1. Regression line and observations for sample 1


Regression line and observations2

Figure 2. Regression line and observations for sample 2


2 regression lines

Figure 3. Comparison of two regression lines

Jul 16

The pearls of AP Statistics 3

Importance of simulation in Excel for elementary stats courses

I would say it is very important because it allows students to see data generation in real time.

They say: Simulating Randomness and Variability. To get a feel for randomness and variability, let’s simulate taking an exit poll of voters using the sample from a population applet on the text CD. (Agresti and Franklin, p.19)

I say: Why use a population applet from the text CD? It's certainly good for the authors, because without the text itself the students would't have the CD. It's not so good for the students because the book's price on the publisher's web site is $221.80. Most students and many teachers of elementary statistics don't know that all simulation can be done in Excel. See examples: Exercise 2.1Exercise 2.2Exercise 2.3Exercise 2.4, Example 2.2, and this post about active learning. Even the law of large numbers and central limit theorem can be illustrated in Excel. Using Excel is better because 1) most people and businesses have it and 2) Microsoft Office is accompanied with a powerful programming software called Visual Basic for Applications. I used it on many occasions and it saved me hundreds of hours (examples will be provided later).



Jan 16

Active learning - away from boredom of lectures

Active learning is the key to success.

Active learning

Figure 1. Excel file - click to view video

The beginning of an introductory course in Statistics has many simple definitions. The combination simplicity+multitude makes it boring if the teacher follows the usual lecture format. Instead, I suggest my students to read the book, collect a sample on a simulated random variable and describe that sample. The ensuing team work and class discussion make the course much livelier. The Excel file used in the video can be downloaded from here. The video explains how to enable macros embedded in the file.

The Excel file simulates seven different variables, among them deterministic and random, categorical and numerical, discrete and continuous; there is also a random vector. When you press the "Observation" button, the file produces new observations on all seven variables. The students have to collect observations on assigned variables and provide descriptive statistics. They work on assignments in teams of up to six members. The seven variables in the file are enough to engage up to 42 students.


Dec 15

Modeling a pair of random variables and scatterplot definition - Exercise 2.5

Modeling a pair of random variables is very simple in Excel.

modeling a pair of random variables

Modeling a pair of random variables - click to view video

Obviously, a pair of random variables is a more complex object than one variable, so be alert when you see one. The mathematical name for a pair of variables is a vector. Both components of the vector are constructed as normal variables, using a combination of a uniformly distributed variable and inverse cumulative distribution of a normal variable. On a scatterplot, we plot values of one variable against those of another. This definition is illustrated on the graph.

The simulation can be easily modified to obtain negatively correlated variables. Simulation of uncorrelated variables is a little trickier. If you just use the uniform distribution (the command is =rand()) in the first and second columns, the two columns will be uncorrelated. However, it is easy to construct variables which geometrically are correlated (say, follow a parabola on the scatterplot) but have the correlation close to zero. Elementary Statistics texts usually skip this detail.

Dec 15

Modeling a sample from a normal distribution in Excel - Exercise 2.4

Modeling a sample from a normal distribution.

Read Exercise 2.4 in the book and Unit 6.6 for the theory. In this video, along with the sample from a normal distribution and a histogram, we construct the density and cumulative distribution function with the same mean and standard deviation.

Simulation steps

  1. μ and σ are chosen by the user. I select them so as to model temperature distribution in Almaty.
  2. temp (temperature) are just integer number from 14 to 38
  3. Use the Excel command norm.dist with required arguments. The last argument should be "false" to produce a density
  4. The last argument should be "true" to give a cumulative distribution
  5. A combination of the commands norm.dist and rand gives simulated temperature values
  6. Next we define the bins required by the Excel function histogram, which is a part of the Data analysis toolpack (needs to be installed and activated)

The fact that Excel maintains links between data and results is very handy. That is, each time the data is renewed, the histogram will change. Pressing F9 (recalculate) you can see the histogram changing together with the sample.

Another interesting fact is that sometimes randomly generated numbers are not realistic. If observed in practice, they could be called outliers. But here they are due to the imperfect nature of the normal distribution, which can take very large (negative or positive) values with a positive probability.


Modeling a sample from a normal distribution - click to view video