Aug 16

The pearls of AP Statistics 22

The law of large numbers - a bird's view

They say: In 1689, the Swiss mathematician Jacob Bernoulli proved that as the number of trials increases, the proportion of occurrences of any given outcome approaches a particular number (such as 1/6) in the long run. (Agresti and Franklin, p.213).

I say: The expression “law of large numbers” appears in the book 13 times, yet its meaning is never clearly explained. The closest approximation to the truth is the above sentence about Jacob Bernoulli. To see if this explanation works, tell it to your students and ask what they understood. To me, this is a clear case when withholding theory harms understanding.

Intuition comes first. I ask my students: if you flip a fair coin 100 times, what do you expect the proportion of ones to be? Absolutely everybody replies correctly, just the form of the answer may be different (50-50 or 0.5 or 50 out of 100). Then I ask: probably it will not be exactly 0.5 but if you flip the coin 1000 times, do you expect the proportion to be closer to 0.5? Everybody says: Yes. Next I ask: Suppose the coin is unfair and the probability of 1 appearing is 0.7. What would you expect the proportion to be close to in large samples? Most students come up with the right answer: 0.7. Congratulations, you have discovered what is called a law of large numbers!

Then we give a theoretical format to our little discovery. p=0.7 is a population parameter. Flipping a coin n times we obtain observations X_1,...,X_n. The proportion of ones is the sample mean \bar{X}=\frac{X_1+...+X_n}{n}. The law of large numbers says two things: 1) as the sample size increases, the sample mean approaches the population mean. 2) At the same time, its variation about the population mean becomes smaller and smaller.

Part 1) is clear to everybody. To corroborate statement 2), I give two facts. Firstly, we know that the standard deviation of the sample mean is \frac{\sigma}{\sqrt{n}}. From this we see that as n increases, the standard deviation of the sample mean decreases and the values of the sample mean become more and more concentrated around the population mean. We express this by saying that the sample mean converges to a spike. Secondly, I produce two histograms. With the sample size n=100, there are two modes (just 1o%) of the histogram at 0.69 and 0.72, while 0.7 was used as the population mean in my simulations. Besides, the spread of the values is large. With n=1000, the mode (27%) is at the true value 0.7, and the spread is low.

Histogram of proportions with n=100


Histogram of proportions with n=1000

Finally, we relate our little exercise to practical needs. In practice, the true mean is never known. But we can obtain a sample and calculate its mean. With a large sample size, the sample mean will be close to the truth. More generally, take any other population parameter, such as its standard deviation, and calculate the sample statistic that estimates it, such as the sample standard deviation. Again, the law of large numbers applies and the sample statistic will be close to the population parameter. The histograms have been obtained as explained here and here. Download the Excel file.

Aug 16

The pearls of AP Statistics 11

When a student has problems, the culprit may be narrow internal vision.

The next situation is way too familiar: the student has absolutely normal cognitive skills, knows all the theory, understands the problem statement but doesn't see the solution. And then we embark on explanations. However, in most cases the root of the problem lies elsewhere.

Internal vision is the ability to imagine and hold in the working memory a complex picture or structure. An example will help explain its importance.

Wide vision

Wide vision - click to see the video

There is a computer game, called Toybox, which is based mainly on simulating the laws of physics. In most cases the user has to find a move that triggers off a series of events, eventually leading to a construction collapsing. In the first video, for example, one of the canons has a red button. Pressing that button causes canons shooting, and the last cannonball destroys the castle. If you see the whole picture, spotting that special canon with a red button is easy.



Narrow vision

Narrow vision - click to see the video

Finding the solution is much more difficult, if you don’t see the whole picture at once and have to scan it by parts, as in the second video. A person with a narrow internal vision behaves as if illuminating the picture with a narrow flashlight. When he/she sees only parts of the picture, it’s difficult to remember them and find the solution. If the student can’t solve the problem, the teacher starts thinking that perhaps he/she doesn’t know the laws of physics and explains the solution.


Egg dominoes

Nonstandard problem - click to see the video

If in the next problem there is a canon with a red button, the student will find the solution. But what if the key to solution is different, as in the third video?

The same problem occurs when studying sciences. Just having a narrow internal vision may prevent your student from seeing the solution. A wide internal vision is a prerequisite for logic and its underdevelopment may be the main culprit when logic leaves much to be desired. Making students study and repeat the theory in large pieces develops internal vision. Multiple explanations of the same theory by the teacher do not!

Aug 16

The pearls of AP Statistics 10

Pirates are everywhere! But this post is about team work.

Here I said that the TI-83+ and TI-84 graphing calculators are so primitive that it is impossible to upload files to them. Well, if you are wrong, it's never too late to admit that. Here is a video on YouTube showing that these calculators are real dinosaurs. There are many other cool ways to bypass the restrictions set by the College Board. Where there is demand there is supply.

teamThis is what I do to prevent cheating. 1) Use only open-end questions that require more understanding than memorization. 2) Improve students' preparation. If they know the material, no reason for them to cheat. 3) And, of course, administer exams/quizzes in a way that prevents cheating. Here is how:

1) I give a list of questions in advance. Sample question: Describe a situation in which: a) The mode is preferred to the mean and median, b) The mean is preferred to the mode and median, c) The median is preferred to the mean and mode. Summarize your conclusions in a table.

2) I divide students in teams and tell them to study and discuss the questions. The main rule is that the quiz is written by randomly selected team reps. Team members get the grade received by their rep. Joint responsibility and natural mutual help are what drive the team work.

3) When the students say they are ready for the quiz, I select the team reps and seat them in the front row. Based on the list of questions they worked on, I make several smaller lists that they can answer in at most 30 min, and distribute those smaller lists randomly among the reps. They cannot cheat because they are sitting right in front of me, answer different lists and are not allowed to use any devices. Most importantly, they know that the material for the next quiz will be even more difficult, and if they don't know the current material, they will have serious problems later. While they are writing, other students are working on a new list.

This method also saves me tons of time. Instead of checking answers of all students I check only answers of reps, and those answers are not based on the large list. My impression is that this method will not work with US students, who are big individualists.

Jul 16

The pearls of AP Statistics 5

Reducing memorization: Don't do for your students what they can do on their own

They say: The proportion of the observations that fall in a certain category is the frequency (count) of observations in that category divided by the total number of observations. The percentage is the proportion multiplied by 100. Proportions and percentages are also called relative frequencies and serve as a way to summarize the measurements in categories of a categorical variable. A frequency table is a listing of possible values for a variable, together with the number of observations for each value. (Agresti and Franklin, p.26) Insight. Don’t mistake the frequencies as values of a variable. They are merely a summary of how many times the observation (a shark attack) occurred in each category (region) (same source, p.27)

I say: Each time I see something in bold font, I think: is this something I have to remember? In this case, I would like my students to remember only the definitions of relative frequencies and frequency tables. Everything else is something the students have to understand after having worked through examples. Don't make your students memorize every little detail. Make them understand as much as possible and memorize as little as possible. Explaining and requiring minutiae is the best way to kill researchers in your students. I would not make this post if the book did not have many trivialities like this.

Feb 16

Teaching methodology dilemma: Is lecturing good or bad?

Teaching methodology dilemma: Is lecturing good or bad?

teach or notI have come across a nice paper by G.Gibbs Twenty terrible reasons for lecturing, SCED Occasional Paper No. 8, Birmingham. 1981, available here. The author discusses one by one the following statements:

1.1 "Lectures should last an hour. If I can stay awake for an hour, so can they".

1.2 "Its the only way to make sure the ground is covered".

1.3 "Lectures are the best way to get facts across".

1.4 "Lectures are the best way to get students to think".

1.5 "Lectures are inspirational: they improve students' attitudes towards the subject, and students like them".

1.6 "Lecturers make sure that students have a proper set of notes".

1.7 "Students are incapable of, or unwilling to, work alone, so its good for them to have full timetables".

1.8 "The criticisms one can make of lecturing only apply to bad lecturing".

1.9 "The value of lectures can only be judged in the context of other teaching and learning activities which make up the course".

The main conclusion is

"I believe both institutions and validating bodies ought to be asking serious questions about courses which appear to be based primarily on lecturing as the dominant teaching method"

and it is supported with a deep analysis and references to many researches done at US universities.

Jan 16

Active learning - away from boredom of lectures

Active learning is the key to success.

Active learning

Figure 1. Excel file - click to view video

The beginning of an introductory course in Statistics has many simple definitions. The combination simplicity+multitude makes it boring if the teacher follows the usual lecture format. Instead, I suggest my students to read the book, collect a sample on a simulated random variable and describe that sample. The ensuing team work and class discussion make the course much livelier. The Excel file used in the video can be downloaded from here. The video explains how to enable macros embedded in the file.

The Excel file simulates seven different variables, among them deterministic and random, categorical and numerical, discrete and continuous; there is also a random vector. When you press the "Observation" button, the file produces new observations on all seven variables. The students have to collect observations on assigned variables and provide descriptive statistics. They work on assignments in teams of up to six members. The seven variables in the file are enough to engage up to 42 students.