These problems are among the most difficult. It's important to work out a general approach to such problems. All references are to J. Abdey, Advanced statistics: distribution theory, ST2133, University of London, 2021.
Step 1. Conditioning is usually suggested by the problem statement: is conditioned on .
Your life will be easier if you follow the notation used in the guide: use for probability mass functions (discrete variables) and for (probability) density functions (continuous variables).
a) If and both are discrete (Example 5.1, Example 5.13, Example 5.18):
b) If and both are continuous (Activity 5.6):
c) If is discrete, is continuous (Example 5.2, Activity 5.5):
d) If is continuous, is discrete (Activity 5.12):
In all cases you need to figure out over which to sum or integrate.
Step 2. Write out the conditional densities/probabilities with the same arguments
as in your conditional equation.
Step 3. Reduce the result to one of known distributions using the completeness
Let denote the number of hurricanes which form in a given year, and let denote the number of these which make landfall. Suppose each hurricane has a probability of making landfall independent of other hurricanes. Given the number of hurricanes , then can be thought of as the number of successes in independent and identically distributed Bernoulli trials. We can write this as . Suppose we also have that . Find the distribution of (noting that ).
Step 1. The number of hurricanes takes values and is distributed as Poisson. The number of landfalls for a given is binomial with values . It follows that .
Write the general formula for conditional probability:
Step 2. Specifying the distributions:
Step 3. Reduce the result to one of known distributions:
(pull out of summation everything that does not depend on summation variable
(replace to better see the structure)
(using the completeness axiom for the Poisson variable)
Its content, organization and level justify its adoption as a textbook for introductory statistics for Econometrics in most American or European universities. The book's table of contents is somewhat standard, the innovation comes in a presentation that is crisp, concise, precise and directly relevant to the Econometrics course that will follow. I think instructors and students will appreciate the absence of unnecessary verbiage that permeates many existing textbooks.
Having read Professor Mynbaev's previous books and research articles I was not surprised with his clear writing and precision. However, I was surprised with an informal and almost conversational one-on-one style of writing which should please most students. The informality belies a careful presentation where great care has been taken to present the material in a pedagogical manner.
In my book I explained how one can use Excel to do statistical simulations and replace statistical tables commonly used in statistics courses. Here I go one step further by providing a free statistical calculator that replaces the following tables from the book by Newbold et al.:
Table 1 Cumulative Distribution Function, F(z), of the Standard Normal Distribution Table
Table 2 Probability Function of the Binomial Distribution
Table 5 Individual Poisson Probabilities
Table 7a Upper Critical Values of Chi-Square Distribution with Degrees of Freedom
Table 8 Upper Critical Values of Student’s t Distribution with Degrees of Freedom
Tables 9a, 9b Upper Critical Values of the F Distribution
The calculator is just a Google sheet with statistical functions, see Picture 1:
Picture 1. Calculator using Google sheet
How to use Calculator
Open an account at gmail.com, if you haven't already. Open Google Drive.
Install Google sheets on your phone.
Find the sheet on my Google drive and copy it to your Google drive (File/Make a copy). An icon of my calculator will appear in your drive. That's not the file, it's just a link to my file. To the right of it there are three dots indicating options. One of them is "Make a copy", so use that one. The copy will be in your drive. After that you can delete the link to my file. You might want to rename "Copy of Calculator" as "Calculator".
Open the file on your drive using Google sheets. Your Calculator is ready!
When you click a cell, you can enter what you need either in the formula bar at the bottom or directly in the cell. You can also see the functions I embedded in the sheet.
In cell A1, for example, you can enter any legitimate formula with numbers, arithmetic signs, and Google sheet functions. Be sure to start it with =,+ or - and to press the checkmark on the right of the formula bar after you finish.
The cells below A1 replace the tables listed above. Beside each function there is a verbal description and further to the right - a graphical illustration (which is not in Picture 1).
On the tab named Regression you can calculate the slope and intercept. The sample size must be 10.
Keep in mind that tables for continuous distributions need two functions. For example, in case of the standard normal distribution one function allows you to go from probability (area of the left tail) to the cutting value on the horizontal axis. The other function goes from the cutting value on the horizontal axis to probability.
Feel free to add new sheets or functions as you may need. You will have to do this on a tablet or computer.
I love all kinds of music. However, while I work I prefer a quiet music. People have different names for it: chill, chillout, lounge, smooth jazz etc. I am not good at those names, I just want something that allows me to concentrate when I want and also is a pleasure to listen when I make it louder. There is plenty of such music on Youtube.
So I tune in one of online radios, for example this is good:
When I like a melody, I select its name, right-click and then click "Search Google". Most of the time it lands me on the corresponding Youtube video. Here comes the magic: there is a nice FREE program called 4K Video Downloader. You can download the whole video or you can extract just the audio. This is how my chillout collection reached 1000. They have other good products too.
Last semester I tried to explain theory through numerical examples. The results were terrible. Even the best students didn't stand up to my expectations. The midterm grades were so low that I did something I had never done before: I allowed my students to write an analysis of the midterm at home. Those who were able to verbally articulate the answers to me received a bonus that allowed them to pass the semester.
This semester I made a U-turn. I announced that in the first half of the semester we will concentrate on theory and we followed this methodology. Out of 35 students, 20 significantly improved their performance and 15 remained where they were.
a. Define the density of a random variable Draw the density of heights of adults, making simplifying assumptions if necessary. Don't forget to label the axes.
b. According to your plot, how much is the integral Explain.
c. Why the density cannot be negative?
d. Why the total area under the density curve should be 1?
e. Where are basketball players on your graph? Write down the corresponding expression for probability.
f. Where are dwarfs on your graph? Write down the corresponding expression for probability.
This question is about the interval formula. In each case students have to write the equation for the probability and the corresponding integral of the density. At this level, I don't talk about the distribution function and introduce the density by the interval formula.
Recently I enjoyed reading Jack Weatherford's "Genghis Khan and the Making of the Modern World" (2004). I was reading the book with a specific question in mind: what were the main reasons of the success of the Mongols? Here you can see the list of their innovations, some of which were in fact adapted from the nations they subjugated. But what was the main driving force behind those innovations? The conclusion I came to is that Genghis Khan was a genial psychologist. He used what he knew about individual and social psychology to constantly improve the government of his empire.
I am no Genghis Khan but I try to base my teaching methods on my knowledge of student psychology.
Problem 1. Students mechanically write down what the teacher says and writes.
Solution. I don't allow my students to write while I am explaining the material. When I explain, their task is to listen and try to understand. I invite them to ask questions and prompt me to write more explanations and comments. After they all say "We understand", I clean the board and then they write down whatever they understood and remembered.
Problem 2. Students are not used to analyze what they read or write.
Solution. After students finish their writing, I ask them to exchange notebooks and check each other's writings. It's easier for them to do this while everything is fresh in their memory. I bought and distributed red pens. When they see that something is missing or wrong, they have to write in red. Errors or omissions must stand out. Thus, right there in the class students repeat the material twice.
Problem 3. Students don't study at home.
Solution. I let my students know in advance what the next quiz will be about. Even with this knowledge, most of them don't prepare at home. Before the quiz I give them about half an hour to repeat and discuss the material (this is at least the third repetition). We start the quiz when they say they are ready.
Problem 4. Students don't understand that active repetition (writing without looking at one's notes) is much more productive than passive repetition (just reading the notes).
Solution. Each time before discussion sessions I distribute scratch paper and urge students to write, not just read or talk. About half of them follow my recommendation. Their desire to keep their notebooks neat is not their last consideration. The solution to Problem 1 also hinges upon active repetition.
Problem 5. If students work and are evaluated individually, usually there is no or little interaction between them.
Problem 6. Some students don't want to work in teams. They are usually either good students, who don't want to suffer because of weak team members, or weak students, who don't want their low grades to harm other team members.
Solution. The good students usually argue that it's not fair if their grade becomes lower because of somebody else's fault. My answer to them is that the meaning of fairness depends on the definition. In my grading scheme, 30 points out of 100 is allocated for team work and the rest for individual achievements. Therefore I never allow good students to work individually. I want them to be my teaching assistants and help other students. While doing so, I tell them that I may reward good students with a bonus in the end of the semester. In some cases I allow weak students to write quizzes individually but only if the team so requests. The request of the weak student doesn't matter. The weak student still has to participate in team discussions.
Problem 7. There is no accumulation of theoretical knowledge (flat learning curve).
Solution. a) Most students come from high school with little experience in algebra. I raise the level gradually and emphasize understanding. Students never see multiple choice questions in my classes. They also know that right answers without explanations will be discarded.
b) Normally, during my explanations I fill out the board. The amount of the information the students have to remember is substantial and increases over time. If you know a better way to develop one's internal vision, let me know.
c) I don't believe in learning the theory by doing applied exercises. After explaining the theory I formulate it as a series of theoretical exercises. I give the theory in large, logically consistent blocks for students to see the system. Half of exam questions are theoretical (students have to provide proofs and derivations) and the other half - applied.
d) The right motivation can be of two types: theoretical or applied, and I never substitute one for another.
Problem 8. In low-level courses you need to conduct frequent evaluations to keep your students in working shape. Multiply that by the number of students, and you get a serious teaching overload.
Solution. Once at a teaching conference in Prague my colleague from New York boasted that he grades 160 papers per week. Evaluating one paper per team saves you from that hell.
In the beginning of the academic year I had 47 students. In the second semester 12 students dropped the course entirely or enrolled in Stats classes taught by other teachers. Based on current grades, I expect 15 more students to fail. Thus, after the first year I'll have about 20 students in my course (if they don't fail other courses). These students will master statistics at the level of my book.
This year I am teaching AP Statistics. If the things continue the way they are, about half of the class will fail. Here is my diagnosis and how I am handling the problem.
On the surface, the students lack algebra training but I think the problem is deeper: many of them have underdeveloped cognitive abilities. Their perception is slow, memory is limited, analytical abilities are rudimentary and they are not used to work at home. Limited resources require careful allocation.
Short and intuitive names are better than two-word professional names.
Instead of "sample space" or "probability space" say "universe". The universe is the widest possible event, and nothing exists outside it.
Instead of "elementary event" say "atom". Simplest possible events are called atoms. This corresponds to the theoretical notion of an atom in measure theory (an atom is a measurable set which has positive measure and contains no set of smaller positive measure).
Then the formulation of classical probability becomes short. Let denote the number of atoms in the universe and let be the number of atoms in event If all atoms are equally likely (have equal probabilities), then
The clumsy "mutually exclusive events" are better replaced by more visual "disjoint sets". Likewise, instead of "collectively exhaustive events" say "events that cover the universe".
The combination "mutually exclusive" and "collectively exhaustive" events is beyond comprehension for many. I say: if events are disjoint and cover the universe, we call them tiles. To support this definition, play onscreen one of jigsaw puzzles (Video 1) and produce the picture from Figure 1.
Figure 1. Tiles (disjoint events that cover the universe)
The philosophy of team work
We are in the same boat. I mean the big boat. Not the class. Not the university. It's the whole country. We depend on each other. Failure of one may jeopardize the well-being of everybody else.
You work in teams. You help each other to learn. My lectures and your presentations are just the beginning of the journey of knowledge into your heads. I cannot control how it settles there. Be my teaching assistants, share your big and little discoveries with your classmates.
I don't just preach about you helping each other. I force you to work in teams. 30% of the final grade is allocated to team work. Team work means joint responsibility. You work on assignments together. I randomly select a team member for reporting. His or her grade is what each team member gets.
This kind of team work is incompatible with the Western obsession with grades privacy. If I say my grade is nobody's business, by extension I consider the level of my knowledge a private issue. This will prevent me from asking for help and admitting my errors. The situation when students hide their errors and weaknesses from others also goes against the ethics of many workplaces. In my class all grades are public knowledge.
In some situations, keeping the grade private is technically impossible. Conducting a competition without announcing the points won is impossible. If I catch a student cheating, I announce the failing grade immediately, as a warning to others.
To those of you who think team-based learning is unfair to better students I repeat: 30% of the final grade is given for team work, not for personal achievements. The other 70% is where you can shine personally.
Breaking the wall of silence
Team work serves several purposes.
Firstly, joint responsibility helps breaking communication barriers. See in Video 2 my students working in teams on classroom assignments. The situation when a weaker student is too proud to ask for help and a stronger student doesn't want to offend by offering help is not acceptable. One can ask for help or offer help without losing respect for each other.
Video 2. Teams working on assignments
Secondly, it turns on resources that are otherwise idle. Explaining something to somebody is the best way to improve your own understanding. The better students master a kind of leadership that is especially valuable in a modern society. For the weaker students, feeling responsible for a team improves motivation.
Thirdly, I save time by having to grade less student papers.
On exams and quizzes I mercilessly punish the students for Yes/No answers without explanations. There are no half-points for half-understanding. This, in combination with the team work and open grades policy allows me to achieve my main objective: students are eager to talk to me about their problems.
Set operations and probability
After studying the basics of set operations and probabilities we had a midterm exam. It revealed that about one-third of students didn't understand this material and some of that misunderstanding came from high school. During the review session I wanted to see if they were ready for a frank discussion and told them: "Those who don't understand probabilities, please raise your hands", and about one-third raised their hands. I invited two of them to work at the board.
Video 3. Translating verbal statements to sets, with accompanying probabilities
Many teachers think that the Venn diagrams explain everything about sets because they are visual. No, for some students they are not visual enough. That's why I prepared a simple teaching aid (see Video 3) and explained the task to the two students as follows:
I am shooting at the target. The target is a square with two circles on it, one red and the other blue. The target is the universe (the bullet cannot hit points outside it). The probability of a set is its area. I am going to tell you one statement after another. You write that statement in the first column of the table. In the second column write the mathematical expression for the set. In the third column write the probability of that set, together with any accompanying formulas that you can come up with. The formulas should reflect the relationships between relevant areas.
Table 1. Set operations and probabilities
1. The bullet hit the universe
2. The bullet didn't hit the universe
3. The bullet hit the red circle
4. The bullet didn't hit the red circle
5. The bullet hit both the red and blue circles
(in general, this is not equal to )
6. The bullet hit or (or both)
7. The bullet hit but not
8. The bullet hit but not
9. The bullet hit either or (but not both)
During the process, I was illustrating everything on my teaching aid. This exercise allows the students to relate verbal statements to sets and further to their areas. The main point is that people need to see the logic, and that logic should be repeated several times through similar exercises.