Let be a random variable. The function where runs over real numbers, is called a distribution function of In statistics, many formulas are derived with the help of The motivation and properties are given here
Oftentimes, working with the distribution function is an intermediate step to obtain a density using the link
A series of exercises below show just how useful the distribution function is.
Exercise 1. Let be a linear transformation of that is, where and Find the link between and Find the link between and
The solution is here.
The more general case of a nonlinear transformation can also be handled:
Exercise 2. Let where is a deterministic function. Suppose that is strictly monotone, differentiable and exists. Find the link between and Find the link between and
Solution. The result differs depending on whether is increasing or decreasing. Let's assume the latter, so that is equivalent to Also for simplicity suppose that for any Then
Exercise 5. Let be two independent random variables. Find and
Solution. The inequality holds if and only if both and hold. This means that the event coincides with the event It follows by independence that
For we need one more trick - pass to the complementary event by writing Now we can use the fact that the event coincides with the event Hence, by independence
Unlike most UoL exams, here I tried to relate the theory to practical issues.
KBTU International School of Economics
Compiled by Kairat Mynbaev
The total for this exam is 41 points. You have two hours.
Everywhere provide detailed explanations. When answering please clearly indicate question numbers. You don’t need a calculator. As long as the formula you provide is correct, the numerical value does not matter.
Question 1. (12 points)
a) (2 points) At a casino, two players are playing on slot machines. Their payoffs are standard normal and independent. Find the joint density of the payoffs.
b) (4 points) Two other players watch the first two players and start to argue what will be larger: the sum or the difference . Find the joint density. Are variables independent? Find their marginal densities.
c) (2 points) Are normal? Why? What are their means and variances?
d) (2 points) Which probability is larger: or ?
e) (2 points) In this context interpret the conditional expectation . How much is it?
Reminder. The density of a normal variable is .
Question 2. (9 points) The distribution of a call duration of one Kcell [largest mobile operator in KZ] customer is exponential: The number of customers making calls simultaneously is distributed as Poisson: Thus the total call duration for all customers is for . We put . Assume that customers make their decisions about calling independently.
a) (3 points) Find the general formula (when are identically distributed and are independent but not necessarily exponential and Poisson, as above) for the moment generating function of explaining all steps.
b) (3 points) Find the moment generating functions of , and for your particular distributions.
c) (3 points) Find the mean and variance of . Based on the equations you obtained, can you suggest estimators of parameters ?
Remark. Direct observations on the exponential and Poisson distributions are not available. We have to infer their parameters by observing . This explains the importance of the technique used in Question 2.
Question 3. (8 points)
a) (2 points) For a non-negative random variable prove the Markov inequality
b) (2 points) Prove the Chebyshev inequality for an arbitrary random variable .
c) (4 points) We say that the sequence of random variables converges in probability to a random variable if as for any . Suppose that for all and that as . Prove that then converges in probability to .
Remark. Question 3 leads to the simplest example of a law of large numbers: if are i.i.d. with finite variance, then their sample mean converges to their population mean in probability.
Question 4. (8 points)
a) (4 points) Define a distribution function. Give its properties, with intuitive explanations.
b) (4 points) Is a sum of two distribution functions a distribution function? Is a product of two distribution functions a distribution function?
Remark. The answer for part a) is here and the one for part b) is based on it.
Question 5. (4 points) The Rakhat factory prepares prizes for kids for the upcoming New Year event. Each prize contains one type of chocolates and one type of candies. The chocolates and candies are chosen randomly from two production lines, the total number of items is always 10 and all selections are equally likely.
a) (2 points) What proportion of prepared prizes contains three or more chocolates?
b) (2 points) 100 prizes have been sent to an orphanage. What is the probability that 50 of those prizes contain no more than two chocolates?
There is a problem I gave on the midterm that does not require much imagination. Just know the definitions and do the technical work, so I was hoping we could put this behind us. Turned out we could not and thus you see this post.
Problem. Suppose the joint density of variables is given by
I. Find .
II. Find marginal densities of . Are independent?
III. Find conditional densities .
IV. Find .
When solving a problem like this, the first thing to do is to give the theory. You may not be able to finish without errors the long calculations but your grade will be determined by the beginning theoretical remarks.
I. Finding the normalizing constant
Any density should satisfy the completeness axiom: the area under the density curve (or in this case the volume under the density surface) must be equal to one: The constant chosen to satisfy this condition is called a normalizing constant. The integration in general is over the whole plain and the first task is to express the above integral as an iterated integral. This is where the domain where the density is not zero should be taken into account. There is little you can do without geometry. One example of how to do this is here.
The shape of the area is determined by a) the extreme values of and b) the relationship between them. The extreme values are 0 and 1 for both and , meaning that is contained in the square The inequality means that we cut out of this square the triangle below the line (it is really the lower triangle because if from a point on the line we move down vertically, will stay the same and will become smaller than ).
In the iterated integral:
a) the lower and upper limits of integration for the inner integral are the boundaries for the inner variable; they may depend on the outer variable but not on the inner variable.
b) the lower and upper limits of integration for the outer integral are the extreme values for the outer variable; they must be constant.
This is illustrated in Pane A of Figure 1.
Figure 1. Integration order
Always take the inner integral in parentheses to show that you are dealing with an iterated integral.
a) In the inner integral integrating over means moving along blue arrows from the boundary to the boundary The boundaries may depend on but not on because the outer integral is over
b) In the outer integral put the extreme values for the outer variable. Thus,
Check that if we first integrate over (vertically along red arrows, see Pane B in Figure 1) then the equation
results.
In fact, from the definition one can see that the inner interval for is and for it is
Suppose in a box we have coins and banknotes of only two denominations: $1 and $5 (see Figure 1).
Figure 1. Illustration of two variables
We pull one out randomly. The division of cash by type (coin or banknote) divides the sample space (shown as a square, lower left picture) with probabilities and (they sum to one). The division by denomination ($1 or $5) divides the same sample space differently, see the lower right picture, with the probabilities to pull out $1 and $5 equal to and , resp. (they also sum to one). This is summarized in the tables
Variable 1: Cash type
Prob
coin
banknote
Variable 2: Denomination
Prob
$1
$5
Now we can consider joint events and probabilities (see Figure 2, where the two divisions are combined).
Figure 2. Joint probabilities
For example, if we pull out a random it can be a and $1 and the corresponding probability is The two divisions of the sample space generate a new division into four parts. Then geometrically it is obvious that we have four identities:
Adding over denominations:
Adding over cash types:
Formally, here we use additivity of probability for disjoint events
In words: we can recover own probabilities of variables 1,2 from joint probabilities.
Generalization
Suppose we have two discrete random variables taking values and resp., and their own probabilities are Denote the joint probabilities Then we have the identities
(1) ( equations).
In words: to obtain the marginal probability of one variable (say, ) sum over the values of the other variable (in this case, ).
The name marginal probabilities is used for because in the two-dimensional table they arise as a result of summing table entries along columns or rows and are displayed in the margins.
Analogs for continuous variables with densities
Suppose we have two continuous random variables and their own densities are and Denote the joint density . Then replacing in (1) sums by integrals and probabilities by densities we get
(2)
In words: to obtain one marginal density (say, ) integrate out the other variable (in this case, ).
Its content, organization and level justify its adoption as a textbook for introductory statistics for Econometrics in most American or European universities. The book's table of contents is somewhat standard, the innovation comes in a presentation that is crisp, concise, precise and directly relevant to the Econometrics course that will follow. I think instructors and students will appreciate the absence of unnecessary verbiage that permeates many existing textbooks.
Having read Professor Mynbaev's previous books and research articles I was not surprised with his clear writing and precision. However, I was surprised with an informal and almost conversational one-on-one style of writing which should please most students. The informality belies a careful presentation where great care has been taken to present the material in a pedagogical manner.
Carlos Martins-Filho Professor of Economics University of Colorado at Boulder Boulder, USA
In my book I explained how one can use Excel to do statistical simulations and replace statistical tables commonly used in statistics courses. Here I go one step further by providing a free statistical calculator that replaces the following tables from the book by Newbold et al.:
Table 1 Cumulative Distribution Function, F(z), of the Standard Normal Distribution Table
Table 2 Probability Function of the Binomial Distribution
Table 5 Individual Poisson Probabilities
Table 7a Upper Critical Values of Chi-Square Distribution with Degrees of Freedom
Table 8 Upper Critical Values of Student’s t Distribution with Degrees of Freedom
Tables 9a, 9b Upper Critical Values of the F Distribution
The calculator is just a Google sheet with statistical functions, see Picture 1:
Picture 1. Calculator using Google sheet
How to use Calculator
Open an account at gmail.com, if you haven't already. Open Google Drive.
Install Google sheets on your phone.
Find the sheet on my Google drive and copy it to your Google drive (File/Make a copy). An icon of my calculator will appear in your drive. That's not the file, it's just a link to my file. To the right of it there are three dots indicating options. One of them is "Make a copy", so use that one. The copy will be in your drive. After that you can delete the link to my file. You might want to rename "Copy of Calculator" as "Calculator".
Open the file on your drive using Google sheets. Your Calculator is ready!
When you click a cell, you can enter what you need either in the formula bar at the bottom or directly in the cell. You can also see the functions I embedded in the sheet.
In cell A1, for example, you can enter any legitimate formula with numbers, arithmetic signs, and Google sheet functions. Be sure to start it with =,+ or - and to press the checkmark on the right of the formula bar after you finish.
The cells below A1 replace the tables listed above. Beside each function there is a verbal description and further to the right - a graphical illustration (which is not in Picture 1).
On the tab named Regression you can calculate the slope and intercept. The sample size must be 10.
Keep in mind that tables for continuous distributions need two functions. For example, in case of the standard normal distribution one function allows you to go from probability (area of the left tail) to the cutting value on the horizontal axis. The other function goes from the cutting value on the horizontal axis to probability.
Feel free to add new sheets or functions as you may need. You will have to do this on a tablet or computer.
Last semester I tried to explain theory through numerical examples. The results were terrible. Even the best students didn't stand up to my expectations. The midterm grades were so low that I did something I had never done before: I allowed my students to write an analysis of the midterm at home. Those who were able to verbally articulate the answers to me received a bonus that allowed them to pass the semester.
This semester I made a U-turn. I announced that in the first half of the semester we will concentrate on theory and we followed this methodology. Out of 35 students, 20 significantly improved their performance and 15 remained where they were.
a. Define the density of a random variable Draw the density of heights of adults, making simplifying assumptions if necessary. Don't forget to label the axes.
b. According to your plot, how much is the integral Explain.
c. Why the density cannot be negative?
d. Why the total area under the density curve should be 1?
e. Where are basketball players on your graph? Write down the corresponding expression for probability.
f. Where are dwarfs on your graph? Write down the corresponding expression for probability.
This question is about the interval formula. In each case students have to write the equation for the probability and the corresponding integral of the density. At this level, I don't talk about the distribution function and introduce the density by the interval formula.
a. Derive linearity of covariance in the first argument when the second is fixed.
b. How much is covariance if one of its arguments is a constant?
c. What is the link between variance and covariance? If you know one of these functions, can you find the other (there should be two answers)? (4 points)
a. Define the density of a random variable Draw the density of work experience of adults, making simplifying assumptions if necessary. Don't forget to label the axes.
b. According to your plot, how much is the integral Explain.
c. Why the density cannot be negative?
d. Why the total area under the density curve should be 1?
e. Where are retired people on your graph? Write down the corresponding expression for probability.
f. Where are young people (up to 25 years old) on your graph? Write down the corresponding expression for probability.
This year I am teaching AP Statistics. If the things continue the way they are, about half of the class will fail. Here is my diagnosis and how I am handling the problem.
On the surface, the students lack algebra training but I think the problem is deeper: many of them have underdeveloped cognitive abilities. Their perception is slow, memory is limited, analytical abilities are rudimentary and they are not used to work at home. Limited resources require careful allocation.
Terminology
Short and intuitive names are better than two-word professional names.
Instead of "sample space" or "probability space" say "universe". The universe is the widest possible event, and nothing exists outside it.
Instead of "elementary event" say "atom". Simplest possible events are called atoms. This corresponds to the theoretical notion of an atom in measure theory (an atom is a measurable set which has positive measure and contains no set of smaller positive measure).
Then the formulation of classical probability becomes short. Let denote the number of atoms in the universe and let be the number of atoms in event If all atoms are equally likely (have equal probabilities), then
The clumsy "mutually exclusive events" are better replaced by more visual "disjoint sets". Likewise, instead of "collectively exhaustive events" say "events that cover the universe".
The combination "mutually exclusive" and "collectively exhaustive" events is beyond comprehension for many. I say: if events are disjoint and cover the universe, we call them tiles. To support this definition, play onscreen one of jigsaw puzzles (Video 1) and produce the picture from Figure 1.
Figure 1. Tiles (disjoint events that cover the universe)
The philosophy of team work
We are in the same boat. I mean the big boat. Not the class. Not the university. It's the whole country. We depend on each other. Failure of one may jeopardize the well-being of everybody else.
You work in teams. You help each other to learn. My lectures and your presentations are just the beginning of the journey of knowledge into your heads. I cannot control how it settles there. Be my teaching assistants, share your big and little discoveries with your classmates.
I don't just preach about you helping each other. I force you to work in teams. 30% of the final grade is allocated to team work. Team work means joint responsibility. You work on assignments together. I randomly select a team member for reporting. His or her grade is what each team member gets.
This kind of team work is incompatible with the Western obsession with grades privacy. If I say my grade is nobody's business, by extension I consider the level of my knowledge a private issue. This will prevent me from asking for help and admitting my errors. The situation when students hide their errors and weaknesses from others also goes against the ethics of many workplaces. In my class all grades are public knowledge.
In some situations, keeping the grade private is technically impossible. Conducting a competition without announcing the points won is impossible. If I catch a student cheating, I announce the failing grade immediately, as a warning to others.
To those of you who think team-based learning is unfair to better students I repeat: 30% of the final grade is given for team work, not for personal achievements. The other 70% is where you can shine personally.
Breaking the wall of silence
Team work serves several purposes.
Firstly, joint responsibility helps breaking communication barriers. See in Video 2 my students working in teams on classroom assignments. The situation when a weaker student is too proud to ask for help and a stronger student doesn't want to offend by offering help is not acceptable. One can ask for help or offer help without losing respect for each other.
Video 2. Teams working on assignments
Secondly, it turns on resources that are otherwise idle. Explaining something to somebody is the best way to improve your own understanding. The better students master a kind of leadership that is especially valuable in a modern society. For the weaker students, feeling responsible for a team improves motivation.
Thirdly, I save time by having to grade less student papers.
On exams and quizzes I mercilessly punish the students for Yes/No answers without explanations. There are no half-points for half-understanding. This, in combination with the team work and open grades policy allows me to achieve my main objective: students are eager to talk to me about their problems.
Set operations and probability
After studying the basics of set operations and probabilities we had a midterm exam. It revealed that about one-third of students didn't understand this material and some of that misunderstanding came from high school. During the review session I wanted to see if they were ready for a frank discussion and told them: "Those who don't understand probabilities, please raise your hands", and about one-third raised their hands. I invited two of them to work at the board.
Video 3. Translating verbal statements to sets, with accompanying probabilities
Many teachers think that the Venn diagrams explain everything about sets because they are visual. No, for some students they are not visual enough. That's why I prepared a simple teaching aid (see Video 3) and explained the task to the two students as follows:
I am shooting at the target. The target is a square with two circles on it, one red and the other blue. The target is the universe (the bullet cannot hit points outside it). The probability of a set is its area. I am going to tell you one statement after another. You write that statement in the first column of the table. In the second column write the mathematical expression for the set. In the third column write the probability of that set, together with any accompanying formulas that you can come up with. The formulas should reflect the relationships between relevant areas.
Table 1. Set operations and probabilities
Statement
Set
Probability
1. The bullet hit the universe
2. The bullet didn't hit the universe
3. The bullet hit the red circle
4. The bullet didn't hit the red circle
5. The bullet hit both the red and blue circles
(in general, this is not equal to )
6. The bullet hit or (or both)
(additivity rule)
7. The bullet hit but not
8. The bullet hit but not
9. The bullet hit either or (but not both)
During the process, I was illustrating everything on my teaching aid. This exercise allows the students to relate verbal statements to sets and further to their areas. The main point is that people need to see the logic, and that logic should be repeated several times through similar exercises.
Reevaluating probabilities based on piece of evidence
This actually has to do with the Bayes' theorem. However, in simple problems one can use a dead simple approach: just find probabilities of all elementary events. This post builds upon the post on Significance level and power of test, including the notation. Be sure to review that post.
Here is an example from the guide for Quantitative Finance by A. Patton (University of London course code FN3142).
Activity 7.2 Consider a test that has a Type I error rate of 5%, and power of 50%.
Suppose that, before running the test, the researcher thinks that both the null and the alternative are equally likely.
If the test indicates a rejection of the null hypothesis, what is the probability that the null is false?
If the test indicates a failure to reject the null hypothesis, what is the probability that the null is true?
Denote events R = {Reject null}, A = {fAil to reject null}; T = {null is True}; F = {null is False}. Then we are given:
(1)
(2)
(1) and (2) show that we can find and and therefore also and Once we know probabilities of elementary events, we can find everything about everything.
Figure 1. Elementary events
Answering the first question: just plug probabilities in
Answering the second question: just plug probabilities in
Patton uses the Bayes' theorem and the law of total probability. The solution suggested above uses only additivity of probability.
You must be logged in to post a comment.