27
Dec 22

Final exam in Advanced Statistics ST2133, 2022

Final exam in Advanced Statistics ST2133, 2022

Unlike most UoL exams, here I tried to relate the theory to practical issues.

KBTU International School of Economics

Compiled by Kairat Mynbaev

The total for this exam is 41 points. You have two hours.

Everywhere provide detailed explanations. When answering please clearly indicate question numbers. You don’t need a calculator. As long as the formula you provide is correct, the numerical value does not matter.

Question 1. (12 points)

a) (2 points) At a casino, two players are playing on slot machines. Their payoffs X,Y are standard normal and independent. Find the joint density of the payoffs.

b) (4 points) Two other players watch the first two players and start to argue what will be larger: the sum U = X + Y or the difference V = X - Y. Find the joint density. Are variables U,V independent? Find their marginal densities.

c) (2 points) Are U,V normal? Why? What are their means and variances?

d) (2 points) Which probability is larger: P(U > V) or P\left( {U < V} \right)?

e) (2 points) In this context interpret the conditional expectation E\left( {U|V = v} \right). How much is it?

Reminder. The density of a normal variable X \sim N\left( {\mu ,{\sigma ^2}} \right) is {f_X}\left( x \right) = \frac{1}{{\sqrt {2\pi {\sigma ^2}} }}{e^{ - \frac{{{{\left( {x - \mu } \right)}^2}}}{{2{\sigma ^2}}}}}.

Question 2. (9 points) The distribution of a call duration X of one Kcell [largest mobile operator in KZ] customer is exponential: {f_X}\left( x \right) = \lambda {e^{ - \lambda x}},\,\,x \ge 0,\,\,{f_X}\left( x \right) = 0,\,\,x < 0. The number N of customers making calls simultaneously is distributed as Poisson: P\left( {N = n} \right) = {e^{ - \mu }}\frac{{{\mu ^n}}}{{n!}},\,\,n = 0,1,2,... Thus the total call duration for all customers is {S_N} = {X_1} + ... + {X_N} for N \ge 1. We put {S_0} = 0. Assume that customers make their decisions about calling independently.

a) (3 points) Find the general formula (when {X_1},...,{X_n} are identically distributed and X,N are independent but not necessarily exponential and Poisson, as above) for the moment generating function of S_N explaining all steps.

b) (3 points) Find the moment generating functions of X, N and {S_N} for your particular distributions.

c) (3 points) Find the mean and variance of {S_N}. Based on the equations you obtained, can you suggest estimators of parameters \lambda ,\mu ?

Remark. Direct observations on the exponential and Poisson distributions are not available. We have to infer their parameters by observing {S_N}. This explains the importance of the technique used in Question 2.

Question 3. (8 points)

a) (2 points) For a non-negative random variable X prove the Markov inequality P\left( {X > c} \right) \le \frac{1}{c}EX,\,\,\,c > 0.

b) (2 points) Prove the Chebyshev inequality P\left( {|X - EX| > c} \right) \le \frac{1}{c^2}Var\left( X \right) for an arbitrary random variable X.

c) (4 points) We say that the sequence of random variables \left\{ X_n \right\} converges in probability to a random variable X if P\left( {|{X_n} - X| > \varepsilon } \right) \to 0 as n \to \infty for any \varepsilon > 0.  Suppose that E{X_n} = \mu for all n and that Var\left(X_n \right) \to 0 as n \to \infty . Prove that then \left\{X_n\right\} converges in probability to \mu .

Remark. Question 3 leads to the simplest example of a law of large numbers: if \left\{ X_n \right\} are i.i.d. with finite variance, then their sample mean converges to their population mean in probability.

Question 4. (8 points)

a) (4 points) Define a distribution function. Give its properties, with intuitive explanations.

b) (4 points) Is a sum of two distribution functions a distribution function? Is a product of two distribution functions a distribution function?

Remark. The answer for part a) is here and the one for part b) is based on it.

Question 5. (4 points) The Rakhat factory prepares prizes for kids for the upcoming New Year event. Each prize contains one type of chocolates and one type of candies. The chocolates and candies are chosen randomly from two production lines, the total number of items is always 10 and all selections are equally likely.

a) (2 points) What proportion of prepared prizes contains three or more chocolates?

b) (2 points) 100 prizes have been sent to an orphanage. What is the probability that 50 of those prizes contain no more than two chocolates?

14
Jan 17

Inductive introduction to Chebyshev inequality

Chebyshev inequality - enigma or simplicity itself?

Let's go back to the very basics. The true probability distribution is usually unknown. This is why using separate values and probabilities is prohibited and we work with various averages. However, as you will see below, the Chebyshev inequality answers a question about behavior of certain probabilities.

Motivation

Table 1. Income distribution

IncomePercentageP(Income>=c)Chebyshev boundBound/true
100.02715.055.05
200.0660.9732.5252.595
300.1230.9071.6831.856
400.1790.7841.2631.611
500.2020.6061.011.667
600.1790.4030.8422.089
700.1230.2250.7213.204
800.0660.1020.6316.186
900.0270.0360.56115.583
1000.0090.0090.50556.111
In Table 1, in the first two columns we have information about income distribution (income values and their probabilities or percentages of the population) in a hypothetical country H. For example, 0.027 is the proportion of the population that has income of 10 H-dollars.

Figure 1. Income distribution

I simulated this distribution using the normal distribution function of Excel in such a way as to get relatively low percentages of poor and rich people, see Figure 1.

Suppose the government wants to increase the income tax on wealthy people and use the resulting tax revenue to support the low income population. The question is what part of the population will be impacted. The percentage of the population with income higher than or equal to a given cut-off level c is given by

(1) P(Income\ge c).

For example, the government may decide to impose a higher tax on wealthy people with Income\ge 90, in which case the proportion of affected people will be 0.027+0.009=0.036. If the tax revenue is to be used only to support the poorest people with income of 10 H-dollars, the proportion of people who benefit from this decision can also be expressed using probability (1) because P(Income=10)=1-P(Income\ge 20). It's easy to see that probability (1) is a cumulative probability: to find it, we sum all probabilities, starting from the last row up to the row in which Income=c. Denoting I_j income levels and p_j the corresponding probabilities, we have

(2) P(Income\ge c)=\sum_{I_j\ge c}p_j.

The third column of Table 1 contains these probabilities for all cut-off values.

Question. The true probabilities are usually unknown but the mean is normally available (to obtain the GDP per capita, just divide the GDP by the head count). In our case the mean income is 50.5. What can be said about (1) if the the cut-off value and the mean are known?

Chebyshev's answer

Chebyshev noticed that for those j over which we sum in (2) we have c\le I_j or, equivalently, 1\le I_j/c. His answer is obtained in two steps:

P(Income\ge c)=\sum_{I_j\ge c}p_j\times 1 (replacing 1 by I_j/c can only increase the right-hand side)

\le\sum_{I_j\ge c}p_jI_j/c (increase the sum further by including all j)

\le\sum_jp_jI_j/c=\frac{1}{c}EIncome.

Thus, we cannot find the exact value of (1) but we can give an upper bound P(Income\ge c)\le\frac{1}{c}EIncome. The fourth column of Table 1 contains Chebyshev bounds. The fifth column, which contains the ratios of the bounds to the true values from column 3, shows that the bounds are reasonably good for middle incomes and badly miss the mark for low and high incomes.

The above proof applies to any nonnegative random variable X and positive c and we state the result as the simplest form of the Chebyshev inequality:

(3) P(X\ge c)\le\frac{1}{c}EX.

Extensions

  1. If X changes sign, its absolute value is nonetheless nonnegative, so P(|X|\ge c)\le\frac{1}{c}E|X|.
  2. It is more interesting to bound the probability of deviation of X from its mean EX. For this, just plug |X-EX| in (3): P(|X-EX|\ge c)\le\frac{1}{c}E|X-EX|.
  3. One more step allows us to obtain Var(X) instead of E|X-EX| at the right. Note that the events |X-EX|\ge c and |X-EX|^2\ge c^2 are equivalent. Therefore

P(|X-EX|\ge c)=P(|X-EX|^2\ge c^2)\le\frac{1}{c^2}E|X-EX|^2=\frac{1}{c^2}Var(X).

The result we have obtained P(|X-EX|\ge c)\le \frac{1}{c^2}Var(X) will be referred to as the Chebyshev inequality.

Digression

A long time ago I read a joke about P.L. Chebyshev. He traveled to Paris to give a talk named "On the optimal fabric cutout". The best Paris fashion designers gathered to listen to his presentation. They left the room after he said: For simplicity, let us imagine that the human body is ball-shaped.