Mar 23

Full solution to Example 2.15 from the guide ST2134

Full solution to Example 2.15 from the guide ST2134

Students tend to miss the ideas needed for this example for two reasons: in the guide there is a reference to ST104b Statistics 2, which nobody consults, and the short notation U of the statistic conceals the essence.

Recommendation: for the statistic U=T\left( X\right) always use T\left(X\right) rather than U. Similarly, if x=\left( x_{1},...,x_{n}\right) is the observed sample, use T\left( x\right) rather than u.

Example 2.15. Let X=\left( X_{1},X_{2},...,X_{n}\right) be a random sample (meaning i.i.d. variables) from a Pois(\lambda ) distribution, and let U=T(X)=\sum_{n=1}^{n}X_{i}. Show that T\left( X\right) is a sufficient statistic.

Solution. There are two ways to solve this problem. One is to use the definition of a sufficient statistic, and the other is to apply the sufficiency principle. It is a good idea to announce which way you go in the very beginning. We apply the definition, and we have to show that the density of X conditional on T\left( X\right) does not depend on \lambda .

Step 1. The density of X_{i} is p_{X_{i}}\left( x_{i},\lambda \right)=e^{-\lambda }\frac{\lambda ^{x_{i}}}{x_{i}!}, x_{i}=0,1,2,... and by independence
p_{X_{1},...,X_{n}}\left( x_{1},...,x_{n},\lambda \right) =e^{-\lambda }  \frac{\lambda ^{x_{1}}}{x_{1}!}...e^{-\lambda }\frac{\lambda ^{x_{n}}}{x_{n}!  }=e^{-n\lambda }\frac{\lambda ^{\Sigma x_{i}}}{x_{1}!...x_{n}!}.

Step 2. We need to characterize the distribution of S_{n}=\sum_{i=1}^{n}X_{i}, and this is accomplished with the MGF. For one Poisson variable X we have
M_{X}\left( t\right) =Ee^{tX}=\sum_{x=0}^{\infty }e^{tx}e^{-\lambda }  \frac{\lambda ^{x}}{x!}=e^{-\lambda }\sum_{x=0}^{\infty }\frac{\left(  e^{t}\lambda \right) ^{x}}{x!}
=e^{-\lambda }e^{e^{t}\lambda }\sum_{x=0}^{\infty }e^{-e^{t}\lambda }\frac{  \left( e^{t}\lambda \right) ^{x}}{x!}=e^{\lambda \left( e^{t}-1\right) }.
Here we used the completeness axiom \sum_{x=0}^{\infty }e^{-\lambda _{1}}  \frac{\left( \lambda _{1}\right) ^{x}}{x!}=1 with \lambda_{1}=e^{t}\lambda .

Step 3. This result for the sum implies
M_{S_{n}}\left( t\right) =Ee^{t\Sigma X_{i}}=E\left(  e^{tX_{1}}...e^{tX_{n}}\right)=Ee^{tX_{1}}...Ee^{tX_{n}} (by independence)
=\left( Ee^{tX}\right) ^{n} (since X_{1},...,X_{n} have identical distribution)
=e^{n\lambda \left( e^{t}-1\right) } (by Step 2).
By Step 2 we know that X\sim Pois\left( n\lambda \right) implies M_{X}\left( t\right) =e^{n\lambda \left( e^{t}-1\right) }. As we just showed, the MGF of M_{S_{n}} is the same. The uniqueness theorem says:

if random variables X and Y have the same MGF's then their distributions are the same.

It follows that M_{S_{n}}\left( t\right) \sim Pois\left( n\lambda \right) and that
p_{\Sigma X_{i}}\left( \Sigma x_{i},\lambda \right) =e^{-n\lambda }\frac{  \left( n\lambda \right) ^{\Sigma x_{i}}}{\left( \Sigma x_{i}\right) !},\ \Sigma x_{i}=0,1,2,...
(which is written as e^{-n\lambda }\frac{\left( n\lambda \right) ^{u}}{u!} in the guide, and to me this is not transparent).

Step 4. To check that the conditional density does not depend on the parameter, we recall that the conditional density along the level set simplifies to (see Guide, p.30)
P\left( X=x|U=u\right) =\frac{P\left( X=x\right) }{P\left( U=u\right) }
(no joint density in the numerator). In our situation the full expression for the ratio on the right is
\frac{P( X=x) }{P( U=u) } =\frac{p_{X_1,...,X_n}  ( x_1,...,x_n,\lambda ) }{p_{\Sigma X_i}( \Sigma  x_i,\lambda ) }=e^{-n\lambda }\frac{\lambda ^{\Sigma x_i}}{  x_1!...x_n!}\left( e^{-n\lambda }\frac{( n\lambda ) ^{\Sigma  x_i}}{( \Sigma x_i) !}\right) ^{-1}
=\frac{\left( \Sigma x_{i}\right) !}{n^{\Sigma  x_{i}}\prod\nolimits_{i=1}^{n}x_{i}!}.
As there is no \lambda in the result, \sum X_{i} is sufficient for \lambda .

Exercise. Do Example 2.16 from the Guide following this format.

Dec 22

Final exam in Advanced Statistics ST2133, 2022

Final exam in Advanced Statistics ST2133, 2022

Unlike most UoL exams, here I tried to relate the theory to practical issues.

KBTU International School of Economics

Compiled by Kairat Mynbaev

The total for this exam is 41 points. You have two hours.

Everywhere provide detailed explanations. When answering please clearly indicate question numbers. You don’t need a calculator. As long as the formula you provide is correct, the numerical value does not matter.

Question 1. (12 points)

a) (2 points) At a casino, two players are playing on slot machines. Their payoffs X,Y are standard normal and independent. Find the joint density of the payoffs.

b) (4 points) Two other players watch the first two players and start to argue what will be larger: the sum U = X + Y or the difference V = X - Y. Find the joint density. Are variables U,V independent? Find their marginal densities.

c) (2 points) Are U,V normal? Why? What are their means and variances?

d) (2 points) Which probability is larger: P(U > V) or P\left( {U < V} \right)?

e) (2 points) In this context interpret the conditional expectation E\left( {U|V = v} \right). How much is it?

Reminder. The density of a normal variable X \sim N\left( {\mu ,{\sigma ^2}} \right) is {f_X}\left( x \right) = \frac{1}{{\sqrt {2\pi {\sigma ^2}} }}{e^{ - \frac{{{{\left( {x - \mu } \right)}^2}}}{{2{\sigma ^2}}}}}.

Question 2. (9 points) The distribution of a call duration X of one Kcell [largest mobile operator in KZ] customer is exponential: {f_X}\left( x \right) = \lambda {e^{ - \lambda x}},\,\,x \ge 0,\,\,{f_X}\left( x \right) = 0,\,\,x < 0. The number N of customers making calls simultaneously is distributed as Poisson: P\left( {N = n} \right) = {e^{ - \mu }}\frac{{{\mu ^n}}}{{n!}},\,\,n = 0,1,2,... Thus the total call duration for all customers is {S_N} = {X_1} + ... + {X_N} for N \ge 1. We put {S_0} = 0. Assume that customers make their decisions about calling independently.

a) (3 points) Find the general formula (when {X_1},...,{X_n} are identically distributed and X,N are independent but not necessarily exponential and Poisson, as above) for the moment generating function of S_N explaining all steps.

b) (3 points) Find the moment generating functions of X, N and {S_N} for your particular distributions.

c) (3 points) Find the mean and variance of {S_N}. Based on the equations you obtained, can you suggest estimators of parameters \lambda ,\mu ?

Remark. Direct observations on the exponential and Poisson distributions are not available. We have to infer their parameters by observing {S_N}. This explains the importance of the technique used in Question 2.

Question 3. (8 points)

a) (2 points) For a non-negative random variable X prove the Markov inequality P\left( {X > c} \right) \le \frac{1}{c}EX,\,\,\,c > 0.

b) (2 points) Prove the Chebyshev inequality P\left( {|X - EX| > c} \right) \le \frac{1}{c^2}Var\left( X \right) for an arbitrary random variable X.

c) (4 points) We say that the sequence of random variables \left\{ X_n \right\} converges in probability to a random variable X if P\left( {|{X_n} - X| > \varepsilon } \right) \to 0 as n \to \infty for any \varepsilon > 0.  Suppose that E{X_n} = \mu for all n and that Var\left(X_n \right) \to 0 as n \to \infty . Prove that then \left\{X_n\right\} converges in probability to \mu .

Remark. Question 3 leads to the simplest example of a law of large numbers: if \left\{ X_n \right\} are i.i.d. with finite variance, then their sample mean converges to their population mean in probability.

Question 4. (8 points)

a) (4 points) Define a distribution function. Give its properties, with intuitive explanations.

b) (4 points) Is a sum of two distribution functions a distribution function? Is a product of two distribution functions a distribution function?

Remark. The answer for part a) is here and the one for part b) is based on it.

Question 5. (4 points) The Rakhat factory prepares prizes for kids for the upcoming New Year event. Each prize contains one type of chocolates and one type of candies. The chocolates and candies are chosen randomly from two production lines, the total number of items is always 10 and all selections are equally likely.

a) (2 points) What proportion of prepared prizes contains three or more chocolates?

b) (2 points) 100 prizes have been sent to an orphanage. What is the probability that 50 of those prizes contain no more than two chocolates?

Mar 22

Midterm Spring 2022

Blueprint for exam versions

This is the exam I administered in my class in Spring 2022. By replacing the Poisson distribution with other random variables the UoL examiners can obtain a large variety of versions with which to torture Advanced Statistics students. On the other hand, for the students the answers below can be a blueprint to fend off any assaults.

During the semester my students were encouraged to analyze and collect information in documents typed in Scientific Word or LyX. The exam was an open-book online assessment. Papers typed in Scientific Word or LyX were preferred and copying from previous analysis was welcomed. This policy would be my preference if I were to study a subject as complex as Advanced Statistics. The students were given just two hours on the assumption that they had done the preparations diligently. Below I give the model answers right after the questions.

Midterm Spring 2022

You have to clearly state all required theoretical facts. Number all equations that you need to use in later calculations and reference them as necessary. Answer the questions in the order they are asked. When you don't know the answer, leave some space. For each unexplained fact I subtract one point. Put your name in the file name.

In questions 1-9 X is the Poisson variable.

Question 1

Define X and derive the population mean and population variance of the sum S_{n}=\sum_{i=1}^{n}X_{i} where X_{i} is an i.i.d. sample from X.

Answer. X is defined by P\left( X=x\right) =e^{-\lambda }\frac{\lambda ^{x}}{x!},\  x=0,1,... Using EX=\lambda and Var\left( X\right) =\lambda (ST2133 p.80) we have

ES_{n}=\sum EX_{i}=n\lambda , Var\left( S_{n}\right) =\sum V\left(  X_{i}\right) =n\lambda

(by independence and identical distribution). [Some students derived EX=\lambda , Var\left( X\right) =\lambda instead of respective equations for sample means].

Question 2

Derive the MGF of the standardized sample mean.

Answer. Knowing this derivation is a must because it is a combination of three important facts.

a) Let z_{n}=\frac{\bar{X}-E\bar{X}}{\sigma \left( \bar{X}\right) }. Then z_{n}=\frac{nS_{n}-EnS_{n}}{\sigma \left( nS_{n}\right) }=\frac{S_{n}-ES_{n}  }{\sigma \left( S_{n}\right) }, so standardizing \bar{X} and S_{n} gives the same result.

b) The MGF of S_{n} is expressed through the MGF of X:

M_{S_{n}}\left( t\right)  =Ee^{S_{n}t}=Ee^{X_{1}t+...+X_{n}t}=Ee^{X_{1}t}...e^{X_{n}t}=

(independence) =Ee^{X_{1}t}...Ee^{X_{n}t}= (identical distribution) =\left[ M_{X}\left( t\right) \right] ^{n}.

c) If X is a linear transformation of Y, X=a+bY, then

M_{X}\left( t\right) =Ee^{X}=Ee^{\left( a+bY\right) t}=e^{at}Ee^{Y\left(  bt\right) }=e^{at}M_{Y}\left( bt\right) .

When answering the question we assume any i.i.d. sample from a population with mean \mu and population variance \sigma ^{2}:

Putting in c) a=-\frac{ES_{n}}{\sigma \left( S_{n}\right) }, b=\frac{1}{\sigma \left( S_{n}\right) } and using a) we get

M_{z_{n}}\left( t\right) =E\exp \left( \frac{S_{n}-ES_{n}}{\sigma \left(  S_{n}\right) }t\right) =e^{-ES_{n}t/\sigma \left( S_{n}\right)  }M_{S_{n}}\left( t/\sigma \left( S_{n}\right) \right)

(using b) and ES_{n}=n\mu , Var\left( S_{n}\right) =n\sigma ^{2})

=e^{-ES_{n}t/\sigma \left( S_{n}\right) }\left[ M_{X}\left( t/\sigma \left(  S_{n}\right) \right) \right] ^{n}=e^{-n\mu t/\left( \sqrt{n}\sigma \right) }%  \left[ M_{X}\left( t/\left( \sqrt{n}\sigma \right) \right) \right] ^{n}.

This is a general result which for the Poisson distribution can be specified as follows. From ST2133, example 3.38 we know that M_{X}\left( t\right)=\exp \left( \lambda \left( e^{t}-1\right) \right) . Therefore, we obtain

M_{z_{n}}\left( t\right) =e^{-\sqrt{\lambda }t}\left[ \exp \left( \lambda  \left( e^{t/\left( n\sqrt{\lambda }\right) }-1\right) \right) \right] ^{n}=  e^{-t\sqrt{\lambda }+n\lambda \left( e^{t/\left( n\sqrt{\lambda }\right)  }-1\right) }.

[Instead of M_{z_n} some students gave M_X.]

Question 3

Derive the cumulant generating function of the standardized sample mean.

Answer. Again, there are a couple of useful general facts.

I) Decomposition of MGF around zero. The series e^{x}=\sum_{i=0}^{\infty }  \frac{x^{i}}{i!} leads to

M_{X}\left( t\right) =Ee^{tX}=E\left( \sum_{i=0}^{\infty }\frac{t^{i}X^{i}}{  i!}\right) =\sum_{i=0}^{\infty }\frac{t^{i}}{i!}E\left( X^{i}\right)  =\sum_{i=0}^{\infty }\frac{t^{i}}{i!}\mu _{i}

where \mu _{i}=E\left( X^{i}\right) are moments of X and \mu  _{0}=EX^{0}=1. Differentiating this equation yields

M_{X}^{(k)}\left( t\right) =\sum_{i=k}^{\infty }\frac{t^{i-k}}{\left(  i-k\right) !}\mu _{i}

and setting t=0 gives the rule for finding moments from MGF: \mu  _{k}=M_{X}^{(k)}\left( 0\right) .

II) Decomposition of the cumulant generating function around zero. K_{X}\left( t\right) =\log M_{X}\left( t\right) can also be decomposed into its Taylor series:

K_{X}\left( t\right) =\sum_{i=0}^{\infty }\frac{t^{i}}{i!}\kappa _{i}

where the coefficients \kappa _{i} are called cumulants and can be found using \kappa _{k}=K_{X}^{(k)}\left( 0\right) . Since

K_{X}^{\prime }\left( t\right) =\frac{M_{X}^{\prime }\left( t\right) }{  M_{X}\left( t\right) } and K_{X}^{\prime \prime }\left( t\right) =\frac{  M_{X}^{\prime \prime }\left( t\right) M_{X}\left( t\right) -\left(  M_{X}^{\prime }\left( t\right) \right) ^{2}}{M_{X}^{2}\left( t\right) }

we have

\kappa _{0}=\log M_{X}\left( 0\right) =0, \kappa _{1}=\frac{M_{X}^{\prime  }\left( 0\right) }{M_{X}\left( 0\right) }=\mu _{1},

\kappa _{2}=\mu  _{2}-\mu _{1}^{2}=EX^{2}-\left( EX\right) ^{2}=Var\left( X\right) .

Thus, for any random variable X with mean \mu and variance \sigma ^{2} we have

K_{X}\left( t\right) =\mu t+\frac{\sigma ^{2}t^{2}}{2}+ terms of higher order for t small.

III) If X=a+bY then by c)

K_{X}\left( t\right) =K_{a+bY}\left( t\right) =\log \left[  e^{at}M_{Y}\left( bt\right) \right] =at+K_{X}\left( bt\right) .

IV) By b)

K_{S_{n}}\left( t\right) =\log \left[ M_{X}\left( t\right) \right]  ^{n}=nK_{X}\left( t\right) .

Using III), z_{n}=\frac{S_{n}-ES_{n}}{\sigma \left( S_{n}\right) } and then IV) we have

K_{z_{n}}\left( t\right) =\frac{-ES_{n}}{\sigma \left( S_{n}\right) }  t+K_{S_{n}}\left( \frac{t}{\sigma \left( S_{n}\right) }\right) =\frac{-ES_{n}  }{\sigma \left( S_{n}\right) }t+nK_{X}\left( \frac{t}{\sigma \left(  S_{n}\right) }\right) .

For the last term on the right we use the approximation around zero from II):

K_{z_{n}}\left( t\right) =\frac{-ES_{n}}{\sigma \left( S_{n}\right) }  t+nK_{X}\left( \frac{t}{\sigma \left( S_{n}\right) }\right) \approx \frac{  -ES_{n}}{\sigma \left( S_{n}\right) }t+n\mu \frac{t}{\sigma \left(  S_{n}\right) }+n\frac{\sigma ^{2}}{2}\left( \frac{t}{\sigma \left(  S_{n}\right) }\right) ^{2}

=-\frac{n\mu }{\sqrt{n}\sigma }t+n\mu \frac{t}{\sqrt{n}\sigma }+n\frac{  \sigma ^{2}}{2}\left( \frac{t}{\sigma \left( S_{n}\right) }\right)  ^{2}=t^{2}/2.

[Important. Why the above steps are necessary? Passing from the series M_{X}\left( t\right) =\sum_{i=0}^{\infty }\frac{t^{i}}{i!}\mu _{i} to the series for K_{X}\left( t\right) =\log M_{X}\left( t\right) is not straightforward and can easily lead to errors. It is not advisable in case of the Poisson to derive K_{z_{n}} from M_{z_{n}}\left( t\right) = e^{-t  \sqrt{\lambda }+n\lambda \left( e^{t/\left( n\sqrt{\lambda }\right)  }-1\right) }.]

Question 4

Prove the central limit theorem using the cumulant generating function you obtained.

Answer. In the previous question we proved that around zero

K_{z_{n}}\left( t\right) \rightarrow \frac{t^{2}}{2}.

This implies that

(1) M_{z_{n}}\left( t\right) \rightarrow e^{t^{2}/2} for each t around zero.

But we know that for a standard normal X its MGF is M_{X}\left( t\right)  =\exp \left( \mu t+\frac{\sigma ^{2}t^{2}}{2}\right) (ST2133 example 3.42) and hence for the standard normal

(2) M_{z}\left( t\right) =e^{t^{2}/2}.

Theorem (link between pointwise convergence of MGFs of \left\{  X_{n}\right\} and convergence in distribution of \left\{ X_{n}\right\} ) Let \left\{ X_{n}\right\} be a sequence of random variables and let X be some random variable. If M_{X_{n}}\left( t\right) converges for each t from a neighborhood of zero to M_{X}\left( t\right), then X_{n} converges in distribution to X.

Using (1), (2) and this theorem we finish the proof that z_{n} converges in distribution to the standard normal, which is the central limit theorem.

Question 5

State the factorization theorem and apply it to show that U=\sum_{i=1}^{n}X_{i} is a sufficient statistic.

Answer. The solution is given on p.180 of ST2134. For x_{i}=1,...,n the joint density is

(3) f_{X}\left( x,\lambda \right) =\prod\limits_{i=1}^{n}e^{-\lambda }  \frac{\lambda ^{x_{i}}}{x_{i}!}=\frac{\lambda ^{\Sigma x_{i}}e^{-n\lambda }}{\Pi _{i=1}^{n}x_{i}!}.

To satisfy the Fisher-Neyman factorization theorem set

g\left( \sum x_{i},\lambda \right) =\lambda ^{\Sigma x_{i}e^{-n\lambda }},\ h\left( x\right) =\frac{1}{\Pi _{i=1}^{n}x_{i}!}

and then we see that \sum x_{i} is a sufficient statistic for \lambda .

Question 6

Find a minimal sufficient statistic for \lambda stating all necessary theoretical facts.

AnswerCharacterization of minimal sufficiency A statistic T\left(  X\right) is minimal sufficient if and only if level sets of T coincide with sets on which the ratio f_{X}\left( x,\theta \right) /f_{X}\left(  y,\theta \right) does not depend on \theta .

From (3)

f_{X}\left( x,\lambda \right) /f_{X}\left( y,\lambda \right) =\frac{\lambda  ^{\Sigma x_{i}}e^{-n\lambda }}{\Pi _{i=1}^{n}x_{i}!}\left[ \frac{\lambda  ^{\Sigma y_{i}}e^{-n\lambda }}{\Pi _{i=1}^{n}y_{i}!}\right] ^{-1}=\lambda ^{  \left[ \Sigma x_{i}-\Sigma y_{i}\right] }\frac{\Pi _{i=1}^{n}y_{i}!}{\Pi  _{i=1}^{n}x_{i}!}.

The expression on the right does not depend on \lambda if and only of \Sigma x_{i}=\Sigma y_{i}0. The last condition describes level sets of T\left( X\right) =\sum X_{i}. Thus it is minimal sufficient.

Question 7

Find the Method of Moments estimator of the population mean.

Answer. The idea of the method is to take some populational property (for example, EX=\lambda ) and replace the population characteristic (in this case EX) by its sample analog (\bar{X}) to obtain a MM estimator. In our case \hat{\lambda}_{MM}=  \bar{X}. [Try to do this for the Gamma distribution].

Question 8

Find the Fisher information.

Answer. From Problem 5 the log-likelihood is

l_{X}\left( \lambda ,x\right) =-n\lambda +\sum x_{i}\log \lambda -\sum \log  \left( x_{i}!\right) .

Hence the score function is (see Example 2.30 in ST2134)

s_{X}\left( \lambda ,x\right) =\frac{\partial }{\partial \lambda }  l_{X}\left( \lambda ,x\right) =-n+\frac{1}{\lambda }\sum x_{i}.


\frac{\partial ^{2}}{\partial \lambda ^{2}}l_{X}\left( \lambda ,x\right) =-  \frac{1}{\lambda ^{2}}\sum x_{i}

and the Fisher information is

I_{X}\left( \lambda \right) =-E\left( \frac{\partial ^{2}}{\partial \lambda  ^{2}}l_{X}\left( \lambda ,x\right) \right) =\frac{1}{\lambda ^{2}}E\sum  X_{i}=\frac{n\lambda }{\lambda ^{2}}=\frac{n}{\lambda }.

Question 9

Derive the Cramer-Rao lower bound for V\left( \bar{X}\right) for a random sample.

Answer. (See Example 3.17 in ST2134) Since \bar{X} is an unbiased estimator of \lambda by Problem 1, from the Cramer-Rao theorem we know that

V\left( \bar{X}\right) \geq \frac{1}{I_{X}\left( \lambda \right) }=\frac{  \lambda }{n}

and in fact by Problem 1 this lower bound is attained.