22
Mar 22

Blueprint for exam versions

This is the exam I administered in my class in Spring 2022. By replacing the Poisson distribution with other random variables the UoL examiners can obtain a large variety of versions with which to torture Advanced Statistics students. On the other hand, for the students the answers below can be a blueprint to fend off any assaults.

During the semester my students were encouraged to analyze and collect information in documents typed in Scientific Word or LyX. The exam was an open-book online assessment. Papers typed in Scientific Word or LyX were preferred and copying from previous analysis was welcomed. This policy would be my preference if I were to study a subject as complex as Advanced Statistics. The students were given just two hours on the assumption that they had done the preparations diligently. Below I give the model answers right after the questions.

Midterm Spring 2022

You have to clearly state all required theoretical facts. Number all equations that you need to use in later calculations and reference them as necessary. Answer the questions in the order they are asked. When you don't know the answer, leave some space. For each unexplained fact I subtract one point. Put your name in the file name.

In questions 1-9 $X$ is the Poisson variable.

Question 1

Define $X$ and derive the population mean and population variance of the sum $S_{n}=\sum_{i=1}^{n}X_{i}$ where $X_{i}$ is an i.i.d. sample from $X$.

Answer. $X$ is defined by $P\left( X=x\right) =e^{-\lambda }\frac{\lambda ^{x}}{x!},\ x=0,1,...$ Using $EX=\lambda$ and $Var\left( X\right) =\lambda$ (ST2133 p.80) we have

$ES_{n}=\sum EX_{i}=n\lambda ,$ $Var\left( S_{n}\right) =\sum V\left( X_{i}\right) =n\lambda$

(by independence and identical distribution). [Some students derived $EX=\lambda ,$ $Var\left( X\right) =\lambda$ instead of respective equations for sample means].

Question 2

Derive the MGF of the standardized sample mean.

Answer. Knowing this derivation is a must because it is a combination of three important facts.

a) Let $z_{n}=\frac{\bar{X}-E\bar{X}}{\sigma \left( \bar{X}\right) }.$ Then $z_{n}=\frac{nS_{n}-EnS_{n}}{\sigma \left( nS_{n}\right) }=\frac{S_{n}-ES_{n} }{\sigma \left( S_{n}\right) },$ so standardizing $\bar{X}$ and $S_{n}$ gives the same result.

b) The MGF of $S_{n}$ is expressed through the MGF of $X$:

$M_{S_{n}}\left( t\right) =Ee^{S_{n}t}=Ee^{X_{1}t+...+X_{n}t}=Ee^{X_{1}t}...e^{X_{n}t}=$

(independence) $=Ee^{X_{1}t}...Ee^{X_{n}t}=$ (identical distribution) $=\left[ M_{X}\left( t\right) \right] ^{n}.$

c) If $X$ is a linear transformation of $Y,$ $X=a+bY,$ then

$M_{X}\left( t\right) =Ee^{X}=Ee^{\left( a+bY\right) t}=e^{at}Ee^{Y\left( bt\right) }=e^{at}M_{Y}\left( bt\right) .$

When answering the question we assume any i.i.d. sample from a population with mean $\mu$ and population variance $\sigma ^{2}$:

Putting in c) $a=-\frac{ES_{n}}{\sigma \left( S_{n}\right) },$ $b=\frac{1}{\sigma \left( S_{n}\right) }$ and using a) we get

$M_{z_{n}}\left( t\right) =E\exp \left( \frac{S_{n}-ES_{n}}{\sigma \left( S_{n}\right) }t\right) =e^{-ES_{n}t/\sigma \left( S_{n}\right) }M_{S_{n}}\left( t/\sigma \left( S_{n}\right) \right)$

(using b) and $ES_{n}=n\mu ,$ $Var\left( S_{n}\right) =n\sigma ^{2}$)

$=e^{-ES_{n}t/\sigma \left( S_{n}\right) }\left[ M_{X}\left( t/\sigma \left( S_{n}\right) \right) \right] ^{n}=e^{-n\mu t/\left( \sqrt{n}\sigma \right) }% \left[ M_{X}\left( t/\left( \sqrt{n}\sigma \right) \right) \right] ^{n}.$

This is a general result which for the Poisson distribution can be specified as follows. From ST2133, example 3.38 we know that $M_{X}\left( t\right)=\exp \left( \lambda \left( e^{t}-1\right) \right)$. Therefore, we obtain

$M_{z_{n}}\left( t\right) =e^{-\sqrt{\lambda }t}\left[ \exp \left( \lambda \left( e^{t/\left( n\sqrt{\lambda }\right) }-1\right) \right) \right] ^{n}= e^{-t\sqrt{\lambda }+n\lambda \left( e^{t/\left( n\sqrt{\lambda }\right) }-1\right) }.$

[Instead of $M_{z_n}$ some students gave $M_X$.]

Question 3

Derive the cumulant generating function of the standardized sample mean.

Answer. Again, there are a couple of useful general facts.

I) Decomposition of MGF around zero. The series $e^{x}=\sum_{i=0}^{\infty } \frac{x^{i}}{i!}$ leads to

$M_{X}\left( t\right) =Ee^{tX}=E\left( \sum_{i=0}^{\infty }\frac{t^{i}X^{i}}{ i!}\right) =\sum_{i=0}^{\infty }\frac{t^{i}}{i!}E\left( X^{i}\right) =\sum_{i=0}^{\infty }\frac{t^{i}}{i!}\mu _{i}$

where $\mu _{i}=E\left( X^{i}\right)$ are moments of $X$ and $\mu _{0}=EX^{0}=1.$ Differentiating this equation yields

$M_{X}^{(k)}\left( t\right) =\sum_{i=k}^{\infty }\frac{t^{i-k}}{\left( i-k\right) !}\mu _{i}$

and setting $t=0$ gives the rule for finding moments from MGF: $\mu _{k}=M_{X}^{(k)}\left( 0\right) .$

II) Decomposition of the cumulant generating function around zero. $K_{X}\left( t\right) =\log M_{X}\left( t\right)$ can also be decomposed into its Taylor series:

$K_{X}\left( t\right) =\sum_{i=0}^{\infty }\frac{t^{i}}{i!}\kappa _{i}$

where the coefficients $\kappa _{i}$ are called cumulants and can be found using $\kappa _{k}=K_{X}^{(k)}\left( 0\right)$. Since

$K_{X}^{\prime }\left( t\right) =\frac{M_{X}^{\prime }\left( t\right) }{ M_{X}\left( t\right) }$ and $K_{X}^{\prime \prime }\left( t\right) =\frac{ M_{X}^{\prime \prime }\left( t\right) M_{X}\left( t\right) -\left( M_{X}^{\prime }\left( t\right) \right) ^{2}}{M_{X}^{2}\left( t\right) }$

we have

$\kappa _{0}=\log M_{X}\left( 0\right) =0,$ $\kappa _{1}=\frac{M_{X}^{\prime }\left( 0\right) }{M_{X}\left( 0\right) }=\mu _{1},$

$\kappa _{2}=\mu _{2}-\mu _{1}^{2}=EX^{2}-\left( EX\right) ^{2}=Var\left( X\right) .$

Thus, for any random variable $X$ with mean $\mu$ and variance $\sigma ^{2}$ we have

$K_{X}\left( t\right) =\mu t+\frac{\sigma ^{2}t^{2}}{2}+$ terms of higher order for $t$ small.

III) If $X=a+bY$ then by c)

$K_{X}\left( t\right) =K_{a+bY}\left( t\right) =\log \left[ e^{at}M_{Y}\left( bt\right) \right] =at+K_{X}\left( bt\right) .$

IV) By b)

$K_{S_{n}}\left( t\right) =\log \left[ M_{X}\left( t\right) \right] ^{n}=nK_{X}\left( t\right) .$

Using III), $z_{n}=\frac{S_{n}-ES_{n}}{\sigma \left( S_{n}\right) }$ and then IV) we have

$K_{z_{n}}\left( t\right) =\frac{-ES_{n}}{\sigma \left( S_{n}\right) } t+K_{S_{n}}\left( \frac{t}{\sigma \left( S_{n}\right) }\right) =\frac{-ES_{n} }{\sigma \left( S_{n}\right) }t+nK_{X}\left( \frac{t}{\sigma \left( S_{n}\right) }\right) .$

For the last term on the right we use the approximation around zero from II):

$K_{z_{n}}\left( t\right) =\frac{-ES_{n}}{\sigma \left( S_{n}\right) } t+nK_{X}\left( \frac{t}{\sigma \left( S_{n}\right) }\right) \approx \frac{ -ES_{n}}{\sigma \left( S_{n}\right) }t+n\mu \frac{t}{\sigma \left( S_{n}\right) }+n\frac{\sigma ^{2}}{2}\left( \frac{t}{\sigma \left( S_{n}\right) }\right) ^{2}$

$=-\frac{n\mu }{\sqrt{n}\sigma }t+n\mu \frac{t}{\sqrt{n}\sigma }+n\frac{ \sigma ^{2}}{2}\left( \frac{t}{\sigma \left( S_{n}\right) }\right) ^{2}=t^{2}/2.$

[Important. Why the above steps are necessary? Passing from the series $M_{X}\left( t\right) =\sum_{i=0}^{\infty }\frac{t^{i}}{i!}\mu _{i}$ to the series for $K_{X}\left( t\right) =\log M_{X}\left( t\right)$ is not straightforward and can easily lead to errors. It is not advisable in case of the Poisson to derive $K_{z_{n}}$ from $M_{z_{n}}\left( t\right) =$ $e^{-t \sqrt{\lambda }+n\lambda \left( e^{t/\left( n\sqrt{\lambda }\right) }-1\right) }$.]

Question 4

Prove the central limit theorem using the cumulant generating function you obtained.

Answer. In the previous question we proved that around zero

$K_{z_{n}}\left( t\right) \rightarrow \frac{t^{2}}{2}.$

This implies that

(1) $M_{z_{n}}\left( t\right) \rightarrow e^{t^{2}/2}$ for each $t$ around zero.

But we know that for a standard normal $X$ its MGF is $M_{X}\left( t\right) =\exp \left( \mu t+\frac{\sigma ^{2}t^{2}}{2}\right)$ (ST2133 example 3.42) and hence for the standard normal

(2) $M_{z}\left( t\right) =e^{t^{2}/2}.$

Theorem (link between pointwise convergence of MGFs of $\left\{ X_{n}\right\}$ and convergence in distribution of $\left\{ X_{n}\right\}$) Let $\left\{ X_{n}\right\}$ be a sequence of random variables and let $X$ be some random variable. If $M_{X_{n}}\left( t\right)$ converges for each $t$ from a neighborhood of zero to $M_{X}\left( t\right)$, then $X_{n}$ converges in distribution to $X.$

Using (1), (2) and this theorem we finish the proof that $z_{n}$ converges in distribution to the standard normal, which is the central limit theorem.

Question 5

State the factorization theorem and apply it to show that $U=\sum_{i=1}^{n}X_{i}$ is a sufficient statistic.

Answer. The solution is given on p.180 of ST2134. For $x_{i}=1,...,n$ the joint density is

(3) $f_{X}\left( x,\lambda \right) =\prod\limits_{i=1}^{n}e^{-\lambda } \frac{\lambda ^{x_{i}}}{x_{i}!}=\frac{\lambda ^{\Sigma x_{i}}e^{-n\lambda }}{\Pi _{i=1}^{n}x_{i}!}.$

To satisfy the Fisher-Neyman factorization theorem set

$g\left( \sum x_{i},\lambda \right) =\lambda ^{\Sigma x_{i}e^{-n\lambda }},\ h\left( x\right) =\frac{1}{\Pi _{i=1}^{n}x_{i}!}$

and then we see that $\sum x_{i}$ is a sufficient statistic for $\lambda .$

Question 6

Find a minimal sufficient statistic for $\lambda$ stating all necessary theoretical facts.

AnswerCharacterization of minimal sufficiency A statistic $T\left( X\right)$ is minimal sufficient if and only if level sets of $T$ coincide with sets on which the ratio $f_{X}\left( x,\theta \right) /f_{X}\left( y,\theta \right)$ does not depend on $\theta .$

From (3)

$f_{X}\left( x,\lambda \right) /f_{X}\left( y,\lambda \right) =\frac{\lambda ^{\Sigma x_{i}}e^{-n\lambda }}{\Pi _{i=1}^{n}x_{i}!}\left[ \frac{\lambda ^{\Sigma y_{i}}e^{-n\lambda }}{\Pi _{i=1}^{n}y_{i}!}\right] ^{-1}=\lambda ^{ \left[ \Sigma x_{i}-\Sigma y_{i}\right] }\frac{\Pi _{i=1}^{n}y_{i}!}{\Pi _{i=1}^{n}x_{i}!}.$

The expression on the right does not depend on $\lambda$ if and only of $\Sigma x_{i}=\Sigma y_{i}0.$ The last condition describes level sets of $T\left( X\right) =\sum X_{i}.$ Thus it is minimal sufficient.

Question 7

Find the Method of Moments estimator of the population mean.

Answer. The idea of the method is to take some populational property (for example, $EX=\lambda$) and replace the population characteristic (in this case $EX$) by its sample analog ($\bar{X}$) to obtain a MM estimator. In our case $\hat{\lambda}_{MM}= \bar{X}.$ [Try to do this for the Gamma distribution].

Question 8

Find the Fisher information.

Answer. From Problem 5 the log-likelihood is

$l_{X}\left( \lambda ,x\right) =-n\lambda +\sum x_{i}\log \lambda -\sum \log \left( x_{i}!\right) .$

Hence the score function is (see Example 2.30 in ST2134)

$s_{X}\left( \lambda ,x\right) =\frac{\partial }{\partial \lambda } l_{X}\left( \lambda ,x\right) =-n+\frac{1}{\lambda }\sum x_{i}.$

Then

$\frac{\partial ^{2}}{\partial \lambda ^{2}}l_{X}\left( \lambda ,x\right) =- \frac{1}{\lambda ^{2}}\sum x_{i}$

and the Fisher information is

$I_{X}\left( \lambda \right) =-E\left( \frac{\partial ^{2}}{\partial \lambda ^{2}}l_{X}\left( \lambda ,x\right) \right) =\frac{1}{\lambda ^{2}}E\sum X_{i}=\frac{n\lambda }{\lambda ^{2}}=\frac{n}{\lambda }.$

Question 9

Derive the Cramer-Rao lower bound for $V\left( \bar{X}\right)$ for a random sample.

Answer. (See Example 3.17 in ST2134) Since $\bar{X}$ is an unbiased estimator of $\lambda$ by Problem 1, from the Cramer-Rao theorem we know that

$V\left( \bar{X}\right) \geq \frac{1}{I_{X}\left( \lambda \right) }=\frac{ \lambda }{n}$

and in fact by Problem 1 this lower bound is attained.

5
Feb 22

Sufficiency and minimal sufficiency

Sufficient statistic

I find that in the notation of a statistic it is better to reflect the dependence on the argument. So I write $T\left( X\right)$ for a statistic, where $X$ is a sample, instead of a faceless $U$ or $V.$

Definition 1. The statistic $T\left( X\right)$ is called sufficient for the parameter $\theta$ if the distribution of $X$ conditional on $T\left( X\right)$ does not depend on $\theta .$

The main results on sufficiency and minimal sufficiency become transparent if we look at them from the point of view of Maximum Likelihood (ML) estimation.

Let $f_{X}\left( x,\theta \right)$ be the joint density of the vector $X=\left( X_{1},...,X_{n}\right)$, where $\theta$ is a parameter (possibly a vector). The ML estimator is obtained by maximizing over $\theta$ the function $f_{X}\left( x,\theta \right)$ with $x=\left(x_{1},...,x_{n}\right)$ fixed at the observed data. The estimator depends on the data and can be denoted $\hat{\theta}_{ML}\left( x\right) .$

Fisher-Neyman theorem. $T\left( X\right)$ is sufficient for $\theta$ if and only if the joint density can be represented as

(1) $f_{X}\left( x,\theta \right) =g\left( T\left( x\right) ,\theta \right) k\left( x\right)$

where, as the notation suggests, $g$ depends on $x$ only through $T\left(x\right)$ and $k$ does not depend on $\theta .$

Maximizing the left side of (1) is the same thing as maximizing $g\left(T\left( x\right) ,\theta \right)$ because $k$ does not depend on $\theta .$ But this means that $\hat{\theta}_{ML}\left( x\right)$ depends on $x$ only through $T\left( x\right) .$ A sufficient statistic is all you need to find the ML estimator. This interpretation is easier to understand than the definition of sufficiency.

Minimal sufficient statistic

Definition 2. A sufficient statistic $T\left( X\right)$ is called minimal sufficient if for any other statistic $S\left( X\right)$ there exists a function $g$ such that $T\left( X\right) =g\left( S\left( X\right) \right) .$

A level set is a set of type $\left\{ x:T\left( x\right) =c\right\} ,$ for a constant $c$ (which in general can be a constant vector). See the visualization of level sets.  A level set is also called a preimage and denoted $T^{-1}\left( c\right) =\left\{ x:T\left(x\right) =c\right\} .$ When $T$ is one-to-one the preimage contains just one point. When $T$ is not one-to-one the preimage contains more than one point. The wider it is the less information about the sample carries the statistic (because many data sets are mapped to a single point and you cannot tell one data set from another by looking at the statistic value). This is illustrated by the following example.

Example 1. On the plane $R^2$ define two statistics: $U_1(X)=(X_1,X_2)$ and $U_2(X)=(X_1+X_2)/2$. For $U_1$ the level set $\{U_1(x)=c=(c_1,c_2)\},\ c\in R^2$, consists of just one point and knowing the statistic value is equivalent to knowing the whole sample. For $U_2$ the level set $\{U_2(x)=c\},\ c\in R$ is a straight line. If we know $U_2$, we know the sample mean but not the separate observations.

In the definition of the minimal sufficient statistic we have

$\left\{x:T\left( X\right) =c\right\} =\left\{ x:g\left( S\left( X\right) \right)=c\right\} =\left\{ x:S\left( X\right) \in g^{-1}\left( c\right) \right\} .$

Since $g^{-1}\left( c\right)$ generally contains more than one point, this shows that the level sets of $T\left( X\right)$ are generally wider than those of $S\left( X\right) .$ Since this is true for any $S\left( X\right) ,$ $T\left( X\right)$ carries less information about $X$ than any other statistic.

Definition 2 is an existence statement and is difficult to verify directly as there are words "for any" and "exists". Again it's better to relate it to ML estimation.

Suppose for two sets of data $x,y$ there is a positive number $k\left(x,y\right)$ such that

(2) $f_{X}\left( x,\theta \right) =k\left( x,y\right) f_{X}\left( y,\theta\right) .$

Maximizing the left side we get the estimator $\hat{\theta}_{ML}\left(x\right) .$ Maximizing $f_{X}\left( y,\theta \right)$ we get $\hat{\theta}_{ML}\left( y\right) .$ Since $k\left( x,y\right)$ does not depend on $\theta ,$ (2) tells us that

$\hat{\theta}_{ML}\left( x\right) =\hat{\theta}_{ML}\left( y\right) .$

Thus, if two sets of data $x,y$ satisfy (2), the ML method cannot distinguish between $x$ and $y$ and supplies the same estimator. Let us call $x,y$ indistinguishable if there is a positive number $k\left( x,y\right)$ such that (2) is true.

An equation $T\left( x\right) =T\left( y\right)$ means that $x,y$ belong to the same level set.

Characterization of minimal sufficiency. A statistic $T\left( X\right)$ is minimal sufficient if and only if its level sets coincide with sets of indistinguishable $x,y.$

The advantage of this formulation is that it relates a geometric notion of level sets to the ML estimator properties. The formulation in the guide by J. Abdey is:

A statistic $T\left( X\right)$ is minimal sufficient if and only if the equality $T\left( x\right) =T\left( y\right)$ is equivalent to (2).

Rewriting (2) as

(3) $f_{X}\left( x,\theta \right) /f_{X}\left( y,\theta \right) =k\left(x,y\right)$

we get a practical way of finding a minimal sufficient statistic: form the ratio on the left of (3) and find the sets along which the ratio does not depend on $\theta .$ Those sets will be level sets of $T\left( X\right) .$