4
Mar 24

## AR(1) model: Tesla stock versus Tesla return

Question. You run two AR(1) regressions: 1) for Tesla stock price $Y_{t}$, $Y_{t}=\alpha +\beta Y_{t-1}+\varepsilon _{t}$, and 2) for its return $R_{t}$$R_{t}=\phi +\psi R_{t-1}+\delta _{t}$. Here the errors $\varepsilon _{t},\delta _{t}$ are i.i.d. normal with mean $0$ and variance $\sigma ^{2}.$ Based on the 5-year chart of the stock price below (see Chart 1), what would be your expectations about the coefficients $\alpha ,$ $\beta ,$ $\phi ,$ and $\psi ?$

Chart 1. 5-year chart of TSLA stock. Source: barchart.com

Answer. Suppose that instead of the time series model $Y_{t}=\alpha +\beta Y_{t-1}+\varepsilon _{t}$ we have a simple regression $Y_{t}=\alpha +\beta X_{t}+\varepsilon _{t}$ and on the stock chart we have the values of $X_{t}$ on the horizontal axis and the values of $Y_{t}$ on the vertical axis. Then instead of the time series chart we would have a scatterplot. Drawing a straight line to approximate the cloud of observed pairs $\left(X_{t},Y_{t}\right)$, we can see that both $\alpha$ and $\beta$ must be positive (see Chart 2). The same intuition applies to the time series model $Y_{t}=\alpha +\beta Y_{t-1}+\varepsilon _{t}.$

Chart 2. Same chart of Tesla stock viewed as a scatterplot with fitted line

Table 1 contains estimation results for the first model.

 Coefficient Estimate p-value $\alpha$ 152.282 0.023 $\beta$ 0.9973 0.000

The fundamental difference between the stock and its return is that the return cannot be trending for extended periods of time. The intuition is that if, for example, the return for some stock is persistently positive, then everybody starts investing in it and seeing sizable profits. However, the paper profits must be realized sooner or later, which means investors at some point will start selling the stock and the return becomes negative. As a result, the return must oscillate around zero. This intuition is confirmed in Chart 3, which displays the return for Tesla stock, and in Chart 4, which is a nonparametric estimation of the density of that return.

Chart 3. Chart for return on Tesla, from Stata

The straight line that approximates the cloud of observed pairs $\left(R_{t-1},R_{t}\right)$ should be very close to the $x$ axis. That is, both $\phi$ and $\psi$ should be very close to zero.

Chart 4. The density of return on Tesla is centered almost at zero

See estimation results in Table 2.

 Coefficient Estimate p-value $\phi$ 0.0018 0.106 $\psi$ -0.0056 0.795

22
Nov 23

## Simple tools for combinatorial problems

Before solving the problem, it is useful to compare the case of independent events with that of dependent events.

Suppose the events $A_{1},A_{2},...,A_{n}$ are independent (in the context of the problem, it will be drawings with replacement). Then by definition the joint probability is the product of individual probabilities:

(1) $P\left( A_{1}\cap ... \cap A_{n}\right) =P\left( A_{1}\right) ... P\left(A_{n}\right) .$

Now assume that the event $A_{1}$ occurs first, $A_{2}$ occurs second, ...., $A_{n}$ occurs last and each subsequent event depends on the previous one (as in the case of drawings without replacement). Then

$P\left( A_{1}\cap A_{2}\right) =\frac{P\left( A_{1}\cap A_{2}\right) }{P\left( A_{1}\right) }P\left( A_{1}\right) =P\left( A_{2}|A_{1}\right)P\left( A_{1}\right) .$

Similarly, by multiplying and dividing many times, we get

(2) $P\left( A_{1}\cap ... \cap A_{n}\right) =P\left(A_{n}|A_{1},...,A_{n-1}\right) P\left( A_{n-1}|A_{1},...,A_{n-2}\right)... P\left( A_{1}\right) .$

Equation (2) is called a chain rule for probability. Several of my students have been able to solve the problem without explicitly using (2). It is advisable to use (2) or other relevant theoretical properties to achieve clarity and avoid errors.

## Problem statement and solution

Suppose there are $n\geq 2$ red balls and $3$ green balls in a bag. All balls with the same color are indistinguishable.

### Part i.

Suppose one ball is drawn at a time at random with replacement from the bag. Let $X$ be the number of balls drawn until a red ball is obtained (including the red ball). Write down the probability mass function of $X$.

Solution. Most students answer that this is a hypergeometric distribution with probabilities given by $q^{x-1}p,$ $x=1,2,...$ where $p$ is the probability of success. Without specifying $p$ (the probability of drawing a red ball) the answer is incomplete. Since $p=\frac{n}{n+3},$ we have

$q^{x-1}p=\left( \frac{3}{n+3}\right) ^{x-1}\frac{n}{n+3}=\frac{n3^{x-1}}{\left( n+3\right) ^{x}}.$

### Part ii.

Now suppose one ball is drawn at random at a time without replacement from the bag. Let $Y$ be the number of balls drawn until a red ball is obtained (including the red ball). Write down the probability mass function of $Y$.

Solution. Let us denote $R_{i}$ the event that the $i$th ball is red and $G_{i}$ the event that the $i$th ball is green, respectively. Note that the only way $R_{3}$ appears is by obtaining $G_{1},G_{2}$ before it. Hence, $R_{3}$ equals $\left( G_{1},G_{2},R_{3}\right) .$ Besides, $R_{1},...,R_{4}$ are the only (mutually exclusive) possibilities and it remains to find their probabilities.

Obviously, $P\left( R_{1}\right) =\frac{n}{n+3}.$

Next, using (2)

$P\left( R_{2}\right) =P\left( G_{1},R_{2}\right) =P\left( G_{1}\right)P\left( R_{2}|G_{1}\right) =\frac{3}{n+3}\frac{n}{n+2}.$

Further,

$P\left( R_{3}\right) =P\left( G_{1},G_{2},R_{3}\right) =P\left(G_{1}\right) P\left( G_{2}|G_{1}\right) P\left( R_{3}|G_{1},G_{2}\right) =\frac{3}{n+3}\frac{2}{n+2}\frac{n}{n+1}.$

Finally,

$P\left( R_{4}\right) =P\left( G_{1},G_{2},G_{3},R_{4}\right)$

$=P\left( G_{1}\right) P\left( G_{2}|G_{1}\right) P\left(G_{3}|G_{1},G_{2}\right) P\left( R_{4}|G_{1},G_{2},G_{3}\right) =\frac{3}{n+3}\frac{2}{n+2}\frac{1}{n+1}\frac{n}{n}.$

The results can be summarized in a table:

$\begin{array}{cc} \textrm{Values} & \textrm{Prob}\\R_{1} & \frac{n}{n+3} \\R_{2} & \frac{3}{n+3}\frac{n}{n+2} \\R_{3} & \frac{3}{n+3}\frac{2}{n+2}\frac{n}{n+1} \\R_{4} & \frac{3}{n+3}\frac{2}{n+2}\frac{1}{n+1}\frac{n}{n}\end{array}$

This distribution is not of one of standard types.

### Part iii.

Suppose two balls are drawn at the same time at random with replacement from the bag. Let $Z$ denote the number of these double draws performed until two green balls are obtained. Show that the probability of drawing two green balls is

$\frac{6}{\left( n+2\right) \left( n+3\right) }.$

Hence, show that the probability mass function for $Z$ is

$P\left( Z=z\right) =\frac{6\left( n^{2}+5n\right) ^{z-1}}{\left( n+3\right)^{z}\left( n+2\right) ^{z}},$ $z=1,2,...$

Solution. Using the same notation as before,

$P\left( G_{1},G_{2}\right) =P\left( G_{2}|G_{1}\right) P\left( G_{1}\right)=\frac{2}{n+2}\frac{3}{n+3}.$

All other events (two red balls or one green and one red) are considered a failure. Thus we have a hypergeometric distribution with

$p=\frac{6}{\left( n+2\right) \left( n+3\right) },$

$q=1-\frac{6}{\left( n+2\right) \left( n+3\right) }=\frac{n^{2}+5n+6-6}{\left( n+2\right) \left( n+3\right) }=\frac{n^{2}+5n}{\left( n+2\right) \left( n+3\right) }$

and

$q^{z-1}p=\left[ \frac{n^{2}+5n}{\left( n+2\right) \left( n+3\right) }\right]^{z-1}\frac{6}{\left( n+2\right) \left( n+3\right) }=\frac{6\left( n^{2}+5n\right) ^{z-1}}{\left( n+2\right) ^{z}\left( n+3\right) ^{2}},$ $z=1,2,...$

14
Sep 23

## The magic of the distribution function

Let $X$ be a random variable. The function $F_{X}\left( x\right) =P\left( X\leq x\right) ,$ where $x$ runs over real numbers, is called a distribution function of $X.$ In statistics, many formulas are derived with the help of $F_{X}\left( x\right) .$ The motivation and properties are given here.

Oftentimes, working with the distribution function is an intermediate step to obtain a density $f_{X}$ using the link

$F_{X}\left( x\right) =\int_{-\infty }^{x}f_{X}\left( t\right) dt.$

A series of exercises below show just how useful the distribution function is.

Exercise 1. Let $Y$ be a linear transformation of $X,$ that is, $Y=\sigma X+\mu ,$ where $\sigma >0$ and $\mu \in R.$ Find the link between $F_{X}$ and $F_{Y}.$ Find the link between $f_{X}$ and $f_{Y}.$

The solution is here.

The more general case of a nonlinear transformation can also be handled:

Exercise 2. Let $Y=g\left( X\right)$ where $g$ is a deterministic function. Suppose that $g$ is strictly monotone and differentiable. Then $g^{-1}$ exists. Find the link between $F_{X}$ and $F_{Y}.$ Find the link between $f_{X}$ and $f_{Y}.$

Solution. The result differs depending on whether $g$ is increasing or decreasing. Let's assume the latter, so that $x_{1}\leq x_{2}$ is equivalent to $g\left( x_{1}\right) \geq g\left( x_{2}\right) .$ Also for simplicity suppose that $P\left( X=c\right) =0$ for any $c\in R.$ Then

$F_{Y}\left( y\right) =P\left( g\left( X\right) \leq y\right) =P\left( X\geq g^{-1}\left( y\right) \right) =1-P\left( X\leq g^{-1}\left( y\right) \right)=1-F_{X}\left( g^{-1}\left( y\right) \right) .$

Differentiation of this equation produces

$f_{Y}\left( y\right)=-f_{X}\left( g^{-1}\left( y\right) \right) \left( g^{-1}\left( y\right) \right) ^{\prime }=f_{X}\left( g^{-1}\left( y\right) \right) \left\vert\left( g^{-1}\left( y\right) \right) ^{\prime }\right\vert$

(the derivative of $g^{-1}$ is negative).

For an example when $g$ is not invertible see the post about the chi-squared distribution.

Exercise 3. Suppose $T=X/Y$ where $X$ and $Y$ are independent, have densities $f_{X},f_{Y}$ and $Y>0.$ What are the distribution function and density of $T?$

Solution. By independence the joint density $f_{X,Y}$ equals $f_{X}f_{Y},$ so

$F_{T}\left( t\right) =P\left( T\leq t\right) =P\left( X\leq tY\right) = \underset{x\leq ty}{\int \int }f_{X}\left( x\right) f_{Y}\left( y\right) dxdy$

(converting a double integral to an iterated integral and remembering that $f_{Y}$ is zero on the left half-axis)

$=\int_{0}^{\infty }\left( \int_{-\infty }^{ty}f_{X}\left( x\right) dx\right) f_{Y}\left( y\right)dy=\int_{0}^{\infty }F_{X}\left( ty\right) f_{Y}\left( y\right) dy.$

Now by the Leibniz integral rule

(1) $f_{T}\left( t\right) =\int_{0}^{\infty }f_{X}\left( ty\right) yf_{Y}\left( y\right) dy.$

A different method is indicated in Activity 4.11, p.207 of J.Abdey, Guide ST2133.

Exercise 4. Let $X,Y$ be two independent random variables with densities $f_{X},f_{Y}$. Find $F_{X+Y}$ and $f_{X+Y}.$

See this post.

Exercise 5. Let $X,Y$ be two independent random variables. Find $F_{\max \left\{ X_{1},X_{2}\right\} }$ and $F_{\min \left\{ X_{1},X_{2}\right\} }.$

Solution. The inequality $\max \left\{ X_{1},X_{2}\right\} \leq x$ holds if and only if both $X_{1}\leq x$ and $X_{2}\leq x$ hold. This means that the event $\left\{ \max \left\{ X_{1},X_{2}\right\} \leq x\right\}$ coincides with the event $\left\{ X_{1}\leq x\right\} \cap \left\{ X_{2}\leq x\right\}.$ It follows by independence that

(2) $F_{\max \left\{ X_{1},X_{2}\right\} }\left( x\right) =P\left( \max \left\{ X_{1},X_{2}\right\} \leq x\right) =P\left( \left\{ X_{1}\leq x\right\} \cap \left\{ X_{2}\leq x\right\} \right)$

$=P(X_{1}\leq x)P\left( X_{2}\leq x\right) =F_{X_{1}}\left( x\right) F_{X_{2}}\left( x\right) .$

For $\min \left\{ X_{1},X_{2}\right\}$ we need one more trick, namely, pass to the complementary event by writing

$F_{\min \left\{ X_{1},X_{2}\right\} }\left(x\right) =P\left( \min \left\{ X_{1},X_{2}\right\} \leq x\right) =1-P\left(\min \left\{ X_{1},X_{2}\right\} >x\right) .$

Now we can use the fact that the event $\left\{ \min \left\{ X_{1},X_{2}\right\} >x\right\}$ coincides with the event $\left\{ X_{1}>x\right\} \cap \left\{ X_{2}>x\right\} .$ Hence, by independence

(3) $F_{\min \left\{ X_{1},X_{2}\right\} }\left( x\right) =1-P\left( \left\{X_{1}>x\right\} \cap \left\{ X_{2}>x\right\} \right) =1-P\left(X_{1}>x\right) P\left( X_{2}>x\right)$

$=1-\left[ 1-P\left( X_{1}\leq x\right) \right] \left[ 1-P\left( X_{2}\leq x\right) \right] =1-\left( 1-F_{X_{1}}\left( x\right) \right) \left(1-F_{X_{2}}\left( x\right) \right) .$

Equations (2) and (3) can be differentiated to obtain the links in terms of densities.

27
Dec 22

## Final exam in Advanced Statistics ST2133, 2022

Unlike most UoL exams, here I tried to relate the theory to practical issues.

KBTU International School of Economics

Compiled by Kairat Mynbaev

The total for this exam is 41 points. You have two hours.

Everywhere provide detailed explanations. When answering please clearly indicate question numbers. You don’t need a calculator. As long as the formula you provide is correct, the numerical value does not matter.

Question 1. (12 points)

a) (2 points) At a casino, two players are playing on slot machines. Their payoffs $X,Y$ are standard normal and independent. Find the joint density of the payoffs.

b) (4 points) Two other players watch the first two players and start to argue what will be larger: the sum $U = X + Y$ or the difference $V = X - Y$. Find the joint density. Are variables $U,V$ independent? Find their marginal densities.

c) (2 points) Are $U,V$ normal? Why? What are their means and variances?

d) (2 points) Which probability is larger: $P(U > V)$ or $P\left( {U < V} \right)$?

e) (2 points) In this context interpret the conditional expectation $E\left( {U|V = v} \right)$. How much is it?

Reminder. The density of a normal variable $X \sim N\left( {\mu ,{\sigma ^2}} \right)$ is ${f_X}\left( x \right) = \frac{1}{{\sqrt {2\pi {\sigma ^2}} }}{e^{ - \frac{{{{\left( {x - \mu } \right)}^2}}}{{2{\sigma ^2}}}}}$.

Question 2. (9 points) The distribution of a call duration $X$ of one Kcell [largest mobile operator in KZ] customer is exponential: ${f_X}\left( x \right) = \lambda {e^{ - \lambda x}},\,\,x \ge 0,\,\,{f_X}\left( x \right) = 0,\,\,x < 0.$ The number $N$ of customers making calls simultaneously is distributed as Poisson: $P\left( {N = n} \right) = {e^{ - \mu }}\frac{{{\mu ^n}}}{{n!}},\,\,n = 0,1,2,...$ Thus the total call duration for all customers is ${S_N} = {X_1} + ... + {X_N}$ for $N \ge 1$. We put ${S_0} = 0$. Assume that customers make their decisions about calling independently.

a) (3 points) Find the general formula (when ${X_1},...,{X_n}$ are identically distributed and $X,N$ are independent but not necessarily exponential and Poisson, as above) for the moment generating function of $S_N$ explaining all steps.

b) (3 points) Find the moment generating functions of $X$, $N$ and ${S_N}$ for your particular distributions.

c) (3 points) Find the mean and variance of ${S_N}$. Based on the equations you obtained, can you suggest estimators of parameters $\lambda ,\mu$?

Remark. Direct observations on the exponential and Poisson distributions are not available. We have to infer their parameters by observing ${S_N}$. This explains the importance of the technique used in Question 2.

Question 3. (8 points)

a) (2 points) For a non-negative random variable $X$ prove the Markov inequality $P\left( {X > c} \right) \le \frac{1}{c}EX,\,\,\,c > 0.$

b) (2 points) Prove the Chebyshev inequality $P\left( {|X - EX| > c} \right) \le \frac{1}{c^2}Var\left( X \right)$ for an arbitrary random variable $X$.

c) (4 points) We say that the sequence of random variables $\left\{ X_n \right\}$ converges in probability to a random variable $X$ if $P\left( {|{X_n} - X| > \varepsilon } \right) \to 0$ as $n \to \infty$ for any $\varepsilon > 0$.  Suppose that $E{X_n} = \mu$ for all $n$ and that $Var\left(X_n \right) \to 0$ as $n \to \infty$. Prove that then $\left\{X_n\right\}$ converges in probability to $\mu$.

Remark. Question 3 leads to the simplest example of a law of large numbers: if $\left\{ X_n \right\}$ are i.i.d. with finite variance, then their sample mean converges to their population mean in probability.

Question 4. (8 points)

a) (4 points) Define a distribution function. Give its properties, with intuitive explanations.

b) (4 points) Is a sum of two distribution functions a distribution function? Is a product of two distribution functions a distribution function?

Remark. The answer for part a) is here and the one for part b) is based on it.

Question 5. (4 points) The Rakhat factory prepares prizes for kids for the upcoming New Year event. Each prize contains one type of chocolates and one type of candies. The chocolates and candies are chosen randomly from two production lines, the total number of items is always 10 and all selections are equally likely.

a) (2 points) What proportion of prepared prizes contains three or more chocolates?

b) (2 points) 100 prizes have been sent to an orphanage. What is the probability that 50 of those prizes contain no more than two chocolates?

24
Oct 22

## A problem to do once and never come back

There is a problem I gave on the midterm that does not require much imagination. Just know the definitions and do the technical work, so I was hoping we could put this behind us. Turned out we could not and thus you see this post.

Problem. Suppose the joint density of variables $X,Y$ is given by

$f_{X,Y}(x,y)=\left\{ \begin{array}{c}k\left( e^{x}+e^{y}\right) \text{ for }0

I. Find $k$.

II. Find marginal densities of $X,Y$. Are $X,Y$ independent?

III. Find conditional densities $f_{X|Y},\ f_{Y|X}$.

IV. Find $EX,\ EY$.

When solving a problem like this, the first thing to do is to give the theory. You may not be able to finish without errors the long calculations but your grade will be determined by the beginning theoretical remarks.

### I. Finding the normalizing constant

Any density should satisfy the completeness axiom: the area under the density curve (or in this case the volume under the density surface) must be equal to one: $\int \int f_{X,Y}(x,y)dxdy=1.$ The constant $k$ chosen to satisfy this condition is called a normalizing constant. The integration in general is over the whole plain $R^{2}$ and the first task is to express the above integral as an iterated integral. This is where the domain where the density is not zero should be taken into account. There is little you can do without geometry. One example of how to do this is here.

The shape of the area $A=\left\{ (x,y):0 is determined by a) the extreme values of $x,y$ and b) the relationship between them. The extreme values are 0 and 1 for both $x$ and $y$, meaning that $A$ is contained in the square $\left\{ (x,y):0 The inequality $y means that we cut out of this square the triangle below the line $y=x$ (it is really the lower triangle because if from a point on the line $y=x$ we move down vertically, $x$ will stay the same and $y$ will become smaller than $x$).

In the iterated integral:

a) the lower and upper limits of integration for the inner integral are the boundaries for the inner variable; they may depend on the outer variable but not on the inner variable.

b) the lower and upper limits of integration for the outer integral are the extreme values for the outer variable; they must be constant.

This is illustrated in Pane A of Figure 1.

Figure 1. Integration order

Always take the inner integral in parentheses to show that you are dealing with an iterated integral.

a) In the inner integral integrating over $x$ means moving along blue arrows from the boundary $x=y$ to the boundary $x=1.$ The boundaries may depend on $y$ but not on $x$ because the outer integral is over $y.$

b) In the outer integral put the extreme values for the outer variable. Thus,

$\underset{A}{\int \int }f_{X,Y}(x,y)dxdy=\int_{0}^{1}\left(\int_{y}^{1}f_{X,Y}(x,y)dx\right) dy.$

Check that if we first integrate over $y$ (vertically along red arrows, see Pane B in Figure 1) then the equation

$\underset{A}{\int \int }f_{X,Y}(x,y)dxdy=\int_{0}^{1}\left(\int_{0}^{x}f_{X,Y}(x,y)dy\right) dx$

results.

In fact, from the definition $A=\left\{ (x,y):0 one can see that the inner interval for $x$ is $\left[ y,1\right]$ and for $y$ it is $\left[ 0,x\right] .$

### II. Marginal densities

I can't say about this more than I said here.

The condition for independence of $X,Y$ is $f_{X,Y}\left( x,y\right) =f_{X}\left( x\right) f_{Y}\left( y\right)$ (this is a direct analog of the independence condition for events $P\left( A\cap B\right) =P\left( A\right) P\left( B\right)$). In words: the joint density decomposes into a product of individual densities.

### III. Conditional densities

In this case the easiest is to recall the definition of conditional probability $P\left( A|B\right) =\frac{P\left( A\cap B\right) }{P\left(B\right) }.$ The definition of conditional densities $f_{X|Y},\ f_{Y|X}$ is quite similar:

(2) $f_{X|Y}\left( x|y\right) =\frac{f_{X,Y}\left( x,y\right) }{f_{Y}\left( y\right) },\ f_{Y|X}\left( y|x\right) =\frac{f_{X,Y}\left( x,y\right) }{f_{X}\left( x\right) }$.

Of course, $f_{Y}\left( y\right) ,f_{X}\left( x\right)$ here can be replaced by their marginal equivalents.

### IV. Finding expected values of $X,Y$

The usual definition $EX=\int xf_{X}\left( x\right) dx$ takes an equivalent form using the marginal density:

$EX=\int x\left( \int f_{X,Y}\left( x,y\right) dy\right) dx=\int \int xf_{X,Y}\left( x,y\right) dydx.$

Which equation to use is a matter of convenience.

Another replacement in the usual definition gives the definition of conditional expectations:

$E\left( X|Y\right) =\int xf_{X|Y}\left( x|y\right) dx,$ $E\left( Y|X\right) =\int yf_{Y|X}\left( y|x\right) dx.$

Note that these are random variables: $E\left( X|Y=y\right)$ depends in $y$ and $E\left( Y|X=x\right)$ depends on $x.$

### Solution to the problem

Being a lazy guy, for the problem this post is about I provide answers found in Mathematica:

I. $k=0.581977$

II. $f_{X}\left( x\right) =-1+e^{x}\left( 1+x\right) ,$ for $x\in[ 0,1],$ $f_{Y}\left( y\right) =e-e^{y}y,$ for $y\in \left[ 0,1\right] .$

It is readily seen that the independence condition is not satisfied.

III. $f_{X|Y}\left( x|y\right) =\frac{k\left( e^{x}+e^{y}\right) }{e-e^{y}y}$ for $0

$f_{Y|X}\left(y|x\right) =\frac{k\left(e^x+e^y\right) }{-1+e^x\left( 1+x\right) }$ for $0

IV. $EX=0.709012,$ $EY=0.372965.$

24
Oct 22

## Marginal probabilities and densities

This is to help everybody, from those who study Basic Statistics up to Advanced Statistics ST2133.

### Discrete case

Suppose in a box we have coins and banknotes of only two denominations: $1 and$5 (see Figure 1).

Figure 1. Illustration of two variables

We pull one out randomly. The division of cash by type (coin or banknote) divides the sample space (shown as a square, lower left picture) with probabilities $p_{c}$ and $p_{b}$ (they sum to one). The division by denomination ($1 or$5) divides the same sample space differently, see the lower right picture, with the probabilities to pull out $1 and$5 equal to $p_{1}$ and $p_{5}$, resp. (they also sum to one). This is summarized in the tables

 Variable 1: Cash type Prob coin $p_{c}$ banknote $p_{b}$
 Variable 2: Denomination Prob $1 $p_{1}$$5 $p_{5}$

Now we can consider joint events and probabilities (see Figure 2, where the two divisions are combined).

Figure 2. Joint probabilities

For example, if we pull out a random $item$ it can be a $coin$ and \$1 and the corresponding probability is $P\left(item=coin,\ item\ value=\1\right) =p_{c1}.$ The two divisions of the sample space generate a new division into four parts. Then geometrically it is obvious that we have four identities:

Adding over denominations: $p_{c1}+p_{c5}=p_{c},$ $p_{b1}+p_{b5}=p_{b},$

Adding over cash types: $p_{c1}+p_{b1}=p_{1},$ $p_{c5}+p_{b5}=p_{5}.$

Formally, here we use additivity of probability for disjoint events

$P\left( A\cup B\right) =P\left( A\right) +P\left( B\right) .$

In words: we can recover own probabilities of variables 1,2 from joint probabilities.

### Generalization

Suppose we have two discrete random variables $X,Y$ taking values $x_{1},...,x_{n}$ and $y_{1},...,y_{m},$ resp., and their own probabilities are $P\left( X=x_{i}\right) =p_{i}^{X},$ $P\left(Y=y_{j}\right) =p_{j}^{Y}.$ Denote the joint probabilities $P\left(X=x_{i},Y=y_{j}\right) =p_{ij}.$ Then we have the identities

(1) $\sum_{j=1}^mp_{ij}=p_{i}^{X},$ $\sum_{i=1}^np_{ij}=p_{j}^{Y}$ ($n+m$ equations).

In words: to obtain the marginal probability of one variable (say, $Y$) sum over the values of the other variable (in this case, $X$).

The name marginal probabilities is used for $p_{i}^{X},p_{j}^{Y}$ because in the two-dimensional table they arise as a result of summing table entries along columns or rows and are displayed in the margins.

### Analogs for continuous variables with densities

Suppose we have two continuous random variables $X,Y$ and their own densities are $f_{X}$ and $f_{Y}.$ Denote the joint density $f_{X,Y}$. Then replacing in (1) sums by integrals and probabilities by densities we get

(2) $\int_R f_{X,Y}\left( x,y\right) dy=f_{X}\left( x\right) ,\ \int_R f_{X,Y}\left( x,y\right) dx=f_{Y}\left( y\right) .$

In words: to obtain one marginal density (say, $f_{Y}$) integrate out the other variable (in this case, $x$).

5
May 22

## Vector autoregression (VAR)

Suppose we are observing two stocks and their respective returns are $x_{t},y_{t}.$ To take into account their interdependence, we consider a vector autoregression

(1) $\left\{\begin{array}{c} x_{t}=a_{1}x_{t-1}+b_{1}y_{t-1}+u_{t} \\ y_{t}=a_{2}x_{t-1}+b_{2}y_{t-1}+v_{t}\end{array}\right.$

Try to repeat for this system the analysis from Section 3.5 (Application to an AR(1) process) of the Guide by A. Patton and you will see that the difficulties are insurmountable. However, matrix algebra allows one to overcome them, with proper adjustment.

### Problem

A) Write this system in a vector format

(2) $Y_{t}=\Phi Y_{t-1}+U_{t}.$

What should be $Y_{t},\Phi ,U_{t}$ in this representation?

B) Assume that the error $U_{t}$ in (1) satisfies

(3) $E_{t-1}U_{t}=0,\ EU_{t}U_{t}^{T}=\Sigma ,~EU_{t}U_{s}^{T}=0$ for $t\neq s$ with some symmetric matrix $\Sigma =\left(\begin{array}{cc}\sigma _{11} & \sigma _{12} \\\sigma _{12} & \sigma _{22} \end{array}\right) .$

What does this assumption mean in terms of the components of $U_{t}$ from (2)? What is $\Sigma$ if the errors in (1) satisfy

(4) $E_{t-1}u_{t}=E_{t-1}v_{t}=0,~Eu_{t}^{2}=Ev_{t}^{2}=\sigma ^{2},$ $Eu_{s}u_{t}=Ev_{s}v_{t}=0$ for $t\neq s,$ $Eu_{s}v_{t}=0$ for all $s,t?$

C) Suppose (1) is stationary. The stationarity condition is expressed in terms of eigenvalues of $\Phi$ but we don't need it. However, we need its implication:

(5) $\det \left( I-\Phi \right) \neq 0$.

Find $\mu =EY_{t}.$

D) Find $Cov(Y_{t-1},U_{t}).$

E) Find $\gamma _{0}\equiv V\left( Y_{t}\right) .$

F) Find $\gamma _{1}=Cov(Y_{t},Y_{t-1}).$

G) Find $\gamma _{2}.$

Solution

A) It takes some practice to see that with the notation

$Y_{t}=\left(\begin{array}{c}x_{t} \\y_{t}\end{array}\right) ,$ $\Phi =\left(\begin{array}{cc} a_{1} & b_{1} \\a_{2} & b_{2}\end{array}\right) ,$ $U_{t}=\left( \begin{array}{c}u_{t} \\v_{t}\end{array}\right)$

the system (1) becomes (2).

B) The equations in (3) look like this:

$E_{t-1}U_{t}=\left(\begin{array}{c}E_{t-1}u_{t} \\ E_{t-1}v_{t}\end{array}\right) =0,$ $EU_{t}U_{t}^{T}=\left( \begin{array}{cc}Eu_{t}^{2} & Eu_{t}v_{t} \\Eu_{t}v_{t} & Ev_{t}^{2} \end{array}\right) =\left(\begin{array}{cc} \sigma _{11} & \sigma _{12} \\ \sigma _{12} & \sigma _{22}\end{array} \right) ,$

$EU_{t}U_{s}^{T}=\left(\begin{array}{cc} Eu_{t}u_{s} & Eu_{t}v_{s} \\Ev_{t}u_{s} & Ev_{t}v_{s} \end{array}\right) =0.$

Equalities of matrices are understood element-wise, so we get a series of scalar equations $E_{t-1}u_{t}=0,...,Ev_{t}v_{s}=0$ for $t\neq s.$

Conversely, the scalar equations from (4) give

$E_{t-1}U_{t}=0,\ EU_{t}U_{t}^{T}=\left(\begin{array}{cc} \sigma ^{2} & 0 \\0 & \sigma ^{2}\end{array} \right) ,~EU_{t}U_{s}^{T}=0$ for $t\neq s$.

C) (2) implies $EY_{t}=\Phi EY_{t-1}+EU_{t}=\Phi EY_{t-1}$ or by stationarity $\mu =\Phi \mu$ or $\left( I-\Phi \right) \mu =0.$ Hence (5) implies $\mu =0.$

D) From (2) we see that $Y_{t-1}$ depends only on $I_{t}$ (information set at time $t$). Therefore by the LIE

$Cov(Y_{t-1},U_{t})=E\left( Y_{t-1}-EY_{t-1}\right) U_{t}^{T}=E\left[ \left( Y_{t-1}-EY_{t-1}\right) E_{t-1}U_{t}^{T}\right] =0,$

$Cov\left( U_{t},Y_{t-1}\right) =\left[ Cov(Y_{t-1},U_{t})\right] ^{T}=0.$

E) Using the previous post

$\gamma _{0}\equiv V\left( \Phi Y_{t-1}+U_{t}\right) =\Phi V\left( Y_{t-1}\right) \Phi ^{T}+Cov\left( U_{t},Y_{t-1}\right) \Phi ^{T}+\Phi Cov(Y_{t-1},U_{t})+V\left( U_{t}\right)$

$=\Phi \gamma _{0}\Phi ^{T}+\Sigma$

(by stationarity and (3)). Thus, $\gamma _{0}-\Phi \gamma _{0}\Phi ^{T}=\Sigma$ and $\gamma _{0}=\sum_{s=0}^{\infty }\Phi ^{s}\Sigma\left( \Phi ^{T}\right) ^{s}$ (see previous post).

F) Using the previous result we have

$\gamma _{1}=Cov(Y_{t},Y_{t-1})=Cov(\Phi Y_{t-1}+U_{t},Y_{t-1})=\Phi Cov(Y_{t-1},Y_{t-1})+Cov(U_{t},Y_{t-1})$

$=\Phi Cov(Y_{t-1},Y_{t-1})=\Phi \gamma _{0}=\Phi \sum_{s=0}^{\infty }\Phi ^{s}\Sigma\left( \Phi ^{T}\right) ^{s}.$

G) Similarly,

$\gamma _{2}=Cov(Y_{t},Y_{t-2})=Cov(\Phi Y_{t-1}+U_{t},Y_{t-2})=\Phi Cov(Y_{t-1},Y_{t-2})+Cov(U_{t},Y_{t-2})$

$=\Phi Cov(Y_{t-1},Y_{t-2})=\Phi \gamma _{1}=\Phi ^{2}\sum_{s=0}^{\infty }\Phi ^{s}\Sigma\left( \Phi ^{T}\right) ^{s}.$

Autocorrelations require a little more effort and I leave them out.

5
May 22

## Vector autoregressions: preliminaries

Suppose we are observing two stocks and their respective returns are $x_{t},y_{t}.$ A vector autoregression for the pair $x_{t},y_{t}$ is one way to take into account their interdependence. This theory is undeservedly omitted from the Guide by A. Patton.

### Required minimum in matrix algebra

Matrix notation and summation are very simple.

Matrix multiplication is a little more complex. Make sure to read Global idea 2 and the compatibility rule.

The general approach to study matrices is to compare them to numbers. Here you see the first big No: matrices do not commute, that is, in general $AB\neq BA.$

The idea behind matrix inversion is pretty simple: we want an analog of the property $a\times \frac{1}{a}=1$ that holds for numbers.

Some facts about determinants have very complicated proofs and it is best to stay away from them. But a couple of ideas should be clear from the very beginning. Determinants are defined only for square matrices. The relationship of determinants to matrix invertibility explains the role of determinants. If $A$ is square, it is invertible if and only if $\det A\neq 0$ (this is an equivalent of the condition $a\neq 0$ for numbers).

Here is an illustration of how determinants are used. Suppose we need to solve the equation $AX=Y$ for $X,$ where $A$ and $Y$ are known. Assuming that $\det A\neq 0$ we can premultiply the equation by $A^{-1}$ to obtain $A^{-1}AX=A^{-1}Y.$ (Because of lack of commutativity, we need to keep the order of the factors). Using intuitive properties $A^{-1}A=I$ and $IX=X$ we obtain the solution: $X=A^{-1}Y.$ In particular, we see that if $\det A\neq 0,$ then the equation $AX=0$ has a unique solution $X=0.$

Let $A$ be a square matrix and let $X,Y$ be two vectors. $A,Y$ are assumed to be known and $X$ is unknown. We want to check that $X=\sum_{s=0}^{\infty }A^{s}Y\left( A^{T}\right) ^{s}$ solves the equation $X-AXA^{T}=Y.$ (Note that for this equation the trick used to solve $AX=Y$ does not work.) Just plug $X:$

$\sum_{s=0}^{\infty }A^{s}Y\left( A^{T}\right) ^{s}-A\sum_{s=0}^{\infty }A^{s}Y\left( A^{T}\right) ^{s}A^{T}$

$=Y+\sum_{s=1}^{\infty }A^{s}Y\left(A^{T}\right) ^{s}-\sum_{s=1}^{\infty }A^{s}Y\left( A^{T}\right) ^{s}=Y$

(write out a couple of first terms in the sums if summation signs frighten you).

Transposition is a geometrically simple operation. We need only the property $\left( AB\right) ^{T}=B^{T}A^{T}.$

### Variance and covariance

Property 1. Variance of a random vector $X$ and covariance of two random vectors $X,Y$ are defined by

$V\left( X\right) =E\left( X-EX\right) \left( X-EX\right) ^{T},$ $Cov\left( X,Y\right) =E\left( X-EX\right) \left( Y-EY\right) ^{T},$

respectively.

Note that when $EX=0,$ variance becomes

$V\left( X\right) =EXX^{T}=\left( \begin{array}{ccc}EX_{1}^{2} & ... & EX_{1}X_{n} \\ ... & ... & ... \\ EX_{1}X_{n} & ... & EX_{n}^{2}\end{array}\right) .$

Property 2. Let $X,Y$ be random vectors and suppose $A,B$ are constant matrices. We want an analog of $V\left( aX+bY\right) =a^{2}V\left( X\right) +2abcov\left( X,Y\right) +b^{2}V\left( X\right) .$ In the next calculation we have to remember that the multiplication order cannot be changed.

$V\left( AX+BY\right) =E\left[ AX+BY-E\left( AX+BY\right) \right] \left[ AX+BY-E\left( AX+BY\right) \right] ^{T}$

$=E\left[ A\left( X-EX\right) +B\left( Y-EY\right) \right] \left[ A\left( X-EX\right) +B\left( Y-EY\right) \right] ^{T}$

$=E\left[ A\left( X-EX\right) \right] \left[ A\left( X-EX\right) \right] ^{T}+E\left[ B\left( Y-EY\right) \right] \left[ A\left( X-EX\right) \right] ^{T}$

$+E\left[ A\left( X-EX\right) \right] \left[ B\left( Y-EY\right) \right] ^{T}+E\left[ B\left( Y-EY\right) \right] \left[ B\left( Y-EY\right) \right] ^{T}$

(applying $\left( AB\right) ^{T}=B^{T}A^{T}$)

$=AE\left( X-EX\right) \left( X-EX\right) ^{T}A^{T}+BE\left( Y-EY\right) \left( X-EX\right) ^{T}A^{T}$

$+AE\left( X-EX\right) \left( Y-EY\right) ^{T}B^{T}+BE\left( Y-EY\right) \left( Y-EY\right) ^{T}B^{T}$

$=AV\left( X\right) A^{T}+BCov\left( Y,X\right) A^{T}+ACov(X,Y)B^{T}+BV\left( Y\right) B^{T}.$

22
Mar 22

## Blueprint for exam versions

This is the exam I administered in my class in Spring 2022. By replacing the Poisson distribution with other random variables the UoL examiners can obtain a large variety of versions with which to torture Advanced Statistics students. On the other hand, for the students the answers below can be a blueprint to fend off any assaults.

During the semester my students were encouraged to analyze and collect information in documents typed in Scientific Word or LyX. The exam was an open-book online assessment. Papers typed in Scientific Word or LyX were preferred and copying from previous analysis was welcomed. This policy would be my preference if I were to study a subject as complex as Advanced Statistics. The students were given just two hours on the assumption that they had done the preparations diligently. Below I give the model answers right after the questions.

## Midterm Spring 2022

You have to clearly state all required theoretical facts. Number all equations that you need to use in later calculations and reference them as necessary. Answer the questions in the order they are asked. When you don't know the answer, leave some space. For each unexplained fact I subtract one point. Put your name in the file name.

In questions 1-9 $X$ is the Poisson variable.

### Question 1

Define $X$ and derive the population mean and population variance of the sum $S_{n}=\sum_{i=1}^{n}X_{i}$ where $X_{i}$ is an i.i.d. sample from $X$.

Answer. $X$ is defined by $P\left( X=x\right) =e^{-\lambda }\frac{\lambda ^{x}}{x!},\ x=0,1,...$ Using $EX=\lambda$ and $Var\left( X\right) =\lambda$ (ST2133 p.80) we have

$ES_{n}=\sum EX_{i}=n\lambda ,$ $Var\left( S_{n}\right) =\sum V\left( X_{i}\right) =n\lambda$

(by independence and identical distribution). [Some students derived $EX=\lambda ,$ $Var\left( X\right) =\lambda$ instead of respective equations for sample means].

### Question 2

Derive the MGF of the standardized sample mean.

Answer. Knowing this derivation is a must because it is a combination of three important facts.

a) Let $z_{n}=\frac{\bar{X}-E\bar{X}}{\sigma \left( \bar{X}\right) }.$ Then $z_{n}=\frac{nS_{n}-EnS_{n}}{\sigma \left( nS_{n}\right) }=\frac{S_{n}-ES_{n} }{\sigma \left( S_{n}\right) },$ so standardizing $\bar{X}$ and $S_{n}$ gives the same result.

b) The MGF of $S_{n}$ is expressed through the MGF of $X$:

$M_{S_{n}}\left( t\right) =Ee^{S_{n}t}=Ee^{X_{1}t+...+X_{n}t}=Ee^{X_{1}t}...e^{X_{n}t}=$

(independence) $=Ee^{X_{1}t}...Ee^{X_{n}t}=$ (identical distribution) $=\left[ M_{X}\left( t\right) \right] ^{n}.$

c) If $X$ is a linear transformation of $Y,$ $X=a+bY,$ then

$M_{X}\left( t\right) =Ee^{X}=Ee^{\left( a+bY\right) t}=e^{at}Ee^{Y\left( bt\right) }=e^{at}M_{Y}\left( bt\right) .$

When answering the question we assume any i.i.d. sample from a population with mean $\mu$ and population variance $\sigma ^{2}$:

Putting in c) $a=-\frac{ES_{n}}{\sigma \left( S_{n}\right) },$ $b=\frac{1}{\sigma \left( S_{n}\right) }$ and using a) we get

$M_{z_{n}}\left( t\right) =E\exp \left( \frac{S_{n}-ES_{n}}{\sigma \left( S_{n}\right) }t\right) =e^{-ES_{n}t/\sigma \left( S_{n}\right) }M_{S_{n}}\left( t/\sigma \left( S_{n}\right) \right)$

(using b) and $ES_{n}=n\mu ,$ $Var\left( S_{n}\right) =n\sigma ^{2}$)

$=e^{-ES_{n}t/\sigma \left( S_{n}\right) }\left[ M_{X}\left( t/\sigma \left( S_{n}\right) \right) \right] ^{n}=e^{-n\mu t/\left( \sqrt{n}\sigma \right) }% \left[ M_{X}\left( t/\left( \sqrt{n}\sigma \right) \right) \right] ^{n}.$

This is a general result which for the Poisson distribution can be specified as follows. From ST2133, example 3.38 we know that $M_{X}\left( t\right)=\exp \left( \lambda \left( e^{t}-1\right) \right)$. Therefore, we obtain

$M_{z_{n}}\left( t\right) =e^{-\sqrt{\lambda }t}\left[ \exp \left( \lambda \left( e^{t/\left( n\sqrt{\lambda }\right) }-1\right) \right) \right] ^{n}= e^{-t\sqrt{\lambda }+n\lambda \left( e^{t/\left( n\sqrt{\lambda }\right) }-1\right) }.$

[Instead of $M_{z_n}$ some students gave $M_X$.]

### Question 3

Derive the cumulant generating function of the standardized sample mean.

Answer. Again, there are a couple of useful general facts.

I) Decomposition of MGF around zero. The series $e^{x}=\sum_{i=0}^{\infty } \frac{x^{i}}{i!}$ leads to

$M_{X}\left( t\right) =Ee^{tX}=E\left( \sum_{i=0}^{\infty }\frac{t^{i}X^{i}}{ i!}\right) =\sum_{i=0}^{\infty }\frac{t^{i}}{i!}E\left( X^{i}\right) =\sum_{i=0}^{\infty }\frac{t^{i}}{i!}\mu _{i}$

where $\mu _{i}=E\left( X^{i}\right)$ are moments of $X$ and $\mu _{0}=EX^{0}=1.$ Differentiating this equation yields

$M_{X}^{(k)}\left( t\right) =\sum_{i=k}^{\infty }\frac{t^{i-k}}{\left( i-k\right) !}\mu _{i}$

and setting $t=0$ gives the rule for finding moments from MGF: $\mu _{k}=M_{X}^{(k)}\left( 0\right) .$

II) Decomposition of the cumulant generating function around zero. $K_{X}\left( t\right) =\log M_{X}\left( t\right)$ can also be decomposed into its Taylor series:

$K_{X}\left( t\right) =\sum_{i=0}^{\infty }\frac{t^{i}}{i!}\kappa _{i}$

where the coefficients $\kappa _{i}$ are called cumulants and can be found using $\kappa _{k}=K_{X}^{(k)}\left( 0\right)$. Since

$K_{X}^{\prime }\left( t\right) =\frac{M_{X}^{\prime }\left( t\right) }{ M_{X}\left( t\right) }$ and $K_{X}^{\prime \prime }\left( t\right) =\frac{ M_{X}^{\prime \prime }\left( t\right) M_{X}\left( t\right) -\left( M_{X}^{\prime }\left( t\right) \right) ^{2}}{M_{X}^{2}\left( t\right) }$

we have

$\kappa _{0}=\log M_{X}\left( 0\right) =0,$ $\kappa _{1}=\frac{M_{X}^{\prime }\left( 0\right) }{M_{X}\left( 0\right) }=\mu _{1},$

$\kappa _{2}=\mu _{2}-\mu _{1}^{2}=EX^{2}-\left( EX\right) ^{2}=Var\left( X\right) .$

Thus, for any random variable $X$ with mean $\mu$ and variance $\sigma ^{2}$ we have

$K_{X}\left( t\right) =\mu t+\frac{\sigma ^{2}t^{2}}{2}+$ terms of higher order for $t$ small.

III) If $X=a+bY$ then by c)

$K_{X}\left( t\right) =K_{a+bY}\left( t\right) =\log \left[ e^{at}M_{Y}\left( bt\right) \right] =at+K_{X}\left( bt\right) .$

IV) By b)

$K_{S_{n}}\left( t\right) =\log \left[ M_{X}\left( t\right) \right] ^{n}=nK_{X}\left( t\right) .$

Using III), $z_{n}=\frac{S_{n}-ES_{n}}{\sigma \left( S_{n}\right) }$ and then IV) we have

$K_{z_{n}}\left( t\right) =\frac{-ES_{n}}{\sigma \left( S_{n}\right) } t+K_{S_{n}}\left( \frac{t}{\sigma \left( S_{n}\right) }\right) =\frac{-ES_{n} }{\sigma \left( S_{n}\right) }t+nK_{X}\left( \frac{t}{\sigma \left( S_{n}\right) }\right) .$

For the last term on the right we use the approximation around zero from II):

$K_{z_{n}}\left( t\right) =\frac{-ES_{n}}{\sigma \left( S_{n}\right) } t+nK_{X}\left( \frac{t}{\sigma \left( S_{n}\right) }\right) \approx \frac{ -ES_{n}}{\sigma \left( S_{n}\right) }t+n\mu \frac{t}{\sigma \left( S_{n}\right) }+n\frac{\sigma ^{2}}{2}\left( \frac{t}{\sigma \left( S_{n}\right) }\right) ^{2}$

$=-\frac{n\mu }{\sqrt{n}\sigma }t+n\mu \frac{t}{\sqrt{n}\sigma }+n\frac{ \sigma ^{2}}{2}\left( \frac{t}{\sigma \left( S_{n}\right) }\right) ^{2}=t^{2}/2.$

[Important. Why the above steps are necessary? Passing from the series $M_{X}\left( t\right) =\sum_{i=0}^{\infty }\frac{t^{i}}{i!}\mu _{i}$ to the series for $K_{X}\left( t\right) =\log M_{X}\left( t\right)$ is not straightforward and can easily lead to errors. It is not advisable in case of the Poisson to derive $K_{z_{n}}$ from $M_{z_{n}}\left( t\right) =$ $e^{-t \sqrt{\lambda }+n\lambda \left( e^{t/\left( n\sqrt{\lambda }\right) }-1\right) }$.]

### Question 4

Prove the central limit theorem using the cumulant generating function you obtained.

Answer. In the previous question we proved that around zero

$K_{z_{n}}\left( t\right) \rightarrow \frac{t^{2}}{2}.$

This implies that

(1) $M_{z_{n}}\left( t\right) \rightarrow e^{t^{2}/2}$ for each $t$ around zero.

But we know that for a standard normal $X$ its MGF is $M_{X}\left( t\right) =\exp \left( \mu t+\frac{\sigma ^{2}t^{2}}{2}\right)$ (ST2133 example 3.42) and hence for the standard normal

(2) $M_{z}\left( t\right) =e^{t^{2}/2}.$

Theorem (link between pointwise convergence of MGFs of $\left\{ X_{n}\right\}$ and convergence in distribution of $\left\{ X_{n}\right\}$) Let $\left\{ X_{n}\right\}$ be a sequence of random variables and let $X$ be some random variable. If $M_{X_{n}}\left( t\right)$ converges for each $t$ from a neighborhood of zero to $M_{X}\left( t\right)$, then $X_{n}$ converges in distribution to $X.$

Using (1), (2) and this theorem we finish the proof that $z_{n}$ converges in distribution to the standard normal, which is the central limit theorem.

### Question 5

State the factorization theorem and apply it to show that $U=\sum_{i=1}^{n}X_{i}$ is a sufficient statistic.

Answer. The solution is given on p.180 of ST2134. For $x_{i}=1,...,n$ the joint density is

(3) $f_{X}\left( x,\lambda \right) =\prod\limits_{i=1}^{n}e^{-\lambda } \frac{\lambda ^{x_{i}}}{x_{i}!}=\frac{\lambda ^{\Sigma x_{i}}e^{-n\lambda }}{\Pi _{i=1}^{n}x_{i}!}.$

To satisfy the Fisher-Neyman factorization theorem set

$g\left( \sum x_{i},\lambda \right) =\lambda ^{\Sigma x_{i}e^{-n\lambda }},\ h\left( x\right) =\frac{1}{\Pi _{i=1}^{n}x_{i}!}$

and then we see that $\sum x_{i}$ is a sufficient statistic for $\lambda .$

### Question 6

Find a minimal sufficient statistic for $\lambda$ stating all necessary theoretical facts.

AnswerCharacterization of minimal sufficiency A statistic $T\left( X\right)$ is minimal sufficient if and only if level sets of $T$ coincide with sets on which the ratio $f_{X}\left( x,\theta \right) /f_{X}\left( y,\theta \right)$ does not depend on $\theta .$

From (3)

$f_{X}\left( x,\lambda \right) /f_{X}\left( y,\lambda \right) =\frac{\lambda ^{\Sigma x_{i}}e^{-n\lambda }}{\Pi _{i=1}^{n}x_{i}!}\left[ \frac{\lambda ^{\Sigma y_{i}}e^{-n\lambda }}{\Pi _{i=1}^{n}y_{i}!}\right] ^{-1}=\lambda ^{ \left[ \Sigma x_{i}-\Sigma y_{i}\right] }\frac{\Pi _{i=1}^{n}y_{i}!}{\Pi _{i=1}^{n}x_{i}!}.$

The expression on the right does not depend on $\lambda$ if and only of $\Sigma x_{i}=\Sigma y_{i}0.$ The last condition describes level sets of $T\left( X\right) =\sum X_{i}.$ Thus it is minimal sufficient.

### Question 7

Find the Method of Moments estimator of the population mean.

Answer. The idea of the method is to take some populational property (for example, $EX=\lambda$) and replace the population characteristic (in this case $EX$) by its sample analog ($\bar{X}$) to obtain a MM estimator. In our case $\hat{\lambda}_{MM}= \bar{X}.$ [Try to do this for the Gamma distribution].

### Question 8

Find the Fisher information.

Answer. From Problem 5 the log-likelihood is

$l_{X}\left( \lambda ,x\right) =-n\lambda +\sum x_{i}\log \lambda -\sum \log \left( x_{i}!\right) .$

Hence the score function is (see Example 2.30 in ST2134)

$s_{X}\left( \lambda ,x\right) =\frac{\partial }{\partial \lambda } l_{X}\left( \lambda ,x\right) =-n+\frac{1}{\lambda }\sum x_{i}.$

Then

$\frac{\partial ^{2}}{\partial \lambda ^{2}}l_{X}\left( \lambda ,x\right) =- \frac{1}{\lambda ^{2}}\sum x_{i}$

and the Fisher information is

$I_{X}\left( \lambda \right) =-E\left( \frac{\partial ^{2}}{\partial \lambda ^{2}}l_{X}\left( \lambda ,x\right) \right) =\frac{1}{\lambda ^{2}}E\sum X_{i}=\frac{n\lambda }{\lambda ^{2}}=\frac{n}{\lambda }.$

### Question 9

Derive the Cramer-Rao lower bound for $V\left( \bar{X}\right)$ for a random sample.

Answer. (See Example 3.17 in ST2134) Since $\bar{X}$ is an unbiased estimator of $\lambda$ by Problem 1, from the Cramer-Rao theorem we know that

$V\left( \bar{X}\right) \geq \frac{1}{I_{X}\left( \lambda \right) }=\frac{ \lambda }{n}$

and in fact by Problem 1 this lower bound is attained.

19
Feb 22

## Estimation of parameters of a normal distribution

Here we show that the knowledge of the distribution of $s^{2}$ for linear regression allows one to do without long calculations contained in the guide ST 2134 by J. Abdey.

Theorem. Let $y_{1},...,y_{n}$ be independent observations from $N\left( \mu,\sigma ^{2}\right)$. 1) $s^{2}\left( n-1\right) /\sigma ^{2}$ is distributed as $\chi _{n-1}^{2}.$ 2) The estimators $\bar{y}$ and $s^{2}$ are independent. 3) $Es^{2}=\sigma ^{2},$ 4) $Var\left( s^{2}\right) =\frac{2\sigma ^{4}}{n-1},$ 5) $\frac{s^{2}-\sigma ^{2}}{\sqrt{2\sigma ^{4}/\left(n-1\right) }}$ converges in distribution to $N\left( 0,1\right) .$

Proof. We can write $y_{i}=\mu +e_{i}$ where $e_{i}$ is distributed as $N\left( 0,\sigma ^{2}\right) .$ Putting $\beta =\mu ,\ y=\left(y_{1},...,y_{n}\right) ^{T},$ $e=\left( e_{1},...,e_{n}\right) ^{T}$ and $X=\left( 1,...,1\right) ^{T}$ (a vector of ones) we satisfy (1) and (2). Since $X^{T}X=n,$ we have $\hat{\beta}=\bar{y}.$ Further,

$r\equiv y-X\hat{ \beta}=\left( y_{1}-\bar{y},...,y_{n}-\bar{y}\right) ^{T}$

and

$s^{2}=\left\Vert r\right\Vert ^{2}/\left( n-1\right) =\sum_{i=1}^{n}\left( y_{i}-\bar{y}\right) ^{2}/\left( n-1\right) .$

Thus 1) and 2) follow from results for linear regression.

3) For a normal variable $X$ its moment generating function is $M_{X}\left( t\right) =\exp \left(\mu t+\frac{1}{2}\sigma ^{2}t^{2}\right)$ (see Guide ST2133, 2021, p.88). For the standard normal we get

$M_{z}^{\prime }\left( t\right) =\exp \left( \frac{1}{2}t^{2}\right) t,$ $M_{z}^{\prime \prime }\left( t\right) =\exp \left( \frac{1}{2}t^{2}\right) (t^{2}+1),$

$M_{z}^{\prime \prime \prime}\left( t\right) =\exp \left( \frac{1}{2}t^{2}\right) (t^{3}+2t+t),$ $M_{z}^{(4)}\left( t\right) =\exp \left( \frac{1}{2}t^{2}\right) (t^{4}+6t^{2}+3).$

Applying the general property $EX^{r}=M_{X}^{\left( r\right) }\left( 0\right)$ (same guide, p.84) we see that

$Ez=0,$ $Ez^{2}=1,$ $Ez^{3}=0,$ $Ez^{4}=3,$

$Var(z)=1,$ $Var\left( z^{2}\right) =Ez^{4}-\left( Ez^{2}\right) ^{2}=3-1=2.$

Therefore

$Es^{2}=\frac{\sigma ^{2}}{n-1}E\left( z_{1}^{2}+...+z_{n-1}^{2}\right) =\frac{\sigma ^{2}}{n-1}\left( n-1\right) =\sigma ^{2}.$

4) By independence of standard normals

$Var\left( s^{2}\right) =$ $\left(\frac{\sigma ^{2}}{n-1}\right) ^{2}\left[ Var\left( z_{1}^{2}\right) +...+Var\left( z_{n-1}^{2}\right) \right] =\frac{\sigma ^{4}}{\left( n-1\right) ^{2}}2\left( n-1\right) =\frac{2\sigma ^{4}}{n-1}.$

5) By standardizing $s^{2}$ we have $\frac{s^{2}-Es^{2}}{\sigma \left(s^{2}\right) }=\frac{s^{2}-\sigma ^{2}}{\sqrt{2\sigma ^{4}/\left( n-1\right) }}$ and this converges in distribution to $N\left( 0,1\right)$ by the central limit theorem.