Jun 21

Solution to Question 1 from UoL exam 2020

Solution to Question 1 from UoL exam 2020

The assessment was an open-book take-home online assessment with a 24-hour window. No attempt was made to prevent cheating, except a warning, which was pretty realistic. Before an exam it's a good idea to see my checklist.

Question 1. Consider the following ARMA(1,1) process:

(1) z_{t}=\gamma +\alpha z_{t-1}+\varepsilon _{t}+\theta \varepsilon _{t-1}

where \varepsilon _{t} is a zero-mean white noise process with variance \sigma ^{2}, and assume |\alpha |,|\theta |<1 and \alpha+\theta \neq 0, which together make sure z_{t} is covariance stationary.

(a) [20 marks] Calculate the conditional and unconditional means of z_{t}, that is, E_{t-1}[z_{t}] and E[z_{t}].

(b) [20 marks] Set \alpha =0. Derive the autocovariance and autocorrelation function of this process for all lags as functions of the parameters \theta and \sigma .

(c) [30 marks] Assume now \alpha \neq 0. Calculate the conditional and unconditional variances of z_{t}, that is, Var_{t-1}[z_{t}] and Var[z_{t}].

Hint: for the unconditional variance, you might want to start by deriving the unconditional covariance between the variable and the innovation term, i.e., Cov[z_{t},\varepsilon _{t}].

(d) [30 marks] Derive the autocovariance and autocorrelation for lags of 1 and 2 as functions of the parameters of the model.

Hint: use the hint of part (c).


Part (a)

Reminder: The definition of a zero-mean white noise process is

(2) E\varepsilon _{t}=0, Var(\varepsilon _{t})=E\varepsilon_{t}^{2}=\sigma ^{2} for all t and Cov(\varepsilon _{j},\varepsilon_{i})=E\varepsilon _{j}\varepsilon _{i}=0 for all i\neq j.

A variable indexed t-1 is known at moment t-1 and at all later moments and behaves like a constant for conditioning at such moments.

Moment t is future relative to t-1.  The future is unpredictable and the best guess about the future error is zero.

The recurrent relationship in (1) shows that

(3) z_{t-1}=\gamma +\alpha z_{t-2}+... does not depend on the information that arrives at time t and later.

Hence, using also linearity of conditional means,

(4) E_{t-1}z_{t}=E_{t-1}\gamma +\alpha E_{t-1}z_{t-1}+E_{t-1}\varepsilon _{t}+\theta E_{t-1}\varepsilon _{t-1}=\gamma +\alpha z_{t-1}+\theta\varepsilon _{t-1}.

The law of iterated expectations (LIE): application of E_{t-1}, based on information available at time t-1, and subsequent application of E, based on no information, gives the same result as application of E.

Ez_{t}=E[E_{t-1}z_{t}]=E\gamma +\alpha Ez_{t-1}+\theta E\varepsilon _{t-1}=\gamma +\alpha Ez_{t-1}.

Since z_{t} is covariance stationary, its means across times are the same, so Ez_{t}=\gamma +\alpha Ez_{t} and Ez_{t}=\frac{\gamma }{1-\alpha }.

Part (b)

With \alpha =0 we get z_{t}=\gamma +\varepsilon _{t}+\theta\varepsilon _{t-1} and from part (a) Ez_{t}=\gamma . Using (2), we find variance

Var(z_{t})=E(z_{t}-Ez_{t})^{2}=E(\varepsilon _{t}^{2}+2\theta \varepsilon_{t}\varepsilon _{t-1}+\theta ^{2}\varepsilon _{t-2}^{2})=(1+\theta^{2})\sigma ^{2}

and first autocovariance

(5) \gamma_{1}=Cov(z_{t},z_{t-1})=E(z_{t}-Ez_{t})(z_{t-1}-Ez_{t-1})=E(\varepsilon_{t}+\theta \varepsilon _{t-1})(\varepsilon _{t-1}+\theta \varepsilon_{t-2})=\theta E\varepsilon _{t-1}^{2}=\theta \sigma ^{2}.

Second and higher autocovariances are zero because the subscripts of epsilons don't overlap.

Autocorrelation function: \rho _{0}=\frac{Cov(z_{t},z_{t})}{\sqrt{Var(z_{t})Var(z_{t})}}=1 (this is always true),

\rho _{1}=\frac{Cov(z_{t},z_{t-1})}{\sqrt{Var(z_{t})Var(z_{t-1})}}=\frac{\theta \sigma ^{2}}{(1+\theta ^{2})\sigma ^{2}}=\frac{\theta }{1+\theta ^{2}}, \rho _{j}=0 for j>1.

This is characteristic of MA processes: their autocorrelations are zero starting from some point.

Part (c)

If we replace all expectations in the definition of variance, we obtain the definition of conditional variance. From (1) and (4)

Var_{t-1}(z_{t})=E_{t-1}(z_{t}-E_{t-1}z_{t})^{2}=E_{t-1}\varepsilon_{t}^{2}=\sigma ^{2}.

By the law of total variance

(6) Var(z_{t})=EVar_{t-1}(z_{t})+Var(E_{t-1}z_{t})=\sigma ^{2}+Var(\gamma+\alpha z_{t-1}+\theta \varepsilon _{t-1})=

(an additive constant does not affect variance)

=\sigma ^{2}+Var(\alpha z_{t-1}+\theta \varepsilon _{t-1})=\sigma^{2}+\alpha ^{2}Var(z_{t})+2\alpha \theta Cov(z_{t-1},\varepsilon_{t-1})+\theta ^{2}Var(\varepsilon _{t-1}).

By the LIE and (3)

Cov(z_{t-1},\varepsilon _{t-1})=Cov(\gamma +\alpha z_{t-2}+\varepsilon  _{t-1}+\theta \varepsilon _{t-2},\varepsilon _{t-1})=\alpha  Cov(z_{t-2},\varepsilon _{t-1})+E\varepsilon _{t-1}^{2}+\theta  EE_{t-2}\varepsilon _{t-2}\varepsilon _{t-1}=\sigma ^{2}+\theta  E(\varepsilon _{t-2}E_{t-2}\varepsilon _{t-1}).

Here E_{t-2}\varepsilon _{t-1}=0, so

(7) Cov(z_{t-1},\varepsilon _{t-1})=\sigma ^{2}.

This equation leads to

Var(z_{t})=Var(\gamma +\alpha z_{t-1}+\varepsilon _{t}+\theta \varepsilon  _{t-1})=\alpha ^{2}Var(z_{t-1})+Var(\varepsilon _{t})+\theta  ^{2}Var(\varepsilon _{t-1})+

+2\alpha Cov(z_{t-1},\varepsilon _{t})+2\alpha \theta  Cov(z_{t-1},\varepsilon _{t-1})+2\theta Cov(\varepsilon _{t},\varepsilon  _{t-1})=\alpha ^{2}Var(z_{t})+\sigma ^{2}+\theta ^{2}\sigma ^{2}+2\alpha  \theta \sigma ^{2}

and, finally,

(8) Var(z_{t})=\frac{(1+2\alpha \theta +\theta ^{2})\sigma ^{2}}{1-\alpha  ^{2}}.

Part (d)

From (7)

(9) Cov(z_{t-1},\varepsilon _{t-2})=Cov(\gamma +\alpha z_{t-2}+\varepsilon  _{t-1}+\theta \varepsilon _{t-2},\varepsilon _{t-2})=\alpha  Cov(z_{t-2},\varepsilon _{t-2})+\theta Var(\varepsilon _{t-2})=(\alpha  +\theta )\sigma ^{2}.

It follows that

Cov(z_{t},z_{t-1})=Cov(\gamma +\alpha z_{t-1}+\varepsilon _{t}+\theta  \varepsilon _{t-1},\gamma +\alpha z_{t-2}+\varepsilon _{t-1}+\theta  \varepsilon _{t-2})=

(a constant is not correlated with anything)

=\alpha ^{2}Cov(z_{t-1},z_{t-2})+\alpha Cov(z_{t-1},\varepsilon  _{t-1})+\alpha \theta Cov(z_{t-1},\varepsilon _{t-2})+

+\alpha Cov(\varepsilon _{t},z_{t-2})+Cov(\varepsilon _{t},\varepsilon  _{t-1})+\theta Cov(\varepsilon _{t},\varepsilon _{t-2})+

+\theta \alpha Cov(\varepsilon _{t-1},z_{t-2})+\theta Var(\varepsilon  _{t-1})+\theta ^{2}Cov(\varepsilon _{t-1},\varepsilon _{t-2}).

From (7) Cov(z_{t-2},\varepsilon _{t-2})=\sigma ^{2} and from (9) Cov(z_{t-1},\varepsilon _{t-2})=(\alpha +\theta )\sigma ^{2}.

From (3) Cov(\varepsilon _{t},z_{t-2})=Cov(\varepsilon _{t-1},z_{t-2})=0.

Using also the white noise properties and stationarity of z_{t}

Cov(z_{t},z_{t-1})=Cov(z_{t-1},z_{t-2})=\gamma _{1},

we are left with

\gamma _{1}=\alpha ^{2}\gamma _{1}+\alpha \sigma  ^{2}+\alpha \theta (\alpha +\theta )\sigma ^{2}+\theta \sigma ^{2}=\alpha  ^{2}\gamma _{1}+(1+\alpha \theta )(\alpha +\theta )\sigma ^{2}.


\gamma _{1}=\frac{(1+\alpha \theta )(\alpha +\theta )\sigma ^{2}}{1-\alpha  ^{2}}

and using (8)

\rho _{0}=1, \rho _{1}=\frac{(1+\alpha \theta )(\alpha +\theta )}{  1+2\alpha \theta +\theta ^{2}}.

The finish is close.

Cov(z_{t},z_{t-2})=Cov(\gamma +\alpha z_{t-1}+\varepsilon _{t}+\theta  \varepsilon _{t-1},\gamma +\alpha z_{t-3}+\varepsilon _{t-2}+\theta  \varepsilon _{t-3})=

=\alpha ^{2}Cov(z_{t-1},z_{t-3})+\alpha Cov(z_{t-1},\varepsilon  _{t-2})+\alpha \theta Cov(z_{t-1},\varepsilon _{t-3})+

+\alpha Cov(\varepsilon _{t},z_{t-3})+Cov(\varepsilon _{t},\varepsilon  _{t-2})+\theta Cov(\varepsilon _{t},\varepsilon _{t-3})+

+\theta \alpha Cov(\varepsilon _{t-1},z_{t-3})+\theta Cov(\varepsilon  _{t-1},\varepsilon _{t-2})+\theta ^{2}Cov(\varepsilon _{t-1},\varepsilon  _{t-3}).

This simplifies to

(10) Cov(z_{t},z_{t-2})=\alpha ^{2}Cov(z_{t-1},z_{t-3})+\alpha (\alpha  +\theta )\sigma ^{2}+\alpha \theta Cov(z_{t-1},\varepsilon _{t-3}).

By (7)

Cov(z_{t-1},\varepsilon _{t-3})=Cov(\gamma +\alpha z_{t-2}+\varepsilon  _{t-1}+\theta \varepsilon _{t-2},\varepsilon _{t-3})=\alpha  Cov(z_{t-2},\varepsilon _{t-3})=

=\alpha Cov(\gamma +\alpha z_{t-3}+\varepsilon _{t-2}+\theta \varepsilon  _{t-3},\varepsilon _{t-3})=\alpha \sigma ^{2}+\alpha \theta \sigma ^{2}=\alpha (1+\theta )\sigma ^{2}.

Finally, using (10)

\gamma _{2}=\alpha ^{2}\gamma _{2}+\alpha (\alpha +\theta )\sigma  ^{2}+\alpha^2 \theta (1 +\theta )\sigma ^{2}=\alpha ^{2}\gamma  _{2}+\alpha\sigma^2 (\alpha +\theta +\alpha\theta +\alpha\theta^2)\sigma ^{2},

\gamma _{2}=\frac{\alpha\sigma^2 (\alpha +\theta +\alpha\theta +\alpha\theta^2)\sigma ^{2}}{1-\alpha  ^{2}},

\rho _{2}=\frac{\alpha\sigma^2 (\alpha +\theta +\alpha\theta +\alpha\theta^2)}{1+2\alpha \theta  +\theta ^{2}}.

A couple of errors have been corrected on June 22, 2021. Hope this is final.

May 18

The Newey-West estimator: uncorrelated and correlated data

The Newey-West estimator: uncorrelated and correlated data

I hate long posts but here we by necessity have to go through all ideas and calculations, to understand what is going on. One page of formulas in A. Patton's guide to FN3142 Quantitative Finance in my rendition becomes three posts.

Preliminaries and autocovariance function

Let X_1,...,X_n be random variables. We need to recall that the variance of the vector X=(X_1,...,X_n)^T is

(1) V(X)=\left(\begin{array}{cccc}V(X_1)&Cov(X_1,X_2)&...&Cov(X_1,X_n) \\Cov(X_2,X_1)&V(X_2)&...&Cov(X_2,X_n) \\...&...&...&... \\Cov(X_n,X_1)&Cov(X_n,X_2)&...&V(X_n) \end{array}\right).

With the help of this matrix we derived two expressions for variance of a linear combination:

(2) V\left(\sum_{i=1}^na_iX_i\right)=\sum_{i=1}^na_i^2V(X_i)

for uncorrelated variables and

(3) V\left(\sum_{i=1}^na_iX_i\right)=\sum_{i=1}^na_i^2V(X_i)+2\sum_{i=1}^{n-1}\sum_{j=i+1}^na_ia_jCov(X_i,X_j)

when there is autocorrelation.

In a time series context X_1,...,X_n are observations along time. 1,...,n stand for moments in time and the sequence X_1,...,X_n is called a time series. We need to recall the definition of a stationary process. Of that definition, we will use only the part about covariances: Cov(X_i,X_j) depends only on the distance |i-j| between the time moments i,j. For example, in the top right corner of (1) we have Cov(X_1,X_n), which depends only on n-1.

Preamble. Let X_1,...,X_n be a stationary times series. Firstly, Cov(X_i,X_i+k) depends only on k. Secondly, for all integer k=0,\pm 1,\pm 2,... denoting j=i+k we have

(4) Cov(X_i,X_{i+k})=Cov(X_{j-k},X_j)=Cov(X_j,X_{j-k}).

Definition. The autocovariance function is defined by

(5) \gamma_k=Cov(X_i,X_{k+i}) for all integer k=0,\pm 1,\pm 2,...

In particular,

(6) \gamma_0=Cov(X_i,X_i)=V(X_i) for all i.

The preamble shows that definition (5) is correct (the right side in (5) depends only on k and not on i). Because of (4) we have symmetry \gamma_{-k}=\gamma _k, so negative k can be excluded from consideration.

With (5) and (6) for a stationary series (1) becomes

(7) V(X)=\left(\begin{array}{cccc}\gamma_0&\gamma_1&...&\gamma_{n-1} \\ \gamma_1&\gamma_0&...&\gamma_{n-2} \\...&...&...&... \\ \gamma_{n-1}&\gamma_{n-2}&...&\gamma_0\end{array}\right).

Estimating variance of a sample mean

Uncorrelated observations. Suppose X_1,...,X_n are uncorrelated observations from the same population with variance \sigma^2. From (2)
we get

(8) V\left(\frac{1}{n}\sum_{i=1}^nX_i\right) =\frac{1}{n^2}\sum_{i=1}^nV(X_i)=\frac{n\sigma^2}{n^2}=\frac{\sigma^2}{n}.

This is a theoretical relationship. To actually obtain an estimator of the sample variance, we need to replace \sigma^2 by some estimator. It is known that

(9) s^2=\frac{1}{n}\sum_{i=1}^n(X_i-\bar{X})^2

consistently estimates \sigma^2. Plugging it in (8) we see that variance of the sample mean is consistently estimated by


This is the estimator derived on p.151 of Patton's guide.

Correlated observations. In this case we use (3):

V\left( \frac{1}{n}\sum_{i=1}^nX_i\right) =\frac{1}{n^{2}}\left[\sum_{i=1}^nV(X_i)+2\sum_{i=1}^{n-1}\sum_{j=i+1}^nCov(X_i,X_j)\right].

Here visualization comes in handy. The sums in the square brackets include all terms on the main diagonal of (7) and above it. That is, we have n copies of \gamma_0, n-1 copies of \gamma_{1},..., 2 copies of \gamma _{n-2} and 1 copy of \gamma _{n-1}. The sum in the brackets is

\sum_{i=1}^nV(X_i)+2\sum_{i=1}^{n-1}\sum_{j=i+1}^nCov(X_i,X_j)=n\gamma_0+2[(n-1)\gamma_1+...+2\gamma_{n-2}+\gamma _{n-1}]=n\gamma  _0+2\sum_{k=1}^{n-1}(n-k)\gamma_k.

Thus we obtain the first equation on p.152 of Patton's guide (it's up to you to match the notation):

(10) V(\bar{X})=\frac{1}{n}\gamma_0+\frac{2}{n}\sum_{k=1}^{n-1}(1-\frac{k}{n})\gamma_k.

As above, this is just a theoretical relationship. \gamma_0=V(X_i)=\sigma^2 is estimated by (9). Ideally, the estimator of \gamma_k=Cov(X_i,X_{k+i}) is obtained by replacing all population means by sample means:

(11) \hat{\gamma}_k=\frac{1}{n}\sum_{i=1}^n(X_i-\bar{X})(X_{k+i}-\bar{X}).

There are two problems with this estimator, though. The first problem is that when i runs from 1 to n, k+i runs from k+1 to k+n. To exclude out-of-sample values, the summation in (11) is reduced:

(12) \hat{\gamma}_k=\frac{1}{n-k}\sum_{i=1}^{n-k}(X_i-\bar{X})(X_{k+i}-\bar{X}).

The second problem is that the sum in (12) becomes too small when k is close to n. For example, for k=n-1 (12) contains just one term (there is no averaging). Therefore the upper limit of summation n-1 in (10) is replaced by some function M(n) that tends to infinity slower than n. The result is the estimator


where \hat{\gamma}_0 is given by (9) and \hat{\gamma}_k is given by (12). This is almost the Newey-West estimator from p.152. The only difference is that instead of \frac{k}{M(n)} they use \frac{k}{M(n)+1}, and I have no idea why. One explanation is that for low n, M(n) can be zero, so they just wanted to avoid division by zero.