The Newey-West estimator: uncorrelated and correlated data
I hate long posts but here we by necessity have to go through all ideas and calculations, to understand what is going on. One page of formulas in A. Patton's guide to FN3142 Quantitative Finance in my rendition becomes three posts.
Preliminaries and autocovariance function
Let be random variables. We need to recall that the variance of the vector
is
(1)
With the help of this matrix we derived two expressions for variance of a linear combination:
(2)
for uncorrelated variables and
(3)
when there is autocorrelation.
In a time series context are observations along time.
stand for moments in time and the sequence
is called a time series. We need to recall the definition of a stationary process. Of that definition, we will use only the part about covariances:
depends only on the distance
between the time moments
For example, in the top right corner of (1) we have
which depends only on
Preamble. Let be a stationary times series. Firstly,
depends only on
Secondly, for all integer
denoting
we have
(4)
Definition. The autocovariance function is defined by
(5) for all integer
In particular,
(6) for all
The preamble shows that definition (5) is correct (the right side in (5) depends only on and not on
). Because of (4) we have symmetry
so negative
can be excluded from consideration.
With (5) and (6) for a stationary series (1) becomes
(7)
Estimating variance of a sample mean
Uncorrelated observations. Suppose are uncorrelated observations from the same population with variance
From (2)
we get
(8)
This is a theoretical relationship. To actually obtain an estimator of the sample variance, we need to replace by some estimator. It is known that
(9)
consistently estimates Plugging it in (8) we see that variance of the sample mean is consistently estimated by
This is the estimator derived on p.151 of Patton's guide.
Correlated observations. In this case we use (3):
.
Here visualization comes in handy. The sums in the square brackets include all terms on the main diagonal of (7) and above it. That is, we have copies of
copies of
,..., 2 copies of
and 1 copy of
The sum in the brackets is
Thus we obtain the first equation on p.152 of Patton's guide (it's up to you to match the notation):
(10)
As above, this is just a theoretical relationship. is estimated by (9). Ideally, the estimator of
is obtained by replacing all population means by sample means:
(11)
There are two problems with this estimator, though. The first problem is that when runs from
to
runs from
to
To exclude out-of-sample values, the summation in (11) is reduced:
(12)
The second problem is that the sum in (12) becomes too small when is close to
For example, for
(12) contains just one term (there is no averaging). Therefore the upper limit of summation
in (10) is replaced by some function
that tends to infinity slower than
The result is the estimator
where is given by (9) and
is given by (12). This is almost the Newey-West estimator from p.152. The only difference is that instead of
they use
, and I have no idea why. One explanation is that for low
,
can be zero, so they just wanted to avoid division by zero.