8
May 18

## Different faces of vector variance: again visualization helps

In the previous post we defined variance of a column vector $X$ with $n$ components by

$V(X)=E(X-EX)(X-EX)^T.$

In terms of elements this is the same as:

(1) $V(X)=\left(\begin{array}{cccc}V(X_1)&Cov(X_1,X_2)&...&Cov(X_1,X_n)\\Cov(X_2,X_1)&V(X_2)&...&Cov(X_2,X_n)\\...&...&...&...\\Cov(X_n,X_1)&Cov(X_n,X_2)&...&V(X_n)\end{array}\right).$

## So why knowing the structure of this matrix is so important?

Let $X_1,...,X_n$ be random variables and let $a_1,...,a_n$ be numbers. In the derivation of the variance of the slope estimator for simple regression we have to deal with the expression of type

(2) $V\left(\sum_{i=1}^na_iX_i\right).$

Question 1. How do you multiply a sum by a sum? I mean, how do you use summation signs to find the product $\left(\sum_{i=1}^na_i\right)\left(\sum_{i=1}^nb_i\right)$?

Answer 1. Whenever you have problems with summation signs, try to do without them. The product

$\left(a_1+...+a_n\right)\left(b_1+...+b_n\right)=a_1b_1+...+a_1b_n+...+a_nb_1+...+a_nb_n$

should contain ALL products $a_ib_j.$ Again, a matrix visualization will help:

$\left(\begin{array}{ccc}a_1b_1&...&a_1b_n\\...&...&...\\a_nb_1&...&a_nb_n\end{array}\right).$

The product we are looking for should contain all elements of this matrix. So the answer is

(3) $\left(\sum_{i=1}^na_i\right)\left(\sum_{i=1}^nb_i\right)=\sum_{i=1}^n\sum_{j=1}^na_ib_j.$

Formally, we can write $\sum_{i=1}^nb_i=\sum_{j=1}^nb_j$ (the sum does not depend on the index of summation, this is another point many students don't understand) and then perform the multiplication in (3).

Question 2. What is the expression for (2) in terms of covariances of components?

Answer 2. If you understand Answer 1 and know the relationship between variances and covariances, it should be clear that

(4) $V\left(\sum_{i=1}^na_iX_i\right)=Cov(\sum_{i=1}^na_iX_i,\sum_{i=1}^na_iX_i)$

$=Cov(\sum_{i=1}^na_iX_i,\sum_{j=1}^na_jX_j)=\sum_{i=1}^n\sum_{j=1}^na_ia_jCov(X_i,X_j).$

Question 3. In light of (1), separate variances from covariances in (4).

Answer 3. When $i=j,$ we have $Cov(X_i,X_j)=V(X_i),$ which are diagonal elements of (1). Otherwise, for $i\neq j$ we get off-diagonal elements of (1). So the answer is

(5) $V\left(\sum_{i=1}^na_iX_i\right)=\sum_{i=1}^na_i^2V(X_i)+\sum_{i\neq j}a_ia_jCov(X_i,X_j).$

Once again, in the first sum on the right we have only variances. In the second sum, the indices $i,j$ are assumed to run from $1$ to $n$, excluding the diagonal $i=j.$

Corollary. If $X_{i}$ are uncorrelated, then the second sum in (5) disappears:

(6) $V\left(\sum_{i=1}^na_iX_i\right)=\sum_{i=1}^na_i^2V(X_i).$

This fact has been used (with a slightly different explanation) in the derivation of the variance of the slope estimator for simple regression.

Question 4. Note that the matrix (1) is symmetric (elements above the main diagonal equal their mirror siblings below that diagonal). This means that some terms in the second sum on the right of (5) are repeated twice. If you group equal terms in (5), what do you get?

Answer 4. The idea is to write

$a_ia_jCov(X_i,X_j)+a_ia_jCov(X_j,X_i)=2a_ia_jCov(X_i,X_j),$

that is, to join equal elements above and below the main diagonal in (1). For this, you need to figure out how to write a sum of the elements that are above the main diagonal. Make a bigger version of (1) (with more off-diagonal elements) to see that the elements that are above the main diagonal are listed in the sum $\sum_{i=1}^{n-1}\sum_{j=i+1}^n.$ This sum can also be written as $\sum_{1\leq i Hence, (5) is the same as

(7) $V\left(\sum_{i=1}^na_iX_i\right)=\sum_{i=1}^na_i^2V(X_i)+2\sum_{i=1}^{n-1}\sum_{j=i+1}^na_ia_jCov(X_i,X_j)$

$=\sum_{i=1}^na_i^2V(X_i)+2\sum_{1\leq i

Unlike (6), this equation is applicable when there is autocorrelation.