21
Feb 16

## Summation sign rules: identities for simple regression

Summation sign rules: identities for simple regression

There are many sources on the Internet. This and this are relatively simple, while this one is pretty advanced. They cover the basics. My purpose is more specific: to show how to obtain a couple of identities in terms of summation signs from general properties of variance and covariance.

Shortcut for covariance. This is a name of the following identity

(1) $E(X-EX)(Y-EY)=E(XY)-(EX)(EY)$

where on the left we have the definition of $Cov(X,Y)$ and on the right we have an alternative expression (a shortcut) for the same thing. Letting $X=Y$ in (1) we get a shortcut for variance:

(2) $E(X-EX)^2=E(X^2)-(EX)^2,$

see the direct proof here. Again, on the left we have the definition of $Var(X)$ and on the right a shortcut for the same.

In this post I mentioned that

for a discrete uniformly distributed variable with a finite number of elements, the population mean equals the sample mean if the sample is the whole population.

This is what it means. The most useful definition of a discrete random variable is this: it is a table values+probabilities of type

 Values $X_1$$X_1$ ... $X_n$$X_n$ Probabilities $p_1$$p_1$ ... $p_n$$p_n$

Here $X_1,...,X_n$ are the values and $p_1,...,p_n$ are the probabilities (they sum to one). With this table, it is easy to define the mean of $X$:

(3) $EX=\sum_{i=1}^nX_ip_i.$

A variable like this is called uniformly distributed if all probabilities are the same:

 Values $X_1$$X_1$ ... $X_n$$X_n$ Probabilities $1/n$$1/n$ ... $1/n$$1/n$

In this case (3) becomes

(4) $EX=\bar{X}.$

This explains the statement from my post. Using (4), equations (1) and (2) rewrite as

(5) $\overline{(X-\bar{X})(Y-\bar{Y})}=\overline{XY}-\bar{X}\bar{Y},\ \overline{(X-\bar{X})^2}=\overline{X^2}-(\bar{X})^2.$

Try to write this using summation signs. For example, the first identity in (5) becomes

$\frac{1}{n}\sum_{i=1}^n\big(X_i-\frac{1}{n}\sum_{i=1}^nX_i\big)\big(Y_i-\frac{1}{n}\sum_{i=1}^nY_i\big)$ $=\frac{1}{n}\sum_{i=1}^nX_iY_i-\big(\frac{1}{n}\sum_{i=1}^nX_i\big)\big(\frac{1}{n}\sum_{i=1}^nY_i\big).$

This is crazy and trying to prove this directly would be even crazier.

Remark. Let $X_1,...,X_n$ be a sample from an arbitrary distribution. Regardless of the parent distribution, the artificial uniform distribution from Table 2 can still be applied to the sample. To avoid confusion with the expected value $E$ with respect to the parent distribution, instead of (4) we can write

(6) $E_uX=\bar{X}$

where the subscript $u$ stands for "uniform". With that understanding, equations (5) are still true. The power of this approach is that all expressions in (5) are random variables which allows for further application of the expected value $E$ with respect to the parent distribution.