21
Feb 16

Summation sign rules: identities for simple regression

Summation sign rules: identities for simple regression

There are many sources on the Internet. This and this are relatively simple, while this one is pretty advanced. They cover the basics. My purpose is more specific: to show how to obtain a couple of identities in terms of summation signs from general properties of variance and covariance.

Shortcut for covariance. This is a name of the following identity

(1) E(X-EX)(Y-EY)=E(XY)-(EX)(EY)

where on the left we have the definition of Cov(X,Y) and on the right we have an alternative expression (a shortcut) for the same thing. Letting X=Y in (1) we get a shortcut for variance:

(2) E(X-EX)^2=E(X^2)-(EX)^2,

see the direct proof here. Again, on the left we have the definition of Var(X) and on the right a shortcut for the same.

In this post I mentioned that

for a discrete uniformly distributed variable with a finite number of elements, the population mean equals the sample mean if the sample is the whole population.

This is what it means. The most useful definition of a discrete random variable is this: it is a table values+probabilities of type

Table 1. Discrete random variable with n values 
Values X_1 ... X_n
Probabilities p_1 ... p_n

Here X_1,...,X_n are the values and p_1,...,p_n are the probabilities (they sum to one). With this table, it is easy to define the mean of X:

(3) EX=\sum_{i=1}^nX_ip_i.

A variable like this is called uniformly distributed if all probabilities are the same:

Table 2. Uniformly distributed discrete random variable with n values
Values X_1 ... X_n
Probabilities 1/n ... 1/n

In this case (3) becomes

(4) EX=\bar{X}.

This explains the statement from my post. Using (4), equations (1) and (2) rewrite as

(5) \overline{(X-\bar{X})(Y-\bar{Y})}=\overline{XY}-\bar{X}\bar{Y},\ \overline{(X-\bar{X})^2}=\overline{X^2}-(\bar{X})^2.

Try to write this using summation signs. For example, the first identity in (5) becomes

\frac{1}{n}\sum_{i=1}^n\big(X_i-\frac{1}{n}\sum_{i=1}^nX_i\big)\big(Y_i-\frac{1}{n}\sum_{i=1}^nY_i\big) =\frac{1}{n}\sum_{i=1}^nX_iY_i-\big(\frac{1}{n}\sum_{i=1}^nX_i\big)\big(\frac{1}{n}\sum_{i=1}^nY_i\big).

This is crazy and trying to prove this directly would be even crazier.

Remark. Let X_1,...,X_n be a sample from an arbitrary distribution. Regardless of the parent distribution, the artificial uniform distribution from Table 2 can still be applied to the sample. To avoid confusion with the expected value E with respect to the parent distribution, instead of (4) we can write

(6) E_uX=\bar{X}

where the subscript u stands for "uniform". With that understanding, equations (5) are still true. The power of this approach is that all expressions in (5) are random variables which allows for further application of the expected value E with respect to the parent distribution.

7 Responses for "Summation sign rules: identities for simple regression"

  1. […] is the starting point. Such a variable, by definition, is a table values+probabilities, see this post, and its mean is . If that random variable is uniformly distributed, in the same post we explain […]

  2. […] leave equation (4) in this form. Do use equations (5) from this post to rewrite equation (4) […]

  3. […] Property 2. Independent variables are uncorrelated: . This follows immediately from multiplicativity and the shortcut for covariance: […]

  4. […] arise from the corresponding population characteristics as explained in this post. Directly from (1) and (2) we see […]

  5. […] Here I argue that for the purposes of obtaining some identities from the general properties of means instead of the sample variance it's better to use the variance defined by (with division by  instead of ). Using Facts 1 and 2 we get from (1) that […]

  6. […] recall our convention regarding the notation of sample versus population […]

Leave a Reply

You must be logged in to post a comment.