2
Jan 17

Conditional variance properties

Preliminaries

Review Properties of conditional expectation, especially the summary, where I introduce a new notation for conditional expectation. Everywhere I use the notation E_Y\pi for expectation of \pi conditional on Y, instead of E(\pi|Y).

This post and the previous one on conditional expectation show that conditioning is a pretty advanced notion. Many introductory books use the condition E_xu=0 (the expected value of the error term u=0 conditional on the regressor x is zero). Because of the complexity of conditioning, I think it's better to avoid this kind of assumption as much as possible.

Conditional variance properties

Replacing usual expectations by their conditional counterparts in the definition of variance, we obtain the definition of conditional variance:

(1) Var_Y(X)=E_Y(X-E_YX)^2.

Property 1. If X,Y are independent, then X-EX and Y are also independent and conditioning doesn't change variance:

Var_Y(X)=E_Y(X-EX)^2=E(X-EX)^2=Var(X),

see Conditioning in case of independence.

Property 2. Generalized homogeneity of degree 2: if a is a deterministic function, then a^2(Y) can be pulled out:

Var_Y(a(Y)X)=E_Y[a(Y)X-E_Y(a(Y)X)]^2=E_Y[a(Y)X-a(Y)E_YX]^2

=E_Y[a^2(Y)(X-E_YX)^2]=a^2(Y)E_Y(X-E_YX)^2=a^2(Y)Var_Y(X).

Property 3. Shortcut for conditional variance:

(2) Var_Y(X)=E_Y(X^2)-(E_YX)^2.

Proof.

Var_Y(X)=E_Y(X-E_YX)^2=E_Y[X^2-2XE_YX+(E_YX)^2]

(distributing conditional expectation)

=E_YX^2-2E_Y(XE_YX)+E_Y(E_YX)^2

(applying Properties 2 and 6 from this Summary with a(Y)=E_YX)

=E_YX^2-2(E_YX)^2+(E_YX)^2=E_YX^2-(E_YX)^2.

Property 4The law of total variance:

(3) Var(X)=Var(E_YX)+E[Var_Y(X)].

Proof. By the shortcut for usual variance and the law of iterated expectations

Var(X)=EX^2-(EX)^2=E[E_Y(X^2)]-[E(E_YX)]^2

(replacing E_Y(X^2) from (2))

=E[Var_Y(X)]+E(E_YX)^2-[E(E_YX)]^2

(the last two terms give the shortcut for variance of E_YX)

=E[Var_Y(X)]+Var(E_YX).

Before we move further we need to define conditional covariance by

Cov_Y(S,T) = E_Y(S - E_YS)(T - E_YT)

(everywhere usual expectations are replaced by conditional ones). We say that random variables S,T are conditionally uncorrelated if Cov_Y(S,T) = 0.

Property 5. Conditional variance of a linear combination. For any random variables S,T and functions a(Y),b(Y) one has

Var_Y(a(Y)S + b(Y)T)=a^2(Y)Var_Y(S)+2a(Y)b(Y)Cov_Y(S,T)+b^2(Y)Var_Y(T).

The proof is quite similar to that in case of usual variances, so we leave it to the reader. In particular, if S,T are conditionally uncorrelated, then the interaction terms disappears:

Var_Y(a(Y)S + b(Y)T)=a^2(Y)Var_Y(S)+b^2(Y)Var_Y(T).

25
Oct 16

Properties of variance

All properties of variance in one place

Certainty is the mother of quiet and repose, and uncertainty the cause of variance and contentions. Edward Coke

Preliminaries: study properties of means with proofs.

Definition. Yes, uncertainty leads to variance, and we measure it by Var(X)=E(X-EX)^2. It is useful to use the name deviation from mean for X-EX and realize that E(X-EX)=0, so that the mean of the deviation from mean cannot serve as a measure of variation of X around EX.

Property 1. Variance of a linear combination. For any random variables X,Y and numbers a,b one has
(1) Var(aX + bY)=a^2Var(X)+2abCov(X,Y)+b^2Var(Y).
The term 2abCov(X,Y) in (1) is called an interaction term. See this post for the definition and properties of covariance.
Proof.
Var(aX + bY)=E[aX + bY -E(aX + bY)]^2

(using linearity of means)
=E(aX + bY-aEX -bEY)^2

(grouping by variable)
=E[a(X-EX)+b(Y-EY)]^2

(squaring out)
=E[a^2(X-EX)^2+2ab(X-EX)(Y-EY)+(Y-EY)^2]

(using linearity of means and definitions of variance and covariance)
=a^2Var(X) + 2abCov(X,Y) +b^2Var(Y).
Property 2. Variance of a sum. Letting in (1) a=b=1 we obtain
Var(X + Y) = Var(X) + 2Cov(X,Y)+Var(Y).

Property 3. Homogeneity of degree 2. Choose b=0 in (1) to get
Var(aX)=a^2Var(X).
Exercise. What do you think is larger: Var(X+Y) or Var(X-Y)?
Property 4. If we add a constant to a variable, its variance does not change: Var(X+c)=E[X+c-E(X+c)]^2=E(X+c-EX-c)^2=E(X-EX)^2=Var(X)
Property 5. Variance of a constant is zero: Var(c)=E(c-Ec)^2=0.

Property 6. Nonnegativity. Since the squared deviation from mean (X-EX)^2 is nonnegative, its expectation is nonnegativeE(X-EX)^2\ge 0.

Property 7. Only a constant can have variance equal to zero: If Var(X)=0, then E(X-EX)^2 =(x_1-EX)^2p_1 +...+(x_n-EX)^2p_n=0, see the definition of the expected value. Since all probabilities are positive, we conclude that x_i=EX for all i, which means that X is identically constant.

Property 8. Shortcut for variance. We have an identity E(X-EX)^2=EX^2-(EX)^2. Indeed, squaring out gives

E(X-EX)^2 =E(X^2-2XEX+(EX)^2)

(distributing expectation)

=EX^2-2E(XEX)+E(EX)^2

(expectation of a constant is constant)

=EX^2-2(EX)^2+(EX)^2=EX^2-(EX)^2.

All of the above properties apply to any random variables. The next one is an exception in the sense that it applies only to uncorrelated variables.

Property 9. If variables are uncorrelated, that is Cov(X,Y)=0, then from (1) we have Var(aX + bY)=a^2Var(X)+b^2Var(Y). In particular, letting a=b=1, we get additivityVar(X+Y)=Var(X)+Var(Y). Recall that the expected value is always additive.

GeneralizationsVar(\sum a_iX_i)=\sum a_i^2Var(X_i) and Var(\sum X_i)=\sum Var(X_i) if all X_i are uncorrelated.

Among my posts, where properties of variance are used, I counted 12 so far.

21
Feb 16

Summation sign rules: identities for simple regression

Summation sign rules: identities for simple regression

There are many sources on the Internet. This and this are relatively simple, while this one is pretty advanced. They cover the basics. My purpose is more specific: to show how to obtain a couple of identities in terms of summation signs from general properties of variance and covariance.

Shortcut for covariance. This is a name of the following identity

(1) E(X-EX)(Y-EY)=E(XY)-(EX)(EY)

where on the left we have the definition of Cov(X,Y) and on the right we have an alternative expression (a shortcut) for the same thing. Letting X=Y in (1) we get a shortcut for variance:

(2) E(X-EX)^2=E(X^2)-(EX)^2,

see the direct proof here. Again, on the left we have the definition of Var(X) and on the right a shortcut for the same.

In this post I mentioned that

for a discrete uniformly distributed variable with a finite number of elements, the population mean equals the sample mean if the sample is the whole population.

This is what it means. The most useful definition of a discrete random variable is this: it is a table values+probabilities of type

Table 1. Discrete random variable with n values 
Values X_1 ... X_n
Probabilities p_1 ... p_n

Here X_1,...,X_n are the values and p_1,...,p_n are the probabilities (they sum to one). With this table, it is easy to define the mean of X:

(3) EX=\sum_{i=1}^nX_ip_i.

A variable like this is called uniformly distributed if all probabilities are the same:

Table 2. Uniformly distributed discrete random variable with n values
Values X_1 ... X_n
Probabilities 1/n ... 1/n

In this case (3) becomes

(4) EX=\bar{X}.

This explains the statement from my post. Using (4), equations (1) and (2) rewrite as

(5) \overline{(X-\bar{X})(Y-\bar{Y})}=\overline{XY}-\bar{X}\bar{Y},\ \overline{(X-\bar{X})^2}=\overline{X^2}-(\bar{X})^2.

Try to write this using summation signs. For example, the first identity in (5) becomes

\frac{1}{n}\sum_{i=1}^n\big(X_i-\frac{1}{n}\sum_{i=1}^nX_i\big)\big(Y_i-\frac{1}{n}\sum_{i=1}^nY_i\big)

=\frac{1}{n}\sum_{i=1}^nX_iY_i-\big(\frac{1}{n}\sum_{i=1}^nX_i\big)\big(\frac{1}{n}\sum_{i=1}^nY_i\big).

This is crazy and trying to prove this directly would be even crazier.

Remark. Let X_1,...,X_n be a sample from an arbitrary distribution. Regardless of the parent distribution, the artificial uniform distribution from Table 2 can still be applied to the sample. To avoid confusion with the expected value E with respect to the parent distribution, instead of (4) we can write

(6) E_uX=\bar{X}

where the subscript u stands for "uniform". With that understanding, equations (5) are still true. The power of this approach is that all expressions in (5) are random variables which allows for further application of the expected value E with respect to the parent distribution.