2
Jan 17

## Conditional variance properties

### Preliminaries

Review Properties of conditional expectation, especially the summary, where I introduce a new notation for conditional expectation. Everywhere I use the notation $E_Y\pi$ for expectation of $\pi$ conditional on $Y$, instead of $E(\pi|Y)$.

This post and the previous one on conditional expectation show that conditioning is a pretty advanced notion. Many introductory books use the condition $E_xu=0$ (the expected value of the error term $u=0$ conditional on the regressor $x$ is zero). Because of the complexity of conditioning, I think it's better to avoid this kind of assumption as much as possible.

### Conditional variance properties

Replacing usual expectations by their conditional counterparts in the definition of variance, we obtain the definition of conditional variance:

(1) $Var_Y(X)=E_Y(X-E_YX)^2.$

Property 1. If $X,Y$ are independent, then $X-EX$ and $Y$ are also independent and conditioning doesn't change variance:

$Var_Y(X)=E_Y(X-EX)^2=E(X-EX)^2=Var(X),$

Property 2. Generalized homogeneity of degree 2: if $a$ is a deterministic function, then $a^2(Y)$ can be pulled out:

$Var_Y(a(Y)X)=E_Y[a(Y)X-E_Y(a(Y)X)]^2=E_Y[a(Y)X-a(Y)E_YX]^2$ $=E_Y[a^2(Y)(X-E_YX)^2]=a^2(Y)E_Y(X-E_YX)^2=a^2(Y)Var_Y(X).$

Property 3. Shortcut for conditional variance:

(2) $Var_Y(X)=E_Y(X^2)-(E_YX)^2.$

Proof.

$Var_Y(X)=E_Y(X-E_YX)^2=E_Y[X^2-2XE_YX+(E_YX)^2]$

(distributing conditional expectation)

$=E_YX^2-2E_Y(XE_YX)+E_Y(E_YX)^2$

(applying Properties 2 and 6 from this Summary with $a(Y)=E_YX$)

$=E_YX^2-2(E_YX)^2+(E_YX)^2=E_YX^2-(E_YX)^2.$

Property 4The law of total variance:

(3) $Var(X)=Var(E_YX)+E[Var_Y(X)].$

Proof. By the shortcut for usual variance and the law of iterated expectations

$Var(X)=EX^2-(EX)^2=E[E_Y(X^2)]-[E(E_YX)]^2$

(replacing $E_Y(X^2)$ from (2))

$=E[Var_Y(X)]+E(E_YX)^2-[E(E_YX)]^2$

(the last two terms give the shortcut for variance of $E_YX$)

$=E[Var_Y(X)]+Var(E_YX).$

Before we move further we need to define conditional covariance by

$Cov_Y(S,T) = E_Y(S - E_YS)(T - E_YT)$

(everywhere usual expectations are replaced by conditional ones). We say that random variables $S,T$ are conditionally uncorrelated if $Cov_Y(S,T) = 0$.

Property 5. Conditional variance of a linear combination. For any random variables $S,T$ and functions $a(Y),b(Y)$ one has

$Var_Y(a(Y)S + b(Y)T)=a^2(Y)Var_Y(S)+2a(Y)b(Y)Cov_Y(S,T)+b^2(Y)Var_Y(T).$

The proof is quite similar to that in case of usual variances, so we leave it to the reader. In particular, if $S,T$ are conditionally uncorrelated, then the interaction terms disappears:

$Var_Y(a(Y)S + b(Y)T)=a^2(Y)Var_Y(S)+b^2(Y)Var_Y(T).$
18
Jul 16

## Properties of conditional expectation

### Background

A company sells a product and may offer a discount. We denote by $X$ the sales volume and by $Y$ the discount amount (per unit). For simplicity, both variables take only two values. They depend on each other. If the sales are high, the discount may be larger. A higher discount, in its turn, may attract more buyers. At the same level of sales, the discount may vary depending on the vendor's costs. With the same discount, the sales vary with consumer preferences. Along with the sales and discount, we consider a third variable that depends on both of them. It can be the profit $\pi$.

### Formalization

The sales volume $X$ takes values $x_1,x_2$ with probabilities $p_i^X=P(X=x_i)$$i=1,2$. Similarly, the discount $Y$ takes values $y_1,y_2$ with probabilities $p_i^Y=P(Y=y_i)$$i=1,2$. The joint events have joint probabilities denoted $P(X=x_i,Y=y_j)=p_{i,j}$. The profit in the event $X=x_i,Y=y_j$ is denoted $\pi_{i,j}$. This information is summarized in Table 1.

 $y_1$$y_1$ $y_1$$y_1$ $x_1$$x_1$ $\pi_{1,1},\ p_{1,1}$$\pi_{1,1},\ p_{1,1}$ $\pi_{1,2},\ p_{1,2}$$\pi_{1,2},\ p_{1,2}$ $p_1^X$$p_1^X$ $x_2$$x_2$ $\pi_{2,1},\ p_{2,1}$$\pi_{2,1},\ p_{2,1}$ $\pi_{2,2},\ p_{2,2}$$\pi_{2,2},\ p_{2,2}$ $p_2^X$$p_2^X$ $p_1^Y$$p_1^Y$ $p_2^Y$$p_2^Y$

Comments. In the left-most column and upper-most row we have values of the sales and discount. In the "margins" (last row and last column) we put probabilities of those values. In the main body of the table we have profit values and their probabilities. It follows that the expected profit is

(1) $E\pi=\pi_{1,1}p_{1,1}+\pi_{1,2}p_{1,2}+\pi_{2,1}p_{2,1}+\pi_{2,2}p_{2,2}.$

### Conditioning

Suppose that the vendor fixes the discount at $y_1$. Then only the column containing this value is relevant. To get numbers that satisfy the completeness axiom, we define conditional probabilities

$P(X=x_1|Y=y_1)=\frac{p_{11}}{p_1^Y},\ P(X=x_2|Y=y_1)=\frac{p_{21}}{p_1^Y}.$

This allows us to define conditional expectation

(2) $E(\pi|Y=y_1)=\pi_{11}\frac{p_{11}}{p_1^Y}+\pi_{21}\frac{p_{21}}{p_1^Y}.$

Similarly, if the discount is fixed at $y_2$,

(3) $E(\pi|Y=y_2)=\pi_{12}\frac{p_{12}}{p_2^Y}+\pi_{22}\frac{p_{22}}{p_2^Y}.$

Equations (2) and (3) are joined in the notation $E(\pi|Y)$.

Property 1. While the usual expectation (1) is a number, the conditional expectation $E(\pi|Y)$ is a function of the value of $Y$ on which the conditioning is being done. Since it is a function of $Y$, it is natural to consider it a random variable defined by the next table

 Values Probabilities $E(\pi|Y=y_1)$$E(\pi|Y=y_1)$ $p_1^Y$$p_1^Y$ $E(\pi|Y=y_2)$$E(\pi|Y=y_2)$ $p_2^Y$$p_2^Y$

Property 2. Law of iterated expectations: the mean of the conditional expectation equals the usual mean. Indeed, using Table 2, we have

$E[E(\pi|Y)]=E(\pi|Y=y_1)p_1^Y+E(\pi|Y=y_2)p_2^Y$ (applying (2) and (3))

$=\left[\pi_{11}\frac{p_{11}}{p_1^Y}+\pi_{21}\frac{p_{21}}{p_1^Y}\right]p_1^Y+\left[\pi_{12}\frac{p_{12}}{p_2^Y}+\pi_{22}\frac{p_{22}}{p_2^Y}\right]p_2^Y$ $=\pi_{1,1}p_{1,1}+\pi_{1,2}p_{1,2}+\pi_{2,1}p_{2,1}+\pi_{2,2}p_{2,2}=E\pi.$

Property 3. Generalized homogeneity. In the usual homogeneity $E(aX)=aEX$$a$ is a number. In the generalized homogeneity

(4) $E(a(Y)\pi|Y)=a(Y)E(\pi|Y),$

$a(Y)$ is allowed to be a  function of the variable on which we are conditioning. See for yourself: using (2), for instance,

$E(a(y_1)\pi|Y=y_1)=a(y_1)\pi_{11}\frac{p_{11}}{p_1^Y}+a(y_1)\pi_{21}\frac{p_{21}}{p_1^Y}$ $=a(y_1)\left[\pi_{11}\frac{p_{11}}{p_1^Y}+\pi_{21}\frac{p_{21}}{p_1^Y}\right]=a(y_1)E(X|Y=y_1).$

Property 4. Additivity. For any random variables $S,T$ we have

(5) $E(S+T|Y)=E(S|Y)+E(T|Y).$

The proof is left as an exercise.

Property 5. Generalized linearity. For any random variables $S,T$ and functions $a(Y),b(Y)$ equations (4) and (5) imply

$E(a(Y)S+b(Y)T|Y)=a(Y)E(S|Y)+b(Y)E(T|Y).$

Property 6. Conditioning in case of independence. This property has to do with the informational aspect of conditioning. The usual expectation (1) takes into account all contingencies. (2) and (3) are based on the assumption that one contingency for $Y$ has been realized, so that the other one becomes irrelevant. Therefore $E(\pi|Y)$ is considered  an updated version of (1) that takes into account the arrival of new information that the value of $Y$ has been fixed. Now we can state the property itself: if $X,Y$ are independent, then $E(X|Y)=EX$, that is, conditioning on $Y$ does not improve our knowledge of $EX$.

Proof. In case of independence we have $p_{i,j}=p_i^Xp_j^Y$ for all $i,j$, so that

$E(X|Y=y_j)=x_1\frac{p_{1j}}{p_j^Y}+x_2\frac{p_{2j}}{p_j^Y}=x_1p_1^X+x_2p_2^X=EX.$

Property 7. Conditioning in case of complete dependence. Conditioning of $Y$ on $Y$ gives the most precise information: $E(Y|Y)=Y$ (if we condition $Y$ on $Y$, we know about it everything and there is no averaging). More generally, $E(f(Y)|Y)=f(Y)$ for any deterministic function $f$.

Proof. If we condition $Y$ on $Y$, the conditional probabilities become

$p_{11}=P(Y=y_1|Y=y_1)=1,\ p_{21}=P(Y=y_2|Y=y_1)=0.$

Hence, (2) gives

$E(f(Y)|Y=y_1)=f(y_1)\times 1+f(y_2)\times 0=f(y_1).$

Conditioning on $Y=y_2$ is treated similarly.

### Summary

Not many people know that using the notation $E_Y\pi$ for conditional expectation instead of $E(\pi|Y)$ makes everything much clearer. I rewrite the above properties using this notation:

1. Law of iterated expectations: $E(E_Y\pi)=E\pi$
2. Generalized homogeneity$E_Y(a(Y)\pi)=a(Y)E_Y\pi$
3. Additivity: For any random variables $S,T$ we have $E_Y(S+T)=E_YS+E_YT$
4. Generalized linearity: For any random variables $S,T$ and functions $a(Y),b(Y)$ one has $E_Y(a(Y)S+b(Y)T)=a(Y)E_YS+b(Y)E_YT$
5. Conditioning in case of independence: if $X,Y$ are independent, then $E_YX=EX$
6. Conditioning in case of complete dependence$E_Yf(Y)=f(Y)$ for any deterministic function $f$.