Oct 17

Conditional-mean-plus-remainder representation

Conditional-mean-plus-remainder representation: we separate the main part from the remainder and find out the remainder properties. My post on properties of conditional expectation is an elementary introduction to conditioning. This is my first post in Quantitative Finance.

A brush-up on conditional expectations

  1. Notation. Let X be a random variable and let I be an information set. Instead of the usual notation E(X|I) for conditional expectation, in large expressions it's better to use the notation with I in the subscript: E_IX=E(X|I).

  2. Generalized homogeneity. If f(I) depends only on information I, then E_I(f(I)X)=f(I)E_I(X) (a function of known information is known and behaves like a constant). A special case is E_I(f(I))=f(I)E_I(1)=f(I). With f(I)=E_I(X) we get E_I(E_I(X))=E_I(X). This shows that conditioning is a projector: if you project a point in a 3D space onto a 2D plane and then project the image of the point onto the same plane, the result will be the same image as from single projecting.

  3. Additivity. E_I(X+Y)=E_IX+E_IY.

  4. Law of iterated expectations (LIE). If we know about two information sets that I_1\subset I_2, then E_{I_1}E_{I_2}X=E_{I_1}X. I like the geometric explanation in terms of projectors. Projecting a point onto a plane and then projecting the result onto a straight line is the same as projecting the point directly onto the straight line.

Conditional-mean-plus-remainder representation

This is a direct generalization of the mean-plus-deviation-from-mean decomposition. There we wrote X=EX+(X-EX) and denoted \mu=EX,~\varepsilon=X-EX to obtain X=\mu+\varepsilon with the property E\varepsilon=0.

Here we write X=E_IX+(X-E_IX) and denote \varepsilon=X-E_IX the remainder. Then the representation is

(1) X=E_IX+\varepsilon.

Properties. 1) E_I\varepsilon=E_IX-E_IX=0 (remember, this is a random variable identically equal to zero, not a number zero).

2) Conditional covariance is obtained from the usual covariance by replacing all usual expectations by conditional. Thus, by definition,


For the components in (1) we have


3) Var_I(\varepsilon)=E_I(\varepsilon-E_I\varepsilon)^{2}=E_I(X-E_IX)^2=Var_I(X).

Jan 17

Conditional variance properties


Review Properties of conditional expectation, especially the summary, where I introduce a new notation for conditional expectation. Everywhere I use the notation E_Y\pi for expectation of \pi conditional on Y, instead of E(\pi|Y).

This post and the previous one on conditional expectation show that conditioning is a pretty advanced notion. Many introductory books use the condition E_xu=0 (the expected value of the error term u=0 conditional on the regressor x is zero). Because of the complexity of conditioning, I think it's better to avoid this kind of assumption as much as possible.

Conditional variance properties

Replacing usual expectations by their conditional counterparts in the definition of variance, we obtain the definition of conditional variance:

(1) Var_Y(X)=E_Y(X-E_YX)^2.

Property 1. If X,Y are independent, then X-EX and Y are also independent and conditioning doesn't change variance:


see Conditioning in case of independence.

Property 2. Generalized homogeneity of degree 2: if a is a deterministic function, then a^2(Y) can be pulled out:


Property 3. Shortcut for conditional variance:

(2) Var_Y(X)=E_Y(X^2)-(E_YX)^2.



(distributing conditional expectation)


(applying Properties 2 and 6 from this Summary with a(Y)=E_YX)


Property 4The law of total variance:

(3) Var(X)=Var(E_YX)+E[Var_Y(X)].

Proof. By the shortcut for usual variance and the law of iterated expectations


(replacing E_Y(X^2) from (2))


(the last two terms give the shortcut for variance of E_YX)


Before we move further we need to define conditional covariance by

Cov_Y(S,T) = E_Y(S - E_YS)(T - E_YT)

(everywhere usual expectations are replaced by conditional ones). We say that random variables S,T are conditionally uncorrelated if Cov_Y(S,T) = 0.

Property 5. Conditional variance of a linear combination. For any random variables S,T and functions a(Y),b(Y) one has

Var_Y(a(Y)S + b(Y)T)=a^2(Y)Var_Y(S)+2a(Y)b(Y)Cov_Y(S,T)+b^2(Y)Var_Y(T).

The proof is quite similar to that in case of usual variances, so we leave it to the reader. In particular, if S,T are conditionally uncorrelated, then the interaction terms disappears:

Var_Y(a(Y)S + b(Y)T)=a^2(Y)Var_Y(S)+b^2(Y)Var_Y(T).