Dec 21

Analysis of problems with conditioning

Analysis of problems with conditioning

These problems are among the most difficult. It's important to work out a general approach to such problems. All references are to J. Abdey,  Advanced statistics: distribution theory, ST2133, University of London, 2021.

General scheme

Step 1. Conditioning is usually suggested by the problem statement: Y is conditioned on X.

Your life will be easier if you follow the notation used in the guide: use p for probability mass functions (discrete variables) and f for (probability) density functions (continuous variables).

a) If Y|X and X both are discrete (Example 5.1, Example 5.13, Example 5.18):

p_{Y}\left( y\right) =\sum_{Set}p_{Y\vert X}\left( y\vert x\right) p_{X}\left(  x\right) .

b) If Y|X and X both are continuous (Activity 5.6):

f_{Y}\left( y\right) =\int_{Set}f_{Y\vert X}\left( y\vert x\right) f_{X}\left(  x\right) dx.

c) If Y|X is discrete, X is continuous (Example 5.2, Activity 5.5):

p_{Y}\left( y\right) =\int_{Set}p_{Y\vert X}\left( y\vert x\right) f_{X}\left(  x\right) dx

d) If Y|X is continuous, X is discrete (Activity 5.12):

f_{Y}\left( y\right) =\sum_{Set}f_{Y\vert X}\left( y\vert x\right) p_{X}\left(  x\right) .

In all cases you need to figure out Set over which to sum or integrate.

Step 2. Write out the conditional densities/probabilities with the same arguments
as in your conditional equation.

Step 3. Reduce the result to one of known distributions using the completeness

Example 5.1

Let X denote the number of hurricanes which form in a given year, and let Y denote the number of these which make landfall. Suppose each hurricane has a probability of \pi making landfall independent of other hurricanes. Given the number of hurricanes x, then Y can be thought of as the number of successes in x independent and identically distributed Bernoulli trials. We can write this as Y|X=x\sim Bin(x,\pi ). Suppose we also have that X\sim Pois(\lambda ). Find the distribution of Y (noting that X\geq Y ).


Step 1. The number of hurricanes X takes values 0,1,2,... and is distributed as Poisson. The number of landfalls for a given X=x is binomial with values y=0,...,x. It follows that Set=\{x:x\ge y\}.

Write the general formula for conditional probability:

p_{Y}\left( y\right) =\sum_{x=y}^{\infty }p_{Y\vert X}\left( y\vert x\right)  p_{X}\left( x\right) .

Step 2. Specifying the distributions:

p_{X}\left( x\right) =e^{-\mu }\frac{\mu ^{x}}{x!}, where x=0,1,2,...,


P\left( Bin\left( x,\pi \right) =y\right) =p_{Y\vert X}\left( y\vert x\right)  =C_{x}^{y}\pi ^{y}\left( 1-\pi \right) ^{x-y} where y\leq x.

Step 3. Reduce the result to one of known distributions:

p_{Y}\left( y\right) =\sum_{x=y}^{\infty }C_{x}^{y}\pi ^{y}\left( 1-\pi  \right) ^{x-y}e^{-\mu }\frac{\mu ^{x}}{x!}

(pull out of summation everything that does not depend on summation variable

=\frac{e^{-\mu }\mu ^{y}}{y!}\pi ^{y}\sum_{x=y}^{\infty }\frac{1}{\left(  x-y\right) !}\left( \mu \left( 1-\pi \right) \right) ^{x-y}

(replace x-y=z to better see the structure)

=\frac{e^{-\mu }\mu ^{y}}{y!}\pi ^{y}\sum_{z=0}^{\infty }\frac{1}{z!}\left(  \mu \left( 1-\pi \right) \right) ^{z}

(using the completeness axiom \sum_{x=0}^{\infty }\frac{\mu ^{x}}{x!}=e^{\mu } for the Poisson variable)

=\frac{e^{-\mu }}{y!}\left( \mu \pi \right) ^{y}e^{\mu \left( 1-\pi \right)  }=\frac{e^{-\mu \pi }}{y!}\left( \mu \pi \right) ^{y}=p_{Pois(\mu \pi  )}\left( y\right) .


Jul 16

Properties of conditional expectation

Properties of conditional expectation


A company sells a product and may offer a discount. We denote by X the sales volume and by Y the discount amount (per unit). For simplicity, both variables take only two values. They depend on each other. If the sales are high, the discount may be larger. A higher discount, in its turn, may attract more buyers. At the same level of sales, the discount may vary depending on the vendor's costs. With the same discount, the sales vary with consumer preferences. Along with the sales and discount, we consider a third variable that depends on both of them. It can be the profit \pi.


The sales volume X takes values x_1,x_2 with probabilities p_i^X=P(X=x_i)i=1,2. Similarly, the discount Y takes values y_1,y_2 with probabilities p_i^Y=P(Y=y_i)i=1,2. The joint events have joint probabilities denoted P(X=x_i,Y=y_j)=p_{i,j}. The profit in the event X=x_i,Y=y_j is denoted \pi_{i,j}. This information is summarized in Table 1.

Table 1. Values and probabilities of the profit function
y_1 y_1
x_1 \pi_{1,1},\ p_{1,1} \pi_{1,2},\ p_{1,2} p_1^X
x_2 \pi_{2,1},\ p_{2,1} \pi_{2,2},\ p_{2,2} p_2^X
p_1^Y p_2^Y

Comments. In the left-most column and upper-most row we have values of the sales and discount. In the "margins" (last row and last column) we put probabilities of those values. In the main body of the table we have profit values and their probabilities. It follows that the expected profit is

(1) E\pi=\pi_{1,1}p_{1,1}+\pi_{1,2}p_{1,2}+\pi_{2,1}p_{2,1}+\pi_{2,2}p_{2,2}.


Suppose that the vendor fixes the discount at y_1. Then only the column containing this value is relevant. To get numbers that satisfy the completeness axiom, we define conditional probabilities

P(X=x_1|Y=y_1)=\frac{p_{11}}{p_1^Y},\ P(X=x_2|Y=y_1)=\frac{p_{21}}{p_1^Y}.

This allows us to define conditional expectation

(2) E(\pi|Y=y_1)=\pi_{11}\frac{p_{11}}{p_1^Y}+\pi_{21}\frac{p_{21}}{p_1^Y}.

Similarly, if the discount is fixed at y_2,

(3) E(\pi|Y=y_2)=\pi_{12}\frac{p_{12}}{p_2^Y}+\pi_{22}\frac{p_{22}}{p_2^Y}.

Equations (2) and (3) are joined in the notation E(\pi|Y).

Property 1. While the usual expectation (1) is a number, the conditional expectation E(\pi|Y) is a function of the value of Y on which the conditioning is being done. Since it is a function of Y, it is natural to consider it a random variable defined by the next table

Table 2. Conditional expectation is a random variable
Values Probabilities
E(\pi|Y=y_1) p_1^Y
E(\pi|Y=y_2) p_2^Y

Property 2. Law of iterated expectations: the mean of the conditional expectation equals the usual mean. Indeed, using Table 2, we have

E[E(\pi|Y)]=E(\pi|Y=y_1)p_1^Y+E(\pi|Y=y_2)p_2^Y (applying (2) and (3))

=\left[\pi_{11}\frac{p_{11}}{p_1^Y}+\pi_{21}\frac{p_{21}}{p_1^Y}\right]p_1^Y+\left[\pi_{12}\frac{p_{12}}{p_2^Y}+\pi_{22}\frac{p_{22}}{p_2^Y}\right]p_2^Y =\pi_{1,1}p_{1,1}+\pi_{1,2}p_{1,2}+\pi_{2,1}p_{2,1}+\pi_{2,2}p_{2,2}=E\pi.

Property 3. Generalized homogeneity. In the usual homogeneity E(aX)=aEXa is a number. In the generalized homogeneity

(4) E(a(Y)\pi|Y)=a(Y)E(\pi|Y),

a(Y) is allowed to be a  function of the variable on which we are conditioning. See for yourself: using (2), for instance,

E(a(y_1)\pi|Y=y_1)=a(y_1)\pi_{11}\frac{p_{11}}{p_1^Y}+a(y_1)\pi_{21}\frac{p_{21}}{p_1^Y} =a(y_1)\left[\pi_{11}\frac{p_{11}}{p_1^Y}+\pi_{21}\frac{p_{21}}{p_1^Y}\right]=a(y_1)E(X|Y=y_1).

Property 4. Additivity. For any random variables S,T we have

(5) E(S+T|Y)=E(S|Y)+E(T|Y).

The proof is left as an exercise.

Property 5. Generalized linearity. For any random variables S,T and functions a(Y),b(Y) equations (4) and (5) imply


Property 6. Conditioning in case of independence. This property has to do with the informational aspect of conditioning. The usual expectation (1) takes into account all contingencies. (2) and (3) are based on the assumption that one contingency for Y has been realized, so that the other one becomes irrelevant. Therefore E(\pi|Y) is considered  an updated version of (1) that takes into account the arrival of new information that the value of Y has been fixed. Now we can state the property itself: if X,Y are independent, then E(X|Y)=EX, that is, conditioning on Y does not improve our knowledge of EX.

Proof. In case of independence we have p_{i,j}=p_i^Xp_j^Y for all i,j, so that


Property 7. Conditioning in case of complete dependence. Conditioning of Y on Y gives the most precise information: E(Y|Y)=Y (if we condition Y on Y, we know about it everything and there is no averaging). More generally, E(f(Y)|Y)=f(Y) for any deterministic function f.

Proof. If we condition Y on Y, the conditional probabilities become

p_{11}=P(Y=y_1|Y=y_1)=1,\ p_{21}=P(Y=y_2|Y=y_1)=0.

Hence, (2) gives

E(f(Y)|Y=y_1)=f(y_1)\times 1+f(y_2)\times 0=f(y_1).

Conditioning on Y=y_2 is treated similarly.


Not many people know that using the notation E_Y\pi for conditional expectation instead of E(\pi|Y) makes everything much clearer. I rewrite the above properties using this notation:

  1. Law of iterated expectations: E(E_Y\pi)=E\pi
  2. Generalized homogeneityE_Y(a(Y)\pi)=a(Y)E_Y\pi
  3. Additivity: For any random variables S,T we have E_Y(S+T)=E_YS+E_YT
  4. Generalized linearity: For any random variables S,T and functions a(Y),b(Y) one has E_Y(a(Y)S+b(Y)T)=a(Y)E_YS+b(Y)E_YT
  5. Conditioning in case of independence: if X,Y are independent, then E_YX=EX
  6. Conditioning in case of complete dependenceE_Yf(Y)=f(Y) for any deterministic function f.