24
Oct 22

## A problem to do once and never come back

There is a problem I gave on the midterm that does not require much imagination. Just know the definitions and do the technical work, so I was hoping we could put this behind us. Turned out we could not and thus you see this post.

Problem. Suppose the joint density of variables $X,Y$ is given by

$f_{X,Y}(x,y)=\left\{ \begin{array}{c}k\left( e^{x}+e^{y}\right) \text{ for }0

I. Find $k$.

II. Find marginal densities of $X,Y$. Are $X,Y$ independent?

III. Find conditional densities $f_{X|Y},\ f_{Y|X}$.

IV. Find $EX,\ EY$.

When solving a problem like this, the first thing to do is to give the theory. You may not be able to finish without errors the long calculations but your grade will be determined by the beginning theoretical remarks.

### I. Finding the normalizing constant

Any density should satisfy the completeness axiom: the area under the density curve (or in this case the volume under the density surface) must be equal to one: $\int \int f_{X,Y}(x,y)dxdy=1.$ The constant $k$ chosen to satisfy this condition is called a normalizing constant. The integration in general is over the whole plain $R^{2}$ and the first task is to express the above integral as an iterated integral. This is where the domain where the density is not zero should be taken into account. There is little you can do without geometry. One example of how to do this is here.

The shape of the area $A=\left\{ (x,y):0 is determined by a) the extreme values of $x,y$ and b) the relationship between them. The extreme values are 0 and 1 for both $x$ and $y$, meaning that $A$ is contained in the square $\left\{ (x,y):0 The inequality $y means that we cut out of this square the triangle below the line $y=x$ (it is really the lower triangle because if from a point on the line $y=x$ we move down vertically, $x$ will stay the same and $y$ will become smaller than $x$).

In the iterated integral:

a) the lower and upper limits of integration for the inner integral are the boundaries for the inner variable; they may depend on the outer variable but not on the inner variable.

b) the lower and upper limits of integration for the outer integral are the extreme values for the outer variable; they must be constant.

This is illustrated in Pane A of Figure 1.

Figure 1. Integration order

Always take the inner integral in parentheses to show that you are dealing with an iterated integral.

a) In the inner integral integrating over $x$ means moving along blue arrows from the boundary $x=y$ to the boundary $x=1.$ The boundaries may depend on $y$ but not on $x$ because the outer integral is over $y.$

b) In the outer integral put the extreme values for the outer variable. Thus,

$\underset{A}{\int \int }f_{X,Y}(x,y)dxdy=\int_{0}^{1}\left(\int_{y}^{1}f_{X,Y}(x,y)dx\right) dy.$

Check that if we first integrate over $y$ (vertically along red arrows, see Pane B in Figure 1) then the equation

$\underset{A}{\int \int }f_{X,Y}(x,y)dxdy=\int_{0}^{1}\left(\int_{0}^{x}f_{X,Y}(x,y)dy\right) dx$

results.

In fact, from the definition $A=\left\{ (x,y):0 one can see that the inner interval for $x$ is $\left[ y,1\right]$ and for $y$ it is $\left[ 0,x\right] .$

### II. Marginal densities

The condition for independence of $X,Y$ is $f_{X,Y}\left( x,y\right) =f_{X}\left( x\right) f_{Y}\left( y\right)$ (this is a direct analog of the independence condition for events $P\left( A\cap B\right) =P\left( A\right) P\left( B\right)$). In words: the joint density decomposes into a product of individual densities.

### III. Conditional densities

In this case the easiest is to recall the definition of conditional probability $P\left( A|B\right) =\frac{P\left( A\cap B\right) }{P\left(B\right) }.$ The definition of conditional densities $f_{X|Y},\ f_{Y|X}$ is quite similar:

(2) $f_{X|Y}\left( x|y\right) =\frac{f_{X,Y}\left( x,y\right) }{f_{Y}\left( y\right) },\ f_{Y|X}\left( y|x\right) =\frac{f_{X,Y}\left( x,y\right) }{f_{X}\left( x\right) }$.

Of course, $f_{Y}\left( y\right) ,f_{X}\left( x\right)$ here can be replaced by their marginal equivalents.

### IV. Finding expected values of $X,Y$

The usual definition $EX=\int xf_{X}\left( x\right) dx$ takes an equivalent form using the marginal density:

$EX=\int x\left( \int f_{X,Y}\left( x,y\right) dy\right) dx=\int \int xf_{X,Y}\left( x,y\right) dydx.$

Which equation to use is a matter of convenience.

Another replacement in the usual definition gives the definition of conditional expectations:

$E\left( X|Y\right) =\int xf_{X|Y}\left( x|y\right) dx,$ $E\left( Y|X\right) =\int yf_{Y|X}\left( y|x\right) dx.$

Note that these are random variables: $E\left( X|Y=y\right)$ depends in $y$ and $E\left( Y|X=x\right)$ depends on $x.$

### Solution to the problem

Being a lazy guy, for the problem this post is about I provide answers found in Mathematica:

I. $k=0.581977$

II. $f_{X}\left( x\right) =-1+e^{x}\left( 1+x\right) ,$ for $x\in[ 0,1],$ $f_{Y}\left( y\right) =e-e^{y}y,$ for $y\in \left[ 0,1\right] .$

It is readily seen that the independence condition is not satisfied.

III. $f_{X|Y}\left( x|y\right) =\frac{k\left( e^{x}+e^{y}\right) }{e-e^{y}y}$ for $0

$f_{Y|X}\left(y|x\right) =\frac{k\left(e^x+e^y\right) }{-1+e^x\left( 1+x\right) }$ for $0

IV. $EX=0.709012,$ $EY=0.372965.$

13
Apr 19

## Checklist for Quantitative Finance FN3142

Students of FN3142 often think that they can get by by picking a few technical tricks. The questions below are mostly about intuition that helps to understand and apply those tricks.

Everywhere we assume that $...,Y_{t-1},Y_t,Y_{t+1},...$ is a time series and $...,I_{t-1},I_t,I_{t+1},...$ is a sequence of corresponding information sets. It is natural to assume that $I_t\subset I_{t+1}$ for all $t.$ We use the short conditional expectation notation: $E_tX=E(X|I_t)$.

### Questions

Question 1. How do you calculate conditional expectation in practice?

Question 2. How do you explain $E_t(E_tX)=E_tX$?

Question 3. Simplify each of $E_tE_{t+1}X$ and $E_{t+1}E_tX$ and explain intuitively.

Question 4. $\varepsilon _t$ is a shock at time $t$. Positive and negative shocks are equally likely. What is your best prediction now for tomorrow's shock? What is your best prediction now for the shock that will happen the day after tomorrow?

Question 5. How and why do you predict $Y_{t+1}$ at time $t$? What is the conditional mean of your prediction?

Question 6. What is the error of such a prediction? What is its conditional mean?

Question 7. Answer the previous two questions replacing $Y_{t+1}$ by $Y_{t+p}$.

Question 8. What is the mean-plus-deviation-from-mean representation (conditional version)?

Question 9. How is the representation from Q.8 reflected in variance decomposition?

Question 10. What is a canonical form? State and prove all properties of its parts.

Question 11. Define conditional variance for white noise process and establish its link with the unconditional one.

Question 12. How do you define the conditional density in case of two variables, when one of them serves as the condition? Use it to prove the LIE.

Question 13. Write down the joint distribution function for a) independent observations and b) for serially dependent observations.

Question 14. If one variable is a linear function of another, what is the relationship between their densities?

Question 15. What can you say about the relationship between $a,b$ if $f(a)=f(b)$? Explain geometrically the definition of the quasi-inverse function.

Answer 1. Conditional expectation is a complex notion. There are several definitions of differing levels of generality and complexity. See one of them here and another in Answer 12.

The point of this exercise is that any definition requires a lot of information and in practice there is no way to apply any of them to actually calculate conditional expectation. Then why do they juggle conditional expectation in theory? The efficient market hypothesis comes to rescue: it is posited that all observed market data incorporate all available information, and, in particular, stock prices are already conditioned on $I_t.$

Answers 2 and 3. This is the best explanation I have.

Answer 4. Since positive and negative shocks are equally likely, the best prediction is $E_t\varepsilon _{t+1}=0$ (I call this equation a martingale condition). Similarly, $E_t\varepsilon _{t+2}=0$ but in this case I prefer to see an application of the LIE: $E_{t}\varepsilon _{t+2}=E_t(E_{t+1}\varepsilon _{t+2})=E_t0=0.$

Answer 5. The best prediction is $\hat{Y}_{t+1}=E_tY_{t+1}$ because it minimizes $E_t(Y_{t+1}-f(I_t))^2$ among all functions $f$ of current information $I_t.$ Formally, you can use the first order condition

$\frac{d}{df(I_t)}E_t(Y_{t+1}-f(I_t))^2=-2E_t(Y_{t+1}-f(I_t))=0$

to find that $f(I_t)=E_tf(I_t)=E_tY_{t+1}$ is the minimizing function. By the projector property
$E_t\hat{Y}_{t+1}=E_tE_tY_{t+1}=E_tY_{t+1}=\hat{Y}_{t+1}.$

Answer 6. It is natural to define the prediction error by

$\hat{\varepsilon}_{t+1}=Y_{t+1}-\hat{Y}_{t+1}=Y_{t+1}-E_tY_{t+1}.$

By the projector property $E_t\hat{\varepsilon}_{t+1}=E_tY_{t+1}-E_tY_{t+1}=0$.

Answer 7. To generalize, just change the subscripts. For the prediction we have to use two subscripts: the notation $\hat{Y}_{t,t+p}$ means that we are trying to predict what happens at a future date $t+p$ based on info set $I_t$ (time $t$ is like today). Then by definition $\hat{Y} _{t,t+p}=E_tY_{t+p},$ $\hat{\varepsilon}_{t,t+p}=Y_{t+p}-E_tY_{t+p}.$

Answer 8. Answer 7, obviously, implies $Y_{t+p}=\hat{Y}_{t,t+p}+\hat{\varepsilon}_{t,t+p}.$ The simple case is here.

Answer 9. See the law of total variance and change it to reflect conditioning on $I_t.$

Answer 11. Combine conditional variance definition with white noise definition.

Answer 12. The conditional density is defined similarly to the conditional probability. Let $X,Y$ be two random variables. Denote $p_X$ the density of $X$ and $p_{X,Y}$ the joint density. Then the conditional density of $Y$ conditional on $X$ is defined as $p_{Y|X}(y|x)=\frac{p_{X,Y}(x,y)}{p_X(x)}.$ After this we can define the conditional expectation $E(Y|X)=\int yp_{Y|X}(y|x)dy.$ With these definitions one can prove the Law of Iterated Expectations:

$E[E(Y|X)]=\int E(Y|x)p_X(x)dx=\int \left( \int yp_{Y|X}(y|x)dy\right) p_X(x)dx$

$=\int \int y\frac{p_{X,Y}(x,y)}{p_X(x)}p_X(x)dxdy=\int \int yp_{X,Y}(x,y)dxdy=EY.$

This is an illustration to Answer 1 and a prelim to Answer 13.

Answer 13. Understanding this answer is essential for Section 8.6 on maximum likelihood of Patton's guide.

a) In case of independent observations $X_1,...,X_n$ the joint density of the vector $X=(X_1,...,X_n)$ is a product of individual densities:

$p_X(x_1,...,x_n)=p_{X_1}(x_1)...p_{X_n}(x_n).$

b) In the time series context it is natural to assume that the next observation depends on the previous ones, that is, for each $t,$ $X_t$ depends on $X_1,...,X_{t-1}$ (serially dependent observations). Therefore we should work with conditional densities $p_{X_1,...,X_t|X_1,...,X_{t-1}}.$ From Answer 12 we can guess how to make conditional densities appear:

$p_{X_1,...,X_n}(x_1,...,x_n)= \frac{p_{X_1,...,X_n}(x_1,...,x_n)}{p_{X_1,...,X_{n-1}}(x_1,...,x_{n-1})} \frac{p_{X_1,...,X_{n-1}}(x_1,...,x_{n-1})}{p_{X_1,...,X_{n-2}}(x_1,...,x_{n-2})}... \frac{p_{X_1,X_2}(x_1,x_2)}{p_{X_1}(x_1)}p_{X_1}(x_1).$

The fractions on the right are recognized as conditional probabilities. The resulting expression is pretty awkward:

$p_{X_1,...,X_n}(x_1,...,x_n)=p_{X_1,...,X_n|X_1,...,X_n-1}(x_1,...,x_n|x_1,...,x_{n-1})\times$

$\times p_{X_1,...,X_{n-1}|X_1,...,X_{n-2}}(x_1,...,x_{n-1}|x_1,...,x_{n-2})... \times$

$p_{X_1,X_2|X_1}(x_1,x_2|x_1)p_{X_1}(x_1).$

Answer 14. The answer given here helps one understand how to pass from the density of the standard normal to that of the general normal.

Answer 15. This elementary explanation of the function definition can be used in the fifth grade. Note that conditions sufficient for existence of the inverse are not satisfied in a case as simple as the distribution function of the Bernoulli variable (when the graph of the function has flat pieces and is not continuous). Therefore we need a more general definition of an inverse. Those who think that this question is too abstract can check out UoL exams, where examinees are required to find Value at Risk when the distribution function is a step function. To understand the idea, do the following:

a) Draw a graph of a good function $f$ (continuous and increasing).

b) Fix some value $y_0$ in the range of this function and identify the region $\{y:y\ge y_0\}$.

c) Find the solution $x_0$ of the equation $f(x)=y_0$. By definition, $x_0=f^{-1}(y_o).$ Identify the region $\{x:f(x)\ge y_0\}$.

d) Note that $x_0=\min\{x:f(x)\ge y_0\}$. In general, for bad functions the minimum here may not exist. Therefore minimum is replaced by infimum, which gives us the definition of the quasi-inverse:

$x_0=\inf\{x:f(x)\ge y_0\}$.