13
Apr 19

## Checklist for Quantitative Finance FN3142

Students of FN3142 often think that they can get by by picking a few technical tricks. The questions below are mostly about intuition that helps to understand and apply those tricks.

Everywhere we assume that $...,Y_{t-1},Y_t,Y_{t+1},...$ is a time series and $...,I_{t-1},I_t,I_{t+1},...$ is a sequence of corresponding information sets. It is natural to assume that $I_t\subset I_{t+1}$ for all $t.$ We use the short conditional expectation notation: $E_tX=E(X|I_t)$.

### Questions

Question 1. How do you calculate conditional expectation in practice?

Question 2. How do you explain $E_t(E_tX)=E_tX$?

Question 3. Simplify each of $E_tE_{t+1}X$ and $E_{t+1}E_tX$ and explain intuitively.

Question 4. $\varepsilon _t$ is a shock at time $t$. Positive and negative shocks are equally likely. What is your best prediction now for tomorrow's shock? What is your best prediction now for the shock that will happen the day after tomorrow?

Question 5. How and why do you predict $Y_{t+1}$ at time $t$? What is the conditional mean of your prediction?

Question 6. What is the error of such a prediction? What is its conditional mean?

Question 7. Answer the previous two questions replacing $Y_{t+1}$ by $Y_{t+p}$.

Question 8. What is the mean-plus-deviation-from-mean representation (conditional version)?

Question 9. How is the representation from Q.8 reflected in variance decomposition?

Question 10. What is a canonical form? State and prove all properties of its parts.

Question 11. Define conditional variance for white noise process and establish its link with the unconditional one.

Question 12. How do you define the conditional density in case of two variables, when one of them serves as the condition? Use it to prove the LIE.

Question 13. Write down the joint distribution function for a) independent observations and b) for serially dependent observations.

Question 14. If one variable is a linear function of another, what is the relationship between their densities?

Question 15. What can you say about the relationship between $a,b$ if $f(a)=f(b)$? Explain geometrically the definition of the quasi-inverse function.

Answer 1. Conditional expectation is a complex notion. There are several definitions of differing levels of generality and complexity. See one of them here and another in Answer 12.

The point of this exercise is that any definition requires a lot of information and in practice there is no way to apply any of them to actually calculate conditional expectation. Then why do they juggle conditional expectation in theory? The efficient market hypothesis comes to rescue: it is posited that all observed market data incorporate all available information, and, in particular, stock prices are already conditioned on $I_t.$

Answers 2 and 3. This is the best explanation I have.

Answer 4. Since positive and negative shocks are equally likely, the best prediction is $E_t\varepsilon _{t+1}=0$ (I call this equation a martingale condition). Similarly, $E_t\varepsilon _{t+2}=0$ but in this case I prefer to see an application of the LIE: $E_{t}\varepsilon _{t+2}=E_t(E_{t+1}\varepsilon _{t+2})=E_t0=0.$

Answer 5. The best prediction is $\hat{Y}_{t+1}=E_tY_{t+1}$ because it minimizes $E_t(Y_{t+1}-f(I_t))^2$ among all functions $f$ of current information $I_t.$ Formally, you can use the first order condition

$\frac{d}{df(I_t)}E_t(Y_{t+1}-f(I_t))^2=-2E_t(Y_{t+1}-f(I_t))=0$

to find that $f(I_t)=E_tf(I_t)=E_tY_{t+1}$ is the minimizing function. By the projector property
$E_t\hat{Y}_{t+1}=E_tE_tY_{t+1}=E_tY_{t+1}=\hat{Y}_{t+1}.$

Answer 6. It is natural to define the prediction error by

$\hat{\varepsilon}_{t+1}=Y_{t+1}-\hat{Y}_{t+1}=Y_{t+1}-E_tY_{t+1}.$

By the projector property $E_t\hat{\varepsilon}_{t+1}=E_tY_{t+1}-E_tY_{t+1}=0$.

Answer 7. To generalize, just change the subscripts. For the prediction we have to use two subscripts: the notation $\hat{Y}_{t,t+p}$ means that we are trying to predict what happens at a future date $t+p$ based on info set $I_t$ (time $t$ is like today). Then by definition $\hat{Y} _{t,t+p}=E_tY_{t+p},$ $\hat{\varepsilon}_{t,t+p}=Y_{t+p}-E_tY_{t+p}.$

Answer 8. Answer 7, obviously, implies $Y_{t+p}=\hat{Y}_{t,t+p}+\hat{\varepsilon}_{t,t+p}.$ The simple case is here.

Answer 9. See the law of total variance and change it to reflect conditioning on $I_t.$

Answer 11. Combine conditional variance definition with white noise definition.

Answer 12. The conditional density is defined similarly to the conditional probability. Let $X,Y$ be two random variables. Denote $p_X$ the density of $X$ and $p_{X,Y}$ the joint density. Then the conditional density of $Y$ conditional on $X$ is defined as $p_{Y|X}(y|x)=\frac{p_{X,Y}(x,y)}{p_X(x)}.$ After this we can define the conditional expectation $E(Y|X)=\int yp_{Y|X}(y|x)dy.$ With these definitions one can prove the Law of Iterated Expectations:

$E[E(Y|X)]=\int E(Y|x)p_X(x)dx=\int \left( \int yp_{Y|X}(y|x)dy\right) p_X(x)dx$

$=\int \int y\frac{p_{X,Y}(x,y)}{p_X(x)}p_X(x)dxdy=\int \int yp_{X,Y}(x,y)dxdy=EY.$

This is an illustration to Answer 1 and a prelim to Answer 13.

Answer 13. Understanding this answer is essential for Section 8.6 on maximum likelihood of Patton's guide.

a) In case of independent observations $X_1,...,X_n$ the joint density of the vector $X=(X_1,...,X_n)$ is a product of individual densities:

$p_X(x_1,...,x_n)=p_{X_1}(x_1)...p_{X_n}(x_n).$

b) In the time series context it is natural to assume that the next observation depends on the previous ones, that is, for each $t,$ $X_t$ depends on $X_1,...,X_{t-1}$ (serially dependent observations). Therefore we should work with conditional densities $p_{X_1,...,X_t|X_1,...,X_{t-1}}.$ From Answer 12 we can guess how to make conditional densities appear:

$p_{X_1,...,X_n}(x_1,...,x_n)=\frac{p_{X_1,...,X_n}(x_1,...,x_n)}{ p_{X_1,...,X_{n-1}}(x_1,...,x_{n-1})}\frac{p_{X_1,...,X_{n-1}}(x_1,...,x_{n-1})}{ p_{X_1,...,X_{n-2}}(x_1,...,x_{n-2})}...\frac{p_{X_1,X_2}(x_1,x_2)}{p_{X_1}(x_1)}p_{X_1}(x_1).$

The fractions on the right are recognized as conditional probabilities. The resulting expression is pretty awkward:

$p_{X_1,...,X_n}(x_1,...,x_n)=p_{X_1,...,X_n|X_1,...,X_n-1}(x_1,...,x_n|x_1,...,x_{n-1})\times$ $\times p_{X_1,...,X_{n-1}|X_1,...,X_{n-2}}(x_1,...,x_{n-1}|x_1,...,x_{n-2})...\times$ $p_{X_1,X_2|X_1}(x_1,x_2|x_1)p_{X_1}(x_1).$

Answer 14. The answer given here helps one understand how to pass from the density of the standard normal to that of the general normal.

Answer 15. This elementary explanation of the function definition can be used in the fifth grade. Note that conditions sufficient for existence of the inverse are not satisfied in a case as simple as the distribution function of the Bernoulli variable (when the graph of the function has flat pieces and is not continuous). Therefore we need a more general definition of an inverse. Those who think that this question is too abstract can check out UoL exams, where examinees are required to find Value at Risk when the distribution function is a step function. To understand the idea, do the following:

a) Draw a graph of a good function $f$ (continuous and increasing).

b) Fix some value $y_0$ in the range of this function and identify the region $\{y:y\ge y_0\}$.

c) Find the solution $x_0$ of the equation $f(x)=y_0$. By definition, $x_0=f^{-1}(y_o).$ Identify the region $\{x:f(x)\ge y_0\}$.

d) Note that $x_0=\min\{x:f(x)\ge y_0\}$. In general, for bad functions the minimum here may not exist. Therefore minimum is replaced by infimum, which gives us the definition of the quasi-inverse:

$x_0=\inf\{x:f(x)\ge y_0\}$.