Apr 20

FN3142 Chapter 13. Risk management and Value-at-Risk: Models

FN3142 Chapter 13. Risk management and Value-at-Risk: Models

Chapter 13 is divided into 5 parts. For each part, there is a video with the supporting pdf file. Both have been created in Notability using an iPad. All files are here.

Part 1. Distribution function with two examples and generalized inverse function.

Part 2. Value-at-Risk definition

Part 3. Empirical distribution function and its estimation

Part 4. Models based on flexible distributions

Part 5. Semiparametric models, nonparametric estimation of densities and historical simulation.

Besides, in the subchapter named Expected shortfall you can find additional information. It is not in the guide but it was required by one of the past UoL exams.

Mar 20

FN3142 Chapter 12. Forecast comparison and combining

FN3142 Chapter 12. Forecast comparison and combining

Like most universities, we switched to online teaching because of COVID-19. This is what I prepared for my Quantitative Finance course and want to make available for everybody. The lecture is divided in three parts. Download the pdf and watch the corresponding video.

Diebold-Mariano test

Diebold-Mariano test

Chapter 12. Part 1.mp4

Chapter 12. Forecast comparison and combination. Part 1.pdf

Chapter 12. Part 2.mp4

Chapter 12. Forecast comparison and combination. Part 2.pdf

Chapter 12. Part 3.mp4

Chapter 12. Part 3. Forecast encompassing and combining

See another post on Newey-West estimator


Oct 19

Leverage effect: the right definition and explanation

Leverage effect: the right definition and explanation

The guide by Andrew Patton for Quantitative Finance FN3142 states that "stock returns are negatively correlated with changes in volatility: that is, volatility tends to rise following bad news (a negative return) and fall following good news (a positive return)", with reference to Black (1976). This is not quite so, as can be seen from the following chart.

S&P500 versus VIX

S&P500 versus VIX: leverage effect


The candlebars (in green and light red) show the index S&P 500, which is an average of the stock prices of the largest 500 publicly traded companies. The continuous purple line shows the VIX, one of widely used measures of volatility. Between the yellow vertical lines at A and B the return has been predominantly positive, yet in the beginning of that period the volatility has been high. The graph clearly shows that there is a negative correlation between asset prices and volatility, not between return and volatility. Thus the proper definition of the leverage effect is "negative correlation between asset prices and volatility". There are different explanations of the effect, here is the one I prefer.

At all times, market participants try to maximize profits and minimize losses. This motivation results in different behaviors during the market cycle.

Near the bottom (below line at E)

During the slump to the left of point A. Out of fear that a great depression is coming, everybody is dumping stocks, trying to stay in cash and gold as much as possible. That's why the price drops quickly and volatility is high.

During the recovery to the right of point A. Some investors consider many stocks cheap and try to load up, buying stocks in large quantities. Others are not convinced that the recovery has started. Opposing opinions and swift purchases, made possible by large amounts of cash on hands, increase volatility.

Near the top (above line at F)

To the left of point B. Stocks are bought in small quantities, for the following reasons: 1) to avoid a sharp increase in price, 2) out of fear that the rally will soon end, and 3) not much cash is left in portfolios, so it's mainly portfolio rebalancing (buy stocks with potential to grow, sell those that have stalled).

To the right of point B. Investors sell stocks in small quantities, to take profits and in anticipation of a new downturn.

The main difference between what happens at the bottom and at the top is in the relative amount of cash and stocks in portfolios. This is why near the top there is little volatility. Of course, there are other reasons. Look at what happened between points C and D. The S&P level was relatively high, but there was a lot of uncertainty about the US-China trade war. Trump with his tweets contributed a lot to that volatility. This article claims that somebody made billions of dollars trading on his tweets. Insider trading is prohibited but not for the mighty of this world.

Final remark. If the recovery from the trough at point A took just three months, why worry and sell stocks during the fall? There are three answers. Firstly, after the big recession of 2008, it took the markets five years to fully recover to the pre-2008 level, and that's what scares everybody. Secondly, the best stocks after the recovery will not be the same as before the fall. Thirdly, one can make money on the way down too, provided one has spare cash.

May 19

Question 1 from UoL exam 2016, Zone B, Post 2

Question 1 from UoL exam 2016, Zone B, Post 2

For the problem statement and first part of the solution see Question 1 from UoL exam 2016, Zone B, Post 1.

Let R denote the return on P_1+P_2. From Table 1 we can derive the probabilities table for this return:

Table 2. Joint table of returns on separate portfolios

0 -100
R_2 0 0.96^2 0.04\cdot 0.96
-100 0.04\cdot 0.96 0.04^2

From Table 2 we conclude that the return on the combined portfolio looks as follows:

Table 3. Total return

R \text{Prob}
0 0.96^2
-50 2\cdot 0.04\cdot 0.96
-100 0.04^2

Table 3 shows that

F_R(x)=0 for x<-100,

F_R(x)=0.04^2=0.0016 for -100\leq x<-50,

F_R(x)=2\cdot 0.04\cdot 0.96+0.0016=0.0784 for -50\leq x<0 and

F_R(x)=0.96^2+0.0784=1 for x\geq 0.

Try to follow the procedure used in Post 1 and you will see that

F_R^{-1}(y)=+\infty for y>1,

F_R^{-1}(y)=0 for 0.0784<y\leq 1,

F_R^{-1}(y)=-50 for 0.0016<y\leq 0.0784,

F_R^{-1}(y)=-100 for 0<y\leq 0.0016 and

F_R^{-1}(y)=-\infty for y\leq 0.

This implies VaR_R^\alpha=F_R^{-1}(0.05)=-50.

(b) The subadditivity definition requires amounts opposite in sign to ours. That is, we define \widetilde{VaR^\alpha} from P(X\leq -\widetilde{VaR^\alpha})=\alpha and then say that VaR thus defined is sub-additive if \widetilde{VaR^\alpha}(P_1+P_2)\leq \widetilde{VaR^\alpha}(P_1)+\widetilde{VaR^\alpha}(P_2). We have been using the definition P(X\leq VaR^\alpha)=\alpha. It's easy to see that \widetilde{VaR^\alpha}=-VaR^\alpha. Thus, in our case we have \widetilde{VaR_R^{0.05}}=50 which is not smaller than \widetilde{VaR_{R_1}^{0.05}}+\widetilde{VaR_{R_2}^{0.05}}=0. Sub-additivity does not hold in this example. Absence of sub-additivity means that riskiness of the whole portfolio, as measured by VaR, may exceed riskiness of the sum of the portfolio parts.

(c) The problem uses the definition of the expected shortfall that yields positive values. I use everywhere the definition that gives negative values: ES^\alpha=E_t[R|R\leq VaR_{t+1}^\alpha]. Since the setup is static, this is the same as ES^\alpha=E[R|R\leq VaR^\alpha]. By definition, E(X|A)=\frac{E(X1_A)}{P(A)}, so ES^\alpha=\frac{E(R1_{\{R\leq VaR^\alpha\}})}{P(R\leq VaR^\alpha)}.

In Post 1 we found that VaR^\alpha=0 for each of R_1,R_2. The condition R_i\leq VaR^\alpha=0 places no restriction on R_i, so from Table 1

E(R_i1_{\{R_i\leq VaR^\alpha \}})=ER_i=0\cdot 0.96-100\cdot 0.04=-4,\ P(R_i\leq VaR^\alpha)=1.

As a result, ES_i^\alpha=-4\%.

Since VaR_R^\alpha=F_R^{-1}(0.05)=-50, from Table 3

E(R1_{\{R\leq -50\}})=-50\cdot 2\cdot 0.04\cdot 0.96-100\cdot 0.04^2=-4,

P(R\leq -50)=1-0.9216=0.0784.

Therefore ES_R^\alpha=-51.02. Converting everything to positive values, we have 51.02>4+4, so that sub-additivity does not hold. In fact, there is a theoretical property that it should hold. In this example it does not because of the bad behavior of the generalized inverse for distribution functions of discrete random variables.

The returns in percentages can be easily converted to those in dollars.

May 19

Question 1 from UoL exam 2016, Zone B, Post 1

Question 1 from UoL exam 2016, Zone B, Post 1

There is a hidden mine in this question, and it is caused by discreteness of the distribution function. We had a lively discussion of this oddity in my class. The answer will be given in two posts.

Question. Two corporations each have a 4% chance of going bankrupt and the event that one of the two companies will go bankrupt is independent of the event that the other company will go bankrupt. Each company has outstanding bonds. A bond from any of the two companies will return R=0% if the corporation does not go bankrupt, and if it goes bankrupt you lose the face value of the investment, i.e., R=-100%. Suppose an investor buys $1000 worth of bonds of the first corporation, which is then called portfolio P_1, and similarly, an investor buys $1000 worth of bonds of the second corporation, which is then called portfolio P_2.

(a) [40 marks] Calculate the VaR at \alpha=5\% critical level for each portfolio and for the joint portfolio P_1+P_2.

(b) [30 marks] Is VaR sub-additive in this example? Explain why the absence of sub-additivity may be a concern for risk managers.

(c) [30 marks] The expected shortfall ES^\alpha at the \alpha =5\% critical level can be defined as ES^\alpha=-E_t[R|R<-VaR_{t+1}^\alpha]. Calculate the expected shortfall for the portfolios P_1, P_2 and P_1+P_2. Is this risk measure sub-additive?

Solution. a) The return on each portfolio is a binary variable described by Table 1:

Table 1. Return on separate portfolios

R_i Prob
0 0.96
-100 0.04

Therefore the distribution function F_{R_i}(x) of the return is a piece-wise constant function equal to 0 for x<-100, to 0.04 for -100\leq x<0 and to 1 for x\geq 0, see Example 3. For instance, if x\geq 0 we can write

F_{R_i}(x)=P(R_i\leq x)=P(R_i<-100)+P(R_i=-100)+P(-100<R_i<0)+P(R_i=0)+P(0<R_i\leq  x)=0.04+0.96=1.

Return distribution function

Diagram 1. Return distribution function

Since this function is not one-to-one, it's usual inverse does not exist and we have to use the quasi-inverse, see Answer 15. As with the distribution function, we need to look at different cases.

If y>1, drawing a horizontal line at y we see that the set \{x:F_{R_i}(x)\geq y\} is empty and the infimum of an empty set is by definition +\infty (can you guess why?)

For any 1\geq y>0.04, we have \{x:F_{R_i}(x)\geq y\}=[0,+\infty ), so F_{R_i}^{-1}(y)=0.

Next, for 0<y\leq 0.04 we have \{x:F_{R_i}(x)\geq y\}=[-100,+\infty ) and F_{R_i}^{-1}(y)=-100.

Finally, if y\leq 0, we get \{x:F_{R_i}(x)\geq  y\}=(-\infty ,+\infty ) and F_{R_i}^{-1}(y)=-\infty .

Generalized inverse example

Generalized inverse example

The resulting function is bad in two ways. Firstly, it takes infinite values. In applications to VaR this should not concern us because the bad values occur in the ranges y>1 and y\leq 0.

Secondly, for practically interesting values of y\in (0,1), the graph of F_{R_i}^{-1} has flat pieces, which may be problematic. By definition, VaR^\alpha is the solution to the equation

P(R_i\leq VaR^\alpha)=\alpha .

This means that we should have VaR^\alpha=F_{R_i}^{-1}(\alpha). When we plug this value in F_{R_i}(x), we are supposed to get \alpha . However, here we don't, in general. For example, VaR^{0.02}=F_{R_i}^{-1}(0.02)=-100, while F_{R_{i}}(-100)=0.04\neq 0.02.

This happens because the usual inverse does not exist. When the usual inverse exists, we have two identities F^{-1}(F(x))=x and F(F^{-1}(y))=y. Here both are violated.

Now the definition of VaR gives VaR_{R_i}^\alpha=F_{R_i}^{-1}(0.05)=0 for each portfolio.

To be continued.

Apr 19

Solution to Question 3b) from UoL exam 2018, Zone A

Solution to Question 3b) from UoL exam 2018, Zone A

I thought that after all the work we've done with my students the answer to this question would be obvious. It was not, so I am sharing it.

Question. Consider a position consisting of a $20,000 investment in asset X and a $20,000 investment in asset Y. Assume that returns on these two assets are i.i.d. normal with mean zero, that the daily volatilities of both assets are 3%, and that the correlation coefficient between their returns is 0.4. What is the 10-day VaR at the \alpha =1\% critical level for the portfolio?

Solution. First we have to work with returns and then translate the result into dollars.

Let R_X, R_Y be the daily returns on the two assets. We are given that ER_X=ER_Y=0, \sigma (R_X)=\sigma(R_Y)=0.03, \rho(R_X,R_Y)=0.4.

Since the total investment is $40,000, the shares of the investment are s_X=s_Y=20,000/40,000=0.5. Therefore the daily return on the portfolio is R=0.5R_X+0.5R_Y, see Exercise 2.

It follows that ER=0.5ER_X+0.5ER_Y=0,

Var(R)=0.5^2\sigma^2(R_X)+2\cdot 0.5\cdot 0.5\rho (R_X,R_Y)\sigma (R_X)\sigma(R_Y)+0.5^2\sigma^2(R_Y)=0.5^2\cdot  0.03^2(1+0.8+1)=0.015^2\cdot 2.8.

These figures are for daily returns. We need to make sure that R is normally distributed. The sufficient condition for this is that the returns R_X, R_Y are jointly normally distributed. It is not mentioned in the problem statement, and we have to assume that it is satisfied.

Let R_i denote the return on day i. Under continuous compounding the daily returns are summed: if we invest M_0 initially, after the first day we have M_1=M_0e^{R_1}, after the second day we have M_2=M_1e^{R_2}=M_0e^{R_1+R_2} and so on. So the 10-day return is r=R_1+...+R_{10}.

Since the daily returns are independent and identically distributed, by additivity of variance we have

Var(r)=\sum Var(R_i)=10\cdot 0.015^2\cdot 2.8, \sigma (r)=0.015\sqrt{28}=0.079, Er=0.

r is normally distributed because the daily returns are independent. It remains to apply the VaR formula


for normal distributions. From the table of the distribution function of the standard normal \Phi ^{-1}(0.01)=-2.33. Thus, VaR^{\alpha }=0.079\cdot (-2.33)=-0.184. This translates to the minimum loss of 0.184\times 40,000=7362. Thus, with probability 1% the loss can be $7362 or more.

Apr 19

Checklist for Quantitative Finance FN3142

Checklist for Quantitative Finance FN3142

Students of FN3142 often think that they can get by by picking a few technical tricks. The questions below are mostly about intuition that helps to understand and apply those tricks.

Everywhere we assume that ...,Y_{t-1},Y_t,Y_{t+1},... is a time series and ...,I_{t-1},I_t,I_{t+1},... is a sequence of corresponding information sets. It is natural to assume that I_t\subset I_{t+1} for all t. We use the short conditional expectation notation: E_tX=E(X|I_t).


Question 1. How do you calculate conditional expectation in practice?

Question 2. How do you explain E_t(E_tX)=E_tX?

Question 3. Simplify each of E_tE_{t+1}X and E_{t+1}E_tX and explain intuitively.

Question 4. \varepsilon _t is a shock at time t. Positive and negative shocks are equally likely. What is your best prediction now for tomorrow's shock? What is your best prediction now for the shock that will happen the day after tomorrow?

Question 5. How and why do you predict Y_{t+1} at time t? What is the conditional mean of your prediction?

Question 6. What is the error of such a prediction? What is its conditional mean?

Question 7. Answer the previous two questions replacing Y_{t+1} by Y_{t+p} .

Question 8. What is the mean-plus-deviation-from-mean representation (conditional version)?

Question 9. How is the representation from Q.8 reflected in variance decomposition?

Question 10. What is a canonical form? State and prove all properties of its parts.

Question 11. Define conditional variance for white noise process and establish its link with the unconditional one.

Question 12. How do you define the conditional density in case of two variables, when one of them serves as the condition? Use it to prove the LIE.

Question 13. Write down the joint distribution function for a) independent observations and b) for serially dependent observations.

Question 14. If one variable is a linear function of another, what is the relationship between their densities?

Question 15. What can you say about the relationship between a,b if f(a)=f(b)? Explain geometrically the definition of the quasi-inverse function.


Answer 1. Conditional expectation is a complex notion. There are several definitions of differing levels of generality and complexity. See one of them here and another in Answer 12.

The point of this exercise is that any definition requires a lot of information and in practice there is no way to apply any of them to actually calculate conditional expectation. Then why do they juggle conditional expectation in theory? The efficient market hypothesis comes to rescue: it is posited that all observed market data incorporate all available information, and, in particular, stock prices are already conditioned on I_t.

Answers 2 and 3. This is the best explanation I have.

Answer 4. Since positive and negative shocks are equally likely, the best prediction is E_t\varepsilon _{t+1}=0 (I call this equation a martingale condition). Similarly, E_t\varepsilon _{t+2}=0 but in this case I prefer to see an application of the LIE: E_{t}\varepsilon _{t+2}=E_t(E_{t+1}\varepsilon _{t+2})=E_t0=0.

Answer 5. The best prediction is \hat{Y}_{t+1}=E_tY_{t+1} because it minimizes E_t(Y_{t+1}-f(I_t))^2 among all functions f of current information I_t. Formally, you can use the first order condition


to find that f(I_t)=E_tf(I_t)=E_tY_{t+1} is the minimizing function. By the projector property

Answer 6. It is natural to define the prediction error by


By the projector property E_t\hat{\varepsilon}_{t+1}=E_tY_{t+1}-E_tY_{t+1}=0.

Answer 7. To generalize, just change the subscripts. For the prediction we have to use two subscripts: the notation \hat{Y}_{t,t+p} means that we are trying to predict what happens at a future date t+p based on info set I_t (time t is like today). Then by definition \hat{Y} _{t,t+p}=E_tY_{t+p}, \hat{\varepsilon}_{t,t+p}=Y_{t+p}-E_tY_{t+p}.

Answer 8. Answer 7, obviously, implies Y_{t+p}=\hat{Y}_{t,t+p}+\hat{\varepsilon}_{t,t+p}. The simple case is here.

Answer 9. See the law of total variance and change it to reflect conditioning on I_t.

Answer 10. See canonical form.

Answer 11. Combine conditional variance definition with white noise definition.

Answer 12. The conditional density is defined similarly to the conditional probability. Let X,Y be two random variables. Denote p_X the density of X and p_{X,Y} the joint density. Then the conditional density of Y conditional on X is defined as p_{Y|X}(y|x)=\frac{p_{X,Y}(x,y)}{p_X(x)}. After this we can define the conditional expectation E(Y|X)=\int yp_{Y|X}(y|x)dy. With these definitions one can prove the Law of Iterated Expectations:

E[E(Y|X)]=\int E(Y|x)p_X(x)dx=\int \left( \int yp_{Y|X}(y|x)dy\right)  p_X(x)dx

=\int \int y\frac{p_{X,Y}(x,y)}{p_X(x)}p_X(x)dxdy=\int \int  yp_{X,Y}(x,y)dxdy=EY.

This is an illustration to Answer 1 and a prelim to Answer 13.

Answer 13. Understanding this answer is essential for Section 8.6 on maximum likelihood of Patton's guide.

a) In case of independent observations X_1,...,X_n the joint density of the vector X=(X_1,...,X_n) is a product of individual densities:


b) In the time series context it is natural to assume that the next observation depends on the previous ones, that is, for each t, X_t depends on X_1,...,X_{t-1} (serially dependent observations). Therefore we should work with conditional densities p_{X_1,...,X_t|X_1,...,X_{t-1}}. From Answer 12 we can guess how to make conditional densities appear:

p_{X_1,...,X_n}(x_1,...,x_n)= \frac{p_{X_1,...,X_n}(x_1,...,x_n)}{p_{X_1,...,X_{n-1}}(x_1,...,x_{n-1})} \frac{p_{X_1,...,X_{n-1}}(x_1,...,x_{n-1})}{p_{X_1,...,X_{n-2}}(x_1,...,x_{n-2})}... \frac{p_{X_1,X_2}(x_1,x_2)}{p_{X_1}(x_1)}p_{X_1}(x_1).

The fractions on the right are recognized as conditional probabilities. The resulting expression is pretty awkward:


\times p_{X_1,...,X_{n-1}|X_1,...,X_{n-2}}(x_1,...,x_{n-1}|x_1,...,x_{n-2})... \times


Answer 14. The answer given here helps one understand how to pass from the density of the standard normal to that of the general normal.

Answer 15. This elementary explanation of the function definition can be used in the fifth grade. Note that conditions sufficient for existence of the inverse are not satisfied in a case as simple as the distribution function of the Bernoulli variable (when the graph of the function has flat pieces and is not continuous). Therefore we need a more general definition of an inverse. Those who think that this question is too abstract can check out UoL exams, where examinees are required to find Value at Risk when the distribution function is a step function. To understand the idea, do the following:

a) Draw a graph of a good function f (continuous and increasing).

b) Fix some value y_0 in the range of this function and identify the region \{y:y\ge y_0\}.

c) Find the solution x_0 of the equation f(x)=y_0. By definition, x_0=f^{-1}(y_o). Identify the region \{x:f(x)\ge y_0\}.

d) Note that x_0=\min\{x:f(x)\ge y_0\}. In general, for bad functions the minimum here may not exist. Therefore minimum is replaced by infimum, which gives us the definition of the quasi-inverse:

x_0=\inf\{x:f(x)\ge y_0\}.

Oct 18

Law of iterated expectations: geometric aspect

Law of iterated expectations: geometric aspect

There will be a separate post on projectors. In the meantime, we'll have a look at simple examples that explain a lot about conditional expectations.

Examples of projectors

The name "projector" is almost self-explanatory. Imagine a point and a plane in the three-dimensional space. Draw a perpendicular from the point to the plane. The intersection of the perpendicular with the plane is the points's projection onto that plane. Note that if the point already belongs to the plane, its projection equals the point itself. Besides, instead of projecting onto a plane we can project onto a straight line.

The above description translates into the following equations. For any x\in R^3 define

(1) P_2x=(x_1,x_2,0) and P_1x=(x_1,0,0).

P_2 projects R^3 onto the plane L_2=\{(x_1,x_2,0):x_1,x_2\in R\} (which is two-dimensional) and P_1 projects R^3 onto the straight line L_1=\{(x_1,0,0):x_1\in R\} (which is one-dimensional).

Property 1. Double application of a projector amounts to single application.

Proof. We do this just for one of the projectors. Using (1) three times we get

(1) P_2[P_2x]=P_2(x_1,x_2,0)=(x_1,x_2,0)=P_2x.

Property 2. A successive application of two projectors yields the projection onto a subspace of a smaller dimension.

Proof. If we apply first P_2 and then P_1, the result is

(2) P_1[P_2x]=P_1(x_1,x_2,0)=(x_1,0,0)=P_1x.

If we change the order of projectors, we have

(3) P_2[P_1x]=P_2(x_1,0,0)=(x_1,0,0)=P_1x.

Exercise 1. Show that both projectors are linear.

Exercise 2. Like any other linear operator in a Euclidean space, these projectors are given by some matrices. What are they?

The simple truth about conditional expectation

In the time series setup, we have a sequence of information sets ... \subset I_t \subset I_{t+1} \subset... (it's natural to assume that with time the amount of available information increases). Denote


the expectation of X conditional on I_t. For each t,

E_t is a projector onto the space of random functions that depend only on the information set I_t.

Property 1. Double application of conditional expectation gives the same result as single application:

(4) E_t(E_tX)=E_tX

(E_tX is already a function of I_t, so conditioning it on I_t doesn't change it).

Property 2. A successive conditioning on two different information sets is the same as conditioning on the smaller one:

(5) E_tE_{t+1}X=E_tX,

(6) E_{t+1}E_tX=E_tX.

Property 3. Conditional expectation is a linear operator: for any variables X,Y and numbers a,b


It's easy to see that (4)-(6) are similar to (1)-(3), respectively, but I prefer to use different names for (4)-(6). I call (4) a projector property. (5) is known as the Law of Iterated Expectations, see my post on the informational aspect for more intuition. (6) holds simply because at time t+1 the expectation E_tX is known and behaves like a constant.

Summary. (4)-(6) are easy to remember as one property. The smaller information set winsE_sE_tX=E_{\min\{s,t\}}X.

Oct 18

Law of iterated expectations: informational aspect

Law of iterated expectations: informational aspect

The notion of Brownian motion will help us. Suppose we observe a particle that moves back and forth randomly along a straight line. The particle starts at zero at time zero. The movement can be visualized by plotting on the horizontal axis time and on the vertical axis - the position of the particle. W(t) denotes the random position of the particle at time t.

Unconditional expectation

Figure 1. Unconditional expectation

In Figure 1, various paths starting at the origin are shown in different colors. The intersections of the paths with vertical lines at times 0.5, 1 and 1.5 show the positions of the particle at these times. The deviations of those positions from y=0 to the upside and downside are assumed to be equally likely (more precisely, they are normal variables with mean zero and variance t).

Unconditional expectation

“In the beginning there was nothing, which exploded.” ― Terry Pratchett, Lords and Ladies

If we are at the origin (like the Big Bang), nothing has happened yet and EW(t)=0 is the best prediction for any moment t>0 we can make (shown by the blue horizontal line in Figure 1). The usual, unconditional expectation EX corresponds to the empty information set.

Conditional expectation

Conditional expectation

Figure 2. Conditional expectation

In Figure 2, suppose we are at t=2. The dark blue path between t=0 and t=2 has been realized. We know that the particle has reached the point W(2) at that time. With this knowledge, we see that the paths starting at this point will have the average

(1) E(W(t)|W(2))=W(2), t>2.

This is because the particle will continue moving randomly, with the up and down moves being equally likely. Prediction (1) is shown by the horizontal light blue line between t=2 and t=4. In general, this prediction is better than EW(t)=0.

Note that for different realized paths, W(2) takes different values. Therefore E(W(t)|W(2)), for fixed t>2, is a random variable of W(2). It is a function of the event we condition the expectation on.

Law of iterated expectations

Law of iterated expectations

Figure 3. Law of iterated expectations

Suppose you are at time t=2 (see Figure 3). You send many agents to the future t=3 to fetch the information about what will happen. They bring you the data on the means E(W(t)|W(3)) they see (shown by horizontal lines between t=3 and t=4). Since there are many possible future realizations, you have to average the future means. For this, you will use the distributional belief you have at time t=2. The result is E[E(W(t)|W(3))|W(2)]. Since the up and down moves are equally likely, your distribution at time t=2 is symmetric around W(2). Therefore the above average will be equal to E(W(t)|W(2)). This is the Law of Iterated Expectations, also called the tower property:

(2) E[E(W(t)|W(3))|W(2)]=E(W(t)|W(2)).

The knowledge of all of the future predictions E(W(t)|W(3)), upon averaging, does not improve or change our current prediction E(W(t)|W(2)).

For a full mathematical treatment of conditional expectation see Lecture 10 by Gordan Zitkovic.

Sep 18

Portfolio analysis: return on portfolio

Portfolio analysis: return on portfolio

Exercise 1. Suppose a portfolio contains n_1 shares of stock 1 whose price is S_1 and n_2 shares of stock 2 whose price is S_2. Stock prices fluctuate and are random variables. Numbers of shares are assumed fixed and are deterministic. What is the expected value of the portfolio?

Solution. The portfolio value is its market price V=n_1S_1+n_2S_2. Since this is a linear combination, the expected value is EV=n_1ES_1+n_2ES_2.

In fact, the portfolio analysis is a little bit different than suggested by Exercise 1. To explain the difference, we start with fixing two points of view.

View 1. I hold a portfolio of stocks. I may have inherited it, and it does not matter how much it cost at the moment it was formed. If I want to sell it, I am interested in knowing its market value. In this situation the numbers of shares in my portfolio, which are constant, and the market prices of stocks, which are random, determine the market value of the portfolio, defined in Exercise 1. The value of the portfolio is a linear combination of stock prices.

View 2. I have a certain amount of money M^0 to invest. Being a gambler, I am not interested in holding a portfolio forever. I am thinking about buying a portfolio of stocks now and selling it, say, in a year at price M^1. In this case I am interested in the rate of return defined by r=\frac{M^1-M^0}{M^0}. M^0 is considered deterministic (current prices are certain) and M^1 is random (future prices are unpredictable). Thus the rate of return is random.

We pursue the second view (prevalent in finance). As it often happens in economics and finance, the result depends on how one understands the things. Suppose the initial amount M^0 is invested in n assets. Denoting M_i^0 the amount invested in asset i, we have M^0=\sum\limits_{i = 1}^nM_i^0. Denoting s_i=M_i^0/{M^0} the share (percentage) of M_i^0 in the total investment M^0, we have

(1) M_i^0=s_iM^0,\ M^0=\sum\limits_{i = 1}^ns_iM^0.

The initial shares s_i are deterministic.

Let M_i^1 be what becomes of M_i^0 in one year and let M^1=\sum\limits_{i = 1}^nM_i^1 be the total value of the investment at the end of the year. Since different assets grow at different rates, generally it is not true that M_i^1 =s_iM^1. Denote r_i=\frac{M_i^1-M_i^0}{M_i^0} the rate of return on asset i. Then

(2) M_i^1=(1+r_i)M_i^0, M^1=\sum\limits_{i = 1}^n(1+r_i)M_i^0.

Exercise 2. The rate of return on the portfolio is a linear combination of the rates of return on separate assets, the coefficients being the initial shares of investment.

Solution. Using Equations (1) and (2) we get

(3) r=\frac{M^1-M^0}{M^0}=\frac{\sum(1+r_i)M_i^0-\sum M_i^0}{M^0}=\frac{\sum r_iM_i^0}{M^0}=\frac{\sum r_is_iM^0}{M^0}=\sum s_ir_i .

Once you know this equation you can find the mean and variance of the rate of return on the portfolio in terms of investment shares and rates of return on assets.