25
Nov 18

## Eigenvalues and eigenvectors of a projector

### Eigenvalues and eigenvectors of a projector

Exercise 1. Find eigenvalues and eigenvectors of a projector.

Solution. We know that a projector doesn't change elements from its image: $Px=x$ for all $x\in\text{Img}(P).$ This means that $\lambda =1$ is an eigenvalue of $P.$ Moreover, if $\{x_i:i=1,...,\dim\text{Img}(P)\}$ is any orthonormal system in $\text{Img}(P),$ each of $x_i$ is an eigenvector of $P$ corresponding to the eigenvalue $\lambda =1.$

Since $P$ maps to zero all elements from the null space $N(P),$ $\lambda =0$ is another eigenvalue. If $\{y_i:i=1,...,\dim N(P)\}$ is any orthonormal system in $N(P),$ each of $y_i$ is an eigenvector of $P$ corresponding to the eigenvalue $\lambda =0.$

A projector cannot have eigenvalues other than $0$ and $1.$ This is proved as follows. Suppose $Px=\lambda x$ with some nonzero $x.$ Applying $P$ to both sides of this equation, we get $Px=P^2x=\lambda Px=\lambda ^2x.$ It follows that $\lambda x=\lambda^2x$ and (because $x\neq 0$) $\lambda =\lambda^2.$ The last equation has only two roots: $0$ and $1.$

We have $\dim\text{Img}(P)+\dim N(P)=n$ because $R^n$ is an orthogonal sum of $N(P)$ and $\text{Img}(P)$.  Combining the systems $\{x_i\},$ $\{y_i\}$ we get an orthonormal basis in $R^{n}$ consisting of eigenvectors of $P$.

### Trace of a projector

Recall that for a square matrix, its trace is defined as the sum of its diagonal elements.

Exercise 2. Prove that $tr(AB)=tr(BA)$ if both products $AB$ and $BA$ are square. It is convenient to call this property trace-commuting (we know that in general matrices do not commute).

Proof. Assume that $A$ is of size $n\times m$ and $B$ is of size $m\times n.$ For both products we need only to find the diagonal elements:

$AB=\left(\begin{array}{ccc} a_{11}&...&a_{1m} \\...&...&... \\a_{n1}&...&a_{nm}\end{array} \right)\left(\begin{array}{ccc} b_{11}&...&b_{1n} \\...&...&... \\b_{m1}&...&b_{mn}\end{array} \right)=\left(\begin{array}{ccc} \sum_ia_{1i}b_{i1}&...&... \\...&...&... \\...&...&\sum_ia_{ni}b_{in}\end{array} \right)$

$BA=\left(\begin{array}{ccc} b_{11}&...&b_{1n} \\...&...&... \\b_{m1}&...&b_{mn}\end{array} \right)\left(\begin{array}{ccc} a_{11}&...&a_{1m} \\...&...&... \\a_{n1}&...&a_{nm}\end{array} \right)=\left(\begin{array}{ccc} \sum_ja_{j1}b_{1j}&...&... \\...&...&... \\...&...&\sum_ja_{jm}b_{mj} \end{array}\right)$

All we have to do is change the order of summation:

$tr(AB)=\sum_j\sum_ia_{ji}b_{ij}=\sum_i\sum_ja_{ji}b_{ij}=tr(BA).$

Exercise 3. Find the trace of a projector.

Solution. In Exercise 1 we established that the projector $P$ has $p=\dim\text{Img}(P)$ eigenvalues $\lambda =1$ and $n-p$ eigenvalues $\lambda =0.$ $P$ is symmetric, so in its diagonal representation $P=UDU^{-1}$ there are $p$ unities and $n-p$ zeros on the diagonal of the diagonal matrix $D.$ By Exercise 2

$tr(P)=tr(UDU^{-1})=tr(DU^{-1}U)=tr(D)=p$.

14
Nov 18

## Constructing a projector onto a given subspace

Let $L$ be a subspace of $R^n.$ Let $k=\dim L\ (\leq n)$ and fix some basis $x^{(1)},...,x^{(k)}$ in $L.$ Define the matrix $X=(x^{(1)},...,x^{(k)})$ of size $n\times k$ (the vectors are written as column vectors).

Exercise 1. a) With the above notation, the matrix $(X^TX)^{-1}$ exists. b) The matrix $P=X(X^TX)^{-1}X^T$ exists. c) $P$ is a projector.

Proof. a) The determinant of $A=X^TX$ is not zero by linear independence of the basis vectors, so its inverse $A^{-1}$ exists. We also know that $A$ and its inverse are symmetric:

(1) $A^T=A,$ $(A^{-1})^T=A^{-1}.$

b) To see that $P$ exists just count the dimensions.

c) Let's prove that $P$ is a projector. (1) allows us to make the proof compact. $P$ is idempotent:

$P^2=(XA^{-1}X^T)(XA^{-1}X^T)=XA^{-1}(X^TX)A^{-1}X^T$

$=X(A^{-1}A)A^{-1}X^T=XA^{-1}X^T=P.$

$P$ is symmetric:

$P^T=[XA^{-1}X^T]^T=(X^T)^T(A^{-1})^TX^T=XA^{-1}X^T=P.$

Exercise 2. $P$ projects onto $L$$\text{Img}(P)=L.$

Proof. First we show that $\text{Img}(P)\subseteq L.$ Put

(2) $y=A^{-1}X^Tx,$

for any $x\in R^n.$ Then

(3) $Px=XA^{-1}X^Tx=Xy=\sum x^{(j)}y_j\in L.$

This shows that $\text{Img}(P)\subseteq L.$

Let's prove the opposite inclusion. Any element of $L$ is of form

(4) $x=Xz$

with some $z.$ Plugging (4) in (3) we have $Px=XA^{-1}X^TXz=Xz$ which shows that $\text{Img}(P)\supseteq L.$

Exercise 3. Let $L_1$ be a subspace of $R^n$ and let $L_2=(L_1)^\perp$. Then $R^n=L_1\oplus L_2$.

Proof. Since any element of $L_1$ is orthogonal to any element of $L_2$, we have only to show that any $x\in R^n$ can be represented as $x=l_1+l_2$ with $l_j\in L_j$. Let $P$ be the projector from Exercise 1, where $L=L_1$. Put $l_1=Px$, $l_2=(I-P)x$. $l_1\in L_1=\text{Img}(P)$ is obvious. For any $y\in R^n$, $(Py)\cdot l_2=y\cdot (P(I-P)x)=0$, so $l_2$ is orthogonal to $L_1$. By definition of $L_2$, we have $l_2\in L_2$.

9
Nov 18

## Geometry and algebra of projectors

Projectors are geometrically so simple that they should have been discussed somewhere in the beginning of this course. I am giving them now because the applications are more advanced.

### Motivating example

Let $L$ be the $x$-axis and $L^\perp$ the $y$-axis on the plane. Let $P$ be the projector onto $L$ along $L^\perp$ and let $Q$ be the projector onto $L^\perp$ along $L.$ This geometry translates into the following definitions:

$L=\{(x,0):x\in R\},$ $L^\perp=\{(0,y):y\in R\},$ $P(x,y)=(x,0),$ $Q(x,y)=(0,y).$

The theory is modeled on the following observations.

a) $P$ leaves the elements of $L$ unchanged and sends to zero all elements of $L^\perp.$

b) $L$ is the image of $P$ and $L^\perp$ is the null space of $P.$

c) Any element of the image of $P$ is orthogonal to any element of the image of $Q.$

d) Any $x$ can be represented as $x=(x_1,0)+(0,x_2)=Px+Qx.$ It follows that $I=P+Q.$

For more simple examples, see my post on conditional expectations.

### Formal approach

Definition 1. A square matrix $P$ is called a projector if it satisfies two conditions: 1) $P^2=P$ ($P$ is idempotent; for some reason, students remember this term better than others) and 2) $P^T=P$ ($P$ is symmetric).

Exercise 1. Denote $L_P=\{x:Px=x\}$ the set of points $x$ that are left unchanged by $P.$ Then $L_P$ is the image of $P$ (and therefore a subspace).

Proof. Indeed, the image of $P$ consists of points $y=Px.$ For any such $y,$ we have $Py=P^2x=Px=y,$ so $y$ belongs to $L_P.$ Conversely, any element of $L_P$ is seen to belong to the image of $P.$

Exercise 2. a) The null space and image of $P$ are orthogonal. b) We have an orthogonal decomposition $R^n=N(P)\oplus \text{Img}(P).$

Proof. a) If $x\in \text{Img}(P)$ and $y\in N(P),$ then $Py=0$ and by Exercise 1 $Px=x.$ Therefore $x\cdot y=(Px)\cdot y=x\cdot (Py)=0.$ This shows that $\text{Img}(P)\perp N(P).$

b) For any $x$ write $x=Px+(I-P)x.$ Here $Px\in \text{Img}(P)$ and $(I-P)x\in N(P)$ because $P(I-P)x=(P-P^2)x=0.$

Exercise 3. a) Along with $P,$ the matrix $Q=I-P$ is also a projector. b) $\text{Img}(Q)=N(P)$ and $N(Q)=\text{Img}(P).$

Proof. a) $Q$ is idempotent: $Q^2=(I-P)^2=I-2P+P^2=I-P=Q.$ b) $Q$ is symmetric: $Q^T=I^T-P^T=Q.$

b) By Exercise 2

$\text{Img}(Q)=\{x:Qx=x\}=\{x:(I-P)x=x\}=\{x:Px=0\}=N(P).$

Since $P=I-Q,$ this equation implies $N(Q)=\text{Img}(P).$

It follows that, as with $P,$ the set $L_Q=\{x:Qx=x\}$ is the image of $Q$ and it consists of points that are not changed by $Q.$

18
Oct 18

## Law of iterated expectations: geometric aspect

There will be a separate post on projectors. In the meantime, we'll have a look at simple examples that explain a lot about conditional expectations.

### Examples of projectors

The name "projector" is almost self-explanatory. Imagine a point and a plane in the three-dimensional space. Draw a perpendicular from the point to the plane. The intersection of the perpendicular with the plane is the points's projection onto that plane. Note that if the point already belongs to the plane, its projection equals the point itself. Besides, instead of projecting onto a plane we can project onto a straight line.

The above description translates into the following equations. For any $x\in R^3$ define

(1) $P_2x=(x_1,x_2,0)$ and $P_1x=(x_1,0,0).$

$P_2$ projects $R^3$ onto the plane $L_2=\{(x_1,x_2,0):x_1,x_2\in R\}$ (which is two-dimensional) and $P_1$ projects $R^3$ onto the straight line $L_1=\{(x_1,0,0):x_1\in R\}$ (which is one-dimensional).

Property 1. Double application of a projector amounts to single application.

Proof. We do this just for one of the projectors. Using (1) three times we get

(1) $P_2[P_2x]=P_2(x_1,x_2,0)=(x_1,x_2,0)=P_2x.$

Property 2. A successive application of two projectors yields the projection onto a subspace of a smaller dimension.

Proof. If we apply first $P_2$ and then $P_1$, the result is

(2) $P_1[P_2x]=P_1(x_1,x_2,0)=(x_1,0,0)=P_1x.$

If we change the order of projectors, we have

(3) $P_2[P_1x]=P_2(x_1,0,0)=(x_1,0,0)=P_1x.$

Exercise 1. Show that both projectors are linear.

Exercise 2. Like any other linear operator in a Euclidean space, these projectors are given by some matrices. What are they?

### The simple truth about conditional expectation

In the time series setup, we have a sequence of information sets $... \subset I_t \subset I_{t+1} \subset...$ (it's natural to assume that with time the amount of available information increases). Denote

$E_tX=E(X|I_t)$

the expectation of $X$ conditional on $I_t$. For each $t$,

 $E_t$ is a projector onto the space of random functions that depend only on the information set $I_t$.

Property 1. Double application of conditional expectation gives the same result as single application:

(4) $E_t(E_tX)=E_tX$

($E_tX$ is already a function of $I_t$, so conditioning it on $I_t$ doesn't change it).

Property 2. A successive conditioning on two different information sets is the same as conditioning on the smaller one:

(5) $E_tE_{t+1}X=E_tX,$

(6) $E_{t+1}E_tX=E_tX.$

Property 3. Conditional expectation is a linear operator: for any variables $X,Y$ and numbers $a,b$

$E_t(aX+bY)=aE_tX+bE_tY.$

It's easy to see that (4)-(6) are similar to (1)-(3), respectively, but I prefer to use different names for (4)-(6). I call (4) a projector property. (5) is known as the Law of Iterated Expectations, see my post on the informational aspect for more intuition. (6) holds simply because at time $t+1$ the expectation $E_tX$ is known and behaves like a constant.

Summary. (4)-(6) are easy to remember as one property. The smaller information set wins$E_sE_tX=E_{\min\{s,t\}}X.$

4
Oct 17

## Conditional-mean-plus-remainder representation

Conditional-mean-plus-remainder representation: we separate the main part from the remainder and find out the remainder properties. My post on properties of conditional expectation is an elementary introduction to conditioning. This is my first post in Quantitative Finance.

## A brush-up on conditional expectations

1. Notation. Let $X$ be a random variable and let $I$ be an information set. Instead of the usual notation $E(X|I)$ for conditional expectation, in large expressions it's better to use the notation with $I$ in the subscript: $E_IX=E(X|I).$

2. Generalized homogeneity. If $f(I)$ depends only on information $I,$ then $E_I(f(I)X)=f(I)E_I(X)$ (a function of known information is known and behaves like a constant). A special case is $E_I(f(I))=f(I)E_I(1)=f(I).$ With $f(I)=E_I(X)$ we get $E_I(E_I(X))=E_I(X).$ This shows that conditioning is a projector: if you project a point in a 3D space onto a 2D plane and then project the image of the point onto the same plane, the result will be the same image as from single projecting.

3. Additivity. $E_I(X+Y)=E_IX+E_IY.$

4. Law of iterated expectations (LIE). If we know about two information sets that $I_1\subset I_2,$ then $E_{I_1}E_{I_2}X=E_{I_1}X.$ I like the geometric explanation in terms of projectors. Projecting a point onto a plane and then projecting the result onto a straight line is the same as projecting the point directly onto the straight line.

## Conditional-mean-plus-remainder representation

This is a direct generalization of the mean-plus-deviation-from-mean decomposition. There we wrote $X=EX+(X-EX)$ and denoted $\mu=EX,~\varepsilon=X-EX$ to obtain $X=\mu+\varepsilon$ with the property $E\varepsilon=0.$

Here we write $X=E_IX+(X-E_IX)$ and denote $\varepsilon=X-E_IX$ the remainder. Then the representation is

(1) $X=E_IX+\varepsilon.$

Properties. 1) $E_I\varepsilon=E_IX-E_IX=0$ (remember, this is a random variable identically equal to zero, not a number zero).

2) Conditional covariance is obtained from the usual covariance by replacing all usual expectations by conditional. Thus, by definition,

$Cov_I(X,Y)=E_I(X-E_IX)(Y-E_IY).$

For the components in (1) we have

$Cov_I(E_IX,\varepsilon)=E_I(E_IX-E_IE_IX)(\varepsilon-E_I\varepsilon)=E_I(E_IX-E_IX)\varepsilon=0.$

3) $Var_I(\varepsilon)=E_I(\varepsilon-E_I\varepsilon)^{2}=E_I(X-E_IX)^2=Var_I(X).$