25
Nov 18

Eigenvalues and eigenvectors of a projector

Eigenvalues and eigenvectors of a projector

Exercise 1. Find eigenvalues and eigenvectors of a projector.

Solution. We know that a projector doesn't change elements from its image: Px=x for all x\in\text{Img}(P). This means that \lambda =1 is an eigenvalue of P. Moreover, if \{x_i:i=1,...,\dim\text{Img}(P)\} is any orthonormal system in \text{Img}(P), each of x_i is an eigenvector of P corresponding to the eigenvalue \lambda =1.

Since P maps to zero all elements from the null space N(P), \lambda =0 is another eigenvalue. If \{y_i:i=1,...,\dim N(P)\} is any orthonormal system in N(P), each of y_i is an eigenvector of P corresponding to the eigenvalue \lambda =0.

A projector cannot have eigenvalues other than 0 and 1. This is proved as follows. Suppose Px=\lambda x with some nonzero x. Applying P to both sides of this equation, we get Px=P^2x=\lambda Px=\lambda ^2x. It follows that \lambda x=\lambda^2x and (because x\neq 0) \lambda =\lambda^2. The last equation has only two roots: 0 and 1.

We have \dim\text{Img}(P)+\dim N(P)=n because R^n is an orthogonal sum of N(P) and \text{Img}(P).  Combining the systems \{x_i\}, \{y_i\} we get an orthonormal basis in R^{n} consisting of eigenvectors of P.

Trace of a projector

Recall that for a square matrix, its trace is defined as the sum of its diagonal elements.

Exercise 2. Prove that tr(AB)=tr(BA) if both products AB and BA are square. It is convenient to call this property trace-commuting (we know that in general matrices do not commute).

Proof. Assume that A is of size n\times m and B is of size m\times n. For both products we need only to find the diagonal elements:

AB=\left(\begin{array}{ccc}  a_{11}&...&a_{1m} \\...&...&... \\a_{n1}&...&a_{nm}\end{array}  \right)\left(\begin{array}{ccc}  b_{11}&...&b_{1n} \\...&...&... \\b_{m1}&...&b_{mn}\end{array}  \right)=\left(\begin{array}{ccc}  \sum_ia_{1i}b_{i1}&...&... \\...&...&... \\...&...&\sum_ia_{ni}b_{in}\end{array}  \right)

BA=\left(\begin{array}{ccc}  b_{11}&...&b_{1n} \\...&...&... \\b_{m1}&...&b_{mn}\end{array}  \right)\left(\begin{array}{ccc}  a_{11}&...&a_{1m} \\...&...&... \\a_{n1}&...&a_{nm}\end{array}  \right)=\left(\begin{array}{ccc}  \sum_ja_{j1}b_{1j}&...&... \\...&...&... \\...&...&\sum_ja_{jm}b_{mj}  \end{array}\right)

All we have to do is change the order of summation:

tr(AB)=\sum_j\sum_ia_{ji}b_{ij}=\sum_i\sum_ja_{ji}b_{ij}=tr(BA).

Exercise 3. Find the trace of a projector.

Solution. In Exercise 1 we established that the projector P has p=\dim\text{Img}(P) eigenvalues \lambda =1 and n-p eigenvalues \lambda =0. P is symmetric, so in its diagonal representation P=UDU^{-1} there are p unities and n-p zeros on the diagonal of the diagonal matrix D. By Exercise 2

tr(P)=tr(UDU^{-1})=tr(DU^{-1}U)=tr(D)=p.

14
Nov 18

Constructing a projector onto a given subspace

Constructing a projector onto a given subspace

Let L be a subspace of R^n. Let k=\dim L\ (\leq n) and fix some basis x^{(1)},...,x^{(k)} in L. Define the matrix X=(x^{(1)},...,x^{(k)}) of size n\times k (the vectors are written as column vectors).

Exercise 1. a) With the above notation, the matrix (X^TX)^{-1} exists. b) The matrix P=X(X^TX)^{-1}X^T exists. c) P is a projector.

Proof. a) The determinant of A=X^TX is not zero by linear independence of the basis vectors, so its inverse A^{-1} exists. We also know that A and its inverse are symmetric:

(1) A^T=A, (A^{-1})^T=A^{-1}.

b) To see that P exists just count the dimensions.

c) Let's prove that P is a projector. (1) allows us to make the proof compact. P is idempotent:

P^2=(XA^{-1}X^T)(XA^{-1}X^T)=XA^{-1}(X^TX)A^{-1}X^T

=X(A^{-1}A)A^{-1}X^T=XA^{-1}X^T=P.

P is symmetric:

P^T=[XA^{-1}X^T]^T=(X^T)^T(A^{-1})^TX^T=XA^{-1}X^T=P.

Exercise 2. P projects onto L\text{Img}(P)=L.

Proof. First we show that \text{Img}(P)\subseteq L. Put

(2) y=A^{-1}X^Tx,

for any x\in R^n. Then

(3) Px=XA^{-1}X^Tx=Xy=\sum x^{(j)}y_j\in L.

This shows that \text{Img}(P)\subseteq L.

Let's prove the opposite inclusion. Any element of L is of form

(4) x=Xz

with some z. Plugging (4) in (3) we have Px=XA^{-1}X^TXz=Xz which shows that \text{Img}(P)\supseteq L.

Exercise 3. Let L_1 be a subspace of R^n and let L_2=(L_1)^\perp. Then R^n=L_1\oplus L_2.

Proof. Since any element of L_1 is orthogonal to any element of L_2, we have only to show that any x\in R^n can be represented as x=l_1+l_2 with l_j\in L_j. Let P be the projector from Exercise 1, where L=L_1. Put l_1=Px, l_2=(I-P)x. l_1\in L_1=\text{Img}(P) is obvious. For any y\in R^n, (Py)\cdot l_2=y\cdot (P(I-P)x)=0, so l_2 is orthogonal to L_1. By definition of L_2, we have l_2\in L_2.

9
Nov 18

Geometry and algebra of projectors

Geometry and algebra of projectors

Projectors are geometrically so simple that they should have been discussed somewhere in the beginning of this course. I am giving them now because the applications are more advanced.

Motivating example

Let L be the x-axis and L^\perp the y-axis on the plane. Let P be the projector onto L along L^\perp and let Q be the projector onto L^\perp along L. This geometry translates into the following definitions:

L=\{(x,0):x\in R\}, L^\perp=\{(0,y):y\in R\}, P(x,y)=(x,0), Q(x,y)=(0,y).

The theory is modeled on the following observations.

a) P leaves the elements of L unchanged and sends to zero all elements of L^\perp.

b) L is the image of P and L^\perp is the null space of P.

c) Any element of the image of P is orthogonal to any element of the image of Q.

d) Any x can be represented as x=(x_1,0)+(0,x_2)=Px+Qx. It follows that I=P+Q.

For more simple examples, see my post on conditional expectations.

Formal approach

Definition 1. A square matrix P is called a projector if it satisfies two conditions: 1) P^2=P (P is idempotent; for some reason, students remember this term better than others) and 2) P^T=P (P is symmetric).

Exercise 1. Denote L_P=\{x:Px=x\} the set of points x that are left unchanged by P. Then L_P is the image of P (and therefore a subspace).

Proof. Indeed, the image of P consists of points y=Px. For any such y, we have Py=P^2x=Px=y, so y belongs to L_P. Conversely, any element of L_P is seen to belong to the image of P.

Exercise 2. a) The null space and image of P are orthogonal. b) We have an orthogonal decomposition R^n=N(P)\oplus \text{Img}(P).

Proof. a) If x\in \text{Img}(P) and y\in N(P), then Py=0 and by Exercise 1 Px=x. Therefore x\cdot y=(Px)\cdot y=x\cdot (Py)=0. This shows that \text{Img}(P)\perp N(P).

b) For any x write x=Px+(I-P)x. Here Px\in \text{Img}(P) and (I-P)x\in N(P) because P(I-P)x=(P-P^2)x=0.

Exercise 3. a) Along with P, the matrix Q=I-P is also a projector. b) \text{Img}(Q)=N(P) and N(Q)=\text{Img}(P).

Proof. a) Q is idempotent: Q^2=(I-P)^2=I-2P+P^2=I-P=Q. b) Q is symmetric: Q^T=I^T-P^T=Q.

b) By Exercise 2

\text{Img}(Q)=\{x:Qx=x\}=\{x:(I-P)x=x\}=\{x:Px=0\}=N(P).

Since P=I-Q, this equation implies N(Q)=\text{Img}(P).

It follows that, as with P, the set L_Q=\{x:Qx=x\} is the image of Q and it consists of points that are not changed by Q.

18
Oct 18

Law of iterated expectations: geometric aspect

Law of iterated expectations: geometric aspect

There will be a separate post on projectors. In the meantime, we'll have a look at simple examples that explain a lot about conditional expectations.

Examples of projectors

The name "projector" is almost self-explanatory. Imagine a point and a plane in the three-dimensional space. Draw a perpendicular from the point to the plane. The intersection of the perpendicular with the plane is the points's projection onto that plane. Note that if the point already belongs to the plane, its projection equals the point itself. Besides, instead of projecting onto a plane we can project onto a straight line.

The above description translates into the following equations. For any x\in R^3 define

(1) P_2x=(x_1,x_2,0) and P_1x=(x_1,0,0).

P_2 projects R^3 onto the plane L_2=\{(x_1,x_2,0):x_1,x_2\in R\} (which is two-dimensional) and P_1 projects R^3 onto the straight line L_1=\{(x_1,0,0):x_1\in R\} (which is one-dimensional).

Property 1. Double application of a projector amounts to single application.

Proof. We do this just for one of the projectors. Using (1) three times we get

(1) P_2[P_2x]=P_2(x_1,x_2,0)=(x_1,x_2,0)=P_2x.

Property 2. A successive application of two projectors yields the projection onto a subspace of a smaller dimension.

Proof. If we apply first P_2 and then P_1, the result is

(2) P_1[P_2x]=P_1(x_1,x_2,0)=(x_1,0,0)=P_1x.

If we change the order of projectors, we have

(3) P_2[P_1x]=P_2(x_1,0,0)=(x_1,0,0)=P_1x.

Exercise 1. Show that both projectors are linear.

Exercise 2. Like any other linear operator in a Euclidean space, these projectors are given by some matrices. What are they?

The simple truth about conditional expectation

In the time series setup, we have a sequence of information sets ... \subset I_t \subset I_{t+1} \subset... (it's natural to assume that with time the amount of available information increases). Denote

E_tX=E(X|I_t)

the expectation of X conditional on I_t. For each t,

E_t is a projector onto the space of random functions that depend only on the information set I_t.

Property 1. Double application of conditional expectation gives the same result as single application:

(4) E_t(E_tX)=E_tX

(E_tX is already a function of I_t, so conditioning it on I_t doesn't change it).

Property 2. A successive conditioning on two different information sets is the same as conditioning on the smaller one:

(5) E_tE_{t+1}X=E_tX,

(6) E_{t+1}E_tX=E_tX.

Property 3. Conditional expectation is a linear operator: for any variables X,Y and numbers a,b

E_t(aX+bY)=aE_tX+bE_tY.

It's easy to see that (4)-(6) are similar to (1)-(3), respectively, but I prefer to use different names for (4)-(6). I call (4) a projector property. (5) is known as the Law of Iterated Expectations, see my post on the informational aspect for more intuition. (6) holds simply because at time t+1 the expectation E_tX is known and behaves like a constant.

Summary. (4)-(6) are easy to remember as one property. The smaller information set winsE_sE_tX=E_{\min\{s,t\}}X.

4
Oct 17

Conditional-mean-plus-remainder representation

Conditional-mean-plus-remainder representation: we separate the main part from the remainder and find out the remainder properties. My post on properties of conditional expectation is an elementary introduction to conditioning. This is my first post in Quantitative Finance.

A brush-up on conditional expectations

  1. Notation. Let X be a random variable and let I be an information set. Instead of the usual notation E(X|I) for conditional expectation, in large expressions it's better to use the notation with I in the subscript: E_IX=E(X|I).

  2. Generalized homogeneity. If f(I) depends only on information I, then E_I(f(I)X)=f(I)E_I(X) (a function of known information is known and behaves like a constant). A special case is E_I(f(I))=f(I)E_I(1)=f(I). With f(I)=E_I(X) we get E_I(E_I(X))=E_I(X). This shows that conditioning is a projector: if you project a point in a 3D space onto a 2D plane and then project the image of the point onto the same plane, the result will be the same image as from single projecting.

  3. Additivity. E_I(X+Y)=E_IX+E_IY.

  4. Law of iterated expectations (LIE). If we know about two information sets that I_1\subset I_2, then E_{I_1}E_{I_2}X=E_{I_1}X. I like the geometric explanation in terms of projectors. Projecting a point onto a plane and then projecting the result onto a straight line is the same as projecting the point directly onto the straight line.

Conditional-mean-plus-remainder representation

This is a direct generalization of the mean-plus-deviation-from-mean decomposition. There we wrote X=EX+(X-EX) and denoted \mu=EX,~\varepsilon=X-EX to obtain X=\mu+\varepsilon with the property E\varepsilon=0.

Here we write X=E_IX+(X-E_IX) and denote \varepsilon=X-E_IX the remainder. Then the representation is

(1) X=E_IX+\varepsilon.

Properties. 1) E_I\varepsilon=E_IX-E_IX=0 (remember, this is a random variable identically equal to zero, not a number zero).

2) Conditional covariance is obtained from the usual covariance by replacing all usual expectations by conditional. Thus, by definition,

Cov_I(X,Y)=E_I(X-E_IX)(Y-E_IY).

For the components in (1) we have

Cov_I(E_IX,\varepsilon)=E_I(E_IX-E_IE_IX)(\varepsilon-E_I\varepsilon)=E_I(E_IX-E_IX)\varepsilon=0.

3) Var_I(\varepsilon)=E_I(\varepsilon-E_I\varepsilon)^{2}=E_I(X-E_IX)^2=Var_I(X).