Principal component analysis is a general method based on diagonalization of the variance matrix. We consider it in a financial context. The variance matrix measures riskiness of the portfolio. We want to see which stocks contribute most to the portfolio risk. The surprise is that the answer is given not in terms of the vector of returns but in terms of its linear transformation.
With such a matrix, instead of we can consider its transformation for which
We know that has variances on the main diagonal. It follows that for all Variance is a measure of riskiness. Thus, the transformed variables are put in the order of declining risk. What follows is the realization of this idea using sample data.
In a sampling context, all population means shoud be replaced by their sample counterparts. Let be a vector of observations on at time These observations are put side by side into a matrix where is the number of moments in time. The population mean is estimated by the sample mean
The variance matrix is estimated by
where is a vector of ones. It is this matrix that is diagonalized:
In general, the eigenvalues in are not ordered. Ordering them and at the same time changing places of the rows of correspondingly we get a new orthogonal matrix (this requires a small proof) such that the eigenvalues in will be ordered. There is a lot more to say about the method and its applications.
A complex nonzero number in polar form is where is the absolute value of and is a real angle, so that The matrix analog of this form obtains when the condition is replaced by the absolute value of from Definition 1 is used and an orthogonal matrix plays the role of
Exercise 2. For a symmetric matrix its determinant equals the product of its eigenvalues.
Exercise 3. Let Put Then is orthogonal and the polar form of is
Proof. Let be the eigenvalues of They are real and non-negative. In fact they are all positive because by Exercise 2 their product equals
Hence, exists and it is symmetric. also exists and is orthogonal: Finally, from the definition of
Definition 1. For a square matrix all non-negative integer powers are defined. This allows us to define an analytical function of a matrix whenever a function has the Taylor decomposition
Example 1. The exponential function has the decomposition Hence, Differentiating this gives
( the constant term disappears)
This means that solves the differential equation To satisfy the initial condition instead of we can consider This matrix function solves the initial value problem
Calculating all powers of a matrix can be a time-consuming business. The process is facilitated by the knowledge of the diagonal representation.
Exercise 2. If is symmetric, then for all non-negative integer Hence,
Proof. The equation
shows that to square it is enough to square In a similar fashion we can find all non-negative integer powers of
3. Linear differential equation
Example 1 about the exponential matrix function involves a bit of guessing. With the diagonal representation at hand, we can obtain the same result in a more logical way.
One-dimensional case. is equivalent to Upon integration this gives
General case. In case of a matrix the first equation of (2) is The fact that the x's on the right are mixed up makes the direct solution difficult. The idea is to split the system and separate the x's.
Premultiplying (2) by we have or, denoting and using (1), The last system is a collection of one-dimensional equations Let be the initial vector. From (4) In matrix form this amounts to Hence, as in Exercise 2,
This is the same solution obtained above. The difference is that here we assume symmetry and, as a consequence, can use Example 2.
We are looking for conditions under which a matrix in some orthonormal basis takes a diagonal form.
Definition 1. We say that is diagonalizable if there exists an orthonormal basis such that is diagonal:
The usual way of solving a mathematical problem is this. Look for necessary conditions. Then try to find weak sufficient conditions. If they coincide with the necessary ones, you have an exact solution to the problem.
Exercise 1 (necessary condition) Suppose is diagonalizable by an orthogonal matrix. Then it is symmetric.
Exercise 2. If is symmetric and a subspace is invariant with respect to , then is also an invariant subspace of .
Proof. Let We need to show that Take any Since is invariant, we have Hence, by Exercise 1 Since is arbitrary, it follows that
Exercise 3 (sufficient condition) If is symmetric, then it is diagonalizable by an orthogonal matrix.
Proof. Let be symmetric. By Exercise 3 has at least one eigenvector Denote the set of vectors orthogonal to It has dimension and it is invariant by Exercise 2. By Exercise 4 has an eigenvector in
Let be the set of vectors from which are orthogonal to As above, , is invariant and in there is an eigenvector of .
Continuing in this way, eventually we obtain a system Since is orthogonal to and all of belong to , is orthogonal to all of . Similarly, at each step is orthogonal to all of Thus, the system is orthogonal. Its elements can be scaled to satisfy for all
The resulting system will be an orthonormal basis. Its completeness follows from the fact that there are elements in this system and they are linearly independent.
Denoting the eigenvalue corresponding to we have This means that is of diagonal form in the basis The transition matrix from the basis consisting of unit vectors to is orthogonal. If then the same is the vector of coefficients of in the basis By Exercise 3,
Here we consider properties of symmetric matrices that will be used to prove their diagonalizability.
What are the differences between and
Vectors in both spaces have coordinates. In we can multiply vectors by real numbers and in - by complex numbers. This affects the notions of linear combinations, linear independence, dimension and scalar product. We indicate only the differences to watch for.
If in we multiply vectors only by real numbers, it becomes a space of dimension Let's take to see why.
Example 1. If we take then any complex number is a multiple of with the scaling coefficient Thus, is a one-dimensional space in this sense. On the other hand, if only multiplication by real numbers is allowed, then we can take as a basis and then and is two-dimensional. To avoid confusion, just use scaling by the right numbers.
The scalar product in is given by and in by As a result, for the second scalar product we have for complex (some people call this antilinearity, to distinguish it from linearity for real ).
Definition 1. For a matrix with possibly complex entries we denote The matrix is called an adjoint or a conjugate of
Thus, when considering matrices in conjugation should be used instead of transposition. In particular, instead of symmetry the equation should be used. Matrices satisfying the last equation are called self-adjoint. The theory of self-adjoint matrices in is very similar to that of symmetric matrices in Keeping in mind two applications (time series analysis and optimization), we consider only square matrices with real entries. Even in this case one is forced to work with from time to time because, in general, eigenvalues can be complex numbers.
General properties of symmetric matrices
When we extend from to is defined by the same expression as before but is allowed to be from and the scalar product in is replaced by the scalar product in The extension is denoted
Exercise 2. If is symmetric, then all eigenvalues of are real.
Proof. Suppose is an eigenvalue of Using Exercise 1 and the symmetry of we have
Exercise 3. If is symmetric, then it has at least one real eigenvector.
Proof. We know that has at least one complex eigenvalue . By Exercise 2, this eigenvalue must be real. Thus, we have with some nonzero Separating real and imaginary parts of we have with some At least one of is not zero. Thus a real eigenvector exists.
We need to generalize Exercise 3 to the case when acts in a subspace. This is done in the next two exercises.
Definition 2. A subspace is called an invariant subspace of if
Example 2. If is an eigenvector of then the subspace spanned by is an invariant subspace of This is because implies
Exercise 4. If is symmetric and is a non-trivial invariant subspace of , then has an eigenvector in
Proof. By the definition of an invariant subspace, the restriction of to defined by acts from to . By Exercise 3, applied to it has an eigenvector in , which is also an eigenvector of
Exercise 5. a) If is an (real) eigenvalue of then it is an eigenvalue of b) If is a real eigenvalue of then it is an eigenvalue of This is summarized as see the spectrum notation.
See if you can prove this yourself following the ideas used above.
Of course, nothing can be simpler than the zero and identity matrices. What is the next level of simplicity? In the one-dimensional case, consider a linear mapping Any real number can be written as By homogeneity Denoting we see that is just multiplication by a number:
Similarly, is scaling by along the -axis. Matrices of type
are called diagonal and the short notation is Any simpler than that, and we'll be in the kindergarten.
It remains to name the things and link them to what we know
Definition 1. Based on (1), we say that is an eigenvalue of the matrix if there exists a nonzero vector such that
Such an is called an eigenvector corresponding to
With a zero (2) would be true for any and there would be no definition. Since is a real matrix, in principle cannot be complex if
Definition 2. If one satisfies (2), then the whole straight line passing through satisfies (2). Moreover, if we have two eigenvectors corresponding to then for any linear combination we have Thus the set of all eigenvectors corresponding to one eigenvalue, completed with zero, is a subspace. We call it an eigenspace corresponding to Its dimension is called a multiplicity of
Exercise 1. a) The eigenspace corresponding to coincides with and the multiplicity of equals b) The set of eigenvalues coincides with the set of roots of the equation
Proof. a) This is obvious because (2) is equivalently written as
Keep in mind that part b) of Exercise 1 implicitly assumes that both sets are of the same nature (are both subsets of either or ).
Definition 3. (3) is called a characteristic equation and its roots - characteristic roots.
After finding characteristic roots we can plug them in (4) to find
Definition 4. The set of eigenvalues of is called the spectrum of and denoted
Remark. When convenient, we use the following notation. as a mapping from to will be denoted and as a mapping from to will be denoted We have to remember that there are two versions of statement b) from Exercise 1: coincides with the set of real characteristic roots. coincides with the set of complex characteristic roots.
Digression on determinants
As much as possible I try to avoid using explicit formulas for the determinant but here we have to use one. The simplest version is this:
Perhaps a verbal description is better than this formula:
1) the determinant is a sum of products of elements of the matrix.
2) Each product contains factors and is obtained as follows: take one element from the first row and cross out the column it belongs to; take one element from the second row and not from the column you have crossed out, and cross out the column it belongs to; take one element from the third row and not from the two columns you have crossed out, and cross out the column it belongs to. Continue like that. After each step, the number of crossed-out columns increases by 1. The first factor can be chosen in ways, the second in ways,..., and for the last factor there will be only one choice.
3) The sign depends on the choices made and doesn't matter at this point.
Let's see what this implies for finding eigenvalues.
Exercise 2. is a polynomial of power in
The verbal description shows that contains the product of diagonal elements which, in turn, contains the power Because of the crossing-out procedure, other products will contain powers of lower than and there is no way the power can be cancelled out. Besides, may contain only non-negative integer powers of not exceeding This proves the statement.
Exercise 3. In any matrix has at least one eigenvector.
Proof. Exercise 1 reduces the problem of finding eigenvalues to the problem of finding characteristic roots. By the fundamental theorem of algebra every non-constant single-variable polynomial has at least one complex root. Thus we have at least one eigenvalue and from (4) at least one eigenvector.
The reduction of the problem of finding eigenvalues to the problem of finding characteristic roots poses a problem. Even when has real entries, the characteristic roots may be complex. Watch later how this problem is handled.
Since the origin is unchanged under any linear mapping, Exercise 2 gives the following geometric interpretation of an orthogonal matrix: it is rotation around the origin (angles and vector lengths are preserved, while the origin stays in place). Strictly speaking, in case we have rotation and in case - reflection.
Another interpretation is suggested by the next exercise.
Exercise 3. If is an orthonormal basis, then the matrix is orthogonal.
Proof. Orthonormality means that if and if This implies orthogonality of
Exercise 4. Let and be two orthonormal bases. Let be the transition matrix from coordinates in the basis to coordinates in the basis . Then is orthogonal.
Proof. By Exercise 3, both and are orthogonal. Hence, by Exercise 1 is orthogonal. It suffices to show that a product of two orthogonal matrices is orthogonal:
Remark. Equation (1) shows that columns of an orthogonal matrix constitute an orthonormal basis. The equation shows that the same is true for rows.
The basis consisting of unit vectors is simple in that the coefficients in the representation are exactly the components of the vector With other types of bases it is not like that: the dependence of coefficients in
on for a general basis is not so simple.
Exercise 1. Put the basis vectors side by side, and write the vector of coefficients as a column vector. Then (1) becomes so that
The explicit formula from Exercise 1 shows, in particular, that the vector of coefficients is uniquely determined by and depends linearly on The coefficients of in another basis
may be different from those in (1). For future applications, we need to know how the coefficients in one basis are related to those in another. Put the basis vectors side by side, and write as a column vector.
Exercise 2. Let and be two bases in Then
Proof. With our notation (1) and (2) become and Thus, and (3) follows.
Definition 1. The matrix in (3) is called a transition matrix from to .
Changing bases to analyze matrices
Suppose we want to analyze Fix a basis and take any We can decompose as in (1). Then we have a vector of coefficients can be considered an original and - its reflection in a mirror or a clone in a parallel world. Instead of applying to we can apply it to its reflection to obtain To get back to the original world, we can use as a vector of coefficients of a new vector and call this vector an image of under a new mapping
The transition is unique and the transition is also unique, so definition (4) is correct.
Exercise 3. Show that
Proof. By Exercise 1, (4) can be written as Combining these two equations we get Since this is true for all the statement follows.