29
Sep 18

Questions for repetition

Questions for repetition

  1. Describe the steps leading to a full definition of the set of complex numbers (imaginary unit, complex numbers, operations with them, absolute value, conjugate number).
  2. What are matrix analogs of a) real numbers, b) conjugation and c) an absolute value of a complex number (without proof)?
  3. Write out and prove the properties of the scalar product in C^n.
  4. Assuming the Euler formula known, derive the polar form of a complex number. What can you say about the angle \theta in the polar form?
  5. Prove that a quadratic form of a matrix is homogeneous of degree 2.
  6. Prove that a quadratic form takes values in the set of numbers.
  7. Define positive and non-negative matrices. State Sylvester's criterion.
  8. Illustrate the Sylvester criterion geometrically.
  9. Show that A^TA is symmetric and non-negative.
  10. Give the definition of a basis and derive the formula for the coefficients \xi in the decomposition x=\sum\xi_iu_i for an arbitrary vector x.
  11. How are the decomposition coefficients transformed when one passes from one basis to another?
  12. Give the full picture behind the similarity definition.
  13. Prove that for an orthogonal matrix, a) the inverse and transpose are the same, b) the transpose and inverse are orthogonal.
  14. An orthogonal matrix preserves scalar products, norms and angles.
  15. If you put elements of an orthonormal basis side by side, the resulting matrix will be orthogonal.
  16. The transition matrix from one orthonormal basis to another is orthogonal.
  17. Show that the product of two diagonal matrices is diagonal.
  18. Define eigenvalues and eigenvectors. Why are we interested in them?
  19. What is the link between eigenvalues and a characteristic equation of a matrix?
  20. Prove that the characteristic equation is a polynomial of power n, if A is of size n\times n.
  21. Prove that in C^n any matrix has at least one eigenvector.
  22. A symmetric matrix in C^n has only real eigenvalues.
  23. If A is symmetric, then it has at least one real eigenvector.
  24. A symmetric matrix has at least one eigenvector in any nontrivial invariant subspace.
  25. What is the relationship between the spectra of A_C and A_R?
  26. A matrix diagonalizable by an orthogonal matrix must be symmetric.
  27. For a symmetric matrix, an orthogonal complement of an invariant subspace is invariant.
  28. Main theorem. A symmetric matrix is diagonalizable by an orthogonal matrix.
22
Sep 18

Applications of the diagonal representation IV

Applications of the diagonal representation IV

Principal component analysis is a general method based on diagonalization of the variance matrix. We consider it in a financial context. The variance matrix measures riskiness of the portfolio.  We want to see which stocks contribute most to the portfolio risk. The surprise is that the answer is given not in terms of the vector of returns but in terms of its linear transformation.

8. Principal component analysis (PCA)

Let R be a column-vector of returns on p stocks with the variance matrix V(R)=E(R-ER)(R-ER)^{T}. The idea is to find an orthogonal matrix W such that W^{-1}V(R)W=D is a diagonal matrix D=diag[\lambda_1,...,\lambda_p] with \lambda_1\geq...\geq\lambda_p.

With such a matrix, instead of R we can consider its transformation Y=W^{-1}R for which

V(Y)=W^{-1}V(R)(W^{-1})^T=W^{-1}V(R)W=D.

We know that V(Y) has variances V(Y_1),...,V(Y_p) on the main diagonal. It follows that V(Y_i)=\lambda_i for all i. Variance is a measure of riskiness. Thus, the transformed variables Y_1,...,Y_p are put in the order of declining risk. What follows is the realization of this idea using sample data.

In a sampling context, all population means shoud be replaced by their sample counterparts. Let R^{(t)} be a p\times 1 vector of observations on R at time t. These observations are put side by side into a matrix \mathbb{R}=(R^{(1)},...,R^{(n)}) where n is the number of moments in time. The population mean ER is estimated by the sample mean

\bar{\mathbb{R}}=\frac{1}{n}\sum_{t=1}^nR^{(t)}.

The variance matrix V(R) is estimated by

\hat{V}=\frac{1}{n-1}(\mathbb{R}-\bar{\mathbb{R}}l)(\mathbb{R}-\bar{\mathbb{R}}l)^T

where l is a 1\times n vector of ones. It is this matrix that is diagonalized: W^{-1}\hat{V}W=D.

In general, the eigenvalues in D are not ordered. Ordering them and at the same time changing places of the rows of W^{-1} correspondingly we get a new orthogonal matrix W_1 (this requires a small proof) such that the eigenvalues in W_1^{-1}\hat{V}W_1=D_1 will be ordered. There is a lot more to say about the method and its applications.

9
Sep 18

Applications of the diagonal representation III

Applications of the diagonal representation III

6. Absolute value of a matrix

Exercise 1. For a square matrix A the matrix A^TA is non-negative.

Proof. x^TA^TAx=(Ax)^TAx=\|Ax\|^2\geq 0 for any x.

For a complex number c the absolute value is defined by |c|=(\bar{c}c)^{1/2}. Since transposition of matrices is similar to conjugation of complex numbers, this leads us to the following definition.

Definition 1. By Exercise 1 from the previous post and Exercise 1, the matrix A^TA has non-negative eigenvalues. Hence we can define the absolute value of A as a square root of A^TA,  |A|=(A^TA)^{1/2}.

7. Polar form

A complex nonzero number c in polar form is c=\rho e^{i\theta} where \rho >0 is the absolute value of c and \theta is a real angle, so that |e^{i\theta}|=1. The matrix analog of this form obtains when the condition \rho >0 is replaced by \det A\neq 0, the absolute value of A from Definition 1 is used and an orthogonal matrix plays the role of e^{i\theta}.

Exercise 2. For a symmetric matrix A, its determinant equals the product of its eigenvalues.

Proof. \det A=\det(Udiag[\lambda_1,...,\lambda_n]U^{-1})=(\det U)(\det diag[\lambda_1,...,\lambda_n])(\det(U^{-1}))

=(\det U)^2\det diag[\lambda_1,...,\lambda_n]=\lambda_1...\lambda_n.

Exercise 3. Let \det A\neq 0. Put U=A|A|^{-1}. Then U is orthogonal and the polar form of A is A=U|A|.

Proof. Let \lambda_1,...,\lambda_n be the eigenvalues of A^TA. They are real and non-negative. In fact they are all positive because by Exercise 2 their product equals \det(A^TA)=(\det A)^2.

Hence, |A|^{-1} exists and it is symmetric. U also exists and is orthogonal: U^TU=(|A|^{-1})^TA^TA|A|^{-1}=|A|^{-1}|A|^2|A|^{-1}=I. Finally, from the definition of U, A=U|A|.

9
Sep 18

Applications of the diagonal representation II

Applications of the diagonal representation II

4. Square root of a matrix

Definition 1. For a symmetric matrix with non-negative eigenvalues the square root is defined by

(1) A^{1/2}=Udiag[\sqrt{\lambda_1},...,\sqrt{\lambda_n}]U^{-1}.

Exercise 1. (1) is symmetric and satisfies (A^{1/2})^2=A.

Proof. By properties of orthogonal matrices

(A^{1/2})^T=(U^{-1})^Tdiag[\sqrt{\lambda_1},...,\sqrt{\lambda_n}]U^T=A^{1/2}, (A^{1/2})^2=Udiag[\sqrt{\lambda_1},...,\sqrt{\lambda_n}]U^{-1}Udiag[\sqrt{\lambda_1},...,\sqrt{\lambda_n}]U^{-1} =Udiag[\lambda_1,...,\lambda_n]U^{-1}=A.

5. Generalized least squares estimator

The error term e in the multiple regression y=X\beta +e under homoscedasticity and in absence of autocorrelation satisfies

(2) V(e)=\sigma^2I, where \sigma^2 is some positive number.

The OLS estimator in this situation is given by

(3) \hat{\beta}=(X^TX)^{-1}X^Ty.

Now consider a more general case V(e)=\Omega.

Exercise 2. The variance matrix V(e)=\Omega is always symmetric and non-negative.

Proof. V(e)^T=[E(e-Ee)(e-Ee)^T]^T=V(e),

x^TV(e)x=Ex^T(e-Ee)(e-Ee)^Tx=E\|(e-Ee)^Tx\|^2\geq 0.

Exercise 3. Let's assume that \Omega is positive. Show that \Omega^{-1/2} is symmetric and satisfies (\Omega^{-1/2})^2=\Omega^{-1}.

Proof. By Exercise 1 the eigenvalues of \Omega are positive. Hence its inverse \Omega^{-1} exists and is given by \Omega^{-1}=U\Omega_U^{-1}U^T where \Omega_U^{-1}=diag[\lambda_1^{-1},...,\lambda_n^{-1}]. It is symmetric as an inverse of a symmetric matrix. It remains to apply Exercise 1 to A=\Omega^{-1/2}.

Exercise 4. Find the variance of u=\Omega^{-1/2}e.

Solution. Using the definition of variance of a vector

V(u)=E(u-Eu)(u-Eu)^T=\Omega^{-1/2}V(e)(\Omega^{-1/2})^T=\Omega^{-1/2}\Omega\Omega^{-1/2}=I.

Exercise 4 suggests how to transform y=X\beta +e to satisfy (2). In the equation \Omega^{-1/2}y=\Omega^{-1/2}X\beta +\Omega^{-1/2}e the error u=\Omega^{-1/2}e satisfies the assumption under which (2) is applicable. Let \tilde{y}=\Omega^{-1/2}y, \tilde{X}=\Omega^{-1/2}X. Then we have \tilde{y}=\tilde{X}\beta +u and from (2) \hat{\beta}=(\tilde{X}^T\tilde{X})^{-1}\tilde{X}^T\tilde{y}. Since \tilde{X}^T=X^T\Omega^{-1/2}, this can be written as

\hat{\beta}=(X^T\Omega^{-1/2}\Omega^{-1/2}X)^{-1}X^T\Omega^{-1/2}\Omega^{-1/2}y=(X^T\Omega^{-1}X)^{-1}X^T\Omega^{-1}y.

This is Aitken's Generalized least squares estimator.

8
Sep 18

Applications of the diagonal representation I

Applications of the diagonal representation I

When A is symmetric, we can use the representation

(1) A=UA_UU^T where A_U=diag[\lambda_1,...,\lambda_n].

1. Matrix positivity in terms of eigenvalues

Exercise 1. Let A be a symmetric matrix. It is positive (non-negative) if and only if its eigenvalues are positive (non-negative).

Proof. By definition, we have to consider the quadratic form x^TAx=x^TUA_UU^Tx=(U^Tx)^TA_UU^Tx. Denoting y=U^Tx, we know that an orthogonal matrix preserves the norm: \|y\|=\|x\|. This allows us to obtain a lower bound for the quadratic form

x^TAx=y^TA_Uy=\sum\lambda_iy_i^2\geq\min_i\lambda_i\sum y_i^2=\min_i\lambda_i\|x\|^2.

This implies the statement.

2. Analytical functions of a matrix

Definition 1. For a square matrix A all non-negative integer powers A^m are defined. This allows us to define an analytical function of a matrix f(A)=\sum_{m=0}^\infty\frac{f^{(m)}(0)}{m!}A^m whenever a function f has the Taylor decomposition f(t)=\sum_{m=0}^\infty\frac{f^{(m)}(0)}{m!}t^m, t\in R.

Example 1. The exponential function f(t)=e^t has the decomposition e^t=\sum_{m=0}^\infty\frac{1}{m!}t^m. Hence, e^{At}=\sum_{m=0}^\infty\frac{1}{m!}A^mt^m. Differentiating this gives

\frac{de^{At}}{dt}=\sum_{m=1}^\infty\frac{1}{m!}A^mmt^{m-1} ( the constant term disappears)

=A\sum_{m=1}^\infty\frac{1}{(m-1)!}A^{m-1}t^{m-1}=Ae^{At}.

This means that e^{At} solves the differential equation x^\prime(t)=Ax(t). To satisfy the initial condition x(t_0)=x_0, instead of e^{At} we can consider x(t)=e^{A(t-t_0)}x_0. This matrix function solves the initial value problem

(2) x^\prime(t)=Ax(t), x(t_{0})=x_0.

Calculating all powers of a matrix can be a time-consuming business. The process is facilitated by the knowledge of the diagonal representation.

Exercise 2. If A is symmetric, then A^m=UA_U^mU^{-1} for all non-negative integer m. Hence, f(A)=U\left(\sum_{m=0}^\infty\frac{f^{(m)}(0)}{m!}A_U^m\right)U^{-1}=Uf(A_U)U^{-1}.

Proof. The equation

(3) A^2=UA_UU^{-1}UA_UU^{-1}=UA_UA_UU^{-1}=UA_U^2U^{-1}=Udiag[\lambda_1^2,...,\lambda_n^2]U^{-1}

shows that to square A it is enough to square A_{U}. In a similar fashion we can find all non-negative integer powers of A.

Example 2. e^{At}=U\left(\sum_{m=0}^\infty\frac{1}{m!}diag[\lambda_{1}^m,...,\lambda_n^m]t^m\right)U^{-1}.

3. Linear differential equation

Example 1 about the exponential matrix function involves a bit of guessing. With the diagonal representation at hand, we can obtain the same result in a more logical way.

One-dimensional case. \frac{dx}{dt}=ax is equivalent to \frac{dx}{x}=adt. Upon integration this gives

\log x(t)-\log x(t_0)=\int_{t_0}^td(\log x)=\int_{t_0}^tadt=a(t-t_0)

or

(4) x(t)=e^{a(t-t_0)}x_0.

General case. In case of a 2\times 2 matrix the first equation of (2) is \frac{dx_1}{dt}=a_{11}x_1+a_{12}x_2. The fact that the x's on the right are mixed up makes the direct solution difficult. The idea is to split the system and separate the x's.

Premultiplying (2) by U^{-1} we have \frac{d(U^{-1}x)}{dt}=U^{-1}Ax=U^{-1}AUU^{-1}x or, denoting U^{-1}x=y and using (1), \frac{dy}{dt}=A_Uy. The last system is a collection of one-dimensional equations \frac{dy_i}{dt}=\lambda_iy_i, i=1,...,n. Let y_0=U^{-1}x_0 be the initial vector. From (4) y_i=e^{\lambda_i(t-t_0)}y_{0i}. In matrix form this amounts to y=e^{A_U(t-t_0)}y_0. Hence, as in Exercise 2, x=Uy=Ue^{A_U(t-t_0)}U^{-1}Uy_0=e^{A(t-t_0)}x_0.

This is the same solution obtained above. The difference is that here we assume symmetry and, as a consequence, can use Example 2.

3
Sep 18

Diagonalization of symmetric matrices

Diagonalization of symmetric matrices

We are looking for conditions under which a matrix in some orthonormal basis takes a diagonal form.

Definition 1. We say that A is diagonalizable if there exists an orthonormal basis U=(u_1,...,u_n) such that A_U=U^{-1}AU is diagonal: A_U=diag[\lambda_1,...,\lambda_n].

The usual way of solving a mathematical problem is this. Look for necessary conditions. Then try to find weak sufficient conditions. If they coincide with the necessary ones, you have an exact solution to the problem.

Exercise 1 (necessary condition) Suppose A is diagonalizable by an orthogonal matrix. Then it is symmetric.

Proof. A diagonal matrix is obviously symmetric. Hence, from A=UA_UU^{-1} and using Exercise 1 on the orthogonal matrix inverse A^T=(U^{-1})^TA_U^TU^T =(U^T)^TA_UU^{-1}=A.

Exercise 2. If A is symmetric and a subspace L\subset R^n is invariant with respect to A, then L^\perp is also an invariant subspace of A.

Proof. Let y\in L^\perp. We need to show that Ay\in L^\perp. Take any x\in L. Since L is invariant, we have Ax\in L. Hence, by Exercise 1 0=(Ax)\cdot y=x\cdot(Ay). Since x is arbitrary, it follows that Ay\in L^\perp.

Exercise 3 (sufficient condition) If A is symmetric, then it is diagonalizable by an orthogonal matrix.

Proof. Let A be symmetric. By Exercise 3 A has at least one eigenvector u_1\in R^n. Denote L_1 the set of vectors orthogonal to u_1. It has dimension n-1 and it is invariant by Exercise 2. By Exercise 4 A has an eigenvector u_2 in L_1.

Let L_2 be the set of vectors from L_1 which are orthogonal to u_2. As above, \dim L_2=n-2, L_2 is invariant and in L_2 there is an eigenvector u_3 of A.

Continuing in this way, eventually we obtain a system u_1,...,u_n. Since u_1 is orthogonal to L_1 and all of u_2,u_3,... belong to L_1u_1 is orthogonal to all of u_2,u_3,.... Similarly, at each step u_i is orthogonal to all of u_{i+1},u_{i+2},... Thus, the system u_1,...,u_n is orthogonal. Its elements can be scaled to satisfy \|u_i\|=1 for all i.

The resulting system U=(u_1,...,u_n) will be an orthonormal basis. Its completeness follows from the fact that there are n elements in this system and they are linearly independent.

Denoting \lambda_i the eigenvalue corresponding to u_i, we have Au_i=\lambda_iu_i. This means that A is of diagonal form in the basis U. The transition matrix from the basis consisting of unit vectors e_i to u_i is orthogonal. If x\in R^n, then the same x is the vector of coefficients of x in the basis E=(e_1,...,e_n). By Exercise 3, A_U=U^{-1}AU.

2
Sep 18

General properties of symmetric matrices

General properties of symmetric matrices

Here we consider properties of symmetric matrices that will be used to prove their diagonalizability.

What are the differences between R^n and C^n?

Vectors in both spaces have n coordinates. In R^n we can multiply vectors by real numbers and in C^n - by complex numbers. This affects the notions of linear combinations, linear independence, dimension and scalar product. We indicate only the differences to watch for.

If in C^n we multiply vectors only by real numbers, it becomes a space of dimension 2n. Let's take n=1 to see why.

Example 1. If we take e=1, then any complex number c=a+ib is a multiple of e: c=c1=ce with the scaling coefficient c. Thus, C is a one-dimensional space in this sense. On the other hand, if only multiplication by real numbers is allowed, then we can take e_1=1, e_2=i as a basis and then c=ae_1+be_2 and C is two-dimensional. To avoid confusion, just use scaling by the right numbers.

The scalar product in R^n is given by x\cdot y=\sum x_iy_i and in C^n by x\cdot y=\sum x_i\bar{y}_i. As a result, for the second scalar product we have x\cdot (c_1y+c_2z)=\bar{c}_1x\cdot y+\bar{c}_2x\cdot z for complex c_1,c_2 (some people call this antilinearity, to distinguish it from linearity x\cdot (ay+bz)=ax\cdot y+bx\cdot z for real a,b).

Definition 1. For a matrix A with possibly complex entries we denote A'=\overline{A^T}. The matrix A' is called an adjoint or a conjugate of A.

Exercise 1. Prove that (Ax)\cdot y=x\cdot(A'y), for any x,y\in C^n.

Proof. For complex numbers we have \overline{\bar{c}}=c, \overline{c_1c_2}=\bar{c}_1\bar{c}_2. Therefore

(Ax)\cdot y=(Ax)^T\bar{y}=x^TA^T\bar{y}=x^T\overline{\overline{A^T}}\bar{y}=x^T\overline{\overline{A^T}y}=x\cdot(A'y).

Thus, when considering matrices in C^n, conjugation should be used instead of transposition. In particular, instead of symmetry A=A^T the equation A=A' should be used. Matrices satisfying the last equation are called self-adjoint. The theory of self-adjoint matrices in C^n is very similar to that of symmetric matrices in R^n. Keeping in mind two applications (time series analysis and optimization), we consider only square matrices with real entries. Even in this case one is forced to work with C^n from time to time because, in general, eigenvalues can be complex numbers.

General properties of symmetric matrices

When we extend A from R^n to C^n, Ax is defined by the same expression as before but x is allowed to be from C^n and the scalar product in R^n is replaced by the scalar product in C^n. The extension is denoted A_C.

Exercise 2. If A is symmetric, then all eigenvalues of A_C are real.

Proof. Suppose \lambda is an eigenvalue of A_C. Using Exercise 1 and the symmetry of A we have

\lambda x\cdot x=(Ax)\cdot x=x\cdot(Ax)=x\cdot(\lambda x)=\bar{\lambda}x\cdot x.

Since x\cdot x=\|x\|^2>0, we have \lambda =\bar{\lambda}. This shows that \lambda is real.

Exercise 3. If A is symmetric, then it has at least one real eigenvector.

Proof. We know that A_C has at least one complex eigenvalue \lambda. By Exercise 2, this eigenvalue must be real. Thus, we have Ax=\lambda x with some nonzero x\in C^n. Separating real and imaginary parts of x, we have x=u+iv, Au=\lambda u, Av=\lambda v with some u,v\in R^n. At least one of u,v is not zero. Thus a real eigenvector exists.

We need to generalize Exercise 3 to the case when A acts in a subspace. This is done in the next two exercises.

Definition 2. A subspace L is called an invariant subspace of A if AL\subseteq L.

Example 2. If x is an eigenvector of A, then the subspace L spanned by x is an invariant subspace of A. This is because x\in L implies Ax=\lambda x\in L.

Exercise 4. If A is symmetric and L is a non-trivial invariant subspace of A, then A has an eigenvector in L.

Proof. By the definition of an invariant subspace, the restriction of A to L defined by A_Lx=Ax, x\in L, acts from L to L. By Exercise 3, applied to A_L, it has an eigenvector in L, which is also an eigenvector of A.

Exercise 5. a) If \lambda is an (real) eigenvalue of A_R, then it is an eigenvalue of A_C. b) If \lambda is a real eigenvalue of A_C, then it is an eigenvalue of A_R. This is summarized as \sigma(A_R)=\sigma (A_C)\cap R, see the spectrum notation.

See if you can prove this yourself following the ideas used above.

31
Aug 18

Eigenvalues and eigenvectors

Eigenvalues and eigenvectors

Motivation: what matrix can be called simplest?

Of course, nothing can be simpler than the zero and identity matrices. What is the next level of simplicity? In the one-dimensional case, consider a linear mapping f. Any real number x can be written as x=x1. By homogeneity f(x)=xf(1). Denoting c=f(1), we see that f is just multiplication by a number: f(x)=cx.

Let's look at the 2-dimensional generalization. We know that a linear mapping is given by a matrix. Consider A=\left(  \begin{array}{cc}\lambda _1&0\\0&\lambda_2\end{array}\right). A is scaling by \lambda_1 along the x-axis:

(1) A\left(\begin{array}{c}x\\0\end{array}\right) =\left(\begin{array}{cc}\lambda_1&0\\0&\lambda_2\end{array}\right)\left(\begin{array}{c}x\\0\end{array}\right) =\left(\begin{array}{c}\lambda_1x\\  0\end{array}\right)=\lambda_1\left(\begin{array}{c}x \\0\end{array}\right).

Similarly, A is scaling by \lambda_2 along the y-axis. Matrices of type

\left(\begin{array}{ccc}\lambda_1&...&0\\...&...&...\\0&...&\lambda_{n}\end{array}\right)

are called diagonal and the short notation is diag[\lambda_1,...,\lambda_n]. Any simpler than that, and we'll be in the kindergarten.

It remains to name the things and link them to what we know

Definition 1. Based on (1), we say that \lambda is an eigenvalue of the matrix A if there exists a nonzero vector x such that

(2) Ax=\lambda x.

Such an x is called an eigenvector corresponding to \lambda .

With a zero x, (2) would be true for any \lambda, and there would be no definition. Since A is a real matrix, \lambda in principle cannot be complex if x\in R^n.

Definition 2. If one x satisfies (2), then the whole straight line passing through x satisfies (2). Moreover, if we have two eigenvectors x,y corresponding to \lambda, then for any linear combination z=ax+by we have Az=aAx+bAy=\lambda (ax+by)=\lambda z. Thus the set of all eigenvectors corresponding to one eigenvalue, completed with zero, is a subspace. We call it an eigenspace corresponding to \lambda. Its dimension is called a multiplicity of \lambda.

Exercise 1. a) The eigenspace corresponding to \lambda coincides with N(A-\lambda I) and the multiplicity of \lambda equals \dim N(A-\lambda I). b) The set of eigenvalues coincides with the set of roots of the equation

(3) \det (A-\lambda I)=0.

Proof. a) This is obvious because (2) is equivalently written as

(4) (A-\lambda I)x=0.

b) Existence of nonzero solutions of (4) is equivalent to N(A-\lambda I)\neq\{0\}, and we know that this condition is equivalent to (3).

Keep in mind that part b) of Exercise 1 implicitly assumes that both sets are of the same nature (are both subsets of either R or C).

Definition 3. (3) is called a characteristic equation and its roots - characteristic roots.

After finding characteristic roots we can plug them in (4) to find N(A-\lambda I).

Definition 4. The set of eigenvalues of A is called the spectrum of A and denoted \sigma(A).

Remark. When convenient, we use the following notation. A as a mapping from R^n to R^n will be denoted A_R and as a mapping from C^n to C^n will be denoted A_C. We have to remember that there are two versions of statement b) from Exercise 1: b') \sigma(A_R) coincides with the set of real characteristic roots. b'') \sigma(A_C) coincides with the set of complex characteristic roots.

Digression on determinants

As much as possible I try to avoid using explicit formulas for the determinant but here we have to use one. The simplest version is this:

\det A=\sum\pm a_{1,i_1}a_{2,i_2}...a_{n,i_n}.

Perhaps a verbal description is better than this formula:

1) the determinant is a sum of products of elements of the matrix.

2) Each product contains n factors and is obtained as follows: take one element a_{1,i_1} from the first row and cross out the column it belongs to; take one element a_{2,i_2} from the second row and not from the column you have crossed out, and cross out the column it belongs to; take one element a_{3,i_3} from the third row and not from the two columns you have crossed out, and cross out the column it belongs to. Continue like that. After each step, the number of crossed-out columns increases by 1. The first factor can be chosen in n ways, the second in (n-1) ways,..., and for the last factor a_{n,i_n} there will be only one choice.

3) The sign \pm depends on the choices made and doesn't matter at this point.

Let's see what this implies for finding eigenvalues.

Exercise 2. \det (A-\lambda I) is a polynomial of power n in \lambda .

Proof. \det(A-\lambda I)=\left(\begin{array}{cccc}a_{11}-\lambda& a_{12}&...&a_{1n}\\a_{21}&a_{22}-\lambda&...&a_{2n}\\...&...&...& ...\\a_{n1}&a_{n2}&...&a_{nn}-\lambda\end{array}\right).

The verbal description shows that \det(A-\lambda I) contains the product of diagonal elements (a_{11}-\lambda)(a_{22}-\lambda)...(a_{nn}-\lambda ), which, in turn, contains the power \lambda^n. Because of the crossing-out procedure, other products will contain powers of \lambda lower than n and there is no way the power \lambda^n can be cancelled out. Besides, \det(A-\lambda I) may contain only non-negative integer powers of \lambda not exceeding n. This proves the statement.

Exercise 3. In C^n any matrix A has at least one eigenvector.

Proof. Exercise 1 reduces the problem of finding eigenvalues to the problem of finding characteristic roots. By the fundamental theorem of algebra every non-constant single-variable polynomial has at least one complex root. Thus we have at least one eigenvalue and from (4) at least one eigenvector.

The reduction of the problem of finding eigenvalues to the problem of finding characteristic roots poses a problem. Even when A has real entries, the characteristic roots may be complex. Watch later how this problem is handled.

28
Aug 18

Orthogonal matrices

Orthogonal matrices

Definition 1. A square matrix A is called orthogonal if A^TA=I.

Exercise 1. Let A be orthogonal. a) A^{-1}=A^T, b) the transpose A^T is orthogonal, c) the inverse A^{-1} is orthogonal, d) |\det A|=1.

Proof. a) A^T is the left inverse of A. Hence, A is invertible and its inverse is A^T. b) AA^T=I from the inverse definition. Part c) follows from parts a) and b). d) Just apply \det to the definition to get (\det A)^{2}=1.

Exercise 2. An orthogonal matrix preserves scalar products, norms and angles.

Proof. For any vectors x,y scalar products are preserved: (Ax)\cdot(Ay)=(A^TAx)\cdot(y)=x\cdot y. Therefore vector lengths are preserved: \|Ax\| =\|x\|. Cosines of angles are preserved too, because \frac{(Ax)\cdot(Ay)}{\|Ax\|\|Ay\|}=\frac{x\cdot y}{\|x\|\| y\|}. Thus angles are preserved.

Since the origin is unchanged under any linear mapping, A0=0, Exercise 2 gives the following geometric interpretation of an orthogonal matrix: it is rotation around the origin (angles and vector lengths are preserved, while the origin stays in place). Strictly speaking, in case \det A=1 we have rotation and in case \det A=1 - reflection.

Another interpretation is suggested by the next exercise.

Exercise 3. If u_1,...,u_n is an orthonormal basis, then the matrix U=(u_1,...,u_n) is orthogonal.

Proof. Orthonormality means that u_i^Tu_j=1 if i=j and u_i^Tu_j=0, if i\neq j. This implies orthogonality of U:

(1) U^TU=\left(\begin{array}{c}u_1^T\\...\\u_n^T\end{array}\right)\left(u_1,...,u_n\right)=I.

Exercise 4. Let u_1,...,u_n and v_1,...,v_n be two orthonormal bases. Let A=V^{-1}U be the transition matrix from coordinates \xi in the basis u_1,...,u_n to coordinates \xi^\prime in the basis v_1,...,v_n. Then A is orthogonal.

Proof. By Exercise 3, both U and V are orthogonal. Hence, by Exercise 1 V^{-1} is orthogonal. It suffices to show that a product of two orthogonal matrices M,N is orthogonal: (MN)^TMN=N^TM^TMN=N^TN=I.

Remark. Equation (1) shows that columns of an orthogonal matrix constitute an orthonormal basis. The equation UU^T=I shows that the same is true for rows.

27
Aug 18

Matrix similarity

Matrix similarity

Switching bases

The basis consisting of unit vectors is simple in that the coefficients in the representation x=\sum x_ie_i are exactly the components of the vector x. With other types of bases it is not like that: the dependence of coefficients in

(1) x=\sum\xi_iu_i

on x for a general basis u_1,...,u_n is not so simple.

Exercise 1. Put the basis vectors side by side, U=(u_1,...,u_n) and write the vector of coefficients \xi as a column vector. Then (1) becomes U\xi =x, so that \xi =U^{-1}x.

Proof. By the basis definition, \sum\xi_iu_i runs over R^n and therefore \text{Img}(U)=R^n. This implies \det U\neq 0. The rest is obvious.

The explicit formula from Exercise 1 shows, in particular, that the vector of coefficients is uniquely determined by and depends linearly on x. The coefficients \xi_i^\prime of x in another basis v_1,...,v_n

(2) x=\sum\xi_i^\prime v_i

may be different from those in (1). For future applications, we need to know how the coefficients in one basis are related to those in another. Put the basis vectors side by side, V=(v_1,...,v_n), and write \xi^\prime as a column vector.

Exercise 2. Let u_1,...,u_n and v_1,...,v_n be two bases in R^n. Then

(3) \xi^\prime=V^{-1}U\xi.

Proof. With our notation (1) and (2) become x=U\xi and x=V\xi^\prime. Thus, U\xi=V\xi^\prime and (3) follows.

Definition 1. The matrix in (3) is called a transition matrix from u_1,...,u_n to v_1,...,v_n.

Changing bases to analyze matrices

Suppose we want to analyze A. Fix a basis u_1,...,u_n and take any x\in R^n. We can decompose x as in (1). Then we have a vector of coefficients \xi. x can be considered an original and \xi - its reflection in a mirror or a clone in a parallel world. Instead of applying A to x we can apply it to its reflection \xi, to obtain A\xi. To get back to the original world, we can use A\xi as a vector of coefficients of a new vector \sum(A\xi )_iu_i and call this vector an image of x under a new mapping A_U:

(4) A_Ux\overset{def}{=}\sum(A\xi)_iu_i.

The transition x\rightarrow\xi is unique and the transition A\xi\rightarrow A_Ux is also unique, so definition (4) is correct.

Exercise 3. Show that

(5) A_U=UAU^{-1}.

Proof. By Exercise 1, \xi=U^{-1}x. (4) can be written as A_Ux=UA\xi. Combining these two equations we get A_Ux=UAU^{-1}x. Since this is true for all x, the statement follows.

The point of the transformation in (5) is that A_U may be in some ways simpler or better than A. Note that when we use the orthonormal basis of unit vectors, U=I and A_U=A.

Definition 2. The matrix A_U is called similar to A.