29
Sep 18

## Questions for repetition

1. Describe the steps leading to a full definition of the set of complex numbers (imaginary unit, complex numbers, operations with them, absolute value, conjugate number).
2. What are matrix analogs of a) real numbers, b) conjugation and c) an absolute value of a complex number (without proof)?
3. Write out and prove the properties of the scalar product in $C^n$.
4. Assuming the Euler formula known, derive the polar form of a complex number. What can you say about the angle $\theta$ in the polar form?
5. Prove that a quadratic form of a matrix is homogeneous of degree 2.
6. Prove that a quadratic form takes values in the set of numbers.
7. Define positive and non-negative matrices. State Sylvester's criterion.
8. Illustrate the Sylvester criterion geometrically.
9. Show that $A^TA$ is symmetric and non-negative.
10. Give the definition of a basis and derive the formula for the coefficients $\xi$ in the decomposition $x=\sum\xi_iu_i$ for an arbitrary vector $x$.
11. How are the decomposition coefficients transformed when one passes from one basis to another?
12. Give the full picture behind the similarity definition.
13. Prove that for an orthogonal matrix, a) the inverse and transpose are the same, b) the transpose and inverse are orthogonal.
14. An orthogonal matrix preserves scalar products, norms and angles.
15. If you put elements of an orthonormal basis side by side, the resulting matrix will be orthogonal.
16. The transition matrix from one orthonormal basis to another is orthogonal.
17. Show that the product of two diagonal matrices is diagonal.
18. Define eigenvalues and eigenvectors. Why are we interested in them?
19. What is the link between eigenvalues and a characteristic equation of a matrix?
20. Prove that the characteristic equation is a polynomial of power $n$, if $A$ is of size $n\times n$.
21. Prove that in $C^n$ any matrix has at least one eigenvector.
22. A symmetric matrix in $C^n$ has only real eigenvalues.
23. If $A$ is symmetric, then it has at least one real eigenvector.
24. A symmetric matrix has at least one eigenvector in any nontrivial invariant subspace.
25. What is the relationship between the spectra of $A_C$ and $A_R$?
26. A matrix diagonalizable by an orthogonal matrix must be symmetric.
27. For a symmetric matrix, an orthogonal complement of an invariant subspace is invariant.
28. Main theorem. A symmetric matrix is diagonalizable by an orthogonal matrix.
22
Sep 18

## Applications of the diagonal representation IV

Principal component analysis is a general method based on diagonalization of the variance matrix. We consider it in a financial context. The variance matrix measures riskiness of the portfolio.  We want to see which stocks contribute most to the portfolio risk. The surprise is that the answer is given not in terms of the vector of returns but in terms of its linear transformation.

### 8. Principal component analysis (PCA)

Let $R$ be a column-vector of returns on $p$ stocks with the variance matrix $V(R)=E(R-ER)(R-ER)^{T}$. The idea is to find an orthogonal matrix $W$ such that $W^{-1}V(R)W=D$ is a diagonal matrix $D=diag[\lambda_1,...,\lambda_p]$ with $\lambda_1\geq...\geq\lambda_p.$

With such a matrix, instead of $R$ we can consider its transformation $Y=W^{-1}R$ for which

$V(Y)=W^{-1}V(R)(W^{-1})^T=W^{-1}V(R)W=D.$

We know that $V(Y)$ has variances $V(Y_1),...,V(Y_p)$ on the main diagonal. It follows that $V(Y_i)=\lambda_i$ for all $i.$ Variance is a measure of riskiness. Thus, the transformed variables $Y_1,...,Y_p$ are put in the order of declining risk. What follows is the realization of this idea using sample data.

In a sampling context, all population means shoud be replaced by their sample counterparts. Let $R^{(t)}$ be a $p\times 1$ vector of observations on $R$ at time $t.$ These observations are put side by side into a matrix $\mathbb{R}=(R^{(1)},...,R^{(n)})$ where $n$ is the number of moments in time. The population mean $ER$ is estimated by the sample mean

$\bar{\mathbb{R}}=\frac{1}{n}\sum_{t=1}^nR^{(t)}.$

The variance matrix $V(R)$ is estimated by

$\hat{V}=\frac{1}{n-1}(\mathbb{R}-\bar{\mathbb{R}}l)(\mathbb{R}-\bar{\mathbb{R}}l)^T$

where $l$ is a $1\times n$ vector of ones. It is this matrix that is diagonalized: $W^{-1}\hat{V}W=D.$

In general, the eigenvalues in $D$ are not ordered. Ordering them and at the same time changing places of the rows of $W^{-1}$ correspondingly we get a new orthogonal matrix $W_1$ (this requires a small proof) such that the eigenvalues in $W_1^{-1}\hat{V}W_1=D_1$ will be ordered. There is a lot more to say about the method and its applications.

9
Sep 18

## Applications of the diagonal representation III

### 6. Absolute value of a matrix

Exercise 1. For a square matrix $A$ the matrix $A^TA$ is non-negative.

Proof. $x^TA^TAx=(Ax)^TAx=\|Ax\|^2\geq 0$ for any $x.$

For a complex number $c$ the absolute value is defined by $|c|=(\bar{c}c)^{1/2}$. Since transposition of matrices is similar to conjugation of complex numbers, this leads us to the following definition.

Definition 1. By Exercise 1 from the previous post and Exercise 1, the matrix $A^TA$ has non-negative eigenvalues. Hence we can define the absolute value of $A$ as a square root of $A^TA,$  $|A|=(A^TA)^{1/2}.$

### 7. Polar form

A complex nonzero number $c$ in polar form is $c=\rho e^{i\theta}$ where $\rho >0$ is the absolute value of $c$ and $\theta$ is a real angle, so that $|e^{i\theta}|=1.$ The matrix analog of this form obtains when the condition $\rho >0$ is replaced by $\det A\neq 0,$ the absolute value of $A$ from Definition 1 is used and an orthogonal matrix plays the role of $e^{i\theta}.$

Exercise 2. For a symmetric matrix $A,$ its determinant equals the product of its eigenvalues.

Proof. $\det A=\det(Udiag[\lambda_1,...,\lambda_n]U^{-1})=(\det U)(\det diag[\lambda_1,...,\lambda_n])(\det(U^{-1}))$

$=(\det U)^2\det diag[\lambda_1,...,\lambda_n]=\lambda_1...\lambda_n.$

Exercise 3. Let $\det A\neq 0.$ Put $U=A|A|^{-1}.$ Then $U$ is orthogonal and the polar form of $A$ is $A=U|A|.$

Proof. Let $\lambda_1,...,\lambda_n$ be the eigenvalues of $A^TA.$ They are real and non-negative. In fact they are all positive because by Exercise 2 their product equals $\det(A^TA)=(\det A)^2.$

Hence, $|A|^{-1}$ exists and it is symmetric. $U$ also exists and is orthogonal: $U^TU=(|A|^{-1})^TA^TA|A|^{-1}=|A|^{-1}|A|^2|A|^{-1}=I.$ Finally, from the definition of $U,$ $A=U|A|.$

9
Sep 18

## Applications of the diagonal representation II

### 4. Square root of a matrix

Definition 1. For a symmetric matrix with non-negative eigenvalues the square root is defined by

(1) $A^{1/2}=Udiag[\sqrt{\lambda_1},...,\sqrt{\lambda_n}]U^{-1}.$

Exercise 1. (1) is symmetric and satisfies $(A^{1/2})^2=A.$

Proof. By properties of orthogonal matrices

$(A^{1/2})^T=(U^{-1})^Tdiag[\sqrt{\lambda_1},...,\sqrt{\lambda_n}]U^T=A^{1/2},$ $(A^{1/2})^2=Udiag[\sqrt{\lambda_1},...,\sqrt{\lambda_n}]U^{-1}Udiag[\sqrt{\lambda_1},...,\sqrt{\lambda_n}]U^{-1}$ $=Udiag[\lambda_1,...,\lambda_n]U^{-1}=A.$

### 5. Generalized least squares estimator

The error term $e$ in the multiple regression $y=X\beta +e$ under homoscedasticity and in absence of autocorrelation satisfies

(2) $V(e)=\sigma^2I,$ where $\sigma^2$ is some positive number.

The OLS estimator in this situation is given by

(3) $\hat{\beta}=(X^TX)^{-1}X^Ty.$

Now consider a more general case $V(e)=\Omega.$

Exercise 2. The variance matrix $V(e)=\Omega$ is always symmetric and non-negative.

Proof. $V(e)^T=[E(e-Ee)(e-Ee)^T]^T=V(e),$

$x^TV(e)x=Ex^T(e-Ee)(e-Ee)^Tx=E\|(e-Ee)^Tx\|^2\geq 0.$

Exercise 3. Let's assume that $\Omega$ is positive. Show that $\Omega^{-1/2}$ is symmetric and satisfies $(\Omega^{-1/2})^2=\Omega^{-1}.$

Proof. By Exercise 1 the eigenvalues of $\Omega$ are positive. Hence its inverse $\Omega^{-1}$ exists and is given by $\Omega^{-1}=U\Omega_U^{-1}U^T$ where $\Omega_U^{-1}=diag[\lambda_1^{-1},...,\lambda_n^{-1}].$ It is symmetric as an inverse of a symmetric matrix. It remains to apply Exercise 1 to $A=\Omega^{-1/2}.$

Exercise 4. Find the variance of $u=\Omega^{-1/2}e$.

Solution. Using the definition of variance of a vector

$V(u)=E(u-Eu)(u-Eu)^T=\Omega^{-1/2}V(e)(\Omega^{-1/2})^T=\Omega^{-1/2}\Omega\Omega^{-1/2}=I.$

Exercise 4 suggests how to transform $y=X\beta +e$ to satisfy (2). In the equation $\Omega^{-1/2}y=\Omega^{-1/2}X\beta +\Omega^{-1/2}e$ the error $u=\Omega^{-1/2}e$ satisfies the assumption under which (2) is applicable. Let $\tilde{y}=\Omega^{-1/2}y,$ $\tilde{X}=\Omega^{-1/2}X.$ Then we have $\tilde{y}=\tilde{X}\beta +u$ and from (2) $\hat{\beta}=(\tilde{X}^T\tilde{X})^{-1}\tilde{X}^T\tilde{y}.$ Since $\tilde{X}^T=X^T\Omega^{-1/2},$ this can be written as

$\hat{\beta}=(X^T\Omega^{-1/2}\Omega^{-1/2}X)^{-1}X^T\Omega^{-1/2}\Omega^{-1/2}y=(X^T\Omega^{-1}X)^{-1}X^T\Omega^{-1}y.$
8
Sep 18

## Applications of the diagonal representation I

When $A$ is symmetric, we can use the representation

(1) $A=UA_UU^T$ where $A_U=diag[\lambda_1,...,\lambda_n].$

### 1. Matrix positivity in terms of eigenvalues

Exercise 1. Let $A$ be a symmetric matrix. It is positive (non-negative) if and only if its eigenvalues are positive (non-negative).

Proof. By definition, we have to consider the quadratic form $x^TAx=x^TUA_UU^Tx=(U^Tx)^TA_UU^Tx.$ Denoting $y=U^Tx,$ we know that an orthogonal matrix preserves the norm: $\|y\|=\|x\|.$ This allows us to obtain a lower bound for the quadratic form

$x^TAx=y^TA_Uy=\sum\lambda_iy_i^2\geq\min_i\lambda_i\sum y_i^2=\min_i\lambda_i\|x\|^2.$

This implies the statement.

### 2. Analytical functions of a matrix

Definition 1. For a square matrix $A$ all non-negative integer powers $A^m$ are defined. This allows us to define an analytical function of a matrix $f(A)=\sum_{m=0}^\infty\frac{f^{(m)}(0)}{m!}A^m$ whenever a function $f$ has the Taylor decomposition $f(t)=\sum_{m=0}^\infty\frac{f^{(m)}(0)}{m!}t^m,$ $t\in R.$

Example 1. The exponential function $f(t)=e^t$ has the decomposition $e^t=\sum_{m=0}^\infty\frac{1}{m!}t^m.$ Hence, $e^{At}=\sum_{m=0}^\infty\frac{1}{m!}A^mt^m.$ Differentiating this gives

$\frac{de^{At}}{dt}=\sum_{m=1}^\infty\frac{1}{m!}A^mmt^{m-1}$ ( the constant term disappears)

$=A\sum_{m=1}^\infty\frac{1}{(m-1)!}A^{m-1}t^{m-1}=Ae^{At}.$

This means that $e^{At}$ solves the differential equation $x^\prime(t)=Ax(t).$ To satisfy the initial condition $x(t_0)=x_0,$ instead of $e^{At}$ we can consider $x(t)=e^{A(t-t_0)}x_0.$ This matrix function solves the initial value problem

(2) $x^\prime(t)=Ax(t),$ $x(t_{0})=x_0.$

Calculating all powers of a matrix can be a time-consuming business. The process is facilitated by the knowledge of the diagonal representation.

Exercise 2. If $A$ is symmetric, then $A^m=UA_U^mU^{-1}$ for all non-negative integer $m.$ Hence, $f(A)=U\left(\sum_{m=0}^\infty\frac{f^{(m)}(0)}{m!}A_U^m\right)U^{-1}=Uf(A_U)U^{-1}.$

Proof. The equation

(3) $A^2=UA_UU^{-1}UA_UU^{-1}=UA_UA_UU^{-1}=UA_U^2U^{-1}=Udiag[\lambda_1^2,...,\lambda_n^2]U^{-1}$

shows that to square $A$ it is enough to square $A_{U}.$ In a similar fashion we can find all non-negative integer powers of $A.$

Example 2. $e^{At}=U\left(\sum_{m=0}^\infty\frac{1}{m!}diag[\lambda_{1}^m,...,\lambda_n^m]t^m\right)U^{-1}.$

### 3. Linear differential equation

Example 1 about the exponential matrix function involves a bit of guessing. With the diagonal representation at hand, we can obtain the same result in a more logical way.

One-dimensional case. $\frac{dx}{dt}=ax$ is equivalent to $\frac{dx}{x}=adt.$ Upon integration this gives

$\log x(t)-\log x(t_0)=\int_{t_0}^td(\log x)=\int_{t_0}^tadt=a(t-t_0)$

or

(4) $x(t)=e^{a(t-t_0)}x_0.$

General case. In case of a $2\times 2$ matrix the first equation of (2) is $\frac{dx_1}{dt}=a_{11}x_1+a_{12}x_2.$ The fact that the x's on the right are mixed up makes the direct solution difficult. The idea is to split the system and separate the x's.

Premultiplying (2) by $U^{-1}$ we have $\frac{d(U^{-1}x)}{dt}=U^{-1}Ax=U^{-1}AUU^{-1}x$ or, denoting $U^{-1}x=y$ and using (1), $\frac{dy}{dt}=A_Uy.$ The last system is a collection of one-dimensional equations $\frac{dy_i}{dt}=\lambda_iy_i,$ $i=1,...,n.$ Let $y_0=U^{-1}x_0$ be the initial vector. From (4) $y_i=e^{\lambda_i(t-t_0)}y_{0i}.$ In matrix form this amounts to $y=e^{A_U(t-t_0)}y_0.$ Hence, as in Exercise 2, $x=Uy=Ue^{A_U(t-t_0)}U^{-1}Uy_0=e^{A(t-t_0)}x_0.$

This is the same solution obtained above. The difference is that here we assume symmetry and, as a consequence, can use Example 2.

3
Sep 18

## Diagonalization of symmetric matrices

We are looking for conditions under which a matrix in some orthonormal basis takes a diagonal form.

Definition 1. We say that $A$ is diagonalizable if there exists an orthonormal basis $U=(u_1,...,u_n)$ such that $A_U=U^{-1}AU$ is diagonal: $A_U=diag[\lambda_1,...,\lambda_n].$

The usual way of solving a mathematical problem is this. Look for necessary conditions. Then try to find weak sufficient conditions. If they coincide with the necessary ones, you have an exact solution to the problem.

Exercise 1 (necessary condition) Suppose $A$ is diagonalizable by an orthogonal matrix. Then it is symmetric.

Proof. A diagonal matrix is obviously symmetric. Hence, from $A=UA_UU^{-1}$ and using Exercise 1 on the orthogonal matrix inverse $A^T=(U^{-1})^TA_U^TU^T$ $=(U^T)^TA_UU^{-1}=A.$

Exercise 2. If $A$ is symmetric and a subspace $L\subset R^n$ is invariant with respect to $A$, then $L^\perp$ is also an invariant subspace of $A$.

Proof. Let $y\in L^\perp.$ We need to show that $Ay\in L^\perp.$ Take any $x\in L.$ Since $L$ is invariant, we have $Ax\in L.$ Hence, by Exercise 1 $0=(Ax)\cdot y=x\cdot(Ay).$ Since $x$ is arbitrary, it follows that $Ay\in L^\perp.$

Exercise 3 (sufficient condition) If $A$ is symmetric, then it is diagonalizable by an orthogonal matrix.

Proof. Let $A$ be symmetric. By Exercise 3 $A$ has at least one eigenvector $u_1\in R^n.$ Denote $L_1$ the set of vectors orthogonal to $u_1.$ It has dimension $n-1$ and it is invariant by Exercise 2. By Exercise 4 $A$ has an eigenvector $u_2$ in $L_1.$

Let $L_2$ be the set of vectors from $L_1$ which are orthogonal to $u_2.$ As above, $\dim L_2=n-2$, $L_2$ is invariant and in $L_2$ there is an eigenvector $u_3$ of $A$.

Continuing in this way, eventually we obtain a system $u_1,...,u_n.$ Since $u_1$ is orthogonal to $L_1$ and all of $u_2,u_3,...$ belong to $L_1$$u_1$ is orthogonal to all of $u_2,u_3,...$. Similarly, at each step $u_i$ is orthogonal to all of $u_{i+1},u_{i+2},...$ Thus, the system $u_1,...,u_n$ is orthogonal. Its elements can be scaled to satisfy $\|u_i\|=1$ for all $i.$

The resulting system $U=(u_1,...,u_n)$ will be an orthonormal basis. Its completeness follows from the fact that there are $n$ elements in this system and they are linearly independent.

Denoting $\lambda_i$ the eigenvalue corresponding to $u_i,$ we have $Au_i=\lambda_iu_i.$ This means that $A$ is of diagonal form in the basis $U.$ The transition matrix from the basis consisting of unit vectors $e_i$ to $u_i$ is orthogonal. If $x\in R^n,$ then the same $x$ is the vector of coefficients of $x$ in the basis $E=(e_1,...,e_n).$ By Exercise 3, $A_U=U^{-1}AU.$

2
Sep 18

## General properties of symmetric matrices

Here we consider properties of symmetric matrices that will be used to prove their diagonalizability.

### What are the differences between $R^n$$R^n$ and $C^n?$$C^n?$

Vectors in both spaces have $n$ coordinates. In $R^n$ we can multiply vectors by real numbers and in $C^n$ - by complex numbers. This affects the notions of linear combinations, linear independence, dimension and scalar product. We indicate only the differences to watch for.

If in $C^n$ we multiply vectors only by real numbers, it becomes a space of dimension $2n.$ Let's take $n=1$ to see why.

Example 1. If we take $e=1,$ then any complex number $c=a+ib$ is a multiple of $e:$ $c=c1=ce$ with the scaling coefficient $c.$ Thus, $C$ is a one-dimensional space in this sense. On the other hand, if only multiplication by real numbers is allowed, then we can take $e_1=1,$ $e_2=i$ as a basis and then $c=ae_1+be_2$ and $C$ is two-dimensional. To avoid confusion, just use scaling by the right numbers.

The scalar product in $R^n$ is given by $x\cdot y=\sum x_iy_i$ and in $C^n$ by $x\cdot y=\sum x_i\bar{y}_i.$ As a result, for the second scalar product we have $x\cdot (c_1y+c_2z)=\bar{c}_1x\cdot y+\bar{c}_2x\cdot z$ for complex $c_1,c_2$ (some people call this antilinearity, to distinguish it from linearity $x\cdot (ay+bz)=ax\cdot y+bx\cdot z$ for real $a,b$).

Definition 1. For a matrix $A$ with possibly complex entries we denote $A'=\overline{A^T}.$ The matrix $A'$ is called an adjoint or a conjugate of $A.$

Exercise 1. Prove that $(Ax)\cdot y=x\cdot(A'y),$ for any $x,y\in C^n.$

Proof. For complex numbers we have $\overline{\bar{c}}=c,$ $\overline{c_1c_2}=\bar{c}_1\bar{c}_2.$ Therefore

$(Ax)\cdot y=(Ax)^T\bar{y}=x^TA^T\bar{y}=x^T\overline{\overline{A^T}}\bar{y}=x^T\overline{\overline{A^T}y}=x\cdot(A'y).$

Thus, when considering matrices in $C^n,$ conjugation should be used instead of transposition. In particular, instead of symmetry $A=A^T$ the equation $A=A'$ should be used. Matrices satisfying the last equation are called self-adjoint. The theory of self-adjoint matrices in $C^n$ is very similar to that of symmetric matrices in $R^n.$ Keeping in mind two applications (time series analysis and optimization), we consider only square matrices with real entries. Even in this case one is forced to work with $C^n$ from time to time because, in general, eigenvalues can be complex numbers.

### General properties of symmetric matrices

When we extend $A$ from $R^n$ to $C^n,$ $Ax$ is defined by the same expression as before but $x$ is allowed to be from $C^n$ and the scalar product in $R^n$ is replaced by the scalar product in $C^n.$ The extension is denoted $A_C.$

Exercise 2. If $A$ is symmetric, then all eigenvalues of $A_C$ are real.

Proof. Suppose $\lambda$ is an eigenvalue of $A_C.$ Using Exercise 1 and the symmetry of $A$ we have

$\lambda x\cdot x=(Ax)\cdot x=x\cdot(Ax)=x\cdot(\lambda x)=\bar{\lambda}x\cdot x.$

Since $x\cdot x=\|x\|^2>0,$ we have $\lambda =\bar{\lambda}.$ This shows that $\lambda$ is real.

Exercise 3. If $A$ is symmetric, then it has at least one real eigenvector.

Proof. We know that $A_C$ has at least one complex eigenvalue $\lambda$. By Exercise 2, this eigenvalue must be real. Thus, we have $Ax=\lambda x$ with some nonzero $x\in C^n.$ Separating real and imaginary parts of $x,$ we have $x=u+iv,$ $Au=\lambda u,$ $Av=\lambda v$ with some $u,v\in R^n.$ At least one of $u,v$ is not zero. Thus a real eigenvector exists.

We need to generalize Exercise 3 to the case when $A$ acts in a subspace. This is done in the next two exercises.

Definition 2. A subspace $L$ is called an invariant subspace of $A$ if $AL\subseteq L.$

Example 2. If $x$ is an eigenvector of $A,$ then the subspace $L$ spanned by $x$ is an invariant subspace of $A.$ This is because $x\in L$ implies $Ax=\lambda x\in L.$

Exercise 4. If $A$ is symmetric and $L$ is a non-trivial invariant subspace of $A$, then $A$ has an eigenvector in $L.$

Proof. By the definition of an invariant subspace, the restriction of $A$ to $L$ defined by $A_Lx=Ax,$ $x\in L,$ acts from $L$ to $L$. By Exercise 3, applied to $A_L,$ it has an eigenvector in $L$, which is also an eigenvector of $A.$

Exercise 5. a) If $\lambda$ is an (real) eigenvalue of $A_R,$ then it is an eigenvalue of $A_C.$ b) If $\lambda$ is a real eigenvalue of $A_C,$ then it is an eigenvalue of $A_R.$ This is summarized as $\sigma(A_R)=\sigma (A_C)\cap R,$ see the spectrum notation.

See if you can prove this yourself following the ideas used above.

31
Aug 18

## Eigenvalues and eigenvectors

### Motivation: what matrix can be called simplest?

Of course, nothing can be simpler than the zero and identity matrices. What is the next level of simplicity? In the one-dimensional case, consider a linear mapping $f.$ Any real number $x$ can be written as $x=x1.$ By homogeneity $f(x)=xf(1).$ Denoting $c=f(1),$ we see that $f$ is just multiplication by a number: $f(x)=cx.$

Let's look at the 2-dimensional generalization. We know that a linear mapping is given by a matrix. Consider $A=\left( \begin{array}{cc}\lambda _1&0\\0&\lambda_2\end{array}\right).$ $A$ is scaling by $\lambda_1$ along the $x$-axis:

(1) $A\left(\begin{array}{c}x\\0\end{array}\right) =\left(\begin{array}{cc}\lambda_1&0\\0&\lambda_2\end{array}\right)\left(\begin{array}{c}x\\0\end{array}\right) =\left(\begin{array}{c}\lambda_1x\\ 0\end{array}\right)=\lambda_1\left(\begin{array}{c}x \\0\end{array}\right).$

Similarly, $A$ is scaling by $\lambda_2$ along the $y$-axis. Matrices of type

$\left(\begin{array}{ccc}\lambda_1&...&0\\...&...&...\\0&...&\lambda_{n}\end{array}\right)$

are called diagonal and the short notation is $diag[\lambda_1,...,\lambda_n].$ Any simpler than that, and we'll be in the kindergarten.

### It remains to name the things and link them to what we know

Definition 1. Based on (1), we say that $\lambda$ is an eigenvalue of the matrix $A$ if there exists a nonzero vector $x$ such that

(2) $Ax=\lambda x.$

Such an $x$ is called an eigenvector corresponding to $\lambda .$

With a zero $x,$ (2) would be true for any $\lambda,$ and there would be no definition. Since $A$ is a real matrix, $\lambda$ in principle cannot be complex if $x\in R^n.$

Definition 2. If one $x$ satisfies (2), then the whole straight line passing through $x$ satisfies (2). Moreover, if we have two eigenvectors $x,y$ corresponding to $\lambda,$ then for any linear combination $z=ax+by$ we have $Az=aAx+bAy=\lambda (ax+by)=\lambda z.$ Thus the set of all eigenvectors corresponding to one eigenvalue, completed with zero, is a subspace. We call it an eigenspace corresponding to $\lambda.$ Its dimension is called a multiplicity of $\lambda.$

Exercise 1. a) The eigenspace corresponding to $\lambda$ coincides with $N(A-\lambda I)$ and the multiplicity of $\lambda$ equals $\dim N(A-\lambda I).$ b) The set of eigenvalues coincides with the set of roots of the equation

(3) $\det (A-\lambda I)=0.$

Proof. a) This is obvious because (2) is equivalently written as

(4) $(A-\lambda I)x=0.$

b) Existence of nonzero solutions of (4) is equivalent to $N(A-\lambda I)\neq\{0\},$ and we know that this condition is equivalent to (3).

Keep in mind that part b) of Exercise 1 implicitly assumes that both sets are of the same nature (are both subsets of either $R$ or $C$).

Definition 3. (3) is called a characteristic equation and its roots - characteristic roots.

After finding characteristic roots we can plug them in (4) to find $N(A-\lambda I).$

Definition 4. The set of eigenvalues of $A$ is called the spectrum of $A$ and denoted $\sigma(A).$

Remark. When convenient, we use the following notation. $A$ as a mapping from $R^n$ to $R^n$ will be denoted $A_R$ and as a mapping from $C^n$ to $C^n$ will be denoted $A_C.$ We have to remember that there are two versions of statement b) from Exercise 1: $b')$ $\sigma(A_R)$ coincides with the set of real characteristic roots. $b'')$ $\sigma(A_C)$ coincides with the set of complex characteristic roots.

### Digression on determinants

As much as possible I try to avoid using explicit formulas for the determinant but here we have to use one. The simplest version is this:

$\det A=\sum\pm a_{1,i_1}a_{2,i_2}...a_{n,i_n}.$

Perhaps a verbal description is better than this formula:

1) the determinant is a sum of products of elements of the matrix.

2) Each product contains $n$ factors and is obtained as follows: take one element $a_{1,i_1}$ from the first row and cross out the column it belongs to; take one element $a_{2,i_2}$ from the second row and not from the column you have crossed out, and cross out the column it belongs to; take one element $a_{3,i_3}$ from the third row and not from the two columns you have crossed out, and cross out the column it belongs to. Continue like that. After each step, the number of crossed-out columns increases by 1. The first factor can be chosen in $n$ ways, the second in $(n-1)$ ways,..., and for the last factor $a_{n,i_n}$ there will be only one choice.

3) The sign $\pm$ depends on the choices made and doesn't matter at this point.

Let's see what this implies for finding eigenvalues.

Exercise 2. $\det (A-\lambda I)$ is a polynomial of power $n$ in $\lambda .$

Proof. $\det(A-\lambda I)=\left(\begin{array}{cccc}a_{11}-\lambda& a_{12}&...&a_{1n}\\a_{21}&a_{22}-\lambda&...&a_{2n}\\...&...&...& ...\\a_{n1}&a_{n2}&...&a_{nn}-\lambda\end{array}\right).$

The verbal description shows that $\det(A-\lambda I)$ contains the product of diagonal elements $(a_{11}-\lambda)(a_{22}-\lambda)...(a_{nn}-\lambda ),$ which, in turn, contains the power $\lambda^n.$ Because of the crossing-out procedure, other products will contain powers of $\lambda$ lower than $n$ and there is no way the power $\lambda^n$ can be cancelled out. Besides, $\det(A-\lambda I)$ may contain only non-negative integer powers of $\lambda$ not exceeding $n.$ This proves the statement.

Exercise 3. In $C^n$ any matrix $A$ has at least one eigenvector.

Proof. Exercise 1 reduces the problem of finding eigenvalues to the problem of finding characteristic roots. By the fundamental theorem of algebra every non-constant single-variable polynomial has at least one complex root. Thus we have at least one eigenvalue and from (4) at least one eigenvector.

The reduction of the problem of finding eigenvalues to the problem of finding characteristic roots poses a problem. Even when $A$ has real entries, the characteristic roots may be complex. Watch later how this problem is handled.

28
Aug 18

## Orthogonal matrices

Definition 1. A square matrix $A$ is called orthogonal if $A^TA=I.$

Exercise 1. Let $A$ be orthogonal. a) $A^{-1}=A^T,$ b) the transpose $A^T$ is orthogonal, c) the inverse $A^{-1}$ is orthogonal, d) $|\det A|=1.$

Proof. a) $A^T$ is the left inverse of $A.$ Hence, $A$ is invertible and its inverse is $A^T.$ b) $AA^T=I$ from the inverse definition. Part c) follows from parts a) and b). d) Just apply $\det$ to the definition to get $(\det A)^{2}=1.$

Exercise 2. An orthogonal matrix preserves scalar products, norms and angles.

Proof. For any vectors $x,y$ scalar products are preserved: $(Ax)\cdot(Ay)=(A^TAx)\cdot(y)=x\cdot y.$ Therefore vector lengths are preserved: $\|Ax\| =\|x\|.$ Cosines of angles are preserved too, because $\frac{(Ax)\cdot(Ay)}{\|Ax\|\|Ay\|}=\frac{x\cdot y}{\|x\|\| y\|}.$ Thus angles are preserved.

Since the origin is unchanged under any linear mapping, $A0=0,$ Exercise 2 gives the following geometric interpretation of an orthogonal matrix: it is rotation around the origin (angles and vector lengths are preserved, while the origin stays in place). Strictly speaking, in case $\det A=1$ we have rotation and in case $\det A=1$ - reflection.

Another interpretation is suggested by the next exercise.

Exercise 3. If $u_1,...,u_n$ is an orthonormal basis, then the matrix $U=(u_1,...,u_n)$ is orthogonal.

Proof. Orthonormality means that $u_i^Tu_j=1$ if $i=j$ and $u_i^Tu_j=0,$ if $i\neq j.$ This implies orthogonality of $U:$

(1) $U^TU=\left(\begin{array}{c}u_1^T\\...\\u_n^T\end{array}\right)\left(u_1,...,u_n\right)=I.$

Exercise 4. Let $u_1,...,u_n$ and $v_1,...,v_n$ be two orthonormal bases. Let $A=V^{-1}U$ be the transition matrix from coordinates $\xi$ in the basis $u_1,...,u_n$ to coordinates $\xi^\prime$ in the basis $v_1,...,v_n$. Then $A$ is orthogonal.

Proof. By Exercise 3, both $U$ and $V$ are orthogonal. Hence, by Exercise 1 $V^{-1}$ is orthogonal. It suffices to show that a product of two orthogonal matrices $M,N$ is orthogonal: $(MN)^TMN=N^TM^TMN=N^TN=I.$

Remark. Equation (1) shows that columns of an orthogonal matrix constitute an orthonormal basis. The equation $UU^T=I$ shows that the same is true for rows.

27
Aug 18

## Matrix similarity

### Switching bases

The basis consisting of unit vectors is simple in that the coefficients in the representation $x=\sum x_ie_i$ are exactly the components of the vector $x.$ With other types of bases it is not like that: the dependence of coefficients in

(1) $x=\sum\xi_iu_i$

on $x$ for a general basis $u_1,...,u_n$ is not so simple.

Exercise 1. Put the basis vectors side by side, $U=(u_1,...,u_n)$ and write the vector of coefficients $\xi$ as a column vector. Then (1) becomes $U\xi =x,$ so that $\xi =U^{-1}x.$

Proof. By the basis definition, $\sum\xi_iu_i$ runs over $R^n$ and therefore $\text{Img}(U)=R^n.$ This implies $\det U\neq 0.$ The rest is obvious.

The explicit formula from Exercise 1 shows, in particular, that the vector of coefficients is uniquely determined by and depends linearly on $x.$ The coefficients $\xi_i^\prime$ of $x$ in another basis $v_1,...,v_n$

(2) $x=\sum\xi_i^\prime v_i$

may be different from those in (1). For future applications, we need to know how the coefficients in one basis are related to those in another. Put the basis vectors side by side, $V=(v_1,...,v_n),$ and write $\xi^\prime$ as a column vector.

Exercise 2. Let $u_1,...,u_n$ and $v_1,...,v_n$ be two bases in $R^n.$ Then

(3) $\xi^\prime=V^{-1}U\xi.$

Proof. With our notation (1) and (2) become $x=U\xi$ and $x=V\xi^\prime.$ Thus, $U\xi=V\xi^\prime$ and (3) follows.

Definition 1. The matrix in (3) is called a transition matrix from $u_1,...,u_n$ to $v_1,...,v_n$.

### Changing bases to analyze matrices

Suppose we want to analyze $A.$ Fix a basis $u_1,...,u_n$ and take any $x\in R^n.$ We can decompose $x$ as in (1). Then we have a vector of coefficients $\xi.$ $x$ can be considered an original and $\xi$ - its reflection in a mirror or a clone in a parallel world. Instead of applying $A$ to $x$ we can apply it to its reflection $\xi,$ to obtain $A\xi.$ To get back to the original world, we can use $A\xi$ as a vector of coefficients of a new vector $\sum(A\xi )_iu_i$ and call this vector an image of $x$ under a new mapping $A_U:$

(4) $A_Ux\overset{def}{=}\sum(A\xi)_iu_i.$

The transition $x\rightarrow\xi$ is unique and the transition $A\xi\rightarrow A_Ux$ is also unique, so definition (4) is correct.

Exercise 3. Show that

(5) $A_U=UAU^{-1}.$

Proof. By Exercise 1, $\xi=U^{-1}x.$ (4) can be written as $A_Ux=UA\xi.$ Combining these two equations we get $A_Ux=UAU^{-1}x.$ Since this is true for all $x,$ the statement follows.

The point of the transformation in (5) is that $A_U$ may be in some ways simpler or better than $A.$ Note that when we use the orthonormal basis of unit vectors, $U=I$ and $A_U=A.$

Definition 2. The matrix $A_U$ is called similar to $A.$