19
Feb 22

## Distribution of the estimator of the error variance

If you are reading the book by Dougherty: this post is about the distribution of the estimator  $s^2$ defined in Chapter 3.

Consider regression

(1) $y=X\beta +e$

where the deterministic matrix $X$ is of size $n\times k,$ satisfies $\det \left( X^{T}X\right) \neq 0$ (regressors are not collinear) and the error $e$ satisfies

(2) $Ee=0,Var(e)=\sigma ^{2}I$

$\beta$ is estimated by $\hat{\beta}=(X^{T}X)^{-1}X^{T}y.$ Denote $P=X(X^{T}X)^{-1}X^{T},$ $Q=I-P.$ Using (1) we see that $\hat{\beta}=\beta +(X^{T}X)^{-1}X^{T}e$ and the residual $r\equiv y-X\hat{\beta}=Qe.$ $\sigma^{2}$ is estimated by

(3) $s^{2}=\left\Vert r\right\Vert ^{2}/\left( n-k\right) =\left\Vert Qe\right\Vert ^{2}/\left( n-k\right) .$

$Q$ is a projector and has properties which are derived from those of $P$

(4) $Q^{T}=Q,$ $Q^{2}=Q.$

If $\lambda$ is an eigenvalue of $Q,$ then multiplying $Qx=\lambda x$ by $Q$ and using the fact that $x\neq 0$ we get $\lambda ^{2}=\lambda .$ Hence eigenvalues of $Q$ can be only $0$ or $1.$ The equation $tr\left( Q\right) =n-k$
tells us that the number of eigenvalues equal to 1 is $n-k$ and the remaining $k$ are zeros. Let $Q=U\Lambda U^{T}$ be the diagonal representation of $Q.$ Here $U$ is an orthogonal matrix,

(5) $U^{T}U=I,$

and $\Lambda$ is a diagonal matrix with eigenvalues of $Q$ on the main diagonal. We can assume that the first $n-k$ numbers on the diagonal of $Q$ are ones and the others are zeros.

Theorem. Let $e$ be normal. 1) $s^{2}\left( n-k\right) /\sigma ^{2}$ is distributed as $\chi _{n-k}^{2}.$ 2) The estimators $\hat{\beta}$ and $s^{2}$ are independent.

Proof. 1) We have by (4)

(6) $\left\Vert Qe\right\Vert ^{2}=\left( Qe\right) ^{T}Qe=\left( Q^{T}Qe\right) ^{T}e=\left( Qe\right) ^{T}e=\left( U\Lambda U^{T}e\right) ^{T}e=\left( \Lambda U^{T}e\right) ^{T}U^{T}e.$

Denote $S=U^{T}e.$ From (2) and (5)

$ES=0,$ $Var\left( S\right) =EU^{T}ee^{T}U=\sigma ^{2}U^{T}U=\sigma ^{2}I$

and $S$ is normal as a linear transformation of a normal vector. It follows that $S=\sigma z$ where $z$ is a standard normal vector with independent standard normal coordinates $z_{1},...,z_{n}.$ Hence, (6) implies

(7) $\left\Vert Qe\right\Vert ^{2}=\sigma ^{2}\left( \Lambda z\right) ^{T}z=\sigma ^{2}\left( z_{1}^{2}+...+z_{n-k}^{2}\right) =\sigma ^{2}\chi _{n-k}^{2}.$

(3) and (7) prove the first statement.

2) First we note that the vectors $Pe,Qe$ are independent. Since they are normal, their independence follows from

$cov(Pe,Qe)=EPee^{T}Q^{T}=\sigma ^{2}PQ=0.$

It's easy to see that $X^{T}P=X^{T}.$ This allows us to show that $\hat{\beta}$ is a function of $Pe$:

$\hat{\beta}=\beta +(X^{T}X)^{-1}X^{T}e=\beta +(X^{T}X)^{-1}X^{T}Pe.$

Independence of $Pe,Qe$ leads to independence of their functions $\hat{\beta}$ and $s^{2}.$

24
Aug 19

## Sylvester's criterion

Exercise 1. Suppose $A=CBC^{T},$ where $B=diag[d_{1},...,d_{n}]$ and $\det C\neq 0$. Then $A$ is positive if and only if all $d_{i}$ are positive.

Proof. For any $x\neq 0$ we have $x^{T}Ax=x^{T}CBC^{T}x=(C^{T}x)^{T}B(C^{T}x).$ Let $y=C^{T}x\neq 0.$ Then $x^{T}Ax=y^{T}By=\sum_{j}d_{j}y_{j}^{2}.$ This is positive for all $y\neq 0$ if and only if $\min_{j}d_{j}>0.$

Exercise 2 (modified Gaussian elimination). Suppose that $A$ is a real symmetric matrix with nonzero leading principal minors $d_{1},...,d_{n}$. Then $B=CAC^{T},$ where $B=diag[d_{1},d_{2}/d_{1},...,d_{n}/d_{n-1}]$ and $\det C=1$.

Proof. Review the transformation applied in Exercise 1 to obtain a triangular form. In that exercise, we eliminated element $a_{21}$ below $a_{11}$ by premultiplying $A$ by the matrix $C=I-\frac{a_{21}}{a_{11}} e_{2}^{T}e_{1}.$ Now after this we can post-multiply $A$ by the matrix $C^{T}=I-\frac{a_{21}}{a_{11}}e_{1}^{T}e_{2}.$ Because of the assumed symmetry of $A,$ we have $C^{T}=I-\frac{a_{12}}{a_{11}}e_{1}^{T}e_{2},$ so this will eliminate element $a_{12}$ to the right of $a_{11}$, see Exercise 2. Since in the first column $a_{21}$ is already $0$, the diagonal element $a_{22}$ will not change.

We can modify Exercise 1 by eliminating $a_{1j}$ immediately after eliminating $a_{j1}.$ The right sequencing of transformations is necessary to be able to apply Exercise 1: the matrix used for post-multiplication should be the transpose of the matrix used for premultiplication. If $C=C_{m}...C_{1},$ then $C^{T}=C_{1}^{T}...C_{m}^{T},$ which means that premultiplication by $C_{i}$ should be followed by post-multiplication by $C_{i}^{T}.$ In this way we can make zero all off-diagonal elements. The resulting matrix $B=diag[d_{1},d_{2}/d_{1},$ $...,d_{n}/d_{n-1}]$ is related to $A$ through $B=CAC^{T}.$

Theorem (Sylvester) Suppose that $A$ is a real symmetric matrix. Then $A$ is positive if and only if all its leading principal minors are positive.

Proof. Let's assume that all leading principal minors are positive. By Exercise 2, we have $A=C^{-1}B(C^{-1})^{T}$ where $\det C=1.$ It remains to apply Exercise 1 above to see that $A$ is positive.

Now suppose that $A$ is positive, that is $x^{T}Ax=\sum_{i,j=1}^{n}a_{ij}x_{i}x_{j}>0$ for any $x\neq 0.$ Consider cut-off matrices $A_{k}=\left( a_{ij}\right) _{i,j=1}^{k}.$ The corresponding cut-off quadratic forms $x^{T}A_{k}x=\sum_{i,j=1}^{k}a_{ij}x_{i}x_{j},$ $k=1,...,n,$ are positive for nonzero $x\in R^{k}.$ It follows that $A_{k}$ are non-singular because if $x\in N(A_{k}),$ then $x^{T}A_{k}x=0.$ Hence their determinants $d_{k}=\det A_{k},$ $k=1,...,n,$ are nonzero . This allows us to apply the modified Gaussian elimination (Exercise 2) and then Exercise 1 with $B=diag[d_{1},...,d_{n}/d_{n-1}].$ By Exercise 1 consecutively $d_{1}>0,$ $d_{2}>0,...,$ $d_{n}>0.$

Exercise 3. $A$ is negative if and only if the leading principal minors change signs, starting with minus: $d_{1}<0,$ $d_{2}>0,$ $d_{3}<0,...$

Proof. By definition, $A$ is negative if $-A$ is positive. Because of homogeneity of determinants, when we pass from $A$ to $-A,$ the minor of order $k$ gets multiplied by $(-1)^{k}.$ Thus, by Sylvester's criterion $A$ is negative if and only if $(-1)^{k}d_{k}>0,$ as required.

19
Aug 19

## Gaussian elimination method

Consider the system

$a_{11}x+a_{12}y+a_{13}z=a_{1},\\ a_{21}x+a_{22}y+a_{23}z=a_{2},\\ a_{31}x+a_{32}y+a_{33}z=a_{3}.$

If the first equation is nontrivial, then one of the coefficients is different from zero. Suppose it is $a_{11}.$ Adding the first equation multiplied by $-a_{21}/a_{11}$ to the second one we eliminate $x$ from it. Similarly, adding the first equation multiplied by $-a_{31}/a_{11}$ to the third one we eliminate $x$ from it. The system becomes

$a_{11}x+a_{12}y+a_{13}z=a_{1},\\ \quad \ \ \ b_{22}y+b_{23}z=b_{2},\\ \ \ \ \ b_{32}y+b_{33}z=b_{3},$

where $b$ with indexes stands for new numbers.

If $b_{22}\neq 0,$ we can use the second equation to eliminate $y$ from the third equation and the result will be

$a_{11}x+a_{12}y+a_{13}z=a_{1},\\ b_{22}y+b_{23}z=b_{2},\\ c_{33}z=c_{3}$

with some new $c$'s. If $c_{33}\neq 0,$ we can solve the system backwards, finding first $z$ from the third equation, then $y$ from the second and, finally $x$ from the first.

Notice that for the method to work it does not matter what happens with the vector at the right of the system. It only matters what happens to $A.$ That's why we focus on transformations of $A.$

## Theoretical treatment

Exercise 1. Denote

$d_{1}=a_{11},\ d_{2}=\det \left(\begin{array}{cc} a_{11} & a_{12} \\a_{21} & a_{22}\end{array}\right) ,..., \ d_{n}=\det A$

the leading principal minors of $A.$ If all of them are different from zero, then the Gaussian method reduces $A$ (by way of premultiplication of $A$ by elementary matrices) to the triangular matrix

$B=\left(\begin{array}{cccc}b_{11} & b_{12} & ... & b_{1n} \\ 0 & b_{22} & ... & b_{2n} \\... & ... & ... & ... \\0 & 0 & ... & b_{nn}\end{array}\right)$

where $b_{11}=d_{1},\ b_{22}=d_{2}/d_{1},...,\ b_{nn}=d_{n}/d_{n-1}.$

Proof by induction. Let $n=2.$ Premultiplying $A=\left(\begin{array}{cc} a_{11} & a_{12} \\a_{21} & a_{22}\end{array}\right)$ by $I-\frac{a_{21}}{a_{11}}e_{2}^{T}e_{1}$ we get the desired result:

$B=\left(\begin{array}{cc}a_{11} & a_{12} \\0 & a_{22}-\frac{a_{21}a_{12}}{a_{11}} \end{array}\right) =\left(\begin{array}{cc}d_{1} & a_{12} \\0 & \frac{d_{2}}{d_{1}} \end{array}\right) .$

Now let the statement hold for $n-1.$ By the induction assumption we can start with the matrix of form

$A=\left(\begin{array}{ccccc}d_{1} & b_{12} & ... & b_{1n} & b_{1n} \\ 0 & d_{2}/d_{1} & ... & b_{2n} & b_{2n} \\... & ... & ... & ... & ... \\0 & 0 & ... & d_{n-1}/d_{n-2} & b_{2n} \\ a_{n1} & a_{n2}&... & a_{n-1,n} & a_{nn} \end{array}\right) .$

Using the condition that all $d_{1},...,d_{n-1}$ are different from zero, we can make zero the elements $a_{n1}, ..., a_{n-1,n}$. The result is:

$B=\left(\begin{array}{ccccc}d_{1} & b_{12} & ... & b_{1n} & b_{1n} \\ 0 & d_{2}/d_{1} & ... & b_{2n} & b_{2n} \\... & ... & ... & ... & ... \\0 & 0 & ... & d_{n-1}/d_{n-2} & b_{2n} \\ 0 & 0 & ... & 0 & b_{nn}\end{array}\right) .$

The determinant of a triangular matrix equals the product of diagonal elements because of the cross-out rule:

(1) $\det B=d_{1}(d_{2}/d_{1})...(d_{n-1}/d_{n-2})b_{nn}=d_{n-1}b_{nn}$

(for example, if a product contains $b_{1n}$, you should cross out the first row and then the product should contain one of the zeros below $d_1$).

On the other hand, $B$ is a result of premultiplication by elementary matrices: $B=C_{1}...C_{n-1}A$ where by Exercise 1 $\det C_{i}=1$ for all $i.$ Hence,

(2) $\det B=\det A=d_{n}.$

Combining (1) and (2) we get $b_{nn}=d_{n}/d_{n-1},$ which concludes the inductive argument.

19
Aug 19

## Elementary transformations

Here we look at matrix representations of transformations called elementary.

Exercise 1. Let $e_{i}$ denote the $i$-th unit row vector and let $A$ be an arbitrary matrix. Then

a) premultiplication of $A$ by $e_{j}$ cuts out of $A$ the $j$-th row $A_{j}.$

b) Premultiplication of $A$ by $e_{i}^{T}e_{j}$ puts the row $A_{j}$ into the $i$-th row of the null matrix.

c) Premultiplication of $A$ by $I+ce_{i}^{T}e_{j}$ adds row $A_{j}$ multiplied by $c$ to row $A_{i},$ without changing the other rows of $A.$

d) The matrix $I+ce_{i}^{T}e_{j}$ has determinant 1.

Proof. a) It's easy to see that

$e_{j}A=\left( 0...0~1~0...0\right) \left(\begin{array}{ccc}a_{11} & ... & a_{1n} \\ ... & ... & ... \\a_{j1} & ... & a_{jn} \\... & ... & ... \\a_{n1} & ... & a_{nn}\end{array} \right) =\left(\begin{array}{ccc}a_{j1} & ... & a_{jn}\end{array}\right) =A_{j}.$

b) Obviously,

$e_{i}^{T}e_{j}A=\left(\begin{array}{c}0 \\... \\0 \\1 \\0 \\... \\0\end{array} \right) \left(\begin{array}{ccc}a_{j1} & ... & a_{jn}\end{array} \right) =\left(\begin{array}{ccc}0 & ... & 0 \\... & ... & ... \\a_{j1} & ... & a_{jn} \\... & ... & ... \\ 0 & ... & 0\end{array} \right) =\left(\begin{array}{c}\Theta \\A_{j} \\\Theta\end{array} \right)$

($A_{j}$ in the $i$-th row, $\Theta$ denotes null matrices of conformable dimensions)

c) $(I+ce_{i}^{T}e_{j})A=A+ce_{i}^{T}e_{j}A=\left(\begin{array}{c} A_{1} \\... \\A_{i} \\... \\A_{n}\end{array} \right) +\left(\begin{array}{c}\Theta \\... \\cA_{j} \\... \\\Theta\end{array} \right) =\left(\begin{array}{c}A_{1} \\... \\A_{i}+cA_{j} \\ ... \\A_{n}\end{array}\right) .$

d) The matrix $A=I+ce_{i}^{T}e_{j}$ has ones on the main diagonal and only one nonzero element $a_{ij}=c$ outside it. By the Leibniz formula it's determinant is 1.

The reader can easily solve the next

Exercise 2. a) Postmultiplication of $A$ by $e_{j}^{T}$ cuts out of $A$ the $j$-th column $A^{(j)}.$

b) Postmultiplication of $A$ by $e_{j}^{T}e_{i}$ puts the column $A^{(j)}$ into the $i$-th column of the null matrix.

c) Postmultiplication of $A$ by $I+ce_{j}^{T}e_{i}$ adds column $A^{(j)}$ multiplied by $c$ to column $A^{(i)},$ without changing the other columns of $A.$

d) The matrix $I+ce_{j}^{T}e_{i}=(I+ce_{i}^{T}e_{j})^{T}$ has determinant 1.

Exercise 3. a) Premultiplication of $A$ by

(1) $\left(\begin{array}{c}e_{1} \\... \\e_{j} \\... \\e_{i} \\... \\e_{n}\end{array} \right)$

permutes rows $A_{i},A_{j}.$

b) Postmultiplication of $A$ by the transpose of (1) permutes columns $A^{(i)},A^{(j)}.$

This is a general property of permutation matrices. Recall also that their determinants can be only $\pm 1.$

Definition. 1) Adding some row multiplied by a constant to another row or 2) adding some column multiplied by a constant to another column or 3) permuting rows or columns is called an elementary operation. Accordingly, matrices that realize them are called elementary matrices.

2
Aug 19

## Main theorem: Jordan normal form

By Exercise 1, it is sufficient to show that in each root subspace the matrix takes the Jordan form.

Step 1. Take a basis

(1) $x_{1,p},...,x_{k,p}\in N_{\lambda }^{(p)}$ independent relative to $N_{\lambda }^{(p-1)}.$

Consecutively define

(2) $x_{1,p-1}=(A-\lambda I)x_{1,p},\ ...,\ x_{k,p-1}=(A-\lambda I)x_{k,p}\in N_{\lambda }^{(p-1)},$

...

(3) $x_{1,1}=(A-\lambda I)x_{1,2},\ ...,\ x_{k,1}=(A-\lambda I)x_{k,2}\in N_{\lambda }^{(1)}.$

Exercise 1. The vectors in (2) are linearly independent relative to $N_{\lambda }^{(p-2)},...,$ the vectors in (3) are linearly independent.

Proof. Consider (2), for example. Suppose that $\sum_{j=1}^{k}a_{j}x_{j,p-1} \in N_{\lambda }^{(p-2)}.$ Then

$0=(A-\lambda I)^{p-2}\sum_{j=1}^{k}a_{j}x_{j,p-1}=(A-\lambda I)^{p-1}\sum_{j=1}^{k}a_{j}x_{j,p}.$

The conclusion that $\sum_{j=1}^{k}a_{j}x_{j,p}\in N_{\lambda}^{(p-1)}$ contradicts assumption (1).

Exercise 2. The system of $kp$ vectors listed in (1)-(3) is linearly independent, so that its span $L_{x}$ is of dimension $kp.$

Proof. Suppose $\sum_{j=1}^{k}a_{j,p}x_{j,p}+...+\sum_{j=1}^{k}a_{j,1}x_{j,1}=0.$ Then by inclusion relations

$\sum_{j=1}^{k}a_{j,p}x_{j,p}=-\sum_{j=1}^{k}a_{j,p-1}x_{j,p-1}-...- \sum_{j=1}^{k}a_{j,1}x_{j,1}\in N_{\lambda }^{(p-1)}$

which implies $a_{j,p}=0$ for $j=1,...,k,$ by relative independence stated in (1). This process can be continued by Exercise 1 to show that all coefficients are zeros.

Next we show that in each of $N_{\lambda }^{(p)},...,N_{\lambda}^{(1)}$ we can find a basis relative to the lower indexed subspace $N_{\lambda }^{(p-1)},...,N_{\lambda }^{(0)}=\{0\}.$ According to (1), in $N_{\lambda }^{(p)}$ we already have such a basis. If the vectors in (2) constitute such a basis in $N_{\lambda }^{(p-1)}$, we consider $N_{\lambda }^{(p-2)}.$

Step 2. If not, by Exercise 3 we can find vectors

$y_{1,p-1},...,y_{l,p-1}\in N_{\lambda }^{(p-1)}$

such that

$x_{1,p-1},...,x_{k,p-1},y_{1,p-1},...,y_{l,p-1}$ represent a basis relative to $N_{\lambda }^{(p-2)}.$

Then we can define

$y_{1,p-2}=(A-\lambda I)y_{1,p-1},\ ...,\ y_{l,p-2}=(A-\lambda I)y_{l,p-1}\in N_{\lambda }^{(p-2)},$

...

$y_{1,1}=(A-\lambda I)y_{1,2},\ ...,\ y_{l,1}=(A-\lambda I)y_{l,2}\in N_{\lambda }^{(1)}.$

By Exercise 2, the $y$'s defined here are linearly independent. But we can show more:

Exercise 3. All $x$'s from Step 1 combined with the $y$'s from Step 2 are linearly independent.

The proof is analogous to that of Exercise 2.

Denote $L_{y}$ the span of vectors introduced in Step 2. $L_{x}\cap L_{y}=\{0\}$ because they have different bases. Therefore we can consider a direct sum $L_{x}\dotplus L_{y}.$ Repeating Step 2 as many times as necessary, after the last step we obtain a subspace, say, $L_{z},$ such that $N_{\lambda }^{(p)}=L_{x}\dotplus L_{y}\dotplus ...\dotplus L_{z}.$ The restrictions of $A$ onto the subspaces on the right is described by Jordan cells with the same $\lambda$ and of possibly different dimensions. We have proved the following theorem:

Theorem (Jordan form) For a matrix $A$ in $C^{n}$ one can find a basis in which $A$ can be written as a block-diagonal matrix

(1) $A=\left(\begin{array}{ccc}A_{1} & ... & ... \\... & ... & ... \\... & ... & A_{m}\end{array}\right) .$

Here $A_{i}$ are (square) Jordan cells, with possibly different lambdas on the main diagonal and of possibly different sizes, and all off-diagonal blocks are zero matrices of compatible dimensions.

31
Jul 19

## Playing with bases

Here is one last push before the main result. Exercise 1 is one of the basic facts about bases.

Exercise 1. In $C^{n}$ any system of $k linearly independent vectors $x_{1},...,x_{k}$ can be completed to form a basis.

Proof. Let $e_{1},...,e_{n}$ be a basis in $C^{n}.$ If each of $e_{1},...,e_{n}$ belongs to $span(x_{1},...,x_{k}),$ then by the lemma we would have $n\leq k,$ contradicting the assumption $k Hence, among $e_{1},...,e_{n}$ there is at least on vector that does not belong to $span(x_{1},...,x_{k}).$ We can add it to $x_{1},...,x_{k}$ denoting it $x_{k+1}.$

Suppose $\sum_{j=1}^{k+1}a_{j}x_{j}=0.$ Since $x_{k+1}$ is independent of the other vectors, we have $a_{k+1}=0$ but then because of independence of $x_{1},...,x_{k}$ all other coefficients are zero. Thus, $x_{1},...,x_{k},x_{k+1}$ are linearly independent.

If $k+1 we can repeat the addition process, until we obtain $n$ linearly independent vectors $x_{1},...,x_{n}.$ By construction, $e_{1},...,e_{n}$ belong to $span(x_{1},...,x_{n}).$ Since $e_{1},...,e_{n}$ span $C^{n},$ $x_{1},...,x_{n}$ do too and therefore form a basis.

Definition 1. Let $L_{1}\subset L_{2}$ be two subspaces. Vectors $x_{1},...,x_{m}\in L_{2}$ are called linearly independent relative to $L_{1}$ if any nontrivial linear combination $\sum a_{j}x_{j}$ does not belong to $L_{1}.$ For the purposes of this definition, it is convenient to denote by $\Theta$ a generic element of $L_{1}.$ $\Theta$ plays the role of zero and the definition looks similar to usual linear independence: $\sum a_{j}x_{j}\neq \Theta$ for any nonzero vector $a.$ Rejecting this definition, we can say that $x_{1},...,x_{m}\in L_{2}$ are called linearly dependent relative to $L_{1}$ if $\sum a_{j}x_{j}=\Theta$ for some nonzero vector $a.$

Definition 2. Let $L_{1}\subset L_{2}$ be two subspaces. Vectors $x_{1},...,x_{m}\in L_{2}$ are called a basis relative to $L_{1}$ if they are linearly independent and can be completed by some basis from $L_{1}$ to form a basis in $L_{2}.$

Exercise 2. Show existence of a relative basis in $L_{2}.$

Proof. Take any basis in $L_{1}$ (say, $x_{1},...,x_{k}$) and, using Exercise 1, complete it by some vectors (say, $x_{k+1},...,x_{n}\in L_{2}$) to get a basis in $L_{2}.$ Then, obviously, $x_{k+1},...,x_{n}$ form a basis in $L_{2}$ relative to $L_{1}.$ Besides, none of $x_{k+1},...,x_{n}$ belongs to $L_{1}.$

Exercise 3. Any system of vectors $x_{1},...,x_{k}\in L_{2}$ linearly independent relative to $L_{1}$ can be completed to form a relative basis in $L_{2}.$

Proof. Take a basis in $L_{1}$ (say, $x_{k+1},...,x_{l}$) and add it to $x_{1},...,x_{k}.$ The resulting system $x_{1},...,x_{k},x_{k+1},...,x_{l}$ is linearly independent. Indeed, if $\sum_{j=1}^{l}a_{j}x_{j}=0,$ then

$\sum_{j=1}^{k}a_{j}x_{j}=-\sum_{j=k+1}^{l}a_{j}x_{j}\in L_{1}.$

By assumption of relative linear independence $a_{1}=...=a_{k}=0$ but then the remaining coefficients are also zero.

By Exercise 1 we can find $x_{l+1},...,x_{n}$ such that $x_{1},...,x_{k},x_{k+1},...,x_{l},x_{l+1},...,x_{n}$ is a basis in $L_{2}.$ Now the system $x_{1},...,x_{k},x_{l+1},...,x_{n}$ is a relative basis because these vectors are linearly independent and together with $x_{k+1},...,x_{l}\in L_{1}$ form a basis in $L_{2}.$

31
Jul 19

## Chipping off root subspaces

The basis in which a matrix is diagonal consists of eigenvectors. Therefore if the number of linearly independent eigenvectors is less than $n,$ such a matrix cannot be diagonalized.

The general result we are heading to is that any matrix in $C^{n}$ in an appropriately chosen basis can be written as a block-diagonal matrix

(1) $A=\left(\begin{array}{ccc}A_{1} & ... & ... \\ ... & ... & ... \\... & ... & A_{m}\end{array}\right) .$

Here $A_{i}$ are Jordan cells, with possibly different lambdas on the main diagonal and of possibly different sizes, and all off-diagonal blocks are zero matrices of compatible dimensions. The next exercise is an intermediate step towards that result.

Exercise 1. Let $A$ have $k$ different eigenvalues $\lambda _{1},...,\lambda_{k}.$ Then $C^{n}$ can be represented as a direct sum of $k$ invariant subspaces

(2) $C^{n}=N_{\lambda _{1}}^{(p_{1})}\dotplus ...\dotplus N_{\lambda _{k}}^{(p_{k})}.$

The subspace $N_{\lambda _{i}}^{(p_{i})}$ consists of only root vectors belonging to the eigenvalue $\lambda _{i}.$

Proof. By Exercise 2 we have $C^{n}=N_{\lambda _{1}}^{(p_{1})}\dotplus L_{1}$ where the subspace $L_{1}$ has two properties: it is invariant with respect to $A$ and the restriction of $A$ to $L_{1}$ does not have $\lambda _{1}$ as an eigenvalue. $N_{\lambda _{1}}^{(p_{1})}$ consists of root vectors belonging to $\lambda _{1}.$ Applying Exercise 2 to $L_{1}$ we get $L_{1}=N_{\lambda _{2}}^{(p_{2})}\dotplus L_{2}$ where $N_{\lambda_{2}}^{(p_{2})},L_{2}$ have similar properties. Applying Exercise 2 $k$
times we finish the proof.

Note that the restriction of $A$ onto $N_{\lambda _{i}}^{(p_{i})}$ may not be described by a single Jordan cell. A matrix may have more than one Jordan cell with the same eigenvalue. To make this point clearer, note that the matrices

$\left(\begin{array}{cccc}\lambda & 1 & 0 & 0 \\ 0 & \lambda & 0 & 0 \\0 & 0 & \lambda & 1 \\0 & 0 & 0 & \lambda\end{array} \right)$   and   $\left(\begin{array}{cccc}\lambda & 1 & 0 & 0 \\ 0 & \lambda & 1 & 0 \\0 & 0 & \lambda & 1 \\0 & 0 & 0 & \lambda\end{array}\right)$

are not the same (the first matrix has two Jordan cells on the main diagonal and the second one is itself a Jordan cell). It will take some effort to get from (2) to (1).

Exercise 4. Show that for the matrix from Exercise 3 $\det (A-\lambda _{1}I)=\left(\lambda -\lambda _{1}\right) ^p$ ($\lambda _{1}$ - any number).

Exercise 5. In addition to Exercise 4, show that in $N_{\lambda }^{(p)}$ the matrix $A$ has only one eigenvector, up to a scaling factor. Hint: use the Jordan cell.

30
Jul 19

## Action of a matrix in its root subspace

The purpose of the following discussion is to reveal the matrix form of $A$ in $N_{\lambda }^{(p)}.$

Definition 1. Nonzero elements of $N_{\lambda }^{(p)}$ are called root vectors. This definition can be detailed as follows:

Elements of $N_{\lambda }^{(1)}\setminus \{0\}$ are eigenvalues.

Elements of $N_{\lambda }^{(2)}\setminus N_{\lambda }^{(1)}$ are called root vectors of 1st order.

...

Elements of $N_{\lambda }^{(p)}\setminus N_{\lambda }^{(p-1)}$ are called root vectors of order $p-1$.

Thus, root vectors belong to

$N_{\lambda }^{(p)}\setminus \{0\}=\left( N_{\lambda }^{(p)}\setminus N_{\lambda }^{(p-1)}\right) \cup ...\cup \left( N_{\lambda }^{(2)}\setminus N_{\lambda }^{(1)}\right) \cup \left( N_{\lambda }^{(1)}\setminus \{0\}\right)$

where the sets of root vectors of different orders do not intersect.

Exercise 1. $(A-\lambda I)\left( N_{\lambda }^{(k)}\setminus N_{\lambda }^{(k-1)}\right) \subset \left( N_{\lambda }^{(k-1)}\setminus N_{\lambda}^{(k-2)}\right) .$

Proof. Suppose $x\in N_{\lambda }^{(k)}\setminus N_{\lambda }^{(k-1)},$ that is, $(A-\lambda I)^{k}x=0$ and $(A-\lambda I)^{k-1}x\neq 0.$ Denoting $y=(A-\lambda I)x,$ we have $(A-\lambda I)^{k-1}y=0$ and $(A-\lambda I)^{k-2}y\neq 0,$ which means that $y\in N_{\lambda }^{(k-1)}\setminus N_{\lambda }^{(k-2)}$ and $A-\lambda I$ maps $N_{\lambda }^{(k)}\setminus N_{\lambda }^{(k-1)}$ into $N_{\lambda }^{(k-1)}\setminus N_{\lambda}^{(k-2)}.$

Now, starting from some $x_{p}\in N_{\lambda }^{(p)}\setminus N_{\lambda}^{(p-1)},$ we extend a chain of root vectors all the way to an eigenvector. By Exercise 1, the vector $x_{p-1}=(A-\lambda I)x_{p}$ belongs to $N_{\lambda }^{(p-1)}\setminus N_{\lambda }^{(p-2)}.$ From the definition of $x_{p-1}$ we see that

(1) $Ax_{p}=\lambda x_{p}+x_{p-1},$   $x_{p-1}\in N_{\lambda}^{(p-1)}\setminus N_{\lambda }^{(p-2)}$

($x_{p}$ is an "eigenvector" up to a root vector of lower order). Similarly, denoting $x_{p-2}=(A-\lambda I)x_{p-1},$ we have

(2) $Ax_{p-1}=\lambda x_{p-1}+x_{p-2},$   $x_{p-2}\in N_{\lambda}^{(p-2)}\setminus N_{\lambda }^{(p-3)}.$

...

Continuing in the same way, we get $x_{1}=(A-\lambda I)x_{2}\in N_{\lambda}^{(1)}\setminus \{0\},$

(3) $Ax_{2}=\lambda x_{2}+x_{1},$   $x_{1}\in N_{\lambda}^{(1)}\setminus \{0\},$

(4) $Ax_{1}=\lambda x_{1},$   $x_{1}\neq 0.$

Exercise 2. The vectors $x_{1},...,x_{p}$ defined above are linearly independent.

Proof. If $\sum_{j=1}^{p}a_{j}x_{j}=0,$ then $a_{p}x_{p}=-\sum_{j=1}^{p-1}a_{j}x_{j}.$ Here the left side belongs to $N_{\lambda}^{(p)}\setminus N_{\lambda }^{(p-1)}$ and the right side belongs to $N_{\lambda }^{(p-1)}$ because of inclusion relations. Hence, $a_{p}=0.$ Similarly, all other coefficients are zero.

By Exercise 2, the vectors $x_{1},...,x_{p}$ form a basis in $L=span(x_{1},...,x_{p}).$

Exercise 3. The transformation $A$ in $L$ is given by the matrix

(5) $A=\left(\begin{array}{ccccc}\lambda & 1 & 0 & ... & 0 \\0 & \lambda & 1 & ... & 0 \\ ... & ... & ... & ... & ... \\0 & 0 & 0 & ... & 1 \\0 & 0 & 0 & ... & \lambda\end{array} \right) =\lambda I+J$

where $J$ is a matrix with unities in the first superdiagonal and zeros everywhere else.

Proof. Since $x_{1},...,x_{p}$ is taken as the basis, $x_{i}$ can be identified with the unit column-vector $e_{i}.$ The equations (1)-(4) take the form

$Ae_{1}=\lambda e_{1},$ $Ae_{j}=\lambda e_{j}+e_{j-1},$ $j=2,...,p.$

Putting these equations side by side we get

$AI=A\left( e_{1},...,e_{p}\right) =\left( Ae_{1},...,Ae_{p}\right) =\left(\lambda e_{1},\lambda e_{2}+e_{1},...,\lambda e_{p}+e_{p-1}\right) =$

=$\left(\begin{array}{cccc}\lambda & 1 & ... & 0 \\0 & \lambda & ... & 0 \\ ... & ... & ... & ... \\0 & 0 & ... & 1 \\0 & 0 & ... & \lambda\end{array}\right) .$

Definition 2. The matrix in (5) is called a Jordan cell.

30
Jul 19

## Properties of root subspaces

Let $A$ be a square matrix and let $\lambda \in \sigma (A)$ be its eigenvalue. As we know, the nonzero elements of the null space $N(A-\lambda I)=\{x:(A-\lambda I)x=0\}$ are the corresponding eigenvectors. This definition is generalized as follows.

Definition 1. The subspaces $N_{\lambda }^{(k)}=N((A-\lambda I)^{k}),$ $k=1,2,...$ are called root subspaces of $A$ corresponding to $\lambda .$

Exercise 1. a) Root subspaces are increasing:

(1) $N_{\lambda }^{(k)}\subset N_{\lambda }^{(k+1)}$ for all $k\geq 1$

and b) there is such $p\leq n$ that all inclusions (1) are strict for $k and

(2) $N_{\lambda }^{(p)}=N_{\lambda }^{(p+1)}=...$

Proof. a) If $x\in N_{\lambda }^{(k)}$ for some $k,$ then $(A-\lambda I)^{k+1}x=(A-\lambda I)(A-\lambda I)^{k}x=0,$ which shows that $x\in N_{\lambda }^{(k+1)}.$

b) (1) implies $\dim N_{\lambda }^{(k)}\leq \dim N_{\lambda }^{(k+1)}.$ Since all root subspaces are contained in $C^{n},$ there are $k$ such that $N_{\lambda }^{(k)}=N_{\lambda }^{(k+1)}.$ Let $p$ be the smallest such $k.$ Then all inclusions (1) are strict for $k

Suppose $N_{\lambda}^{(k+1)}\setminus N_{\lambda }^{(k)}\neq \varnothing$ for some $k\ge p.$ Then there exists $x\in N_{\lambda }^{(k+1)}$ such that $x\notin N_{\lambda}^{(k)}$, that is, $(A-\lambda I)^{k+1}x=0,$ $(A-\lambda I)^{k}x\neq 0.$ Put $y=(A-\lambda I)^{k-p}x.$ Then $(A-\lambda I)^{p+1}y=(A-\lambda I)^{k+1}x=0,$ $(A-\lambda I)^{p}y=(A-\lambda I)^{k}x\notin 0.$ This means
that $y\in N_{\lambda }^{(p+1)}\setminus N_{\lambda }^{(p)}$ which contradicts the definition of $p.$

Definition 2. Property (2) can be called stabilization. The number $p$ from (2) is called a height of the eigenvalue $\lambda$.

Exercise 2. Let $\lambda \in \sigma (A)$ and let $p$ be the number from Exercise 1. Then

(3) $C^{n}=N_{\lambda }^{(p)}\dotplus \text{Img}[(A-\lambda I)^{p}].$

Proof. By the rank-nullity theorem applied to $(A-\lambda I)^{p}$ we have $n=\dim N_{\lambda }^{(p)}+\dim \text{Img}[(A-\lambda I)^{p}].$ By Exercise 3, to prove (3) it is sufficient to establish that $L\equiv N_{\lambda}^{(p)}\cap \text{Img}[(A-\lambda I)^{p}]=\{0\}.$ Let's assume that $L$ contains a nonzero vector $x.$ Then we have $x=(A-\lambda I)^{p}y$ for some $y.$ We obtain two facts:

$(A-\lambda I)^{p}y\neq 0$ $\Longrightarrow y\notin N_{\lambda }^{(p)},$

$(A-\lambda I)^{2p}y=(A-\lambda I)^{p}(A-\lambda I)^{p}y=(A-\lambda I)^{p}x=0\Longrightarrow y\in N_{\lambda }^{(2p)}.$

It follows that $y$ is a nonzero element of $N_{\lambda }^{(2p)}\setminus N_{\lambda }^{(p)}.$ This contradicts (2). Hence, the assumption $L\neq \{0\}$ is wrong, and (3) follows.

Exercise 3. Both subspaces at the right of (3) are invariant with respect to $A.$

Proof. If $x\in N_{\lambda }^{(p)},$ then by commutativity of $A$ and $A-\lambda I$ we have $(A-\lambda I)^{p}Ax=A(A-\lambda I)^{p}x=0,$ so $Ax\in N_{\lambda }^{(p)}.$

Suppose $x\in \text{Img}[(A-\lambda I)^{p}],$ so that $x=(A-\lambda I)^{p}y$ for some $y.$ Then $Ax=(A-\lambda I)^{p}Ay\in \text{Img}[(A-\lambda I)^{p}].$

Exercise 3 means that, for the purpose of further analyzing $A,$ we can consider its restrictions onto $N_{\lambda }^{(p)}$ and $\text{Img}[(A-\lambda I)^{p}].$

Exercise 4. The restriction of $A$ onto $N_{\lambda }^{(p)}$ does not have eigenvalues other than $\lambda .$

Proof. Suppose $Ax=\mu x,$ $x\neq 0,$ for some $\mu .$ Since $x\in N_{\lambda }^{(p)},$ we have $(A-\lambda I)^{p}x=0.$ Then $(A-\lambda I)x=(\mu -\lambda )x$ and $0=(A-\lambda I)^{p}x=(\mu -\lambda )^{p}x$. This implies $\mu =\lambda .$

Exercise 5. The restriction of $A$ onto $\text{Img}[(A-\lambda I)^{p}]$ does not have $\lambda$ as an eigenvalue (so that $A-\lambda I$ is invertible).

Proof. Suppose $x\in \text{Img}[(A-\lambda I)^{p}]$ and $Ax=\lambda x,$ $x\neq 0.$ Then $x=(A-\lambda I)^{p}y$ for some $y\neq 0$ and $0=(A-\lambda I)x=(A-\lambda I)^{p+1}y.$ By Exercise 1 $y\in N_{\lambda }^{(p+1)}=N_{\lambda }^{(p)}$ and $x=(A-\lambda I)^{p}y=0.$ This contradicts the choice of $x.$

30
Jul 19

## Direct sums of subspaces

The definition of an orthogonal sum $L=L_{1}\oplus L_{2}$ requires two things: 1) every element $l\in L$ can be decomposed as $l=l_{1}+l_{2}$ with $l_{i}\in L_{i},$ $i=1,2,$ and 2) every element of $L_{1}$ is orthogonal to every element of $L_{2}.$ Orthogonality of $L_{1}$ to $L_{2}$ implies $L_{1}\cap L_{2}=\{0\}$ which, in turn, guarantees uniqueness of the representation $l=l_{1}+l_{2}.$ If we drop the orthogonality requirement but retain 1) and $L_{1}\cap L_{2}=\{0\},$ we get the definition of a direct sum.

Definition. Let $L_{1},L_{2}$ be two subspaces such that $L_{1}\cap L_{2}=\{0\}.$ The set $L=\{l_{1}+l_{2}:$ $l_{i}\in L_{i},$ $i=1,2\}$ is called a direct sum of $L_{1},L_{2}$ and denoted $L=L_{1}\dotplus L_{2}$.  The condition $L_{1}\cap L_{2}=\{0\}$ provides uniqueness of the representation $l=l_{1}+l_{2}.$

Exercise 1. Let $L=L_{1}\dotplus L_{2}.$ If $l\in L$ is decomposed as $l=l_{1}+l_{2}$ with $l_{i}\in L_{i},$ $i=1,2,$ define $P_{1}l=l_{1}.$ Then $P_{1}$ is linear and satisfies $P_{1}^{2}=P_{1},$ $\text{Img}(P_{1})=L_{1},$ $N(P_{1})=L_{2}$.

Under conditions of Exercise 1, $P_{1}$ is an oblique projector of $L$ onto $L_{1}$ parallel to $L_{2}.$

Exercise 2. Prove that dimension additivity extends to direct sums: if $L=L_{1}\dotplus L_{2},$ then $\dim L=\dim L_{1}+\dim L_{2}.$

Exercise 3. Let $L_{1},L_{2}$ be two subspaces of $C^{n}$ and suppose $n=\dim L_{1}+\dim L_{2}.$ Then to have $C^{n}=L_{1}+L_{2}$ it is sufficient to check that $L_{1}\cap L_{2}=\{0\},$ in which case $L=L_{1}\dotplus L_{2}$

Proof. Denote $L=L_{1}\dotplus L_{2}.$ By Exercise 2, $\dim L=\dim L_{1}+\dim L_{2}=n.$ If $C^{n}\setminus L$ is not empty, then we can complete a basis in $L$ with a nonzero vector from $C^{n}\setminus L$ to see that $\dim C^{n}\geq n+1$ which is impossible.