Feb 22

Distribution of the estimator of the error variance

Distribution of the estimator of the error variance

If you are reading the book by Dougherty: this post is about the distribution of the estimator  s^2 defined in Chapter 3.

Consider regression

(1) y=X\beta +e

where the deterministic matrix X is of size n\times k, satisfies \det  \left( X^{T}X\right) \neq 0 (regressors are not collinear) and the error e satisfies

(2) Ee=0,Var(e)=\sigma ^{2}I

\beta is estimated by \hat{\beta}=(X^{T}X)^{-1}X^{T}y. Denote P=X(X^{T}X)^{-1}X^{T}, Q=I-P. Using (1) we see that \hat{\beta}=\beta +(X^{T}X)^{-1}X^{T}e and the residual r\equiv y-X\hat{\beta}=Qe. \sigma^{2} is estimated by

(3) s^{2}=\left\Vert r\right\Vert ^{2}/\left( n-k\right) =\left\Vert  Qe\right\Vert ^{2}/\left( n-k\right) .

Q is a projector and has properties which are derived from those of P

(4) Q^{T}=Q, Q^{2}=Q.

If \lambda is an eigenvalue of Q, then multiplying Qx=\lambda x by Q and using the fact that x\neq 0 we get \lambda ^{2}=\lambda . Hence eigenvalues of Q can be only 0 or 1. The equation tr\left( Q\right) =n-k
tells us that the number of eigenvalues equal to 1 is n-k and the remaining k are zeros. Let Q=U\Lambda U^{T} be the diagonal representation of Q. Here U is an orthogonal matrix,

(5) U^{T}U=I,

and \Lambda is a diagonal matrix with eigenvalues of Q on the main diagonal. We can assume that the first n-k numbers on the diagonal of Q are ones and the others are zeros.

Theorem. Let e be normal. 1) s^{2}\left( n-k\right) /\sigma ^{2} is distributed as \chi _{n-k}^{2}. 2) The estimators \hat{\beta} and s^{2} are independent.

Proof. 1) We have by (4)

(6) \left\Vert Qe\right\Vert ^{2}=\left( Qe\right) ^{T}Qe=\left(  Q^{T}Qe\right) ^{T}e=\left( Qe\right) ^{T}e=\left( U\Lambda U^{T}e\right)  ^{T}e=\left( \Lambda U^{T}e\right) ^{T}U^{T}e.

Denote S=U^{T}e. From (2) and (5)

ES=0, Var\left( S\right) =EU^{T}ee^{T}U=\sigma ^{2}U^{T}U=\sigma ^{2}I

and S is normal as a linear transformation of a normal vector. It follows that S=\sigma z where z is a standard normal vector with independent standard normal coordinates z_{1},...,z_{n}. Hence, (6) implies

(7) \left\Vert Qe\right\Vert ^{2}=\sigma ^{2}\left( \Lambda z\right)  ^{T}z=\sigma ^{2}\left( z_{1}^{2}+...+z_{n-k}^{2}\right) =\sigma ^{2}\chi  _{n-k}^{2}.

(3) and (7) prove the first statement.

2) First we note that the vectors Pe,Qe are independent. Since they are normal, their independence follows from

cov(Pe,Qe)=EPee^{T}Q^{T}=\sigma ^{2}PQ=0.

It's easy to see that X^{T}P=X^{T}. This allows us to show that \hat{\beta} is a function of Pe:

\hat{\beta}=\beta +(X^{T}X)^{-1}X^{T}e=\beta +(X^{T}X)^{-1}X^{T}Pe.

Independence of Pe,Qe leads to independence of their functions \hat{\beta} and s^{2}.


Aug 19

Sylvester's criterion

Sylvester's criterion

Exercise 1. Suppose A=CBC^{T}, where B=diag[d_{1},...,d_{n}] and \det C\neq 0. Then A is positive if and only if all d_{i} are positive.

Proof. For any x\neq 0 we have x^{T}Ax=x^{T}CBC^{T}x=(C^{T}x)^{T}B(C^{T}x). Let y=C^{T}x\neq 0. Then x^{T}Ax=y^{T}By=\sum_{j}d_{j}y_{j}^{2}. This is positive for all y\neq 0 if and only if \min_{j}d_{j}>0.

Exercise 2 (modified Gaussian elimination). Suppose that A is a real symmetric matrix with nonzero leading principal minors d_{1},...,d_{n}. Then B=CAC^{T}, where B=diag[d_{1},d_{2}/d_{1},...,d_{n}/d_{n-1}] and \det C=1.

Proof. Review the transformation applied in Exercise 1 to obtain a triangular form. In that exercise, we eliminated element a_{21} below a_{11} by premultiplying A by the matrix C=I-\frac{a_{21}}{a_{11}} e_{2}^{T}e_{1}. Now after this we can post-multiply A by the matrix C^{T}=I-\frac{a_{21}}{a_{11}}e_{1}^{T}e_{2}. Because of the assumed symmetry of A, we have C^{T}=I-\frac{a_{12}}{a_{11}}e_{1}^{T}e_{2}, so this will eliminate element a_{12} to the right of a_{11}, see Exercise 2. Since in the first column a_{21} is already 0, the diagonal element a_{22} will not change.

We can modify Exercise 1 by eliminating a_{1j} immediately after eliminating a_{j1}. The right sequencing of transformations is necessary to be able to apply Exercise 1: the matrix used for post-multiplication should be the transpose of the matrix used for premultiplication. If C=C_{m}...C_{1}, then C^{T}=C_{1}^{T}...C_{m}^{T}, which means that premultiplication by C_{i} should be followed by post-multiplication by C_{i}^{T}. In this way we can make zero all off-diagonal elements. The resulting matrix B=diag[d_{1},d_{2}/d_{1}, ...,d_{n}/d_{n-1}] is related to A through B=CAC^{T}.

Theorem (Sylvester) Suppose that A is a real symmetric matrix. Then A is positive if and only if all its leading principal minors are positive.

Proof. Let's assume that all leading principal minors are positive. By Exercise 2, we have A=C^{-1}B(C^{-1})^{T} where \det C=1. It remains to apply Exercise 1 above to see that A is positive.

Now suppose that A is positive, that is x^{T}Ax=\sum_{i,j=1}^{n}a_{ij}x_{i}x_{j}>0 for any x\neq 0. Consider cut-off matrices A_{k}=\left( a_{ij}\right) _{i,j=1}^{k}. The corresponding cut-off quadratic forms x^{T}A_{k}x=\sum_{i,j=1}^{k}a_{ij}x_{i}x_{j}, k=1,...,n, are positive for nonzero x\in R^{k}. It follows that A_{k} are non-singular because if x\in N(A_{k}), then x^{T}A_{k}x=0. Hence their determinants d_{k}=\det A_{k}, k=1,...,n, are nonzero . This allows us to apply the modified Gaussian elimination (Exercise 2) and then Exercise 1 with B=diag[d_{1},...,d_{n}/d_{n-1}]. By Exercise 1 consecutively d_{1}>0, d_{2}>0,..., d_{n}>0.

Exercise 3. A is negative if and only if the leading principal minors change signs, starting with minus: d_{1}<0, d_{2}>0, d_{3}<0,...

Proof. By definition, A is negative if -A is positive. Because of homogeneity of determinants, when we pass from A to -A, the minor of order k gets multiplied by (-1)^{k}. Thus, by Sylvester's criterion A is negative if and only if (-1)^{k}d_{k}>0, as required.

Aug 19

Gaussian elimination method

Gaussian elimination method

Consider the system

a_{11}x+a_{12}y+a_{13}z=a_{1},\\    a_{21}x+a_{22}y+a_{23}z=a_{2},\\    a_{31}x+a_{32}y+a_{33}z=a_{3}.

If the first equation is nontrivial, then one of the coefficients is different from zero. Suppose it is a_{11}. Adding the first equation multiplied by -a_{21}/a_{11} to the second one we eliminate x from it. Similarly, adding the first equation multiplied by -a_{31}/a_{11} to the third one we eliminate x from it. The system becomes

a_{11}x+a_{12}y+a_{13}z=a_{1},\\    \quad \ \ \ b_{22}y+b_{23}z=b_{2},\\    \ \ \ \ b_{32}y+b_{33}z=b_{3},

where b with indexes stands for new numbers.

If b_{22}\neq 0, we can use the second equation to eliminate y from the third equation and the result will be

a_{11}x+a_{12}y+a_{13}z=a_{1},\\    b_{22}y+b_{23}z=b_{2},\\    c_{33}z=c_{3}

with some new c's. If c_{33}\neq 0, we can solve the system backwards, finding first z from the third equation, then y from the second and, finally x from the first.

Notice that for the method to work it does not matter what happens with the vector at the right of the system. It only matters what happens to A. That's why we focus on transformations of A.

Theoretical treatment

Exercise 1. Denote

d_{1}=a_{11},\ d_{2}=\det \left(\begin{array}{cc}  a_{11} & a_{12} \\a_{21} & a_{22}\end{array}\right) ,..., \ d_{n}=\det A

the leading principal minors of A. If all of them are different from zero, then the Gaussian method reduces A (by way of premultiplication of A by elementary matrices) to the triangular matrix

B=\left(\begin{array}{cccc}b_{11} & b_{12} & ... & b_{1n} \\  0 & b_{22} & ... & b_{2n} \\... & ... & ... & ... \\0 & 0 & ... & b_{nn}\end{array}\right)

where b_{11}=d_{1},\ b_{22}=d_{2}/d_{1},...,\ b_{nn}=d_{n}/d_{n-1}.

Proof by induction. Let n=2. Premultiplying A=\left(\begin{array}{cc}  a_{11} & a_{12} \\a_{21} & a_{22}\end{array}\right) by I-\frac{a_{21}}{a_{11}}e_{2}^{T}e_{1} we get the desired result:

B=\left(\begin{array}{cc}a_{11} & a_{12} \\0 & a_{22}-\frac{a_{21}a_{12}}{a_{11}}  \end{array}\right) =\left(\begin{array}{cc}d_{1} & a_{12} \\0 & \frac{d_{2}}{d_{1}}  \end{array}\right) .

Now let the statement hold for n-1. By the induction assumption we can start with the matrix of form

A=\left(\begin{array}{ccccc}d_{1} & b_{12} & ... & b_{1n} & b_{1n} \\  0 & d_{2}/d_{1} & ... & b_{2n} & b_{2n} \\... & ... & ... & ... & ... \\0 & 0 & ... & d_{n-1}/d_{n-2} & b_{2n} \\  a_{n1} & a_{n2}&... & a_{n-1,n} & a_{nn} \end{array}\right) .

Using the condition that all d_{1},...,d_{n-1} are different from zero, we can make zero the elements a_{n1}, ..., a_{n-1,n}. The result is:

B=\left(\begin{array}{ccccc}d_{1} & b_{12} & ... & b_{1n} & b_{1n} \\  0 & d_{2}/d_{1} & ... & b_{2n} & b_{2n} \\... & ... & ... & ... & ... \\0 & 0 & ... & d_{n-1}/d_{n-2} & b_{2n} \\  0 & 0 & ... & 0 & b_{nn}\end{array}\right) .

The determinant of a triangular matrix equals the product of diagonal elements because of the cross-out rule:

(1) \det B=d_{1}(d_{2}/d_{1})...(d_{n-1}/d_{n-2})b_{nn}=d_{n-1}b_{nn}

(for example, if a product contains b_{1n}, you should cross out the first row and then the product should contain one of the zeros below d_1).

On the other hand, B is a result of premultiplication by elementary matrices: B=C_{1}...C_{n-1}A where by Exercise 1 \det C_{i}=1 for all i. Hence,

(2) \det B=\det A=d_{n}.

Combining (1) and (2) we get b_{nn}=d_{n}/d_{n-1}, which concludes the inductive argument.

Aug 19

Elementary transformations

Elementary transformations

Here we look at matrix representations of transformations called elementary.

Exercise 1. Let e_{i} denote the i-th unit row vector and let A be an arbitrary matrix. Then

a) premultiplication of A by e_{j} cuts out of A the j-th row A_{j}.

b) Premultiplication of A by e_{i}^{T}e_{j} puts the row A_{j} into the i-th row of the null matrix.

c) Premultiplication of A by I+ce_{i}^{T}e_{j} adds row A_{j} multiplied by c to row A_{i}, without changing the other rows of A.

d) The matrix I+ce_{i}^{T}e_{j} has determinant 1.

Proof. a) It's easy to see that

e_{j}A=\left( 0...0~1~0...0\right) \left(\begin{array}{ccc}a_{11} & ... & a_{1n} \\  ... & ... & ... \\a_{j1} & ... & a_{jn} \\... & ... & ... \\a_{n1} & ... & a_{nn}\end{array}  \right) =\left(\begin{array}{ccc}a_{j1} & ... & a_{jn}\end{array}\right) =A_{j}.

b) Obviously,

e_{i}^{T}e_{j}A=\left(\begin{array}{c}0 \\... \\0 \\1 \\0 \\... \\0\end{array}  \right) \left(\begin{array}{ccc}a_{j1} & ... & a_{jn}\end{array}  \right) =\left(\begin{array}{ccc}0 & ... & 0 \\... & ... & ... \\a_{j1} & ... & a_{jn} \\... & ... & ... \\  0 & ... & 0\end{array}  \right) =\left(\begin{array}{c}\Theta \\A_{j} \\\Theta\end{array}  \right)

(A_{j} in the i-th row, \Theta denotes null matrices of conformable dimensions)

c) (I+ce_{i}^{T}e_{j})A=A+ce_{i}^{T}e_{j}A=\left(\begin{array}{c}  A_{1} \\... \\A_{i} \\... \\A_{n}\end{array}  \right) +\left(\begin{array}{c}\Theta \\... \\cA_{j} \\... \\\Theta\end{array}  \right) =\left(\begin{array}{c}A_{1} \\... \\A_{i}+cA_{j} \\  ... \\A_{n}\end{array}\right) .

d) The matrix A=I+ce_{i}^{T}e_{j} has ones on the main diagonal and only one nonzero element a_{ij}=c outside it. By the Leibniz formula it's determinant is 1.

The reader can easily solve the next

Exercise 2. a) Postmultiplication of A by e_{j}^{T} cuts out of A the j-th column A^{(j)}.

b) Postmultiplication of A by e_{j}^{T}e_{i} puts the column A^{(j)} into the i-th column of the null matrix.

c) Postmultiplication of A by I+ce_{j}^{T}e_{i} adds column A^{(j)} multiplied by c to column A^{(i)}, without changing the other columns of A.

d) The matrix I+ce_{j}^{T}e_{i}=(I+ce_{i}^{T}e_{j})^{T} has determinant 1.

Exercise 3. a) Premultiplication of A by

(1) \left(\begin{array}{c}e_{1} \\... \\e_{j} \\... \\e_{i} \\... \\e_{n}\end{array}  \right)

permutes rows A_{i},A_{j}.

b) Postmultiplication of A by the transpose of (1) permutes columns A^{(i)},A^{(j)}.

This is a general property of permutation matrices. Recall also that their determinants can be only \pm 1.

Definition. 1) Adding some row multiplied by a constant to another row or 2) adding some column multiplied by a constant to another column or 3) permuting rows or columns is called an elementary operation. Accordingly, matrices that realize them are called elementary matrices.

Aug 19

Main theorem: Jordan normal form

Main theorem: Jordan normal form

By Exercise 1, it is sufficient to show that in each root subspace the matrix takes the Jordan form.

Step 1. Take a basis

(1) x_{1,p},...,x_{k,p}\in N_{\lambda }^{(p)} independent relative to N_{\lambda }^{(p-1)}.

Consecutively define

(2) x_{1,p-1}=(A-\lambda I)x_{1,p},\ ...,\ x_{k,p-1}=(A-\lambda I)x_{k,p}\in N_{\lambda }^{(p-1)},


(3) x_{1,1}=(A-\lambda I)x_{1,2},\ ...,\ x_{k,1}=(A-\lambda I)x_{k,2}\in N_{\lambda }^{(1)}.

Exercise 1. The vectors in (2) are linearly independent relative to N_{\lambda }^{(p-2)},..., the vectors in (3) are linearly independent.

Proof. Consider (2), for example. Suppose that \sum_{j=1}^{k}a_{j}x_{j,p-1} \in N_{\lambda }^{(p-2)}. Then

0=(A-\lambda I)^{p-2}\sum_{j=1}^{k}a_{j}x_{j,p-1}=(A-\lambda I)^{p-1}\sum_{j=1}^{k}a_{j}x_{j,p}.

The conclusion that \sum_{j=1}^{k}a_{j}x_{j,p}\in N_{\lambda}^{(p-1)} contradicts assumption (1).

Exercise 2. The system of kp vectors listed in (1)-(3) is linearly independent, so that its span L_{x} is of dimension kp.

Proof. Suppose \sum_{j=1}^{k}a_{j,p}x_{j,p}+...+\sum_{j=1}^{k}a_{j,1}x_{j,1}=0. Then by inclusion relations

\sum_{j=1}^{k}a_{j,p}x_{j,p}=-\sum_{j=1}^{k}a_{j,p-1}x_{j,p-1}-...-  \sum_{j=1}^{k}a_{j,1}x_{j,1}\in N_{\lambda }^{(p-1)}

which implies a_{j,p}=0 for j=1,...,k, by relative independence stated in (1). This process can be continued by Exercise 1 to show that all coefficients are zeros.

Next we show that in each of N_{\lambda }^{(p)},...,N_{\lambda}^{(1)} we can find a basis relative to the lower indexed subspace N_{\lambda }^{(p-1)},...,N_{\lambda }^{(0)}=\{0\}. According to (1), in N_{\lambda }^{(p)} we already have such a basis. If the vectors in (2) constitute such a basis in N_{\lambda }^{(p-1)}, we consider N_{\lambda }^{(p-2)}.

Step 2. If not, by Exercise 3 we can find vectors

y_{1,p-1},...,y_{l,p-1}\in N_{\lambda }^{(p-1)}

such that

x_{1,p-1},...,x_{k,p-1},y_{1,p-1},...,y_{l,p-1} represent a basis relative to N_{\lambda }^{(p-2)}.

Then we can define

y_{1,p-2}=(A-\lambda I)y_{1,p-1},\ ...,\ y_{l,p-2}=(A-\lambda I)y_{l,p-1}\in N_{\lambda }^{(p-2)},


y_{1,1}=(A-\lambda I)y_{1,2},\ ...,\ y_{l,1}=(A-\lambda I)y_{l,2}\in N_{\lambda }^{(1)}.

By Exercise 2, the y's defined here are linearly independent. But we can show more:

Exercise 3. All x's from Step 1 combined with the y's from Step 2 are linearly independent.

The proof is analogous to that of Exercise 2.

Denote L_{y} the span of vectors introduced in Step 2. L_{x}\cap L_{y}=\{0\} because they have different bases. Therefore we can consider a direct sum L_{x}\dotplus L_{y}. Repeating Step 2 as many times as necessary, after the last step we obtain a subspace, say, L_{z}, such that N_{\lambda }^{(p)}=L_{x} \dotplus L_{y} \dotplus ... \dotplus L_{z}. The restrictions of A onto the subspaces on the right is described by Jordan cells with the same \lambda and of possibly different dimensions. We have proved the following theorem:

Theorem (Jordan form) For a matrix A in C^{n} one can find a basis in which A can be written as a block-diagonal matrix

(1) A=\left(\begin{array}{ccc}A_{1} & ... & ... \\... & ... & ... \\... & ... & A_{m}\end{array}\right) .

Here A_{i} are (square) Jordan cells, with possibly different lambdas on the main diagonal and of possibly different sizes, and all off-diagonal blocks are zero matrices of compatible dimensions.

Jul 19

Playing with bases

Playing with bases

Here is one last push before the main result. Exercise 1 is one of the basic facts about bases.

Exercise 1. In C^{n} any system of k<n linearly independent vectors x_{1},...,x_{k} can be completed to form a basis.

Proof. Let e_{1},...,e_{n} be a basis in C^{n}. If each of e_{1},...,e_{n} belongs to span(x_{1},...,x_{k}), then by the lemma we would have n\leq k, contradicting the assumption k<n. Hence, among e_{1},...,e_{n} there is at least on vector that does not belong to span(x_{1},...,x_{k}). We can add it to x_{1},...,x_{k} denoting it x_{k+1}.

Suppose \sum_{j=1}^{k+1}a_{j}x_{j}=0. Since x_{k+1} is independent of the other vectors, we have a_{k+1}=0 but then because of independence of x_{1},...,x_{k} all other coefficients are zero. Thus, x_{1},...,x_{k},x_{k+1} are linearly independent.

If k+1<n, we can repeat the addition process, until we obtain n linearly independent vectors x_{1},...,x_{n}. By construction, e_{1},...,e_{n} belong to span(x_{1},...,x_{n}). Since e_{1},...,e_{n} span C^{n}, x_{1},...,x_{n} do too and therefore form a basis.

Definition 1. Let L_{1}\subset L_{2} be two subspaces. Vectors x_{1},...,x_{m}\in L_{2} are called linearly independent relative to L_{1} if any nontrivial linear combination \sum a_{j}x_{j} does not belong to L_{1}. For the purposes of this definition, it is convenient to denote by \Theta a generic element of L_{1}. \Theta plays the role of zero and the definition looks similar to usual linear independence: \sum  a_{j}x_{j}\neq \Theta for any nonzero vector a. Rejecting this definition, we can say that x_{1},...,x_{m}\in L_{2} are called linearly dependent relative to L_{1} if \sum a_{j}x_{j}=\Theta for some nonzero vector a.

Definition 2. Let L_{1}\subset L_{2} be two subspaces. Vectors x_{1},...,x_{m}\in L_{2} are called a basis relative to L_{1} if they are linearly independent and can be completed by some basis from L_{1} to form a basis in L_{2}.

Exercise 2. Show existence of a relative basis in L_{2}.

Proof. Take any basis in L_{1} (say, x_{1},...,x_{k}) and, using Exercise 1, complete it by some vectors (say, x_{k+1},...,x_{n}\in L_{2}) to get a basis in L_{2}. Then, obviously, x_{k+1},...,x_{n} form a basis in L_{2} relative to L_{1}. Besides, none of x_{k+1},...,x_{n} belongs to L_{1}.

Exercise 3. Any system of vectors x_{1},...,x_{k}\in L_{2} linearly independent relative to L_{1} can be completed to form a relative basis in L_{2}.

Proof. Take a basis in L_{1} (say, x_{k+1},...,x_{l}) and add it to x_{1},...,x_{k}. The resulting system x_{1},...,x_{k},x_{k+1},...,x_{l} is linearly independent. Indeed, if \sum_{j=1}^{l}a_{j}x_{j}=0, then

\sum_{j=1}^{k}a_{j}x_{j}=-\sum_{j=k+1}^{l}a_{j}x_{j}\in L_{1}.

By assumption of relative linear independence a_{1}=...=a_{k}=0 but then the remaining coefficients are also zero.

By Exercise 1 we can find x_{l+1},...,x_{n} such that x_{1},...,x_{k},x_{k+1},...,x_{l},x_{l+1},...,x_{n} is a basis in L_{2}. Now the system x_{1},...,x_{k},x_{l+1},...,x_{n} is a relative basis because these vectors are linearly independent and together with x_{k+1},...,x_{l}\in L_{1} form a basis in L_{2}.

Jul 19

Chipping off root subspaces

Chipping off root subspaces

The basis in which a matrix is diagonal consists of eigenvectors. Therefore if the number of linearly independent eigenvectors is less than n, such a matrix cannot be diagonalized.

The general result we are heading to is that any matrix in C^{n} in an appropriately chosen basis can be written as a block-diagonal matrix

(1) A=\left(\begin{array}{ccc}A_{1} & ... & ... \\  ... & ... & ... \\... & ... & A_{m}\end{array}\right) .

Here A_{i} are Jordan cells, with possibly different lambdas on the main diagonal and of possibly different sizes, and all off-diagonal blocks are zero matrices of compatible dimensions. The next exercise is an intermediate step towards that result.

Exercise 1. Let A have k different eigenvalues \lambda _{1},...,\lambda_{k}. Then C^{n} can be represented as a direct sum of k invariant subspaces

(2) C^{n}=N_{\lambda _{1}}^{(p_{1})} \dotplus ... \dotplus N_{\lambda_{k}}^{(p_{k})}.

The subspace N_{\lambda _{i}}^{(p_{i})} consists of only root vectors belonging to the eigenvalue \lambda _{i}.

Proof. By Exercise 2 we have C^{n}=N_{\lambda _{1}}^{(p_{1})}\dotplus L_{1} where the subspace L_{1} has two properties: it is invariant with respect to A and the restriction of A to L_{1} does not have \lambda _{1} as an eigenvalue. N_{\lambda _{1}}^{(p_{1})} consists of root vectors belonging to \lambda _{1}. Applying Exercise 2 to L_{1} we get L_{1}=N_{\lambda _{2}}^{(p_{2})}\dotplus L_{2} where N_{\lambda_{2}}^{(p_{2})},L_{2} have similar properties. Applying Exercise 2 k
times we finish the proof.

Note that the restriction of A onto N_{\lambda _{i}}^{(p_{i})} may not be described by a single Jordan cell. A matrix may have more than one Jordan cell with the same eigenvalue. To make this point clearer, note that the matrices

\left(\begin{array}{cccc}\lambda & 1 & 0 & 0 \\  0 & \lambda & 0 & 0 \\0 & 0 & \lambda & 1 \\0 & 0 & 0 & \lambda\end{array}  \right)    and   \left(\begin{array}{cccc}\lambda & 1 & 0 & 0 \\  0 & \lambda & 1 & 0 \\0 & 0 & \lambda & 1 \\0 & 0 & 0 & \lambda\end{array}\right)

are not the same (the first matrix has two Jordan cells on the main diagonal and the second one is itself a Jordan cell). It will take some effort to get from (2) to (1).

Exercise 4. Show that for the matrix from Exercise 3 \det (A-\lambda _{1}I)=\left(\lambda -\lambda _{1}\right) ^p (\lambda _{1} - any number).

Exercise 5. In addition to Exercise 4, show that in N_{\lambda }^{(p)} the matrix A has only one eigenvector, up to a scaling factor. Hint: use the Jordan cell.

Jul 19

Action of a matrix in its root subspace

Action of a matrix in its root subspace

The purpose of the following discussion is to reveal the matrix form of A in N_{\lambda }^{(p)}.

Definition 1. Nonzero elements of N_{\lambda }^{(p)} are called root vectors. This definition can be detailed as follows:

Elements of N_{\lambda }^{(1)}\setminus \{0\} are eigenvalues.

Elements of N_{\lambda }^{(2)}\setminus N_{\lambda }^{(1)} are called root vectors of 1st order.


Elements of N_{\lambda }^{(p)}\setminus N_{\lambda }^{(p-1)} are called root vectors of order p-1.

Thus, root vectors belong to

N_\lambda^{(p)} \setminus\{0\}=\left( N_\lambda^{(p)} \setminus N_\lambda^{(p-1)} \right) \cup ... \cup \left( N_\lambda^{(2)} \setminus N_\lambda^{(1)}\right) \cup \left( N_\lambda^{(1)} \setminus \{0\} \right)

where the sets of root vectors of different orders do not intersect.

Exercise 1. (A-\lambda I)\left( N_{\lambda }^{(k)}\setminus N_{\lambda }^{(k-1)}\right) \subset \left( N_{\lambda }^{(k-1)}\setminus N_{\lambda}^{(k-2)}\right) .

Proof. Suppose x\in N_{\lambda }^{(k)}\setminus N_{\lambda }^{(k-1)}, that is, (A-\lambda I)^{k}x=0 and (A-\lambda I)^{k-1}x\neq 0. Denoting y=(A-\lambda I)x, we have (A-\lambda I)^{k-1}y=0 and (A-\lambda I)^{k-2}y\neq 0, which means that y\in N_{\lambda }^{(k-1)}\setminus N_{\lambda }^{(k-2)} and A-\lambda I maps N_{\lambda }^{(k)}\setminus N_{\lambda }^{(k-1)} into N_{\lambda }^{(k-1)}\setminus N_{\lambda}^{(k-2)}.

Now, starting from some x_{p}\in N_{\lambda }^{(p)}\setminus N_{\lambda}^{(p-1)}, we extend a chain of root vectors all the way to an eigenvector. By Exercise 1, the vector x_{p-1}=(A-\lambda I)x_{p} belongs to N_{\lambda }^{(p-1)}\setminus N_{\lambda }^{(p-2)}. From the definition of x_{p-1} we see that

(1) Ax_{p}=\lambda x_{p}+x_{p-1},   x_{p-1}\in N_{\lambda}^{(p-1)}\setminus N_{\lambda }^{(p-2)}

(x_{p} is an "eigenvector" up to a root vector of lower order). Similarly, denoting x_{p-2}=(A-\lambda I)x_{p-1}, we have

(2) Ax_{p-1}=\lambda x_{p-1}+x_{p-2},   x_{p-2}\in N_{\lambda}^{(p-2)}\setminus N_{\lambda }^{(p-3)}.


Continuing in the same way, we get x_{1}=(A-\lambda I)x_{2}\in N_{\lambda}^{(1)}\setminus \{0\},

(3) Ax_{2}=\lambda x_{2}+x_{1},   x_{1}\in N_{\lambda}^{(1)}\setminus \{0\},

(4) Ax_{1}=\lambda x_{1},   x_{1}\neq 0.

Exercise 2. The vectors x_{1},...,x_{p} defined above are linearly independent.

Proof. If \sum_{j=1}^{p}a_{j}x_{j}=0, then a_{p}x_{p}=-\sum_{j=1}^{p-1}a_{j}x_{j}. Here the left side belongs to N_{\lambda}^{(p)}\setminus N_{\lambda }^{(p-1)} and the right side belongs to N_{\lambda }^{(p-1)} because of inclusion relations. Hence, a_{p}=0. Similarly, all other coefficients are zero.

By Exercise 2, the vectors x_{1},...,x_{p} form a basis in L=span(x_{1},...,x_{p}).

Exercise 3. The transformation A in L is given by the matrix

(5) A=\left(\begin{array}{ccccc}\lambda & 1 & 0 & ... & 0 \\0 & \lambda & 1 & ... & 0 \\  ... & ... & ... & ... & ... \\0 & 0 & 0 & ... & 1 \\0 & 0 & 0 & ... & \lambda\end{array}  \right) =\lambda I+J

where J is a matrix with unities in the first superdiagonal and zeros everywhere else.

Proof. Since x_{1},...,x_{p} is taken as the basis, x_{i} can be identified with the unit column-vector e_{i}. The equations (1)-(4) take the form

Ae_{1}=\lambda e_{1}, Ae_{j}=\lambda e_{j}+e_{j-1}, j=2,...,p.

Putting these equations side by side we get

AI=A\left( e_{1},...,e_{p}\right) =\left( Ae_{1},...,Ae_{p}\right) =\left(\lambda e_{1},\lambda e_{2}+e_{1},...,\lambda e_{p}+e_{p-1}\right) =

=\left(\begin{array}{cccc}\lambda & 1 & ... & 0 \\0 & \lambda & ... & 0 \\  ... & ... & ... & ... \\0 & 0 & ... & 1 \\0 & 0 & ... & \lambda\end{array}\right) .

Definition 2. The matrix in (5) is called a Jordan cell.

Jul 19

Properties of root subspaces

Properties of root subspaces

Let A be a square matrix and let \lambda \in \sigma (A) be its eigenvalue. As we know, the nonzero elements of the null space N(A-\lambda I)=\{x:(A-\lambda I)x=0\} are the corresponding eigenvectors. This definition is generalized as follows.

Definition 1. The subspaces N_{\lambda }^{(k)}=N((A-\lambda I)^{k}), k=1,2,... are called root subspaces of A corresponding to \lambda .

Exercise 1. a) Root subspaces are increasing:

(1) N_{\lambda }^{(k)}\subset N_{\lambda }^{(k+1)} for all k\geq 1

and b) there is such p\leq n that all inclusions (1) are strict for k<p and

(2) N_{\lambda }^{(p)}=N_{\lambda }^{(p+1)}=...

Proof. a) If x\in N_{\lambda }^{(k)} for some k, then (A-\lambda I)^{k+1}x=(A-\lambda I)(A-\lambda I)^{k}x=0, which shows that x\in N_{\lambda }^{(k+1)}.

b) (1) implies \dim N_{\lambda }^{(k)}\leq \dim N_{\lambda }^{(k+1)}. Since all root subspaces are contained in C^{n}, there are k such that N_{\lambda }^{(k)}=N_{\lambda }^{(k+1)}. Let p be the smallest such k. Then all inclusions (1) are strict for k<p.

Suppose N_{\lambda}^{(k+1)}\setminus N_{\lambda }^{(k)}\neq \varnothing for some k\ge p. Then there exists x\in N_{\lambda }^{(k+1)} such that x\notin N_{\lambda}^{(k)}, that is, (A-\lambda I)^{k+1}x=0, (A-\lambda I)^{k}x\neq 0. Put y=(A-\lambda I)^{k-p}x. Then (A-\lambda I)^{p+1}y=(A-\lambda I)^{k+1}x=0, (A-\lambda I)^{p}y=(A-\lambda I)^{k}x\notin 0. This means
that y\in N_{\lambda }^{(p+1)}\setminus N_{\lambda }^{(p)} which contradicts the definition of p.

Definition 2. Property (2) can be called stabilization. The number p from (2) is called a height of the eigenvalue \lambda.

Exercise 2. Let \lambda \in \sigma (A) and let p be the number from Exercise 1. Then

(3) C^{n}=N_{\lambda }^{(p)}\dotplus \text{Img}[(A-\lambda I)^{p}].

Proof. By the rank-nullity theorem applied to (A-\lambda I)^{p} we have n=\dim N_{\lambda }^{(p)}+\dim \text{Img}[(A-\lambda I)^{p}]. By Exercise 3, to prove (3) it is sufficient to establish that L\equiv N_{\lambda}^{(p)}\cap \text{Img}[(A-\lambda I)^{p}]=\{0\}. Let's assume that L contains a nonzero vector x. Then we have x=(A-\lambda I)^{p}y for some y. We obtain two facts:

(A-\lambda I)^{p}y\neq 0 \Longrightarrow y\notin N_{\lambda }^{(p)},

(A-\lambda I)^{2p}y=(A-\lambda I)^{p}(A-\lambda I)^{p}y=(A-\lambda I)^{p}x=0\Longrightarrow y\in N_{\lambda }^{(2p)}.

It follows that y is a nonzero element of N_{\lambda }^{(2p)}\setminus N_{\lambda }^{(p)}. This contradicts (2). Hence, the assumption L\neq \{0\} is wrong, and (3) follows.

Exercise 3. Both subspaces at the right of (3) are invariant with respect to A.

Proof. If x\in N_{\lambda }^{(p)}, then by commutativity of A and A-\lambda I we have (A-\lambda I)^{p}Ax=A(A-\lambda I)^{p}x=0, so Ax\in N_{\lambda }^{(p)}.

Suppose x\in \text{Img}[(A-\lambda I)^{p}], so that x=(A-\lambda I)^{p}y for some y. Then Ax=(A-\lambda I)^{p}Ay\in \text{Img}[(A-\lambda I)^{p}].

Exercise 3 means that, for the purpose of further analyzing A, we can consider its restrictions onto N_{\lambda }^{(p)} and \text{Img}[(A-\lambda I)^{p}].

Exercise 4. The restriction of A onto N_{\lambda }^{(p)} does not have eigenvalues other than \lambda .

Proof. Suppose Ax=\mu x, x\neq 0, for some \mu . Since x\in N_{\lambda }^{(p)}, we have (A-\lambda I)^{p}x=0. Then (A-\lambda I)x=(\mu -\lambda )x and 0=(A-\lambda I)^{p}x=(\mu -\lambda )^{p}x. This implies \mu =\lambda .

Exercise 5. The restriction of A onto \text{Img}[(A-\lambda I)^{p}] does not have \lambda as an eigenvalue (so that A-\lambda I is invertible).

Proof. Suppose x\in \text{Img}[(A-\lambda I)^{p}] and Ax=\lambda x, x\neq 0. Then x=(A-\lambda I)^{p}y for some y\neq 0 and 0=(A-\lambda  I)x=(A-\lambda I)^{p+1}y. By Exercise 1 y\in N_{\lambda }^{(p+1)}=N_{\lambda }^{(p)} and x=(A-\lambda I)^{p}y=0. This contradicts the choice of x.

Jul 19

Direct sums of subspaces

Direct sums of subspaces

The definition of an orthogonal sum L=L_{1}\oplus L_{2} requires two things: 1) every element l\in L can be decomposed as l=l_{1}+l_{2} with l_{i}\in L_{i}, i=1,2, and 2) every element of L_{1} is orthogonal to every element of L_{2}. Orthogonality of L_{1} to L_{2} implies L_{1}\cap L_{2}=\{0\} which, in turn, guarantees uniqueness of the representation l=l_{1}+l_{2}. If we drop the orthogonality requirement but retain 1) and L_{1}\cap L_{2}=\{0\}, we get the definition of a direct sum.

Definition. Let L_{1},L_{2} be two subspaces such that L_{1}\cap L_{2}=\{0\}. The set L=\{l_{1}+l_{2}: l_{i}\in L_{i}, i=1,2\} is called a direct sum of L_{1},L_{2} and denoted L=L_{1}\dotplus L_{2}.  The condition L_{1}\cap L_{2}=\{0\} provides uniqueness of the representation l=l_{1}+l_{2}.

Exercise 1. Let L=L_{1}\dotplus L_{2}. If l\in L is decomposed as l=l_{1}+l_{2} with l_{i}\in L_{i}, i=1,2, define P_{1}l=l_{1}. Then P_{1} is linear and satisfies P_{1}^{2}=P_{1}, \text{Img}(P_{1})=L_{1}, N(P_{1})=L_{2}.

Under conditions of Exercise 1, P_{1} is an oblique projector of L onto L_{1} parallel to L_{2}.

Exercise 2. Prove that dimension additivity extends to direct sums: if L=L_{1}\dotplus L_{2}, then \dim L=\dim L_{1}+\dim L_{2}.

Exercise 3. Let L_{1},L_{2} be two subspaces of C^{n} and suppose n=\dim L_{1}+\dim L_{2}. Then to have C^{n}=L_{1}+L_{2} it is sufficient to check that L_{1}\cap L_{2}=\{0\}, in which case L=L_{1}\dotplus L_{2}

Proof. Denote L=L_{1}\dotplus L_{2}. By Exercise 2, \dim L=\dim L_{1}+\dim L_{2}=n. If C^{n}\setminus L is not empty, then we can complete a basis in L with a nonzero vector from C^{n}\setminus L to see that \dim C^{n}\geq n+1 which is impossible.