18
Feb 19

Determinant of a transpose

Determinant of a transpose

Exercise 1. \det (A^T)=\det A.

Proof. The proof is similar to the derivation of the Leibniz formula. Using the notation from that derivation, we decompose rows of A into linear combinations of unit row-vectors A_i=\sum_{j=1}^na_{ij}e_j. Hence A_i^T=\sum_{j=1}^na_{ij}e_j^T. Therefore by multilinearity in columns

(1) \det (A^{T})=\sum_{j_{1},...,j_{n}=1}^{n}a_{1j_{1}}...a_{nj_{n}}\det  \left( e_{j_{1}}^{T},...,e_{j_{n}}^{T}\right)

=\sum_{j_{1},...,j_{n}:\det  P_{j_{1},...,j_{n}}^{T}\neq 0}a_{1j_{1}}...a_{nj_{n}}\det  P_{j_{1},...,j_{n}}^{T}.

Now we want to relate \det P_{j_1,...,j_n}^T to \det P_{j_1,...,j_n}.

(2) \det P_{j_1,...,j_n}^T=\det (P_{j_1,...,j_n}^{-1}) (the transpose of an orthogonal matrix equals its inverse)

=\frac{1}{\det (P_{j_1,...,j_n})}          (the determinant of the inverse is the inverse of the determinant)

=\det (P_{j_1,...,j_n})      (because P_{j_1,...,j_n} is orthogonal and its determinant is either 1 or -1 ).

(1) and (2) prove the statement.

Apart from being interesting in its own right, Exercise 1 allows one to translate properties in terms of rows to properties in terms of columns, as in the next corollary.

Corollary 1. \det A=0 if two columns of A are linearly dependent.

Indeed, columns of A are rows of its transpose, so Exercise 1 and Property III yield the result.

17
Feb 19

Multilinearity in columns

Multilinearity in columns

The axioms and properties of determinants established so far are asymmetric in that they are stated in terms of rows, while similar statements hold for columns. Among the most important properties is the fact that the determinant of A is a multilinear function of its rows.  Here we intend to establish multilinearity in columns.

Exercise 1. The determinant of A is a multilinear function of its columns.

Proof. To prove linearity in each column, it suffices to prove additivity and homogeneity. Let A,B,C be three matrices that have the same columns, except for the lth column. The relationship between the lth columns is specified by A^{(l)}=B^{(l)}+C^{(l)} or in terms of elements a_{il}=b_{il}+c_{il} for all i=1,...,n. Alternatively, in Leibniz' formula

(1) \det A=\sum_{j_{1},...,j_{n}:\det P_{j_{1},...,j_{n}}\neq  0}a_{1j_{1}}...a_{nj_{n}}\det P_{j_{1},...,j_{n}}

for all i such that j_i=l we have

(2) a_{ij_i}=b_{ij_i}+c_{ij_i}.

If \det P_{j_1,...,j_n}\neq 0, for a given set j_1,...,j_n there exists only one i such that j_i=l. Therefore by (2) we can continue (1) as

\det A=\sum_{j_{1},...,j_{n}:\det P_{j_{1},...,j_{n}}\neq  0}a_{1j_{1}}...(b_{ij_{i}}+c_{ij_{i}})a_{nj_{n}}\det  P_{j_{1},...,j_{n}}=\det B+\det C.

Homogeneity is proved similarly, using equations A^{(l)}=kB^{(l)} or in terms of elements a_{il}=kb_{il} for all i.

17
Feb 19

Determinant of a product

Determinant of a product

The derivation is very similar to the proof of the Leibniz rule.

Consider matrices A,B and let B_j denote the jth row of B. Row C_i of the product C=AB, obviously, can be written as C_i=\sum_{j=1}^na_{ij}B_j. Using Property V n times, we have

(1) \det C=\det \left(\begin{array}{c}\sum_{j_1=1}^na_{1j_1}B_{j_1}\\C_2\\...\\C_n\end{array}\right) =\sum_{j_1=1}^na_{1j_1}\det \left(\begin{array}{c}B_{j_1}\\C_2\\...\\C_n\end{array}\right)

=\sum_{j_1=1}^na_{1j_1}\det \left(  \begin{array}{c}B_{j_1}\\\sum_{j_2=1}^na_{2j_2}B_{j_2}\\...\\C_n\end{array}\right)=\sum_{j_1,j_2=1}^na_{1j_1}a_{2j_2}\det\left(\begin{array}{c}B_{j_1}\\B_{j_2}\\...\\  C_{n}\end{array}\right)

=...=\sum_{j_1,...,j_n=1}^na_{1j_1}...a_{nj_n}\det\left(\begin{array}{c}  B_{j_1}\\B_{j_2}\\...\\B_{j_n}\end{array}\right).

By analogy with permutation matrices denote

B_{j_1,...,j_n}=\left(\begin{array}{c}B_{j_1}\\B_{j_2}\\...\\B_{j_n}\end{array}\right).

Recall the link between B_{j_1,...,j_n} and B:

(2) B_{j_1,...,j_n}=P_{j_1,...,j_n}B.

If we had proved the multiplication rule for (2), we would have

(3) \det B_{j_1,...,j_n}=(\det P_{j_1,...,j_n})(\det B).

In the theory of permutations (3) is proved without relying on the multiplication rule. I am going to use (3) as a shortcut that explains the idea. Combining (1) and (3) we obtain by the Leibniz formula

\det C=\sum_{j_1,...,j_n:\det P_{j_1,...,j_n}\neq 0}a_{1j_1}...a_{nj_n}(\det P_{j_1,...,j_n})(\det B)=(\det A)(\det B).
15
Feb 19

Leibniz formula for determinants

Rule for calculating determinant

This is one of those cases when calculations explain the result. Let e_j denote the jth unit row-vector. Row A_i, obviously, can be decomposed as A_i=\sum_{j=1}^na_{ij}e_j. Recall that by Property V, the determinant of A is a multilinear function of its rows. Using Property V n times, we have

(1) \det A=\det \left(\begin{array}{c}\sum_{j_1=1}^na_{1j_1}e_{j_1}\\A_2\\...\\  A_n\end{array}\right)=\sum_{j_1=1}^na_{1j_1}\det\left(\begin{array}{c}e_{j_1}\\A_2\\...\\  A_{n}\end{array}\right)

=\sum_{j_1=1}^na_{1j_1}\det \left(\begin{array}{c}e_{j_1}\\\sum_{j_2=1}^na_{2j_2}e_{j_2}\\...\\A_n\end{array}\right)=\sum_{j_1,j_2=1}^na_{1j_1}a_{2j_2}\det\left(\begin{array}{c}e_{j_1}\\e_{j_2}\\...\\  A_{n}\end{array}\right)=...

=\sum_{j_1,...,j_n=1}^na_{1j_1}...a_{nj_n}\det\left(\begin{array}{c}  e_{j_1}\\e_{j_2}\\...\\e_{j_n}\end{array}\right).

Here by Property III

\det\left(\begin{array}{c}e_{j_1}\\e_{j_2}\\...\\e_{j_n}\end{array}\right)=0

if among rows there are equal vectors. The remaining matrices with nonzero determinants are permutation matrices, so

(2) \det A=\sum_{j_1,...,j_n:\det P_{j_1,...,j_n}\neq 0}a_{1j_1}...a_{nj_n}\det P_{j_1,...,j_n}.

Different-rows-different-columns rule. Take a good look at what the condition \det P_{j_1,...,j_n}\neq 0 implies about the location of the factors of the product a_{1j_1}...a_{nj_n}. The rows 1,...,n to which the factors belong are obviously different. The columns j_1,...,j_n are also different by the definition of a permutation matrix. Conversely, consider any combination a_{i_1,j_1},...,a_{i_n,j_n} of elements such that no two elements belong to the same row or column. Rearrange the first indices i_1,...,i_n in an ascending order, from 1 to n. This leads to a renumbering \tilde{j}_{1},...,\tilde{j}_{n} of the second indices. The product a_{i_1,j_1}...a_{i_n,j_n} becomes a_{1,\tilde{j}_1}...a_{n,\tilde{j}_n}. Since the original second indices were all different, the new ones will be too. Hence, P_{\tilde{j}_1,...,\tilde{j}_n}\neq 0 and such a term must be in (2).

Remark 1. (2) is the Leibniz formula. The formula at the right of (1) is the Levy-Civita formula. The difference is that the Levy-Civita formula contains many more zero terms, while in (2) a term can be zero only if a_{1j_1}...a_{nj_n}=0.

Remark 2. Many textbooks instead of \det P_{j_1,...,j_n} write in (2) signatures of permutations. Using \det P_{j_1,...,j_n} is better because a) you save time by skipping the theory of permutations and b) you need a rule to calculate signatures of permutations and \det P_{j_1,...,j_n} is such a rule (see an example).

11
Feb 19

Properties of permutation matrices

Properties of permutation matrices

Shortcut for permutations

A permutation matrix permutes (changes orders of) rows of a matrix. The precise meaning of this statement is given in equation (1) below.

Partitioning the matrix A into rows A_i we have

e_iA=(0,...,0,1,0,...0)\left(\begin{array}{c}A_1\\...\\A_i\\...\\A_n\end{array}\right)=A_i.

Stacking these equations we get

P_{j_1,...,j_n}A=\left(\begin{array}{c}e_{j_1}\\...\\e_{j_n}\end{array}\right) A=\left(\begin{array}{c}e_{j_1}A\\...\\e_{j_n}A\end{array}\right) =\left(\begin{array}{c}A_{j_1}\\...\\A_{j_n}\end{array}\right).

By analogy with P_{j_1,...,j_n} we denote the last matrix

A_{j_1,...,j_n}=\left(\begin{array}{c}A_{j_1}\\...\\A_{j_n}\end{array}\right).

Thus, pre-multiplication by P_{j_1,...,j_n} transforms A to A_{j_1,...,j_n}:

(1) P_{j_1,...,j_n}A=A_{j_1,...,j_n}.

If we had proven the multiplication rule for determinants, we could have concluded from (1) that

(2) \det (P_{j_1,...,j_n})\det A=\det A_{j_1,...,j_n}.

As we know, changing places of two rows changes the sign of \det A by -1. (2) tells us that permutation by P_{j_1,...,j_n} changes the sign of \det A by \det (P_{j_1,...,j_n}). In the rigorous algebra course (2) is proved using the theory of permutations, without employing the multiplication rule for determinants.

I am going to call (2) a shortcut for permutations and use it without a proof. In general, I prefer to use such shortcuts, to see what is going on and bypass tedious proofs.

Other properties of permutation matrices

Exercise 1. Prove that the Definition 1 is equivalent to the following: A permutation matrix P is defined by two conditions: a) all its columns are unit column-vectors and b) no two columns are equal.

Proof. Take the jth column. It contains one unity (the one that comes from the jth unit row-vector). It cannot contain more than one unity because all rows are different. Hence, the jth column is a unit column-vector. Different columns are different unit vectors because otherwise some row would contain at least two unities and would not be a unit vector.

Exercise 2. Prove that a permutation matrix is an orthogonal matrix.

Proof. By Exercise 1 we can write a permutation matrix as a matrix of unit column-vectors:

P=\left( u_{j_1},...,u_{j_n}\right). Then

P^TP=\left(\begin{array}{c}u_{j_1}^T\\...\\u_{j_n}^T\end{array}\right)\left(u_{j_1},...,u_{j_n}\right) =\left(\begin{array}{ccc}u_{j_1}^Tu_{j_1}&...&u_{j_1}^Tu_{j_n}\\...&...&...\\u_{j_n}^Tu_{j_1}&...&u_{j_n}^Tu_{j_n}  \end{array}\right)=I,

which proves orthogonality. It follows that \det P=\pm 1 (be careful with this equation, it follows from multiplicativity of determinants which we have not derived from our axioms).

11
Feb 19

Permutation matrices

Permutation matrices

Definition and example

Let e_i=(0,...,0,1,0,...0) denote the ith unit row-vector (the ith component is 1 and the others are zeros).

Definition 1. A permutation matrix P is defined by two conditions: a) all of its rows are unit row-vectors and b) no two rows are equal. We use the notation

P_{j_1,...,j_n}=\left(\begin{array}{c}e_{j_1}\\...\\e_{j_n}\end{array}\right)

where all rows are different.

Example. In the following matrix we change places of rows in such a way as to obtain the identity in the end. Each transformation changes the sign of the matrix by Property VI.

P_{2,4,1,3}=\left(\begin{array}{c}e_2\\e_4\\e_1\\e_3\end{array}\right) =\left(\begin{array}{cccc}0&1&0&0\\0&0&0&1\\1&0&0&0\\0&0&1&0\end{array}\right)\overset{\text{changing 1st and 3rd rows}}{\rightarrow }\left(\begin{array}{cccc}1 &0&0&0\\0&0&0&1\\0&1&0&0\\0&0&1&0\end{array}\right)

\overset{\text{changing 3rd and 4th rows}}{\rightarrow }\left(\begin{array}{cccc}1&0&0&0\\0&0&0&1\\0&0&1&0\\0&1&0&0  \end{array}\right)\overset{\text{changing 2nd and 4th rows}}{\rightarrow }\left(\begin{array}{cccc}1&0&0&0\\  0&1&0&0\\0&0&1&0\\0&0&0&1\end{array}\right).

Since there are three changes and the final matrix is identity, by Axiom 3 \det P_{2,4,1,3}=-1.

It is possible to arrive to the identity matrix in more than one way but the result will be the same because the number \det P_{2,4,1,3} is fixed. We will not delve into the theory of permutations.

10
Feb 19

Properties IV-VI

Properties IV-VI

Exercise 1. Let S be a linearly independent system of n-1 vectors in R^n. Then it can be completed with a vector B to form a basis in R^n.

Proof. One way to obtain B is this. Let P be a projector onto the span of S and let Q=I-P. Take as B any nonzero vector from the image of Q. It is orthogonal to any element of the image of P and, in particular, to elements of S. Therefore S completed with B gives a linearly independent system \tilde{S}. \tilde{S} is a basis because x=Px+Qx for any x\in R^n.

Property IV. Additivity. Suppose the ith row of A is a sum of two vectors:

A=\left(\begin{array}{c}A_1\\...\\A_i^\prime+A_i^{\prime\prime }\\...\\A_n\end{array}  \right).

Denote

A^\prime=\left(\begin{array}{c}A_1\\...\\A_i^\prime\\...\\A_n\end{array}\right), A^{\prime\prime}=\left(\begin{array}{c}A_1\\...\\A_i^{\prime\prime}\\...\\A_n\end{array}\right)

(except for the ith row, all others are the same for all three matrices). Then \det A=\det A^\prime+\det A^{\prime\prime}.

Proof. Denote S the system of n-1 vectors A_1,...,A_{i-1},A_{i+1},...,A_n.

Case 1. If S is linearly dependent, then the system of all rows of A is also linearly dependent. By Property III the determinants of all three matrices A,\ A^\prime,\ A^{\prime\prime} are zero and the statement is true.

Case 2. Let S be linearly independent. Then by Exercise 1 it can be completed with a vector B to form a basis in R^n. A_i^\prime,\ A_i^{\prime\prime } can be represented as linear combinations of elements of \tilde{S}. We are interested only in the coefficients of B in those representations. So let A_i^\prime=C+kB, A_i^{\prime\prime}=D+lB, where C and D are linear combinations of elements of S. Hence, A_i^\prime+A_i^{\prime\prime}=C+D+(k+l)B.

We can use Property II to eliminate C, D and C+D from the ith rows of A^\prime, A^{\prime\prime} and A, respectively, without changing the determinants of those matrices. Let A^0 denote the matrix obtained by replacing the ith row of A with B. Then by Property II and Axiom 1

\det A=(k+l)\det A^0, \det A^\prime=k\det A^0, \det A^{\prime\prime}=l\det A^0,

which proves the statement.

Combining homogeneity and additivity, we get the following important property that some people use as a definition:

Property V. Multilinearity. The determinant of A is a multilinear function of its rows, that is, for each i, it is linear in row A_i, when the other rows are fixed.

Property VI. Antisymmetry. If the matrix A^0 is obtained from A by changing places of two rows, then \det A^0=-\det A.

Proof. Let

A=\left(\begin{array}{c}...\\A_i\\...\\A_j\\...\end{array}\right), A^0=\left(\begin{array}{c}...\\A_j\\...\\A_i\\...\end{array}\right)

(all other rows of these matrices are the same). Consider the next sequence of transformations:

\left(\begin{array}{c}...\\A_i\\...\\A_j\\...\end{array}\right)\rightarrow\left(\begin{array}{c}...\\A_i+A_j\\...\\A_j\\...\end{array}\right)\rightarrow\left(\begin{array}{c}...\\A_i+A_j\\...\\A_j-(A_i+A_j)\\...\end{array}\right)=\left(\begin{array}{c}...\\A_i+A_j\\...\\-A_i\\...\end{array}\right)\rightarrow\left(\begin{array}{c}...\\A_i+A_j+(-A_i)\\...\\-A_i\\...\end{array}\right)=\left(\begin{array}{c}...\\A_j\\...\\-A_i\\...\end{array}\right).

By Property II, each of these transformations preserves \det A. Recalling homogeneity, we finish the proof.

9
Feb 19

Axioms 1-3 and Properties I-III

Axioms 1-3 and Properties I-III

Axioms

Axiom 1. Homogeneity. If A^\prime denotes the matrix obtained from A by multiplying one of its rows by a number k, then \det A^\prime=k\det A.

This implies that for k\neq 0, A^\prime and A are invertible simultaneously.

Axiom 2. If A^\prime denotes the matrix obtained from A by adding one of the rows of A to another, then \det A^\prime=\det A.

We remember that adding one of the rows of A to another corresponds to adding one equation of the system Ax=y to another and so this operation should not impact solvability of the system.

Axiom 3. \det I=1.

Adding this axiom on top of the previous two makes the determinant a unique function of a matrix.

Properties

Assuming that A is of size n\times n, it is convenient to partition it into rows:

A=\left(\begin{array}{c}A_1\\...\\A_n\end{array}\right).

Property I. If one of the rows of A is zero, then \det A=0.

Indeed, if A_i=0, then A_i=0\times A_i and by Axiom 1 \det A=0\times\det A=0.

Property II. If we add to one of the rows of A another row multiplied by some number k, the determinant does not change.

Proof. If k=0, there is nothing to prove. Suppose k\neq 0 and we want to add kA_j to A_i. This result can be achieved in three steps.

a) Multiply A_j by k. By Axiom 1, \det A gets multiplied by k.

b) In the new matrix, add row kA_j to A_i. By Axiom 2, this does not change the determinant.

c) In the resulting matrix, divide row numbered j by k. The determinant gets divided by k.

The determinant of the very first matrix will be the same as that of the very last matrix, while the ith row of the last matrix will be A_i+kA_j.

Property III. If rows of A are linearly dependent, then \det A=0.

Proof. Suppose rows of A are linearly dependent. Then one of the rows can be expressed as a linear combination of the others. Suppose, for instance, that A_1=k_2A_2+...+k_nA_n. Multiply the ith row by -k_i and add the result to the first row, for i=2,...,n. Thereby we make the first row equal to 0, while maintaining the determinant the same by Property II. Then by Property I \det A=0.

8
Feb 19

Determinants: starting simple

Determinants: starting simple

Previously  we looked at a motivating example to consider the determinant \det A=ad-bc of a 2\times 2 matrix

A=\left(\begin{array}{cc}a&b\\c&d\end{array}\right).

Using this basic example, now we formulate properties that uniquely define determinants of matrices of higher orders. The discussion is based mainly on Kurosh, Course in linear algebra, 9th edition, Moscow, 1968 (in Russian).

Observation 1. Homogeneity. If one of the rows of A is multiplied by a number k, then \det A gets multiplied by k,

\det \left(\begin{array}{cc}ka&kb\\c&d\end{array}\right)    =kad-kbc=k\det A, \det \left(\begin{array}{cc}a&b\\kc&kd\end{array}\right)=akd-bkc=k\det A.

Observation 2. Adding one of the rows of A to another doesn't change the value of the determinant:

\det \left(\begin{array}{cc}a+c&b+d\\c&d\end{array}\right)=(a+c)d-(b+d)c=\det A, \det \left(\begin{array}{cc}a&b\\a+c&b+d\end{array}\right)=a(b+d)-b(a+c)=\det A.

To see the intuition behind these rules, recall that the purpose of the determinant is to verify whether the system Ax=y has solutions. Homogeneity means that if one of the equations of the system is multiplied by a nonzero constant, the solvability of the new system will be equivalent to the solvability of the original system. Similarly, if one of the equations of the system is added to another, solvability of the system as judged by the determinant will not change. This makes sense because multiplying a system equation by a nonzero constant or adding one equation to another does not change the information contained in the system.

Keep in mind an emerging general idea: the determinant discards any transformations of the matrix that are not relevant to its invertibility.

Observation 3. Determinant of the identity: \det I=1.

Taking these properties as axioms for matrices of higher order, we will show that they uniquely define determinants and develop a couple of rules involving determinants. One of them is the Leibniz formula for determinants and the other is Cramer's rule.

31
Dec 18

Questions for repetition

Questions for repetition: projectors and applications

1. If a matrix is idempotent, then it can have only two eigenvalues: 0 and 1.

2. If a matrix is idempotent, then its image is the set of all vectors that are unchanged by this matrix (such vectors are called fixed points).

3. The null space and the image of a projector are orthogonal.

4. If P is a projector, then Q=I-P is too. What is the relationship between the image and null space of P and those of Q?

5. Using a basis in a subspace L, how do you construct a projector P onto L? Prove existence, idempotency and symmetry. Prove that \text{Img}(P)=L.

6. Prove that if P is the projector from the previous exercise, then Q=I-P projects onto L^\perp. Thus, for a given L, we have a way to find L^\perp. Use this fact to prove the theorem on the second orthocomplement.

7. Prove the generalized Pythagoras theorem.

8. Define the OLS estimator for the linear model and derive it. Why does the condition of linear independence of regressors imply that k\leq n?

9. How does the simplified derivation of the OLS estimator look like? Why is it not rigorous?

10. Find eigenvalues of a projector. In particular, prove that eigenvectors corresponding to \lambda=1 are orthogonal to eigenvectors corresponding to \lambda=0.

11. Prove the trace-commuting property.

12. What is the relationship between the trace and image dimension of a projector?

13. Using P=X(X^TX)^{-1}X^T, find the expressions for the fitted value and residual in the linear model theory.

14. Define the OLS estimator of \sigma^2 and prove its unbiasedness. Note the crucial role of the assumption Var(e)=\sigma^2I. Without it there is no unbiasedness and Gauss-Markov theorem.

15. How do you define a standard normal vector? What are its mean and variance?

16. How do you define a general normal vector? Find its mean and variance.

17. Find the mean and variance of \chi_n^2. Prove that \chi_n^2/n converges in probability to 1.

18. Characterize the distribution of the OLS estimator of \sigma ^{2}. Use it to find its mean and variance.