18
Feb 19

## Determinant of a transpose

Exercise 1. $\det (A^T)=\det A.$

Proof. The proof is similar to the derivation of the Leibniz formula. Using the notation from that derivation, we decompose rows of $A$ into linear combinations of unit row-vectors $A_i=\sum_{j=1}^na_{ij}e_j$. Hence $A_i^T=\sum_{j=1}^na_{ij}e_j^T.$ Therefore by multilinearity in columns

(1) $\det (A^{T})=\sum_{j_{1},...,j_{n}=1}^{n}a_{1j_{1}}...a_{nj_{n}}\det \left( e_{j_{1}}^{T},...,e_{j_{n}}^{T}\right)$

$=\sum_{j_{1},...,j_{n}:\det P_{j_{1},...,j_{n}}^{T}\neq 0}a_{1j_{1}}...a_{nj_{n}}\det P_{j_{1},...,j_{n}}^{T}.$

Now we want to relate $\det P_{j_1,...,j_n}^T$ to $\det P_{j_1,...,j_n}$.

(2) $\det P_{j_1,...,j_n}^T=\det (P_{j_1,...,j_n}^{-1})$ (the transpose of an orthogonal matrix equals its inverse)

$=\frac{1}{\det (P_{j_1,...,j_n})}$          (the determinant of the inverse is the inverse of the determinant)

$=\det (P_{j_1,...,j_n})$      (because $P_{j_1,...,j_n}$ is orthogonal and its determinant is either 1 or -1 ).

(1) and (2) prove the statement.

Apart from being interesting in its own right, Exercise 1 allows one to translate properties in terms of rows to properties in terms of columns, as in the next corollary.

Corollary 1. $\det A=0$ if two columns of $A$ are linearly dependent.

Indeed, columns of $A$ are rows of its transpose, so Exercise 1 and Property III yield the result.

17
Feb 19

## Multilinearity in columns

The axioms and properties of determinants established so far are asymmetric in that they are stated in terms of rows, while similar statements hold for columns. Among the most important properties is the fact that the determinant of $A$ is a multilinear function of its rows.  Here we intend to establish multilinearity in columns.

Exercise 1. The determinant of $A$ is a multilinear function of its columns.

Proof. To prove linearity in each column, it suffices to prove additivity and homogeneity. Let $A,B,C$ be three matrices that have the same columns, except for the $l$th column. The relationship between the $l$th columns is specified by $A^{(l)}=B^{(l)}+C^{(l)}$ or in terms of elements $a_{il}=b_{il}+c_{il}$ for all $i=1,...,n.$ Alternatively, in Leibniz' formula

(1) $\det A=\sum_{j_{1},...,j_{n}:\det P_{j_{1},...,j_{n}}\neq 0}a_{1j_{1}}...a_{nj_{n}}\det P_{j_{1},...,j_{n}}$

for all $i$ such that $j_i=l$ we have

(2) $a_{ij_i}=b_{ij_i}+c_{ij_i}.$

If $\det P_{j_1,...,j_n}\neq 0,$ for a given set $j_1,...,j_n$ there exists only one $i$ such that $j_i=l.$ Therefore by (2) we can continue (1) as

$\det A=\sum_{j_{1},...,j_{n}:\det P_{j_{1},...,j_{n}}\neq 0}a_{1j_{1}}...(b_{ij_{i}}+c_{ij_{i}})a_{nj_{n}}\det P_{j_{1},...,j_{n}}=\det B+\det C.$

Homogeneity is proved similarly, using equations $A^{(l)}=kB^{(l)}$ or in terms of elements $a_{il}=kb_{il}$ for all $i.$

17
Feb 19

## Determinant of a product

The derivation is very similar to the proof of the Leibniz rule.

Consider matrices $A,B$ and let $B_j$ denote the $j$th row of $B$. Row $C_i$ of the product $C=AB,$ obviously, can be written as $C_i=\sum_{j=1}^na_{ij}B_j.$ Using Property V $n$ times, we have

(1) $\det C=\det \left(\begin{array}{c}\sum_{j_1=1}^na_{1j_1}B_{j_1}\\C_2\\...\\C_n\end{array}\right) =\sum_{j_1=1}^na_{1j_1}\det \left(\begin{array}{c}B_{j_1}\\C_2\\...\\C_n\end{array}\right)$

$=\sum_{j_1=1}^na_{1j_1}\det \left( \begin{array}{c}B_{j_1}\\\sum_{j_2=1}^na_{2j_2}B_{j_2}\\...\\C_n\end{array}\right)=\sum_{j_1,j_2=1}^na_{1j_1}a_{2j_2}\det\left(\begin{array}{c}B_{j_1}\\B_{j_2}\\...\\ C_{n}\end{array}\right)$

$=...=\sum_{j_1,...,j_n=1}^na_{1j_1}...a_{nj_n}\det\left(\begin{array}{c} B_{j_1}\\B_{j_2}\\...\\B_{j_n}\end{array}\right).$

By analogy with permutation matrices denote

$B_{j_1,...,j_n}=\left(\begin{array}{c}B_{j_1}\\B_{j_2}\\...\\B_{j_n}\end{array}\right).$

Recall the link between $B_{j_1,...,j_n}$ and $B:$

(2) $B_{j_1,...,j_n}=P_{j_1,...,j_n}B.$

If we had proved the multiplication rule for (2), we would have

(3) $\det B_{j_1,...,j_n}=(\det P_{j_1,...,j_n})(\det B).$

In the theory of permutations (3) is proved without relying on the multiplication rule. I am going to use (3) as a shortcut that explains the idea. Combining (1) and (3) we obtain by the Leibniz formula

$\det C=\sum_{j_1,...,j_n:\det P_{j_1,...,j_n}\neq 0}a_{1j_1}...a_{nj_n}(\det P_{j_1,...,j_n})(\det B)=(\det A)(\det B).$
15
Feb 19

## Rule for calculating determinant

This is one of those cases when calculations explain the result. Let $e_j$ denote the $j$th unit row-vector. Row $A_i,$ obviously, can be decomposed as $A_i=\sum_{j=1}^na_{ij}e_j.$ Recall that by Property V, the determinant of $A$ is a multilinear function of its rows. Using Property V $n$ times, we have

(1) $\det A=\det \left(\begin{array}{c}\sum_{j_1=1}^na_{1j_1}e_{j_1}\\A_2\\...\\ A_n\end{array}\right)=\sum_{j_1=1}^na_{1j_1}\det\left(\begin{array}{c}e_{j_1}\\A_2\\...\\ A_{n}\end{array}\right)$

$=\sum_{j_1=1}^na_{1j_1}\det \left(\begin{array}{c}e_{j_1}\\\sum_{j_2=1}^na_{2j_2}e_{j_2}\\...\\A_n\end{array}\right)=\sum_{j_1,j_2=1}^na_{1j_1}a_{2j_2}\det\left(\begin{array}{c}e_{j_1}\\e_{j_2}\\...\\ A_{n}\end{array}\right)=...$

$=\sum_{j_1,...,j_n=1}^na_{1j_1}...a_{nj_n}\det\left(\begin{array}{c} e_{j_1}\\e_{j_2}\\...\\e_{j_n}\end{array}\right).$

Here by Property III

$\det\left(\begin{array}{c}e_{j_1}\\e_{j_2}\\...\\e_{j_n}\end{array}\right)=0$

if among rows there are equal vectors. The remaining matrices with nonzero determinants are permutation matrices, so

(2) $\det A=\sum_{j_1,...,j_n:\det P_{j_1,...,j_n}\neq 0}a_{1j_1}...a_{nj_n}\det P_{j_1,...,j_n}.$

Different-rows-different-columns rule. Take a good look at what the condition $\det P_{j_1,...,j_n}\neq 0$ implies about the location of the factors of the product $a_{1j_1}...a_{nj_n}.$ The rows $1,...,n$ to which the factors belong are obviously different. The columns $j_1,...,j_n$ are also different by the definition of a permutation matrix. Conversely, consider any combination $a_{i_1,j_1},...,a_{i_n,j_n}$ of elements such that no two elements belong to the same row or column. Rearrange the first indices $i_1,...,i_n$ in an ascending order, from $1$ to $n.$ This leads to a renumbering $\tilde{j}_{1},...,\tilde{j}_{n}$ of the second indices. The product $a_{i_1,j_1}...a_{i_n,j_n}$ becomes $a_{1,\tilde{j}_1}...a_{n,\tilde{j}_n}.$ Since the original second indices were all different, the new ones will be too. Hence, $P_{\tilde{j}_1,...,\tilde{j}_n}\neq 0$ and such a term must be in (2).

Remark 1. (2) is the Leibniz formula. The formula at the right of (1) is the Levy-Civita formula. The difference is that the Levy-Civita formula contains many more zero terms, while in (2) a term can be zero only if $a_{1j_1}...a_{nj_n}=0$.

Remark 2. Many textbooks instead of $\det P_{j_1,...,j_n}$ write in (2) signatures of permutations. Using $\det P_{j_1,...,j_n}$ is better because a) you save time by skipping the theory of permutations and b) you need a rule to calculate signatures of permutations and $\det P_{j_1,...,j_n}$ is such a rule (see an example).

11
Feb 19

## Properties of permutation matrices

### Shortcut for permutations

A permutation matrix permutes (changes orders of) rows of a matrix. The precise meaning of this statement is given in equation (1) below.

Partitioning the matrix $A$ into rows $A_i$ we have

$e_iA=(0,...,0,1,0,...0)\left(\begin{array}{c}A_1\\...\\A_i\\...\\A_n\end{array}\right)=A_i.$

Stacking these equations we get

$P_{j_1,...,j_n}A=\left(\begin{array}{c}e_{j_1}\\...\\e_{j_n}\end{array}\right) A=\left(\begin{array}{c}e_{j_1}A\\...\\e_{j_n}A\end{array}\right) =\left(\begin{array}{c}A_{j_1}\\...\\A_{j_n}\end{array}\right).$

By analogy with $P_{j_1,...,j_n}$ we denote the last matrix

$A_{j_1,...,j_n}=\left(\begin{array}{c}A_{j_1}\\...\\A_{j_n}\end{array}\right).$

Thus, pre-multiplication by $P_{j_1,...,j_n}$ transforms $A$ to $A_{j_1,...,j_n}:$

(1) $P_{j_1,...,j_n}A=A_{j_1,...,j_n}.$

If we had proven the multiplication rule for determinants, we could have concluded from (1) that

(2) $\det (P_{j_1,...,j_n})\det A=\det A_{j_1,...,j_n}.$

As we know, changing places of two rows changes the sign of $\det A$ by -1. (2) tells us that permutation by $P_{j_1,...,j_n}$ changes the sign of $\det A$ by $\det (P_{j_1,...,j_n}).$ In the rigorous algebra course (2) is proved using the theory of permutations, without employing the multiplication rule for determinants.

I am going to call (2) a shortcut for permutations and use it without a proof. In general, I prefer to use such shortcuts, to see what is going on and bypass tedious proofs.

### Other properties of permutation matrices

Exercise 1. Prove that the Definition 1 is equivalent to the following: A permutation matrix $P$ is defined by two conditions: a) all its columns are unit column-vectors and b) no two columns are equal.

Proof. Take the $j$th column. It contains one unity (the one that comes from the $j$th unit row-vector). It cannot contain more than one unity because all rows are different. Hence, the $j$th column is a unit column-vector. Different columns are different unit vectors because otherwise some row would contain at least two unities and would not be a unit vector.

Exercise 2. Prove that a permutation matrix is an orthogonal matrix.

Proof. By Exercise 1 we can write a permutation matrix as a matrix of unit column-vectors:

$P=\left( u_{j_1},...,u_{j_n}\right).$ Then

$P^TP=\left(\begin{array}{c}u_{j_1}^T\\...\\u_{j_n}^T\end{array}\right)\left(u_{j_1},...,u_{j_n}\right) =\left(\begin{array}{ccc}u_{j_1}^Tu_{j_1}&...&u_{j_1}^Tu_{j_n}\\...&...&...\\u_{j_n}^Tu_{j_1}&...&u_{j_n}^Tu_{j_n} \end{array}\right)=I,$

which proves orthogonality. It follows that $\det P=\pm 1$ (be careful with this equation, it follows from multiplicativity of determinants which we have not derived from our axioms).

11
Feb 19

## Permutation matrices

### Definition and example

Let $e_i=(0,...,0,1,0,...0)$ denote the $i$th unit row-vector (the $i$th component is 1 and the others are zeros).

Definition 1. A permutation matrix $P$ is defined by two conditions: a) all of its rows are unit row-vectors and b) no two rows are equal. We use the notation

$P_{j_1,...,j_n}=\left(\begin{array}{c}e_{j_1}\\...\\e_{j_n}\end{array}\right)$

where all rows are different.

Example. In the following matrix we change places of rows in such a way as to obtain the identity in the end. Each transformation changes the sign of the matrix by Property VI.

$P_{2,4,1,3}=\left(\begin{array}{c}e_2\\e_4\\e_1\\e_3\end{array}\right) =\left(\begin{array}{cccc}0&1&0&0\\0&0&0&1\\1&0&0&0\\0&0&1&0\end{array}\right)\overset{\text{changing 1st and 3rd rows}}{\rightarrow }\left(\begin{array}{cccc}1 &0&0&0\\0&0&0&1\\0&1&0&0\\0&0&1&0\end{array}\right)$

$\overset{\text{changing 3rd and 4th rows}}{\rightarrow }\left(\begin{array}{cccc}1&0&0&0\\0&0&0&1\\0&0&1&0\\0&1&0&0 \end{array}\right)\overset{\text{changing 2nd and 4th rows}}{\rightarrow }\left(\begin{array}{cccc}1&0&0&0\\ 0&1&0&0\\0&0&1&0\\0&0&0&1\end{array}\right).$

Since there are three changes and the final matrix is identity, by Axiom 3 $\det P_{2,4,1,3}=-1$.

It is possible to arrive to the identity matrix in more than one way but the result will be the same because the number $\det P_{2,4,1,3}$ is fixed. We will not delve into the theory of permutations.

10
Feb 19

## Properties IV-VI

Exercise 1. Let $S$ be a linearly independent system of $n-1$ vectors in $R^n$. Then it can be completed with a vector $B$ to form a basis in $R^n.$

Proof. One way to obtain $B$ is this. Let $P$ be a projector onto the span of $S$ and let $Q=I-P.$ Take as $B$ any nonzero vector from the image of $Q.$ It is orthogonal to any element of the image of $P$ and, in particular, to elements of $S.$ Therefore $S$ completed with $B$ gives a linearly independent system $\tilde{S}$. $\tilde{S}$ is a basis because $x=Px+Qx$ for any $x\in R^n.$

Property IV. Additivity. Suppose the $i$th row of $A$ is a sum of two vectors:

$A=\left(\begin{array}{c}A_1\\...\\A_i^\prime+A_i^{\prime\prime }\\...\\A_n\end{array} \right).$

Denote

$A^\prime=\left(\begin{array}{c}A_1\\...\\A_i^\prime\\...\\A_n\end{array}\right)$, $A^{\prime\prime}=\left(\begin{array}{c}A_1\\...\\A_i^{\prime\prime}\\...\\A_n\end{array}\right)$

(except for the $i$th row, all others are the same for all three matrices). Then $\det A=\det A^\prime+\det A^{\prime\prime}.$

Proof. Denote $S$ the system of $n-1$ vectors $A_1,...,A_{i-1},A_{i+1},...,A_n.$

Case 1. If $S$ is linearly dependent, then the system of all rows of $A$ is also linearly dependent. By Property III the determinants of all three matrices $A,\ A^\prime,\ A^{\prime\prime}$ are zero and the statement is true.

Case 2. Let $S$ be linearly independent. Then by Exercise 1 it can be completed with a vector $B$ to form a basis in $R^n.$ $A_i^\prime,\ A_i^{\prime\prime }$ can be represented as linear combinations of elements of $\tilde{S}.$ We are interested only in the coefficients of $B$ in those representations. So let $A_i^\prime=C+kB,$ $A_i^{\prime\prime}=D+lB,$ where $C$ and $D$ are linear combinations of elements of $S.$ Hence, $A_i^\prime+A_i^{\prime\prime}=C+D+(k+l)B.$

We can use Property II to eliminate $C,$ $D$ and $C+D$ from the $i$th rows of $A^\prime,$ $A^{\prime\prime}$ and $A,$ respectively, without changing the determinants of those matrices. Let $A^0$ denote the matrix obtained by replacing the $i$th row of $A$ with $B.$ Then by Property II and Axiom 1

$\det A=(k+l)\det A^0,$ $\det A^\prime=k\det A^0,$ $\det A^{\prime\prime}=l\det A^0,$

which proves the statement.

Combining homogeneity and additivity, we get the following important property that some people use as a definition:

Property V. Multilinearity. The determinant of $A$ is a multilinear function of its rows, that is, for each $i,$ it is linear in row $A_i,$ when the other rows are fixed.

Property VI. Antisymmetry. If the matrix $A^0$ is obtained from $A$ by changing places of two rows, then $\det A^0=-\det A.$

Proof. Let

$A=\left(\begin{array}{c}...\\A_i\\...\\A_j\\...\end{array}\right),$ $A^0=\left(\begin{array}{c}...\\A_j\\...\\A_i\\...\end{array}\right)$

(all other rows of these matrices are the same). Consider the next sequence of transformations:

$\left(\begin{array}{c}...\\A_i\\...\\A_j\\...\end{array}\right)\rightarrow\left(\begin{array}{c}...\\A_i+A_j\\...\\A_j\\...\end{array}\right)\rightarrow\left(\begin{array}{c}...\\A_i+A_j\\...\\A_j-(A_i+A_j)\\...\end{array}\right)=\left(\begin{array}{c}...\\A_i+A_j\\...\\-A_i\\...\end{array}\right)\rightarrow\left(\begin{array}{c}...\\A_i+A_j+(-A_i)\\...\\-A_i\\...\end{array}\right)=\left(\begin{array}{c}...\\A_j\\...\\-A_i\\...\end{array}\right).$

By Property II, each of these transformations preserves $\det A.$ Recalling homogeneity, we finish the proof.

9
Feb 19

## Axioms 1-3 and Properties I-III

### Axioms

Axiom 1. Homogeneity. If $A^\prime$ denotes the matrix obtained from $A$ by multiplying one of its rows by a number $k,$ then $\det A^\prime=k\det A.$

This implies that for $k\neq 0,$ $A^\prime$ and $A$ are invertible simultaneously.

Axiom 2. If $A^\prime$ denotes the matrix obtained from $A$ by adding one of the rows of $A$ to another, then $\det A^\prime=\det A$.

We remember that adding one of the rows of $A$ to another corresponds to adding one equation of the system $Ax=y$ to another and so this operation should not impact solvability of the system.

Axiom 3. $\det I=1.$

Adding this axiom on top of the previous two makes the determinant a unique function of a matrix.

### Properties

Assuming that $A$ is of size $n\times n,$ it is convenient to partition it into rows:

$A=\left(\begin{array}{c}A_1\\...\\A_n\end{array}\right).$

Property I. If one of the rows of $A$ is zero, then $\det A=0.$

Indeed, if $A_i=0,$ then $A_i=0\times A_i$ and by Axiom 1 $\det A=0\times\det A=0.$

Property II. If we add to one of the rows of $A$ another row multiplied by some number $k,$ the determinant does not change.

Proof. If $k=0,$ there is nothing to prove. Suppose $k\neq 0$ and we want to add $kA_j$ to $A_i.$ This result can be achieved in three steps.

a) Multiply $A_j$ by $k.$ By Axiom 1, $\det A$ gets multiplied by $k.$

b) In the new matrix, add row $kA_j$ to $A_i.$ By Axiom 2, this does not change the determinant.

c) In the resulting matrix, divide row numbered $j$ by $k.$ The determinant gets divided by $k.$

The determinant of the very first matrix will be the same as that of the very last matrix, while the $i$th row of the last matrix will be $A_i+kA_j.$

Property III. If rows of $A$ are linearly dependent, then $\det A=0.$

Proof. Suppose rows of $A$ are linearly dependent. Then one of the rows can be expressed as a linear combination of the others. Suppose, for instance, that $A_1=k_2A_2+...+k_nA_n.$ Multiply the $i$th row by $-k_i$ and add the result to the first row, for $i=2,...,n.$ Thereby we make the first row equal to $0,$ while maintaining the determinant the same by Property II. Then by Property I $\det A=0.$

8
Feb 19

## Determinants: starting simple

Previously  we looked at a motivating example to consider the determinant $\det A=ad-bc$ of a $2\times 2$ matrix

$A=\left(\begin{array}{cc}a&b\\c&d\end{array}\right).$

Using this basic example, now we formulate properties that uniquely define determinants of matrices of higher orders. The discussion is based mainly on Kurosh, Course in linear algebra, 9th edition, Moscow, 1968 (in Russian).

Observation 1. Homogeneity. If one of the rows of $A$ is multiplied by a number $k,$ then $\det A$ gets multiplied by $k,$

$\det \left(\begin{array}{cc}ka&kb\\c&d\end{array}\right) =kad-kbc=k\det A,$ $\det \left(\begin{array}{cc}a&b\\kc&kd\end{array}\right)=akd-bkc=k\det A.$

Observation 2. Adding one of the rows of $A$ to another doesn't change the value of the determinant:

$\det \left(\begin{array}{cc}a+c&b+d\\c&d\end{array}\right)=(a+c)d-(b+d)c=\det A,$ $\det \left(\begin{array}{cc}a&b\\a+c&b+d\end{array}\right)=a(b+d)-b(a+c)=\det A.$

To see the intuition behind these rules, recall that the purpose of the determinant is to verify whether the system $Ax=y$ has solutions. Homogeneity means that if one of the equations of the system is multiplied by a nonzero constant, the solvability of the new system will be equivalent to the solvability of the original system. Similarly, if one of the equations of the system is added to another, solvability of the system as judged by the determinant will not change. This makes sense because multiplying a system equation by a nonzero constant or adding one equation to another does not change the information contained in the system.

Keep in mind an emerging general idea: the determinant discards any transformations of the matrix that are not relevant to its invertibility.

Observation 3. Determinant of the identity: $\det I=1.$

Taking these properties as axioms for matrices of higher order, we will show that they uniquely define determinants and develop a couple of rules involving determinants. One of them is the Leibniz formula for determinants and the other is Cramer's rule.

31
Dec 18

## Questions for repetition: projectors and applications

1. If a matrix is idempotent, then it can have only two eigenvalues: 0 and 1.

2. If a matrix is idempotent, then its image is the set of all vectors that are unchanged by this matrix (such vectors are called fixed points).

3. The null space and the image of a projector are orthogonal.

4. If $P$ is a projector, then $Q=I-P$ is too. What is the relationship between the image and null space of $P$ and those of $Q?$

5. Using a basis in a subspace $L,$ how do you construct a projector $P$ onto $L?$ Prove existence, idempotency and symmetry. Prove that $\text{Img}(P)=L$.

6. Prove that if $P$ is the projector from the previous exercise, then $Q=I-P$ projects onto $L^\perp.$ Thus, for a given $L,$ we have a way to find $L^\perp.$ Use this fact to prove the theorem on the second orthocomplement.

7. Prove the generalized Pythagoras theorem.

8. Define the OLS estimator for the linear model and derive it. Why does the condition of linear independence of regressors imply that $k\leq n?$

9. How does the simplified derivation of the OLS estimator look like? Why is it not rigorous?

10. Find eigenvalues of a projector. In particular, prove that eigenvectors corresponding to $\lambda=1$ are orthogonal to eigenvectors corresponding to $\lambda=0.$

11. Prove the trace-commuting property.

12. What is the relationship between the trace and image dimension of a projector?

13. Using $P=X(X^TX)^{-1}X^T$, find the expressions for the fitted value and residual in the linear model theory.

14. Define the OLS estimator of $\sigma^2$ and prove its unbiasedness. Note the crucial role of the assumption $Var(e)=\sigma^2I.$ Without it there is no unbiasedness and Gauss-Markov theorem.

15. How do you define a standard normal vector? What are its mean and variance?

16. How do you define a general normal vector? Find its mean and variance.

17. Find the mean and variance of $\chi_n^2.$ Prove that $\chi_n^2/n$ converges in probability to 1.

18. Characterize the distribution of the OLS estimator of $\sigma ^{2}.$ Use it to find its mean and variance.