14
Jul 19

## Correctness of the space dimension definition

This proof has been taken from I. M. Gelfand, Lectures in Linear Algebra, 4th edition, 1970 (in Russian).

Lemma. Let $f_1,...,f_k$ be some vectors. Suppose that vectors $g_1,...,g_l$ are linearly independent and belong to the span $\text{span}(f_{1},...,f_{k}).$ Then $l\leq k.$

Proof. We prove the lemma by induction. Let $k=1.$ Then $l=1$ (linear dependence for an empty set of vectors is not defined) and the inequality $l\leq k$ is trivial.

Suppose the statement holds for $k-1$ and let us prove it for $k.$ Since $g_{1},...,g_{l}\in \text{span}(f_{1},...,f_{k}),$ we have

(1) $g_{1}=a_{11}f_{1}+...+a_{1k}f_{k}\\...\\ g_{l}=a_{l1}f_{1}+...+a_{lk}f_{k}.$

If all of $a_{1k},...,a_{lk}$ are zero, we have $g_{1},...,g_{l}\in \text{span}(f_{1},...,f_{k-1})$ and then by the induction assumption $l\leq k-1$ and, trivially, $l\leq k.$

Thus, we can assume that not all of $a_{1k},...,a_{lk}$ are zero. Suppose $a_{lk}\neq 0.$ Then we can solve the last equation in (1) for $f_{k}:$

(2) $f_{k}=\frac{1}{a_{lk}}g_{l}-\frac{1}{a_{lk}}\sum_{j=1}^{k-1}a_{l,j}f_{j}.$

Plugging this equation in the first equation in (1) we get

$g_{1}=a_{11}f_{1}+...+a_{1,k-1}f_{k-1}+a_{1k}\left( \frac{1}{a_{lk}}g_{l}-\frac{1}{a_{lk}}\sum_{j=1}^{k-1}a_{l,j}f_{j}\right) .$

Send $g_{l}$ to the left side; the remaining expression on the right side is a linear combination of $f_{1},...,f_{k-1}$; exact expressions of the coefficients $b_{1,j}$ of this linear combination don't matter. The result will be $g_{1}-\frac{a_{1k}}{a_{lk}}g_{l}=\sum_{j=1}^{k-1}b_{1,j}f_{j}.$ After doing the same with the first $l-1$ equations of (1) we get the system

$g_{1}-\frac{a_{1k}}{a_{lk}}g_{l}=\sum_{j=1}^{k-1}b_{1,j}f_{j},...,\ \ g_{l-1}-\frac{a_{l-1,k}}{a_{lk}}g_{l}=\sum_{j=1}^{k-1}b_{l-1,j}f_{j}.$

This shows that the vectors

$g_{1}^{\prime }=g_{1}-\frac{a_{1k}}{a_{lk}}g_{l},...,g_{l-1}^{\prime }=\ g_{l-1}-\frac{a_{l-1,k}}{a_{lk}}g_{l}$

belong to $\text{span}(f_{1},...,f_{k-1}).$ If they are linearly independent, we can use the induction assumption to conclude that $l-1\leq k-1,$ which will prove $l\leq k.$

Suppose with some $c_{i}$

$\sum_{i=1}^{l-1}c_{i}g_{i}^{\prime}=\sum_{i=1}^{l-1}c_{i}g_{i}-\sum_{i=1}^{l-1}c_{i}\frac{a_{i,k}}{a_{lk}}g_{l}=0.$

By the assumed linear independence of $g_{1},...,g_{l}$ this implies $c_{1}=...=c_{l-1}=0,$ so the system $g_{1}^{\prime },...,g_{l-1}^{\prime }$ is linearly independent.

Theorem. The definition of the space dimension is correct.

Proof. We write $\dim (L)=n$ if a) $L$ contains $n$ linearly independent vectors $x_{1},...,x_{n}$ and b) $L=\text{span}(x_{1},...,x_{n}).$ We need to prove that any system with properties a) and b) has the same number of vectors. Suppose $y_{1},...,y_{m}$ is another such system. Since $y_{1},...,y_{m}$ belong to $\text{span}(x_{1},...,x_{n}),$ by the lemma $m\leq n.$ Similarly, $n\leq m.$ So $n=m.$

24
Feb 19

## Determinants: questions for repetition

All formulas for calculating determinants are complex. There are two ways to study determinants: one is to start with the explicit expressions (the Leibniz formula or Laplace expansion) and the other is to start with simple functional properties (called Axioms 1-3 here), then develop more advanced ones (multilinearity) and, finally, derive explicit formulas. I prefer to see ideas and follow the second way.

1. Formulate Axioms 1-3 and make sure that you understand the motivation: 1) multiplying one of the equations of the system $ax+by=e,$ $cx+dy=f$ by a nonzero number $k$ does not impact solvability of the system, 2) similarly, adding one equation of the system to another does not affect solvability and 3) the system $x=e,$ $y=f$ is trivially solvable.

2. Interpret in terms of system solvability properties I-III and prove them.

3. Prove that the determinant is a multilinear antisymmetric function of rows.

4. Define a permutation matrix and give an example of calculating its determinant using Axioms 1-3 and Properties I-VI.

5. Show what pre-multiplication by a permutation matrix does to a matrix $A.$

6. Prove that a permutation matrix is an orthogonal matrix.

7. Derive the Leibniz formula.  At this point, the way multilinearity (in rows) and permutation matrices are used should be absolutely obvious. If they are not, start from Question 1.

8. Explain the different-rows-different-columns (cross-out) rule.

9. Prove that transposition does not change determinants.

10. Let $A^{(j)}$ be the $j$-th column of $A$ and define the row-vector $L_j$ by $\det A=L_jA^{(j)}.$ Why is this definition correct? Prove that $L_jA^{(k)}=0$ for any $k\neq j.$

11. Using Question 12, derive Cramer's rule for solving the system $Ax=y.$

12. Using Question 12, prove the invertibility criterion.

13. Derive the Laplace expansion.

14. Using Questions 14 and 15, derive the explicit formula for $A^{-1}.$

23
Feb 19

## Laplace expansion

The Laplace expansion is a formula for calculating determinants that is calculationally more efficient than the Leibniz formula but less efficient than the decomposition into triangular matrices. See if you like the proof below or the one that heavily relies on permutations.

In fact, we have already dealt with the Laplace expansion when we said that $\det A$ is a linear function of column $A^{(j)}:$

(1) $\det A=\sum_{i=1}^na_{ij}l_{ji}.$

Our task is to analyze the coefficients $l_{ji}.$ They will be identified as determinants of certain submatrices of $A,$ up to the sign.

### Special case

For simplicity, let us start with $j=1:$

(2) $\det A=a_{11}l_{11}+...+a_{n1}l_{1n}.$

Definition. For any $a_{ij},$ denote $A_{ij}$ the matrix resulting from $A$ after deleting row $i$ and column $j.$ $A_{ij}$ is of size $(n-1)\times (n-1).$ Its determinant $\det A_{ij}=M_{ij}$ is called $(i,j)$-minor.

Step 1. We want to show that $l_{11}=M_{11}.$ By the cross-out rule $l_{11}$ depends only on the elements of $A_{11}.$ When studying $l_{11},$ we can assume without loss of generality that

$a_{11}=1,$ $a_{21}=...=a_{n1}=0.$

Then (2) becomes $\det A=l_{11}.$ This allows us to assert that $l_{11}$ satisfies Axioms 1-3:

1) $l_{11}$ is a homogeneous function of any row of $A_{11}$ because $\det A$ has this property,

2) Adding one of the rows of $A_{11}$ to another row does not change $l_{11}$ because $\det A$ stays the same, and

3) If $A_{11}=I_{(n-1)\times (n-1)},$ then $A=I_{n\times n}$ and $l_{11}=\det I=1.$

Since, again, $l_{11}$ is a function of elements of $A_{11}$ only, it follows that

(3) $l_{11}=\det A_{11}=M_{11}.$

Step 2. To analyze $l_{12},$ as above, we can assume that

$a_{21}=1,$ $a_{11}=a_{31}=...=a_{n1}=0.$

(2) becomes $\det A=l_{12}.$ Here $A=(u_2,A^{(2)},...,A^{(n)})$ where $u_2$ is the second unit column-vector and $A^{(2)},...,A^{(n)}$ are columns of $A.$ Let $\tilde{A}$ be the result of permuting the first and second rows of $A.$ This permutation changes the sign of $\det A$ and does not change $l_{12}$ (the matrix $A_{12}$ for $A$ is the same as $A_{11}$ for $\tilde{A}$). Hence,

(4) $\det \tilde{A}=-\det A=-l_{12}.$

For $\tilde{A},$ $l_{11}$ is the same as $l_{11}$ for $A$ in Step 1, so (3) and (4) imply $l_{12}=-M_{12}.$

Step $i.$ We can assume that

$a_{i1}=1,$ $a_{11}=...=a_{i-1,1}=a_{i+1,1}=a_{n1}=0.$

(2) becomes $\det A=l_{1i},$ where $A=(u_i,A^{(2)},...,A^{(n)})$ and $u_{i}$ is the $i$-th unit column-vector. As in Step 2, we can reduce this case to Step 1. Permute rows $i$ and $i-1,$ then permute rows $i-1$ and $i-2$ and so on. In total we need $i-1$ permutations leading to $i-1$ changes in sign. Instead of (4) we get

$\det \tilde{A}=(-1)^{i-1}\det A=(-1)^{i-1}l_{1i},$

where $\tilde{A}$ is the result of $i-1$ permutations. The element in the upper left corner of $\tilde{A}$ is 1 and therefore $l_{1i}$ is the same as $l_{11}$ for $A$ in Step 1. The conclusion is that

(5) $l_{1i}=(-1)^{i-1}M_{1i}.$

### General case

To consider (1), we reduce the case of the $j$-th column to the case of the first column. For this, we permute columns $j$ and $j-1,$ then columns $j-1$ and $j-2$ and so on. In total we need $j-1$ permutations. Denoting the new matrix $\tilde{A},$ we have $\det \tilde{A}=(-1)^{j-1}\det A.$ To $\det \tilde{A}$ we can apply what we know about (2). This will lead to multiplying (5) by $(-1)^{j-1}.$ The result is

$l_{ji}=(-1)^{i-1+j-1}M_{ji}=(-1)^{i+j}M_{ji}.$

Thus we have derived the Laplace expansion:

Theorem. For $j=1,...,n$ one has an expansion by column

$\det A=\sum_{i=1}^na_{ij}(-1)^{i+j}M_{ji}.$

The meaning of this expansion is that the calculation of the determinant of $A$ is reduced to the calculation of the determinants of matrices of lower dimension. Instead of expansions by column, one can use expansions by row, whichever is convenient. The expression $(-1)^{i+j}M_{ji}$ is called a cofactor of the element $a_{ij}.$

22
Feb 19

## Cramer's rule and invertibility criterion

### Consequences of multilinearity

For a fixed $j,$ $\det A$ is a linear function of column $A^{(j)}.$ Such a linear function generates a row-vector $L_{j}$ by way of a formula (see Exercise 3)

(1) $\det A=L_jA^{(j)}.$

Exercise 1. In addition to (1), we have

(2) $L_jA^{(k)}=0$ for any $k\neq l.$

Proof. Here and in the future it is useful to introduce the coordinate representation for $L_j=(l_{j1},...,l_{jn})$ and put

(3) $L=\left(\begin{array}{c}L_1\\...\\L_n\end{array}\right).$

Then we can write (1) as $\det A=\sum_{i=1}^nl_{ji}a_{ij}.$ Here the element $l_{ji}$ does not involve $a_{ij}$ and therefore by the different-columns-different-rows rule it does not involve elements of the entire column $A^{(j)}.$ Hence, the vector $L_j$ does not involve elements of the column $A^{(j)}.$

Let $A^{\prime }$ denote the matrix obtained from $A$ by replacing column $A^{(j)}$ with column $A^{(k)}.$ The vector $L_j$ for the matrix $A^{\prime }$ is the same as for $A$ because both vectors depend on the elements from columns other than the column numbered $j.$ Since $A^\prime$ contains linearly dependent (actually two identical) columns, $\det A^\prime=0.$ Using in (1) $A^\prime$ instead of $A$ we get $0=\det A^\prime=L_jA^{(k)},$ as required.

After reading the next two sections, come back and read this statement again to appreciate its power and originality.

### Cramer's rule

Exercise 2. Suppose $\det A\neq 0.$ For any $y\in R^n$ denote $B_j$ the matrix formed by replacing the $j$-th column of $A$ by the column vector $y$. Then the solution of the system $Ax=y$ exists and the components of $x$ are given by

$x_j=\frac{\det B_j}{\det A},\ j=1,...,n.$

Proof. Premultiply $Ax=y$ by $L_{j}:$

(4) $L_jy=L_jAx=\left(L_jA^{(1)},...,L_jA^{(n)}\right)x=(0,...,\det A,...,0)x.$

Here we applied (1) and (2) (the $j$-th component of the vector $(0,...,\det A,...,0)$ is $\det A$ and all others are zeros). From (4) it follows that $(\det A)x_{j}=L_{j}y.$ On the other hand, from (1) we have $\det B_j=L_jy$ (the vector $L_j$ for $B_j$ is the same as for $A,$ see the proof of Exercise 1). The last two equations prove the statement.

### Invertibility criterion

Exercise 3. $A$ is invertible if and only if $\det A\neq 0.$

Proof. If $A$ is invertible, then $AA^{-1}=I.$ By multiplicativity of determinant and Axiom 3 this implies $\det A\det (A^{-1})=1.$ Thus, $\det A\neq 0.$

Conversely, suppose $\det A\neq 0.$ (1), (2) and (3) imply

$LA=\left(\begin{array}{c}L_1\\...\\L_n\end{array}\right) (A^{(1)},...,A^{(n)})=\left(\begin{array}{ccc}L_1A^{(1)}&...&L_1A^{(n)}\\...&...&...\\L_nA^{(1)}&...&L_nA^{(n)}\end{array} \right)$

$=\left(\begin{array}{ccc}\det A&...&0\\...&...&...\\0&...&\det A\end{array} \right)=\det A\times I.$

This means that the matrix $\frac{1}{\det A}L$ is the inverse of $A.$ Recall that existence of the left inverse implies that of the right inverse, so $A$ is invertible.

18
Feb 19

## Determinant of a transpose

Exercise 1. $\det (A^T)=\det A.$

Proof. The proof is similar to the derivation of the Leibniz formula. Using the notation from that derivation, we decompose rows of $A$ into linear combinations of unit row-vectors $A_i=\sum_{j=1}^na_{ij}e_j$. Hence $A_i^T=\sum_{j=1}^na_{ij}e_j^T.$ Therefore by multilinearity in columns

(1) $\det (A^{T})=\sum_{j_{1},...,j_{n}=1}^{n}a_{1j_{1}}...a_{nj_{n}}\det \left( e_{j_{1}}^{T},...,e_{j_{n}}^{T}\right)$

$=\sum_{j_{1},...,j_{n}:\det P_{j_{1},...,j_{n}}^{T}\neq 0}a_{1j_{1}}...a_{nj_{n}}\det P_{j_{1},...,j_{n}}^{T}.$

Now we want to relate $\det P_{j_1,...,j_n}^T$ to $\det P_{j_1,...,j_n}$.

(2) $\det P_{j_1,...,j_n}^T=\det (P_{j_1,...,j_n}^{-1})$ (the transpose of an orthogonal matrix equals its inverse)

$=\frac{1}{\det (P_{j_1,...,j_n})}$          (the determinant of the inverse is the inverse of the determinant)

$=\det (P_{j_1,...,j_n})$      (because $P_{j_1,...,j_n}$ is orthogonal and its determinant is either 1 or -1 ).

(1) and (2) prove the statement.

Apart from being interesting in its own right, Exercise 1 allows one to translate properties in terms of rows to properties in terms of columns, as in the next corollary.

Corollary 1. $\det A=0$ if two columns of $A$ are linearly dependent.

Indeed, columns of $A$ are rows of its transpose, so Exercise 1 and Property III yield the result.

17
Feb 19

## Multilinearity in columns

The axioms and properties of determinants established so far are asymmetric in that they are stated in terms of rows, while similar statements hold for columns. Among the most important properties is the fact that the determinant of $A$ is a multilinear function of its rows.  Here we intend to establish multilinearity in columns.

Exercise 1. The determinant of $A$ is a multilinear function of its columns.

Proof. To prove linearity in each column, it suffices to prove additivity and homogeneity. Let $A,B,C$ be three matrices that have the same columns, except for the $l$th column. The relationship between the $l$th columns is specified by $A^{(l)}=B^{(l)}+C^{(l)}$ or in terms of elements $a_{il}=b_{il}+c_{il}$ for all $i=1,...,n.$ Alternatively, in Leibniz' formula

(1) $\det A=\sum_{j_{1},...,j_{n}:\det P_{j_{1},...,j_{n}}\neq 0}a_{1j_{1}}...a_{nj_{n}}\det P_{j_{1},...,j_{n}}$

for all $i$ such that $j_i=l$ we have

(2) $a_{ij_i}=b_{ij_i}+c_{ij_i}.$

If $\det P_{j_1,...,j_n}\neq 0,$ for a given set $j_1,...,j_n$ there exists only one $i$ such that $j_i=l.$ Therefore by (2) we can continue (1) as

$\det A=\sum_{j_{1},...,j_{n}:\det P_{j_{1},...,j_{n}}\neq 0}a_{1j_{1}}...(b_{ij_{i}}+c_{ij_{i}})a_{nj_{n}}\det P_{j_{1},...,j_{n}}=\det B+\det C.$

Homogeneity is proved similarly, using equations $A^{(l)}=kB^{(l)}$ or in terms of elements $a_{il}=kb_{il}$ for all $i.$

17
Feb 19

## Determinant of a product

The derivation is very similar to the proof of the Leibniz rule.

Consider matrices $A,B$ and let $B_j$ denote the $j$th row of $B$. Row $C_i$ of the product $C=AB,$ obviously, can be written as $C_i=\sum_{j=1}^na_{ij}B_j.$ Using Property V $n$ times, we have

(1) $\det C=\det \left(\begin{array}{c}\sum_{j_1=1}^na_{1j_1}B_{j_1}\\C_2\\...\\C_n\end{array}\right) =\sum_{j_1=1}^na_{1j_1}\det \left(\begin{array}{c}B_{j_1}\\C_2\\...\\C_n\end{array}\right)$

$=\sum_{j_1=1}^na_{1j_1}\det \left( \begin{array}{c}B_{j_1}\\\sum_{j_2=1}^na_{2j_2}B_{j_2}\\...\\C_n\end{array}\right)=\sum_{j_1,j_2=1}^na_{1j_1}a_{2j_2}\det\left(\begin{array}{c}B_{j_1}\\B_{j_2}\\...\\ C_{n}\end{array}\right)$

$=...=\sum_{j_1,...,j_n=1}^na_{1j_1}...a_{nj_n}\det\left(\begin{array}{c} B_{j_1}\\B_{j_2}\\...\\B_{j_n}\end{array}\right).$

By analogy with permutation matrices denote

$B_{j_1,...,j_n}=\left(\begin{array}{c}B_{j_1}\\B_{j_2}\\...\\B_{j_n}\end{array}\right).$

Recall the link between $B_{j_1,...,j_n}$ and $B:$

(2) $B_{j_1,...,j_n}=P_{j_1,...,j_n}B.$

If we had proved the multiplication rule for (2), we would have

(3) $\det B_{j_1,...,j_n}=(\det P_{j_1,...,j_n})(\det B).$

In the theory of permutations (3) is proved without relying on the multiplication rule. I am going to use (3) as a shortcut that explains the idea. Combining (1) and (3) we obtain by the Leibniz formula

$\det C=\sum_{j_1,...,j_n:\det P_{j_1,...,j_n}\neq 0}a_{1j_1}...a_{nj_n}(\det P_{j_1,...,j_n})(\det B)=(\det A)(\det B).$
15
Feb 19

## Rule for calculating determinant

This is one of those cases when calculations explain the result.

Let $e_j$ denote the $j$th unit row-vector. Row $A_i,$ obviously, can be decomposed as $A_i=\sum_{j=1}^na_{ij}e_j.$ Recall that by Property V, the determinant of $A$ is a multilinear function of its rows. Using Property V $n$ times, we have

(1) $\det A=\det \left(\begin{array}{c}\sum_{j_1=1}^na_{1j_1}e_{j_1}\\A_2\\...\\ A_n\end{array}\right)=\sum_{j_1=1}^na_{1j_1}\det\left(\begin{array}{c}e_{j_1}\\A_2\\...\\ A_{n}\end{array}\right)$

$=\sum_{j_1=1}^na_{1j_1}\det \left(\begin{array}{c}e_{j_1}\\\sum_{j_2=1}^na_{2j_2}e_{j_2}\\...\\A_n\end{array}\right)=\sum_{j_1,j_2=1}^na_{1j_1}a_{2j_2}\det\left(\begin{array}{c}e_{j_1}\\e_{j_2}\\...\\ A_{n}\end{array}\right)=...$

$=\sum_{j_1,...,j_n=1}^na_{1j_1}...a_{nj_n}\det\left(\begin{array}{c} e_{j_1}\\e_{j_2}\\...\\e_{j_n}\end{array}\right).$

Here by Property III

$\det\left(\begin{array}{c}e_{j_1}\\e_{j_2}\\...\\e_{j_n}\end{array}\right)=0$

if among rows there are equal vectors. The remaining matrices with nonzero determinants are permutation matrices, so

(2) $\det A=\sum_{j_1,...,j_n:\det P_{j_1,...,j_n}\neq 0}a_{1j_1}...a_{nj_n}\det P_{j_1,...,j_n}.$

Different-rows-different-columns rule. Take a good look at what the condition $\det P_{j_1,...,j_n}\neq 0$ implies about the location of the factors of the product $a_{1j_1}...a_{nj_n}.$ The rows $1,...,n$ to which the factors belong are obviously different. The columns $j_1,...,j_n$ are also different by the definition of a permutation matrix. Conversely, consider any combination $a_{i_1,j_1},...,a_{i_n,j_n}$ of elements such that no two elements belong to the same row or column. Rearrange the first indices $i_1,...,i_n$ in an ascending order, from $1$ to $n.$ This leads to a renumbering $\tilde{j}_{1},...,\tilde{j}_{n}$ of the second indices. The product $a_{i_1,j_1}...a_{i_n,j_n}$ becomes $a_{1,\tilde{j}_1}...a_{n,\tilde{j}_n}.$ Since the original second indices were all different, the new ones will be too. Hence, $P_{\tilde{j}_1,...,\tilde{j}_n}\neq 0$ and such a term must be in (2).

This rule alternatively can be called a cross-out rule because in practice it is used like this: take, say, element $a_{11}$ and cross out the row and column it belongs to. The next factor is selected from the remaining matrix. In that matrix, again cross out the row and column the second factor sits in. Continue like this until you have $n$ factors. Multiply the resulting product by the determinant of an appropriate permutation matrix, and you have one term of the Leibniz formula.

Remark 1. (2) is the Leibniz formula. The formula at the right of (1) is the Levy-Civita formula. The difference is that the Levy-Civita formula contains many more zero terms, while in (2) a term can be zero only if $a_{1j_1}...a_{nj_n}=0$.

Remark 2. Many textbooks instead of $\det P_{j_1,...,j_n}$ write in (2) signatures of permutations. Using $\det P_{j_1,...,j_n}$ is better because a) you save time by skipping the theory of permutations and b) you need a rule to calculate signatures of permutations and $\det P_{j_1,...,j_n}$ is such a rule (see an example).

11
Feb 19

## Properties of permutation matrices

### Shortcut for permutations

A permutation matrix permutes (changes orders of) rows of a matrix. The precise meaning of this statement is given in equation (1) below.

Partitioning the matrix $A$ into rows $A_i$ we have

$e_iA=(0,...,0,1,0,...0)\left(\begin{array}{c}A_1\\...\\A_i\\...\\A_n\end{array}\right)=A_i.$

Stacking these equations we get

$P_{j_1,...,j_n}A=\left(\begin{array}{c}e_{j_1}\\...\\e_{j_n}\end{array}\right) A=\left(\begin{array}{c}e_{j_1}A\\...\\e_{j_n}A\end{array}\right) =\left(\begin{array}{c}A_{j_1}\\...\\A_{j_n}\end{array}\right).$

By analogy with $P_{j_1,...,j_n}$ we denote the last matrix

$A_{j_1,...,j_n}=\left(\begin{array}{c}A_{j_1}\\...\\A_{j_n}\end{array}\right).$

Thus, pre-multiplication by $P_{j_1,...,j_n}$ transforms $A$ to $A_{j_1,...,j_n}:$

(1) $P_{j_1,...,j_n}A=A_{j_1,...,j_n}.$

If we had proven the multiplication rule for determinants, we could have concluded from (1) that

(2) $\det (P_{j_1,...,j_n})\det A=\det A_{j_1,...,j_n}.$

As we know, changing places of two rows changes the sign of $\det A$ by -1. (2) tells us that permutation by $P_{j_1,...,j_n}$ changes the sign of $\det A$ by $\det (P_{j_1,...,j_n}).$ In the rigorous algebra course (2) is proved using the theory of permutations, without employing the multiplication rule for determinants.

I am going to call (2) a shortcut for permutations and use it without a proof. In general, I prefer to use such shortcuts, to see what is going on and bypass tedious proofs.

### Other properties of permutation matrices

Exercise 1. Prove that Definition 1 is equivalent to the following: A permutation matrix $P$ is defined by two conditions: a) all its columns are unit column-vectors and b) no two columns are equal.

Proof. Take the $j$th column. It contains one unity (the one that comes from the $j$th unit row-vector). It cannot contain more than one unity because all rows are different. Hence, the $j$th column is a unit column-vector. Different columns are different unit vectors because otherwise some row would contain at least two unities and would not be a unit vector.

Exercise 2. Prove that a permutation matrix is an orthogonal matrix.

Proof. By Exercise 1 we can write a permutation matrix as a matrix of unit column-vectors:

$P=\left( u_{j_1},...,u_{j_n}\right).$ Then

$P^TP=\left(\begin{array}{c}u_{j_1}^T\\...\\u_{j_n}^T\end{array}\right)\left(u_{j_1},...,u_{j_n}\right) =\left(\begin{array}{ccc}u_{j_1}^Tu_{j_1}&...&u_{j_1}^Tu_{j_n}\\...&...&...\\u_{j_n}^Tu_{j_1}&...&u_{j_n}^Tu_{j_n} \end{array}\right)=I,$

which proves orthogonality. It follows that $\det P=\pm 1$ (be careful with this equation, it follows from multiplicativity of determinants which we have not derived from our axioms).

11
Feb 19

## Permutation matrices

### Definition and example

Let $e_i=(0,...,0,1,0,...0)$ denote the $i$th unit row-vector (the $i$th component is 1 and the others are zeros).

Definition 1. A permutation matrix $P$ is defined by two conditions: a) all of its rows are unit row-vectors and b) no two rows are equal. We use the notation

$P_{j_1,...,j_n}=\left(\begin{array}{c}e_{j_1}\\...\\e_{j_n}\end{array}\right)$

where all rows are different.

Example. In the following matrix we change places of rows in such a way as to obtain the identity in the end. Each transformation changes the sign of the matrix by Property VI.

$P_{2,4,1,3}=\left(\begin{array}{c}e_2\\e_4\\e_1\\e_3\end{array}\right) =\left(\begin{array}{cccc}0&1&0&0\\0&0&0&1\\1&0&0&0\\0&0&1&0\end{array}\right)\overset{\text{changing 1st and 3rd rows}}{\rightarrow }\left(\begin{array}{cccc}1 &0&0&0\\0&0&0&1\\0&1&0&0\\0&0&1&0\end{array}\right)$

$\overset{\text{changing 3rd and 4th rows}}{\rightarrow }\left(\begin{array}{cccc}1&0&0&0\\0&0&0&1\\0&0&1&0\\0&1&0&0 \end{array}\right)\overset{\text{changing 2nd and 4th rows}}{\rightarrow }\left(\begin{array}{cccc}1&0&0&0\\ 0&1&0&0\\0&0&1&0\\0&0&0&1\end{array}\right).$

Since there are three changes and the final matrix is identity, by Axiom 3 $\det P_{2,4,1,3}=-1$.

It is possible to arrive to the identity matrix in more than one way but the result will be the same because the number $\det P_{2,4,1,3}$ is fixed. Calculating determinants of permutation matrices in this way allows us to avoid studying the theory of permutations.