14
Jun 18

From invertibility to determinants: argument is more important than result

From invertibility to determinants: argument is more important than result

Interestingly enough, determinants appeared before matrices.

Invertibility condition and expression for the inverse

Exercise 1. Let A be a 2\times 2 matrix. Using the condition AA^{-1}=I, find the invertibility condition and the inverse B=A^{-1}.

Solution. A good notation makes half of the solution. Denote

A=\left(\begin{array}{cc}a&b\\c&d\end{array}\right), B=\left(\begin{array}{cc}x&u\\y&v\end{array}\right).

It should be true that

\left(\begin{array}{cc}a&b\\c&d\end{array}\right)\left(\begin{array}{cc}x&u\\y&v\end{array}  \right)=\left(\begin{array}{cc}1&0\\0&1\end{array}\right).

This gives us four equations ax+by=1,\ au+bv=0,\ cx+dy=0,\ cu+dv=1. The notation guides us to consider two systems:

\left\{\begin{array}{c}ax+by=1\\cx+dy=0\end{array}\right. , \left\{\begin{array}{c}au+bv=0\\cu+dv=1\end{array}\right.

From the first system we have

\left\{\begin{array}{c}adx+bdy=d\\bcx+bdy=0\end{array}\right. .

Subtracting the second equation from the first we get (ad-bc)x=d. Hence, imposing the condition

(1) ad-bc\neq 0

we have x=\frac{d}{ad-bc}.

Definition. The method for solving a system of linear equations applied here is called an elimination method: we multiply the two equations by something in such a way that after subtracting one equation from another one variable is eliminated. There is also a substitution method: you solve one equation for one variable and plug the resulting expression into another equation. The elimination method is better because it allows one to see the common structure of the resulting expressions.

Use this method to find the other variables:

y=\frac{-c}{ad-bc}, u=\frac{-b}{ad-bc}, v=\frac{a}{ad-bc}.

Thus (1) is the existence condition for the inverse and the inverse is

(2) A^{-1}=\frac{1}{ad-bc}\left(\begin{array}{cc}d&-b\\-c&a\end{array}\right).

Exercise 2. Check that (2) satisfies

(3) AA^{-1}=I.

Determinant

The problem with determinants is that they are needed early in the course but their theory requires developed algebraic thinking. I decided to stay at the intuitive level for a while and delay the theory until Section 8.

Definition. The expression \det A=ad-bc is called a determinant of the matrix

A=\left(\begin{array}{cc}a&b\\c&d\end{array}\right).

The determinant of a general square matrix can be found using the Leibniz formula.

Exercise 3. Check that \det(AB)=(\det A)(\det B) (multiplicativity). Hint. Find the left and right sides and compare them. Here is the proof in the general case.

Exercise 4. How much is \det I?

Theorem. If the determinant of a square matrix A is different from zero, then that matrix is invertible.

The proof will be given later. Understanding functional properties of the inverse is more important than knowing the general expression for the inverse.

Exercise 5 (why do we need the determinant?) Prove that A is invertible if and only if \det A\neq 0.

Proof. Suppose \det A\neq 0. Then by the theorem above the inverse exists. Conversely, suppose the inverse exists. Then it satisfies (3). Apply \det to both sides of (3):

(4) (\det A)\det (A^{-1})=1.

This shows that \det A\neq 0.

Exercise 6 (determinant of an inverse) What is the relationship between \det A and \det A^{-1}?

Solution. From (4) we see that

(5) \det (A^{-1})=(\det A)^{-1}.

Exercise 7. For square matrices, existence of a right or left inverse implies existence of the other.

Proof. Suppose A,B are square and B is the right inverse:

(6) AB=I.

As in Exercise 5, this implies \det B\neq 0. By the theorem above we can use

(7) BB^{-1}=B^{-1}B=I.

By associativity (6) and (7) give BA=BA(BB^{-1})=B(AB)B^{-1}=BIB^{-1}=I.

The case of the left inverse is similar.

16
Feb 16

OLS estimator for multiple regression - simplified derivation

OLS estimator for multiple regression - as simple as possible

Here I try to explain a couple of ideas to folks not familiar with (or afraid of?) matrix algebra.

A matrix is a rectangular table of numbers. Operations with them most of the time are performed like with numbers. For example, for numbers we know that a+b=b+a. For matrices this is also true, except that they often are denoted with capital letters: A+B=B+A. It is easier to describe differences than similarities.

(1) One of the differences is that for matrices we can define a new operation called transposition: the columns of the original matrix A are put into rows of a new matrix, which is called a transposed of A. Visualize it like this: if A has more rows than columns, then for the transposed the opposite will be true:

Transposed matrix

 

 

 

 

 

 

(2) We know that the number 1 has the property that 1\times a=a. For matrices, the analog is I\times A=A where I is a special matrix called identity.

(3) The property \frac{1}{a}a=1 we have for nonzero numbers generalizes for matrices except that instead of \frac{1}{A} we write A^{-1}. Thus, the inverse matrix has the property that A^{-1}A=I.

(4) You don't need to worry about how these operations are performed when you are given specific numerical matrices, because they can be easily done in Excel. All you have to do is watch that theoretical requirements are not violated. One of them is that, in general, matrices in a product cannot change places: AB\ne BA.

Here is an example that continues my previous post about simplified derivation of the OLS estimator. Consider multiple regression

(5) y=X\beta+u

where y is the dependent variable, X is an independent variable, \beta is the parameter to estimate and u is the error. Multiplying from the left equation (5) by X^T we get X^Ty=X^TX\beta+X^Tu. As in my previous post, we get rid of the term containing the error by formally putting X^Tu=0. The resulting equation X^Ty=X^TX\beta we solve for \beta by multiplying by (X^TX)^{-1} from the left:

(X^TX)^{-1}X^Ty=(X^TX)^{-1}(X^TX)\beta=(using\ (3))=I\beta=(using\ (2))=\beta.

Putting the hat on \beta, we arrive to the OLS estimator for multiple regression\hat{\beta}=(X^TX)^{-1}X^Ty. Like in the previous post, the whole derivation takes just one paragraph!

Caveat. See the rigorous derivation here. My objective is not rigor but to give you something easy to do and remember.