31
Aug 18

Eigenvalues and eigenvectors

Eigenvalues and eigenvectors

Motivation: what matrix can be called simplest?

Of course, nothing can be simpler than the zero and identity matrices. What is the next level of simplicity? In the one-dimensional case, consider a linear mapping f. Any real number x can be written as x=x1. By homogeneity f(x)=xf(1). Denoting c=f(1), we see that f is just multiplication by a number: f(x)=cx.

Let's look at the 2-dimensional generalization. We know that a linear mapping is given by a matrix. Consider

A=\left(  \begin{array}{cc}\lambda _1&0\\0&\lambda_2\end{array}\right).

A is scaling by \lambda_1 along the x-axis:

(1) A\left(\begin{array}{c}x\\0\end{array}\right) =\left(\begin{array}{cc}\lambda_1&0\\0&\lambda_2\end{array}\right)\left(\begin{array}{c}x\\0\end{array}\right) =\left(\begin{array}{c}\lambda_1x\\  0\end{array}\right)=\lambda_1\left(\begin{array}{c}x \\0\end{array}\right).

Similarly, A is scaling by \lambda_2 along the y-axis. Matrices of type

\left(\begin{array}{ccc}\lambda_1&...&0 \\...&...&... \\0&...&\lambda_{n}\end{array}\right)

are called diagonal and the short notation is diag[\lambda_1,...,\lambda_n]. Any simpler than that, and we'll be in the kindergarten.

It remains to name the things and link them to what we know

Definition 1. Based on (1), we say that \lambda is an eigenvalue of the matrix A if there exists a nonzero vector x such that

(2) Ax=\lambda x.

Such an x is called an eigenvector corresponding to \lambda .

With a zero x, (2) would be true for any \lambda, and there would be no definition. Since A is a real matrix, \lambda in principle cannot be complex if x\in R^n.

Definition 2. If one x satisfies (2), then the whole straight line passing through x satisfies (2). Moreover, if we have two eigenvectors x,y corresponding to \lambda, then for any linear combination z=ax+by we have Az=aAx+bAy=\lambda (ax+by)=\lambda z. Thus the set of all eigenvectors corresponding to one eigenvalue, completed with zero, is a subspace. We call it an eigenspace corresponding to \lambda. Its dimension is called a multiplicity of \lambda.

Exercise 1. a) The eigenspace corresponding to \lambda coincides with N(A-\lambda I) and the multiplicity of \lambda equals \dim N(A-\lambda I). b) The set of eigenvalues coincides with the set of roots of the equation

(3) \det (A-\lambda I)=0.

Proof. a) This is obvious because (2) is equivalently written as

(4) (A-\lambda I)x=0.

b) Existence of nonzero solutions of (4) is equivalent to N(A-\lambda I)\neq\{0\}, and we know that this condition is equivalent to (3).

Keep in mind that part b) of Exercise 1 implicitly assumes that both sets are of the same nature (are both subsets of either R or C).

Definition 3. (3) is called a characteristic equation and its roots - characteristic roots. Obviously, \det (A-\lambda I)=(-1)^n\det (\lambda I-A). For convenience, people define \phi(\lambda)=\det (\lambda I-A) and use \phi(\lambda)=0 to find the characteristic roots. The function \phi(\lambda) is called a characteristic polynomial.

After finding characteristic roots we can plug them in (4) to find N(A-\lambda I).

Definition 4. The set of eigenvalues of A is called the spectrum of A and denoted \sigma(A).

Remark. When convenient, we use the following notation. A as a mapping from R^n to R^n will be denoted A_R and as a mapping from C^n to C^n will be denoted A_C. We have to remember that there are two versions of statement b) from Exercise 1: b') \sigma(A_R) coincides with the set of real characteristic roots. b'') \sigma(A_C) coincides with the set of complex characteristic roots.

Digression on determinants

As much as possible I try to avoid using explicit formulas for the determinant but here we have to use one. The simplest version is this:

\det A=\sum\pm a_{1,i_1}a_{2,i_2}...a_{n,i_n}.

Perhaps a verbal description is better than this formula:

1) the determinant is a sum of products of elements of the matrix.

2) Each product contains n factors and is obtained as follows: take one element a_{1,i_1} from the first row and cross out the column it belongs to; take one element a_{2,i_2} from the second row and not from the column you have crossed out, and cross out the column it belongs to; take one element a_{3,i_3} from the third row and not from the two columns you have crossed out, and cross out the column it belongs to. Continue like that. After each step, the number of crossed-out columns increases by 1. The first factor can be chosen in n ways, the second in (n-1) ways,..., and for the last factor a_{n,i_n} there will be only one choice.

3) The sign \pm depends on the choices made and doesn't matter at this point.

Let's see what this implies for finding eigenvalues.

Exercise 2. \det (A-\lambda I) is a polynomial of power n in \lambda .

Proof. \det(A-\lambda I)=\left(\begin{array}{cccc}a_{11}-\lambda& a_{12}&...&a_{1n} \\a_{21}&a_{22}-\lambda&...&a_{2n} \\...&...&...& ... \\a_{n1}&a_{n2}&...&a_{nn}-\lambda\end{array}\right).

The verbal description given above shows that \det(A-\lambda I) contains the product of diagonal elements (a_{11}-\lambda)(a_{22}-\lambda)...(a_{nn}-\lambda ), which, in turn, contains the power \lambda^n. Because of the crossing-out procedure, other products will contain powers of \lambda lower than n, and there is no way the power \lambda^n could be cancelled out. Besides, \det(A-\lambda I) may contain only non-negative integer powers of \lambda not exceeding n. This proves the statement.

Exercise 3. In C^n any matrix A has at least one eigenvector.

Proof. Exercise 1 reduces the problem of finding eigenvalues to the problem of finding characteristic roots. By the fundamental theorem of algebra every non-constant single-variable polynomial has at least one complex root. Thus we have at least one eigenvalue and from (4) at least one eigenvector.

The reduction of the problem of finding eigenvalues to the problem of finding characteristic roots poses a problem. Even when A has real entries, the characteristic roots may be complex. Watch later how this problem is handled.

8
Jun 17

Autoregressive processes

Autoregressive processes: going from the particular to the general is the safest option. Simple observations are the foundation of any theory.

Intuition

Electricity demand in France and Britain

Figure 1. Electricity load in France and Great Britain for 2001 to 2006

If you have only one variable, what can you regress it on? Only on its own past values (future values are not available at any given moment). Figure 1 on electricity demand from a paper by J.W. Taylor illustrates this. A low value of electricity demand, say, in summer last year, will drive down its value in summer this year. Overall, we would expect the electricity demand now to depend on its values in the past 12 months. Another important observation from this example is that probably this time series is stationary.

AR(p) model

We want a definition of a class of stationary models. From this example we see that excluding the time trend increases chances of obtaining a stationary process. The idea to regress the process on its own past values is realized in

(1) y_t=\mu+\beta_1y_{t-1}+...+\beta_py_{t-p}+u_t.

Here p is some positive integer. However, both this example and the one about random walk show that some condition on the coefficients \mu,\beta_1,...,\beta_p will be required for (1) to be stationary. (1) is called an autoregressive process of order p and denoted AR(p).

Exercise 1. Repeat calculations on AR(1) process to see that in case p=1 for (1) the stability condition |\beta_1|<1 is sufficient for stationarity (that is, the coefficient \mu has no impact on stationarity).

Question. How does this stability condition generalize to AR(p)?

Characteristic polynomial

Denote L the lag operator defined by Ly_t=y_{t-1}. More generally, its powers are defined by L^ky_t=y_{t-k}. Then (1) can be rewritten as

y_t=\mu+\beta_1Ly_t+...+\beta_pL^py_t+u_t.

Whoever first did this wanted to solve the equation for y_t. Sending all terms containing y_t to the left we have

y_t-(\beta_1Ly_t+...+\beta_pL^py_t)=\mu+u_t.

The identity operator is defined by Iy_t=y_t, so y_t=Iy_t. Factoring out y_t we get

(2) (I-\beta_1L-...-\beta_pL^p)y_t=\mu+u_t.

Finally, formally solving for y_t we have

(3) y_t=(I-\beta_1L-...-\beta_pL^p)^{-1}(\mu+u_t).

Definition 1.  In I-\beta_1L-...-\beta_pL^p replace the identity by 1 and powers of the lag operator by powers of a real number x to obtain the definition of the characteristic polynomial:

(3) p(x)=1-\beta_1x-...-\beta_px^p.

p(x) is a polynomial of degree p and by the fundamental theorem of algebra has p roots.

Definition 2. We say that model (1) is stable if its characteristic polynomial (3) has roots outside the unit circle, that is, the roots are larger than 1 in absolute value.

Under this stability condition the passage from (2) to (3) can be justified. For AR(1) process this actually has been done.

Example 1. In case of a first-order process, p(x)=1-\beta_1x has one root x=1/\beta_1 which lies outside the unit circle exactly when |\beta_1|<1.

Example 2. In case of a second-order process, p(x) has two roots. If both of them are larger than 1 in absolute value, then the process is stable. The formula for the roots of a quadratic equation is well-known but stating it here wouldn't add much to what we know. Most statistical packages, including Stata, have procedures for checking stability.

Remark. Hamilton uses a different definition of the characteristic polynomial (linked to vector autoregressions), that's why in his definition the roots of the characteristic equation should lie inside the unit circle.