Suppose we are observing two stocks and their respective returns are To take into account their interdependence, we consider a vector autoregression
Try to repeat for this system the analysis from Section 3.5 (Application to an AR(1) process) of the Guide by A. Patton and you will see that the difficulties are insurmountable. However, matrix algebra allows one to overcome them, with proper adjustment.
A) Write this system in a vector format
What should be in this representation?
B) Assume that the error in (1) satisfies
(3) for with some symmetric matrix
What does this assumption mean in terms of the components of from (2)? What is if the errors in (1) satisfy
(4) for for all
C) Suppose (1) is stationary. The stationarity condition is expressed in terms of eigenvalues of but we don't need it. However, we need its implication:
A) It takes some practice to see that with the notation
the system (1) becomes (2).
B) The equations in (3) look like this:
Equalities of matrices are understood element-wise, so we get a series of scalar equations for
Conversely, the scalar equations from (4) give
C) (2) implies or by stationarity or Hence (5) implies
D) From (2) we see that depends only on (information set at time ). Therefore by the LIE
Suppose we are observing two stocks and their respective returns are A vector autoregression for the pair is one way to take into account their interdependence. This theory is undeservedly omitted from the Guide by A. Patton.
Matrix multiplication is a little more complex. Make sure to read Global idea 2 and the compatibility rule.
The general approach to study matrices is to compare them to numbers. Here you see the first big No: matrices do not commute, that is, in general
The idea behind matrix inversion is pretty simple: we want an analog of the property that holds for numbers.
Some facts about determinants have very complicated proofs and it is best to stay away from them. But a couple of ideas should be clear from the very beginning. Determinants are defined only for square matrices. The relationship of determinants to matrix invertibility explains the role of determinants. If is square, it is invertible if and only if (this is an equivalent of the condition for numbers).
Here is an illustration of how determinants are used. Suppose we need to solve the equation for where and are known. Assuming that we can premultiply the equation by to obtain (Because of lack of commutativity, we need to keep the order of the factors). Using intuitive properties and we obtain the solution: In particular, we see that if then the equation has a unique solution
Let be a square matrix and let be two vectors. are assumed to be known and is unknown. We want to check that solves the equation (Note that for this equation the trick used to solve does not work.) Just plug
(write out a couple of first terms in the sums if summation signs frighten you).
Transposition is a geometrically simple operation. We need only the property
Variance and covariance
Property 1. Variance of a random vector and covariance of two random vectors are defined by
Note that when variance becomes
Property 2. Let be random vectors and suppose are constant matrices. We want an analog of In the next calculation we have to remember that the multiplication order cannot be changed.
Distribution of the estimator of the error variance
If you are reading the book by Dougherty: this post is about the distribution of the estimator defined in Chapter 3.
where the deterministic matrix is of size satisfies (regressors are not collinear) and the error satisfies
is estimated by Denote Using (1) we see that and the residual is estimated by
is a projector and has properties which are derived from those of
If is an eigenvalue of then multiplying by and using the fact that we get Hence eigenvalues of can be only or The equation
tells us that the number of eigenvalues equal to 1 is and the remaining are zeros. Let be the diagonal representation of Here is an orthogonal matrix,
and is a diagonal matrix with eigenvalues of on the main diagonal. We can assume that the first numbers on the diagonal of are ones and the others are zeros.
Theorem. Let be normal. 1) is distributed as 2) The estimators and are independent.
Proof. 1) We have by (4)
Denote From (2) and (5)
and is normal as a linear transformation of a normal vector. It follows that where is a standard normal vector with independent standard normal coordinates Hence, (6) implies
(3) and (7) prove the first statement.
2) First we note that the vectors are independent. Since they are normal, their independence follows from
It's easy to see that This allows us to show that is a function of :
Independence of leads to independence of their functions and
We have derived the density of the chi-squared variable with one degree of freedom, see also Example 3.52, J. Abdey, Guide ST2133.
For with independent standard normals we can write where the chi-squared variables on the right are independent and all have one degree of freedom. This is because deterministic (here quadratic) functions of independent variables are independent.
The number of visits to my website has exceeded 206,000. This number depends on what counts as a visit. An external counter, visible to everyone, writes cookies to the reader's computer and counts many visits from one reader as one. The number of individual readers has reached 23,000. The external counter does not give any more statistics. I will give all the numbers from the internal counter, which is visible only to the site owner.
I have a high percentage of complex content. After reading one post, the reader finds that the answer he is looking for depends on the preliminary material. He starts digging it and then has to go deeper and deeper. Hence the number 206,000, that is, one reader visits the site on average 9 times on different days. Sometimes a visitor from one post goes to another by link on the same day. Hence another figure: 310,000 readings.
I originally wrote simple things about basic statistics. Then I began to write accompanying materials for each advanced course that I taught at Kazakh-British Technical University (KBTU). The shift in the number and level of readership shows that people need deep knowledge, not bait for one-day moths.
For example, my simple post on basic statistics was read 2,300 times. In comparison, the more complex post on the Cobb-Douglas function has been read 7,100 times. This function is widely used in economics to model consumer preferences (utility function) and producer capabilities (production function). In all textbooks it is taught using two-dimensional graphs, as P. Samuelson proposed 85 years ago. In fact, two-dimensional graphs are obtained by projection of a three-dimensional graph, which I show, making everything clear and obvious.
The answer to one of the University of London (UoL) exam problems attracted 14,300 readers. It is so complicated that I split the answer into two parts, and there are links to additional material. On the UoL exam, students have to solve this problem in 20-30 minutes, which even I would not be able to do.
Why my site is unique
My site is unique in several ways. Firstly, I tell the truth about the AP Statistics books. This is a basic statistics course for those who need to interpret tables, graphs and simple statistics. If you have a head on your shoulders, and not a Google search engine, all you need to do is read a small book and look at the solutions. I praise one such book in my reviews. You don't need to attend a two-semester course and read an 800-page book. Moreover, one doesn't need 140 high-quality color photographs that have nothing to do with science and double the price of a book.
Many AP Statistics consumers (that's right, consumers, not students) believe that learning should be fun. Such people are attracted by a book with anecdotes that have no relation to statistics or the life of scientists. In the West, everyone depends on each other, and therefore all the reviews are written in a superlative degree and streamlined. Thank God, I do not depend on the Western labor market, and therefore I tell the truth. Part of my criticism, including the statistics textbook selected for the program "100 Textbooks" of the Ministry of Education and Science of Kazakhstan (MES), is on Facebook.
Secondly, I have the world's only online, free, complete matrix algebra tutorial with all the proofs. Free courses on Udemy, Coursera and edX are not far from AP Statistics in terms of level. Courses at MIT and Khan Academy are also simpler than mine, but have the advantage of being given in video format.
The third distinctive feature is that I help UoL students. It is a huge organization spanning 17 universities and colleges in the UK and with many branches in other parts of the world. The Economics program was developed by the London School of Economics (LSE), one of the world's leading universities.
The problem with LSE courses is that they are very difficult. After the exams, LSE puts out short recommendations on the Internet for solving problems like: here you need to use such and such a theory and such and such an idea. Complete solutions are not given for two reasons: they do not want to help future examinees and sometimes their problems or solutions contain errors (who does not make errors?). But they also delete short recommendations after a year. My site is the only place in the world where there are complete solutions to the most difficult problems of the last few years. It is not for nothing that the solution to one problem noted above attracted 14,000 visits.
The average number of visits is about 100 per day. When it's time for students to take exams, it jumps to 1-2 thousand. The total amount of materials created in 5 years is equivalent to 5 textbooks. It takes from 2 hours to one day to create one post, depending on the level. After I published this analysis of the site traffic on Facebook, my colleague Nurlan Abiev decided to write posts for the site. I pay for the domain myself, $186 per year. It would be nice to make the site accessible to students and schoolchildren of Kazakhstan, but I don't have time to translate from English.
Once I was looking at the requirements of the MES for approval of electronic textbooks. They want several copies of printouts of all (!) materials and a solid payment for the examination of the site. As a result, all my efforts to create and maintain the site so far have been a personal initiative that does not have any support from the MES and its Committee on Science.
In case of many dimensions, we follow the same idea. Before doing that we state without proofs two useful facts about independence of random variables (real-valued, not vectors).
Theorem 1. Suppose variables have densities Then they are independent if and only if their joint density is a product of individual densities:
Theorem 2. If variables are normal, then they are independent if and only if they are uncorrelated:
The necessity part (independence implies uncorrelatedness) is trivial.
Let be independent standard normal variables. A standard normal variable is defined by its density, so all of have the same density. We achieve independence, according to Theorem 1, by defining their joint density to be a product of individual densities.
Definition 1. A standard normal vector of dimension is defined by
Definition 2. For a matrix and vector of compatible dimensions a normal vector is defined by
(recall that variance of a vector is always nonnegative).
Distributions derived from normal variables
In the definitions of standard distributions (chi square, t distribution and F distribution) there is no reference to any sample data. Unlike statistics, which by definition are functions of sample data, these and other standard distributions are theoretical constructs. Statistics are developed in such a way as to have a distribution equal or asymptotically equal to one of standard distributions. This allows practitioners to use tables developed for standard distributions.
Exercise 1. Prove that converges to 1 in probability.
Proof. For a standard normal we have and (both properties can be verified in Mathematica). Hence, and
(a) the regressors are assumed deterministic, (b) the number of regressors is smaller than the number of observations (c) the regressors are linearly independent, and (d) the errors are homoscedastic and uncorrelated,
Usually students remember that should be estimated and don't pay attention to estimation of Partly this is because does not appear in the regression and partly because the result on estimation of error variance is more complex than the result on the OLS estimator of
Definition 1. Let be the OLS estimator of . is called the fitted value and is called the residual.
Exercise 1. Using the projectors and show that and
Proof. The first equation is obvious. From the model we have Since we have further
Problem statement. A vector (the dependent vector) and vectors (independent vectors or regressors) are given. The OLS estimator is defined as that vector which minimizes the total sum of squares
Denoting we see that and that finding the OLS estimator means approximating with vectors from the image should be linearly independent, otherwise the solution will not be unique.
Assumption. are linearly independent. This, in particular, implies that
Exercise 2. Show that the OLS estimator is
Proof. By Exercise 1 we can use Since belongs to the image of doesn't change it: Denoting also we have
(by Exercise 1)
This shows that is a lower bound for This lower bound is achieved when the second term is made zero. From
we see that the second term is zero if satisfies (2).
Usually the above derivation is applied to the dependent vector of the form where is a random vector with mean zero. But it holds without this assumption. See also simplified derivation of the OLS estimator.
Regarding the true state of nature we assume two mutually exclusive possibilities: the null hypothesis (like the suspect is guilty) and alternative hypothesis (the suspect is innocent). It's up to us what to call the null and what to call the alternative. However, the statistical procedures are not symmetric: it's easier to measure the probability of rejecting the null when it is true than other involved probabilities. This is why what is desirable to prove is usually designated as the alternative.
Usually in books you can see the following table.
Fail to reject null
State of nature
Null is true
Type I error
Null is false
Type II error
This table is not good enough because there is no link to probabilities. The next video does fill in the blanks.
This will be a simple post explaining the common observation that "in Economics, variability of many variables is proportional to those variables". Make sure to review the assumptions; they tend to slip from memory. We consider the simple regression
One of classical assumptions is
Homoscedasticity. All errors have the same variances: for all .
We discuss its opposite, which is
Heteroscedasticity. Not all errors have the same variance. It would be wrong to write it as for all (which means that all errors have variance different from ). You can write that not all are the same but it's better to use the verbal definition.
Remark about Video 1. The dashed lines can represent mean consumption. Then the fact that variation of a variable grows with its level becomes more obvious.
Video 1. Case for heteroscedasticity
Figure 1. Illustration from Dougherty: as x increases, variance of the error term increases
Homoscedasticity was used in the derivation of the OLS estimator variance; under heteroscedasticity that expression is no longer valid. There are other implications, which will be discussed later.
Companies example. The Samsung Galaxy Note 7 battery fires and explosions that caused two recalls cost the smartphone maker at least $5 billion. There is no way a small company could have such losses.
GDP example. The error in measuring US GDP is on the order of $200 bln, which is comparable to the Kazakhstan GDP. However, the standard deviation of the ratio error/GDP seems to be about the same across countries, if the underground economy is not too big. Often the assumption that the standard deviation of the regression error is proportional to one of regressors is plausible.
To see if the regression error is heteroscedastic, you can look at the graph of the residuals or use statistical tests.