28
Dec 21

## Chi-squared distribution

This post is intended to close a gap in J. Abdey's guide ST2133, which is absence of distributions widely used in Econometrics.

### Chi-squared with one degree of freedom

Let $X$ be a random variable and let $Y=X^{2}.$

Question 1. What is the link between the distribution functions of $Y$ and $X?$

Chart 1. Inverting a square function

The start is simple: just follow the definitions. $F_{Y}\left( y\right)=P\left( Y\leq y\right) =P\left( X^{2}\leq y\right) .$ Assuming that $y>0$, on Chart 1 we see that $\left\{ x:x^{2}\leq y\right\} =\left\{x: -\sqrt{y}\leq x\leq \sqrt{y}\right\} .$ Hence, using additivity of probability,

(1) $F_{Y}\left( y\right) =P\left( -\sqrt{y}\leq X\leq \sqrt{y}\right)=P\left( X\leq \sqrt{y}\right) -P\left( X<-\sqrt{y}\right)$

$=F_{X}\left( \sqrt{y}\right) -F_{X}\left( -\sqrt{y}\right) .$

The last transition is based on the assumption that $P\left( X for all $x$, which is maintained for continuous random variables throughout the guide by Abdey.

Question 2. What is the link between the densities of $X$ and $Y=X^{2}?$ By the Leibniz integral rule (1) implies

(2) $f_{Y}\left( y\right) =f_{X}\left( \sqrt{y}\right) \frac{1}{2\sqrt{y}}+f_{X}\left( -\sqrt{y}\right) \frac{1}{2\sqrt{y}}.$

Exercise. Assuming that $g$ is an increasing differentiable function with the inverse $h$ and $Y=g(X)$ answer questions similar to 1 and 2.

See the definition of $\chi _{1}^{2}.$ Just applying (2) to $X=z$ and $Y=z^{2}=\chi _{1}^{2}$ we get

$f_{\chi _{1}^{2}}\left( y\right) =\frac{1}{\sqrt{2\pi }}e^{-y/2}\frac{1}{2\sqrt{y}}+\frac{1}{\sqrt{2\pi }}e^{-y/2}\frac{1}{2\sqrt{y}}=\frac{1}{\sqrt{2\pi }}y^{1/2-1}e^{-y/2},\ y>0.$

Since $\Gamma \left( 1/2\right) =\sqrt{\pi },$ the procedure for identifying the gamma distribution gives

$f_{\chi _{1}^{2}}\left( x\right) =\frac{1}{\Gamma \left( 1/2\right) }\left(1/2\right) ^{1/2}x^{1/2-1}e^{-x/2}=f_{1/2,1/2}\left( x\right) .$

We have derived the density of the chi-squared variable with one degree of freedom, see also Example 3.52, J. Abdey, Guide ST2133.

### General chi-squared

For $\chi _{n}^{2}=z_{1}^{2}+...+z_{n}^{2}$ with independent standard normals $z_{1},...,z_{n}$ we can write $\chi _{n}^{2}=\chi _{1}^{2}+...+\chi _{1}^{2}$ where the chi-squared variables on the right are independent and all have one degree of freedom. This is because deterministic (here quadratic) functions of independent variables are independent.

Recall that the gamma density is closed under convolutions with the same $\alpha .$ Then by the convolution theorem we get

$f_{\chi _{n}^{2}}=f_{\chi _{1}^{2}} \ast ... \ast f_{\chi _{1}^{2}}=f_{1/2,1/2} \ast ... \ast f_{1/2,1/2}$ $=f_{1/2,n/2}=\frac{1}{\Gamma \left( n/2\right) 2^{n/2}}x^{n/2-1}e^{-x/2}.$

29
Sep 16

## Definitions of chi-square, t statistic and F statistic

Definitions of the standard normal distribution and independence can be combined to produce definitions of chi-square, t statistic and F statistic. The similarity of the definitions makes them easier to study.

### Independence of continuous random variables

Definition of independent discrete random variables easily modifies for the continuous case. Let $X,Y$ be two continuous random variables with densities $p_X,\ p_Y$, respectively. We say that these variables are independent if the density $p_{X,Y}$ of the pair $(X,Y)$ is a product of individual densities:

(1) $p_{X,Y}(s,t)=p_X(s)p_Y(t)$ for all $s,t.$

As in this post, equation (1) can be understood in two ways. If (1) is given, then $X,Y$ are independent. Conversely, we if want them to be independent, we can define the density of the pair by equation (1). This definition readily generalizes for the case of many variables. In particular, if we want variables $z_1,...,z_n$ to be standard normal and independent, we say that each of them has density defined here and the joint density $p_{z_1,...,z_n}$ is a product of individual densities.

### Definition of chi-square variable

Figure 1. chi-square with 1 degree of freedom

Let $z_1,...,z_n$ be standard normal and independent. Then the variable $\chi^2_n=z_1^2+...+z_n^2$ is called a chi-square variable with $n$ degrees of freedom. Obviously, $\chi^2_n\ge 0$, which means that its density is zero to the left of the origin. For low values of degrees of freedom, the density is not bounded near the origin, see Figure 1.

### Definition of t distribution

Figure 2. t distribution and standard normal compared

Let $z_0,z_1,...,z_n$ be standard normal and independent. Then the variable $t_n=\frac{z_0}{\sqrt{(z_1^2+...+z_n^2)/n}}$ is called a t distribution with $n$ degrees of freedom. The density of the t distribution is bell-shaped and for low $n$ has fatter tails than the standard normal. For high $n$, it approaches that of the standard normal, see Figure 2.

### Definition of F distribution

Figure 3. F distribution with (1,m) degrees of freedom

Let $u_1,...,u_n,v_1,...,v_m$ be standard normal and independent. Then the variable $F_{n,m}=\frac{(u_1^2+...+u_n^2)/n}{(v_1^2+...+v_m^2)/m}$ is called an F distribution with $(n,m)$ degrees of freedom. It is nonnegative and its density is zero to the left of the origin. When $n$ is low, the density is not bounded in the neighborhood of zero, see Figure 3.

The Mathematica file and video illustrate better the densities of these three variables.

### Consequences

1. If $\chi^2_n$ and $\chi^2_m$ are independent, then $\chi^2_n+\chi^2_m$ is $\chi^2_{n+m}$ (addition rule). This rule is applied in the theory of ANOVA models.
2. $t_n^2=F_{1,n}$. This is an easy proof of equation (2.71) from Introduction to Econometrics, by Christopher Dougherty, published by Oxford University Press, UK, in 2016.

22
Dec 18

## Application: distribution of sigma squared estimator

For the formulation of multiple regression and classical conditions on its elements see Application: estimating sigma squared. There we proved unbiasedness of the OLS estimator of $\sigma^2.$ Here we do more: we characterize its distribution and obtain unbiasedness as a corollary.

### Preliminaries

We need a summary of what we know about the residual $r=y-\hat{y}$ and the projector $Q=I-P$ where $P=X^T(X^TX)^{-1}X^T:$

(1) $\Vert r\Vert^2=e^TQe.$

$P$ has $k$ unities and $n-k$ zeros on the diagonal of its diagonal representation, where $k$ is the number of regressors. With $Q$ it's the opposite: it has $n-k$ unities and $k$ zeros on the diagonal of its diagonal representation. We can always assume that the unities come first, so in the diagonal representation

(2) $Q=UDU^{-1}$

the matrix $U$ is orthogonal and $D$ can be written as

(3) $D=\left(\begin{array}{cc}I_{n-k}&0\\0&0\end{array}\right)$

where $I_{n-k}$ is an identity matrix and the zeros are zero matrices of compatible dimensions.

### Characterization of the distribution of $s^2$

Exercise 1. Suppose the error vector $e$ is normal: $e\sim N(0,\sigma^2I).$ Prove that the vector $\delta =U^{-1}e/\sigma$ is standard normal.

Proof. By the properties of orthogonal matrices

$Var(\delta)=E\delta\delta^T=U^{-1}Eee^TU/\sigma^2=U^{-1}U=I.$

This, together with the equation $E\delta =0$, proves that $\delta$ is standard normal.

Exercise 2. Prove that $\Vert r\Vert^2/\sigma^2$ is distributed as $\chi _{n-k}^2.$

Proof. From (1) and (2) we have

$\Vert r\Vert^2/\sigma^2=e^TUDU^{-1}e/\sigma^2=(U^{-1}e)^TD(U^{-1}e)/\sigma^2=\delta^TD\delta.$

Now (3) shows that $\Vert r\Vert^2/\sigma^2=\sum_{i=1}^{n-k}\delta_i^2$ which is the definition of $\chi _{n-k}^2.$

Exercise 3. Find the mean and variance of $s^2=\Vert r\Vert^2/(n-k)=\sigma^2\chi _{n-k}^2/(n-k).$

Solution. From Exercise 2 we obtain the result proved earlier in a different way:

$Es^2=\sigma^2E\chi _{n-k}^2/(n-k)=\sigma^2.$

Further, using the variance of a standard normal

$Var(s^2)=\frac{\sigma^4}{(n-k)^2}\sum_{i=1}^{n-k}Var(\delta_i^2)=\frac{2\sigma^4}{n-k}.$

18
Nov 18

## Application: Ordinary Least Squares estimator

### Generalized Pythagoras theorem

Exercise 1. Let $P$ be a projector and denote $Q=I-P.$ Then $\Vert x\Vert^2=\Vert Px\Vert^2+\Vert Qx\Vert^2.$

Proof. By the scalar product properties

$\Vert x\Vert^2=\Vert Px+Qx\Vert^2=\Vert Px\Vert^2+2(Px)\cdot (Qx)+\Vert Qx\Vert^2.$

$P$ is symmetric and idempotent, so

$(Px)\cdot (Qx)=(Px)\cdot[(I-P)x]=x\cdot[(P-P^2)x]=0.$

This proves the statement.

### Ordinary Least Squares (OLS) estimator derivation

Problem statement. A vector $y\in R^n$ (the dependent vector) and vectors $x^{(1)},...,x^{(k)}\in R^n$ (independent vectors or regressors) are given. The OLS estimator is defined as that vector $\beta \in R^k$ which minimizes the total sum of squares $TSS=\sum_{i=1}^n(y_i-x^{(1)}\beta_1-...-x^{(k)}\beta_k)^2.$

Denoting $X=(x^{(1)},...,x^{(k)}),$ we see that $TSS=\Vert y-X\beta\Vert^2$ and that finding the OLS estimator means approximating $y$ with vectors from the image $\text{Img}X.$ $x^{(1)},...,x^{(k)}$ should be linearly independent, otherwise the solution will not be unique.

Assumption. $x^{(1)},...,x^{(k)}$ are linearly independent. This, in particular, implies that $k\leq n.$

Exercise 2. Show that the OLS estimator is

(2) $\hat{\beta}=(X^TX)^{-1}X^Ty.$

Proof. By Exercise 1 we can use $P=X(X^TX)^{-1}X^T.$ Since $X\beta$ belongs to the image of $P,$ $P$ doesn't change it: $X\beta=PX\beta.$ Denoting also $Q=I-P$ we have

$\Vert y-X\beta\Vert^2=\Vert y-Py+Py-X\beta\Vert^2$

$=\Vert Qy+P(y-X\beta)\Vert^2$ (by Exercise 1)

$=\Vert Qy\Vert^2+\Vert P(y-X\beta)\Vert^2.$

This shows that $\Vert Qy\Vert^2$ is a lower bound for $\Vert y-X\beta\Vert^2.$ This lower bound is achieved when the second term is made zero. From

$P(y-X\beta)=Py-X\beta =X(X^TX)^{-1}X^Ty-X\beta=X[(X^TX)^{-1}X^Ty-\beta]$

we see that the second term is zero if $\beta$ satisfies (2).

Usually the above derivation is applied to the dependent vector of the form $y=X\beta+e$ where $e$ is a random vector with mean zero. But it holds without this assumption. See also simplified derivation of the OLS estimator.

10
Dec 18

## Distributions derived from normal variables

In the one-dimensional case the economic way to define normal variables is this: define a standard normal variable and then a general normal variable as its linear transformation.

In case of many dimensions, we follow the same idea. Before doing that we state without proofs two useful facts about independence of random variables (real-valued, not vectors).

Theorem 1. Suppose variables $X_1,...,X_n$ have densities $p_1(x_1),...,p_n(x_n).$ Then they are independent if and only if their joint density $p(x_1,...,x_n)$ is a product of individual densities: $p(x_1,...,x_n)=p_1(x_1)...p_n(x_n).$

Theorem 2. If variables $X,Y$ are normal, then they are independent if and only if they are uncorrelated: $cov(X,Y)=0.$

The necessity part (independence implies uncorrelatedness) is trivial.

### Normal vectors

Let $z_1,...,z_n$ be independent standard normal variables. A standard normal variable is defined by its density, so all of $z_i$ have the same density. We achieve independence, according to Theorem 1, by defining their joint density to be a product of individual densities.

Definition 1. A standard normal vector of dimension $n$ is defined by

$z=\left(\begin{array}{c}z_1 \\... \\z_n \\ \end{array}\right)$

Properties$Ez=0$ because all of $z_i$ have means zero. Further, $cov(z_i,z_j)=0$ for $i\neq j$by Theorem 2 and variance of a standard normal is 1. Therefore, from the expression for variance of a vector we see that $Var(z)=I.$

Definition 2. For a matrix $A$ and vector $\mu$ of compatible dimensions a normal vector is defined by $X=Az+\mu.$

Properties$EX=AEz+\mu=\mu$ and

$Var(X)=Var(Az)=E(Az)(Az)^T=AEzz^TA^T=AIA^T=AA^T$

(recall that variance of a vector is always nonnegative).

### Distributions derived from normal variables

In the definitions of standard distributions (chi square, t distribution and F distribution) there is no reference to any sample data. Unlike statistics, which by definition are functions of sample data, these and other standard distributions are theoretical constructs. Statistics are developed in such a way as to have a distribution equal or asymptotically equal to one of standard distributions. This allows practitioners to use tables developed for standard distributions.

Exercise 1. Prove that $\chi_n^2/n$ converges to 1 in probability.

Proof. For a standard normal $z$ we have $Ez^2=1$ and $Var(z^2)=2$ (both properties can be verified in Mathematica). Hence, $E\chi_n^2/n=1$ and

$Var(\chi_n^2/n)=\sum_iVar(z_i^2)/n^2=2/n\rightarrow 0.$

Now the statement follows from the simple form of the law of large numbers.

Exercise 1 implies that for large $n$ the t distribution is close to a standard normal.

14
Sep 23

## The magic of the distribution function

Let $X$ be a random variable. The function $F_{X}\left( x\right) =P\left( X\leq x\right) ,$ where $x$ runs over real numbers, is called a distribution function of $X.$ In statistics, many formulas are derived with the help of $F_{X}\left( x\right) .$ The motivation and properties are given here.

Oftentimes, working with the distribution function is an intermediate step to obtain a density $f_{X}$ using the link

$F_{X}\left( x\right) =\int_{-\infty }^{x}f_{X}\left( t\right) dt.$

A series of exercises below show just how useful the distribution function is.

Exercise 1. Let $Y$ be a linear transformation of $X,$ that is, $Y=\sigma X+\mu ,$ where $\sigma >0$ and $\mu \in R.$ Find the link between $F_{X}$ and $F_{Y}.$ Find the link between $f_{X}$ and $f_{Y}.$

The solution is here.

The more general case of a nonlinear transformation can also be handled:

Exercise 2. Let $Y=g\left( X\right)$ where $g$ is a deterministic function. Suppose that $g$ is strictly monotone and differentiable. Then $g^{-1}$ exists. Find the link between $F_{X}$ and $F_{Y}.$ Find the link between $f_{X}$ and $f_{Y}.$

Solution. The result differs depending on whether $g$ is increasing or decreasing. Let's assume the latter, so that $x_{1}\leq x_{2}$ is equivalent to $g\left( x_{1}\right) \geq g\left( x_{2}\right) .$ Also for simplicity suppose that $P\left( X=c\right) =0$ for any $c\in R.$ Then

$F_{Y}\left( y\right) =P\left( g\left( X\right) \leq y\right) =P\left( X\geq g^{-1}\left( y\right) \right) =1-P\left( X\leq g^{-1}\left( y\right) \right)=1-F_{X}\left( g^{-1}\left( y\right) \right) .$

Differentiation of this equation produces

$f_{Y}\left( y\right)=-f_{X}\left( g^{-1}\left( y\right) \right) \left( g^{-1}\left( y\right) \right) ^{\prime }=f_{X}\left( g^{-1}\left( y\right) \right) \left\vert\left( g^{-1}\left( y\right) \right) ^{\prime }\right\vert$

(the derivative of $g^{-1}$ is negative).

For an example when $g$ is not invertible see the post about the chi-squared distribution.

Exercise 3. Suppose $T=X/Y$ where $X$ and $Y$ are independent, have densities $f_{X},f_{Y}$ and $Y>0.$ What are the distribution function and density of $T?$

Solution. By independence the joint density $f_{X,Y}$ equals $f_{X}f_{Y},$ so

$F_{T}\left( t\right) =P\left( T\leq t\right) =P\left( X\leq tY\right) = \underset{x\leq ty}{\int \int }f_{X}\left( x\right) f_{Y}\left( y\right) dxdy$

(converting a double integral to an iterated integral and remembering that $f_{Y}$ is zero on the left half-axis)

$=\int_{0}^{\infty }\left( \int_{-\infty }^{ty}f_{X}\left( x\right) dx\right) f_{Y}\left( y\right)dy=\int_{0}^{\infty }F_{X}\left( ty\right) f_{Y}\left( y\right) dy.$

Now by the Leibniz integral rule

(1) $f_{T}\left( t\right) =\int_{0}^{\infty }f_{X}\left( ty\right) yf_{Y}\left( y\right) dy.$

A different method is indicated in Activity 4.11, p.207 of J.Abdey, Guide ST2133.

Exercise 4. Let $X,Y$ be two independent random variables with densities $f_{X},f_{Y}$. Find $F_{X+Y}$ and $f_{X+Y}.$

See this post.

Exercise 5. Let $X,Y$ be two independent random variables. Find $F_{\max \left\{ X_{1},X_{2}\right\} }$ and $F_{\min \left\{ X_{1},X_{2}\right\} }.$

Solution. The inequality $\max \left\{ X_{1},X_{2}\right\} \leq x$ holds if and only if both $X_{1}\leq x$ and $X_{2}\leq x$ hold. This means that the event $\left\{ \max \left\{ X_{1},X_{2}\right\} \leq x\right\}$ coincides with the event $\left\{ X_{1}\leq x\right\} \cap \left\{ X_{2}\leq x\right\}.$ It follows by independence that

(2) $F_{\max \left\{ X_{1},X_{2}\right\} }\left( x\right) =P\left( \max \left\{ X_{1},X_{2}\right\} \leq x\right) =P\left( \left\{ X_{1}\leq x\right\} \cap \left\{ X_{2}\leq x\right\} \right)$

$=P(X_{1}\leq x)P\left( X_{2}\leq x\right) =F_{X_{1}}\left( x\right) F_{X_{2}}\left( x\right) .$

For $\min \left\{ X_{1},X_{2}\right\}$ we need one more trick, namely, pass to the complementary event by writing

$F_{\min \left\{ X_{1},X_{2}\right\} }\left(x\right) =P\left( \min \left\{ X_{1},X_{2}\right\} \leq x\right) =1-P\left(\min \left\{ X_{1},X_{2}\right\} >x\right) .$

Now we can use the fact that the event $\left\{ \min \left\{ X_{1},X_{2}\right\} >x\right\}$ coincides with the event $\left\{ X_{1}>x\right\} \cap \left\{ X_{2}>x\right\} .$ Hence, by independence

(3) $F_{\min \left\{ X_{1},X_{2}\right\} }\left( x\right) =1-P\left( \left\{X_{1}>x\right\} \cap \left\{ X_{2}>x\right\} \right) =1-P\left(X_{1}>x\right) P\left( X_{2}>x\right)$

$=1-\left[ 1-P\left( X_{1}\leq x\right) \right] \left[ 1-P\left( X_{2}\leq x\right) \right] =1-\left( 1-F_{X_{1}}\left( x\right) \right) \left(1-F_{X_{2}}\left( x\right) \right) .$

Equations (2) and (3) can be differentiated to obtain the links in terms of densities.

19
Feb 22

## Distribution of the estimator of the error variance

If you are reading the book by Dougherty: this post is about the distribution of the estimator  $s^2$ defined in Chapter 3.

Consider regression

(1) $y=X\beta +e$

where the deterministic matrix $X$ is of size $n\times k,$ satisfies $\det \left( X^{T}X\right) \neq 0$ (regressors are not collinear) and the error $e$ satisfies

(2) $Ee=0,Var(e)=\sigma ^{2}I$

$\beta$ is estimated by $\hat{\beta}=(X^{T}X)^{-1}X^{T}y.$ Denote $P=X(X^{T}X)^{-1}X^{T},$ $Q=I-P.$ Using (1) we see that $\hat{\beta}=\beta +(X^{T}X)^{-1}X^{T}e$ and the residual $r\equiv y-X\hat{\beta}=Qe.$ $\sigma^{2}$ is estimated by

(3) $s^{2}=\left\Vert r\right\Vert ^{2}/\left( n-k\right) =\left\Vert Qe\right\Vert ^{2}/\left( n-k\right) .$

$Q$ is a projector and has properties which are derived from those of $P$

(4) $Q^{T}=Q,$ $Q^{2}=Q.$

If $\lambda$ is an eigenvalue of $Q,$ then multiplying $Qx=\lambda x$ by $Q$ and using the fact that $x\neq 0$ we get $\lambda ^{2}=\lambda .$ Hence eigenvalues of $Q$ can be only $0$ or $1.$ The equation $tr\left( Q\right) =n-k$
tells us that the number of eigenvalues equal to 1 is $n-k$ and the remaining $k$ are zeros. Let $Q=U\Lambda U^{T}$ be the diagonal representation of $Q.$ Here $U$ is an orthogonal matrix,

(5) $U^{T}U=I,$

and $\Lambda$ is a diagonal matrix with eigenvalues of $Q$ on the main diagonal. We can assume that the first $n-k$ numbers on the diagonal of $Q$ are ones and the others are zeros.

Theorem. Let $e$ be normal. 1) $s^{2}\left( n-k\right) /\sigma ^{2}$ is distributed as $\chi _{n-k}^{2}.$ 2) The estimators $\hat{\beta}$ and $s^{2}$ are independent.

Proof. 1) We have by (4)

(6) $\left\Vert Qe\right\Vert ^{2}=\left( Qe\right) ^{T}Qe=\left( Q^{T}Qe\right) ^{T}e=\left( Qe\right) ^{T}e=\left( U\Lambda U^{T}e\right) ^{T}e=\left( \Lambda U^{T}e\right) ^{T}U^{T}e.$

Denote $S=U^{T}e.$ From (2) and (5)

$ES=0,$ $Var\left( S\right) =EU^{T}ee^{T}U=\sigma ^{2}U^{T}U=\sigma ^{2}I$

and $S$ is normal as a linear transformation of a normal vector. It follows that $S=\sigma z$ where $z$ is a standard normal vector with independent standard normal coordinates $z_{1},...,z_{n}.$ Hence, (6) implies

(7) $\left\Vert Qe\right\Vert ^{2}=\sigma ^{2}\left( \Lambda z\right) ^{T}z=\sigma ^{2}\left( z_{1}^{2}+...+z_{n-k}^{2}\right) =\sigma ^{2}\chi _{n-k}^{2}.$

(3) and (7) prove the first statement.

2) First we note that the vectors $Pe,Qe$ are independent. Since they are normal, their independence follows from

$cov(Pe,Qe)=EPee^{T}Q^{T}=\sigma ^{2}PQ=0.$

It's easy to see that $X^{T}P=X^{T}.$ This allows us to show that $\hat{\beta}$ is a function of $Pe$:

$\hat{\beta}=\beta +(X^{T}X)^{-1}X^{T}e=\beta +(X^{T}X)^{-1}X^{T}Pe.$

Independence of $Pe,Qe$ leads to independence of their functions $\hat{\beta}$ and $s^{2}.$

## Search Results for chi square

12
Dec 21

Here I start a series of new posts to accompany Advanced Statistics ST2133 and ST 2134 by J. Abdey.

Marginal probabilities and densities

Leibniz integral rule

Sum of random variables and convolution

Analysis of problems with conditioning

Gamma function

Gamma distribution

Chi-squared distribution

Sufficiency and minimal sufficiency

Estimation of parameters of a normal distribution

Midterm Spring 2022 Analysis of the midterm I gave in Spring 2022

A problem to do once and never come back Joint, marginal, conditional densities

Simple tools for combinatorial problems Solution to problem 1(b) from exam ST2133 ZA, 2019

2
Mar 20

## Statistical calculator

In my book I explained how one can use Excel to do statistical simulations and replace statistical tables commonly used in statistics courses. Here I go one step further by providing a free statistical calculator that replaces the following tables from the book by Newbold et al.:

Table 1 Cumulative Distribution Function, F(z), of the Standard Normal Distribution Table

Table 2 Probability Function of the Binomial Distribution

Table 5 Individual Poisson Probabilities

Table 7a Upper Critical Values of Chi-Square Distribution with $\nu$ Degrees of Freedom

Table 8 Upper Critical Values of Student’s t Distribution with $\nu$ Degrees of Freedom

Tables 9a, 9b Upper Critical Values of the F Distribution

The calculator is just a Google sheet with statistical functions, see Picture 1:

Picture 1. Calculator using Google sheet

## How to use Calculator

1. Open an account at gmail.com, if you haven't already. Open Google Drive.
3. Find the sheet on my Google drive and copy it to your Google drive (File/Make a copy). An icon of my calculator will appear in your drive. That's not the file, it's just a link to my file. To the right of it there are three dots indicating options. One of them is "Make a copy", so use that one. The copy will be in your drive. After that you can delete the link to my file. You might want to rename "Copy of Calculator" as "Calculator".
5. When you click a cell, you can enter what you need either in the formula bar at the bottom or directly in the cell. You can also see the functions I embedded in the sheet.
6. In cell A1, for example, you can enter any legitimate formula with numbers, arithmetic signs, and Google sheet functions. Be sure to start it with =,+ or - and to press the checkmark on the right of the formula bar after you finish.
7. The cells below A1 replace the tables listed above. Beside each function there is a verbal description and further to the right - a graphical illustration (which is not in Picture 1).
8. On the tab named Regression you can calculate the slope and intercept. The sample size must be 10.
9. Keep in mind that tables for continuous distributions need two functions. For example, in case of the standard normal distribution one function allows you to go from probability (area of the left tail) to the cutting value on the horizontal axis. The other function goes from the cutting value on the horizontal axis to probability.
10. Feel free to add new sheets or functions as you may need. You will have to do this on a tablet or computer.

2
Aug 19

## Main theorem: Jordan normal form

By Exercise 1, it is sufficient to show that in each root subspace the matrix takes the Jordan form.

Step 1. Take a basis

(1) $x_{1,p},...,x_{k,p}\in N_{\lambda }^{(p)}$ independent relative to $N_{\lambda }^{(p-1)}.$

Consecutively define

(2) $x_{1,p-1}=(A-\lambda I)x_{1,p},\ ...,\ x_{k,p-1}=(A-\lambda I)x_{k,p}\in N_{\lambda }^{(p-1)},$

...

(3) $x_{1,1}=(A-\lambda I)x_{1,2},\ ...,\ x_{k,1}=(A-\lambda I)x_{k,2}\in N_{\lambda }^{(1)}.$

Exercise 1. The vectors in (2) are linearly independent relative to $N_{\lambda }^{(p-2)},...,$ the vectors in (3) are linearly independent.

Proof. Consider (2), for example. Suppose that $\sum_{j=1}^{k}a_{j}x_{j,p-1} \in N_{\lambda }^{(p-2)}.$ Then

$0=(A-\lambda I)^{p-2}\sum_{j=1}^{k}a_{j}x_{j,p-1}=(A-\lambda I)^{p-1}\sum_{j=1}^{k}a_{j}x_{j,p}.$

The conclusion that $\sum_{j=1}^{k}a_{j}x_{j,p}\in N_{\lambda}^{(p-1)}$ contradicts assumption (1).

Exercise 2. The system of $kp$ vectors listed in (1)-(3) is linearly independent, so that its span $L_{x}$ is of dimension $kp.$

Proof. Suppose $\sum_{j=1}^{k}a_{j,p}x_{j,p}+...+\sum_{j=1}^{k}a_{j,1}x_{j,1}=0.$ Then by inclusion relations

$\sum_{j=1}^{k}a_{j,p}x_{j,p}=-\sum_{j=1}^{k}a_{j,p-1}x_{j,p-1}-...- \sum_{j=1}^{k}a_{j,1}x_{j,1}\in N_{\lambda }^{(p-1)}$

which implies $a_{j,p}=0$ for $j=1,...,k,$ by relative independence stated in (1). This process can be continued by Exercise 1 to show that all coefficients are zeros.

Next we show that in each of $N_{\lambda }^{(p)},...,N_{\lambda}^{(1)}$ we can find a basis relative to the lower indexed subspace $N_{\lambda }^{(p-1)},...,N_{\lambda }^{(0)}=\{0\}.$ According to (1), in $N_{\lambda }^{(p)}$ we already have such a basis. If the vectors in (2) constitute such a basis in $N_{\lambda }^{(p-1)}$, we consider $N_{\lambda }^{(p-2)}.$

Step 2. If not, by Exercise 3 we can find vectors

$y_{1,p-1},...,y_{l,p-1}\in N_{\lambda }^{(p-1)}$

such that

$x_{1,p-1},...,x_{k,p-1},y_{1,p-1},...,y_{l,p-1}$ represent a basis relative to $N_{\lambda }^{(p-2)}.$

Then we can define

$y_{1,p-2}=(A-\lambda I)y_{1,p-1},\ ...,\ y_{l,p-2}=(A-\lambda I)y_{l,p-1}\in N_{\lambda }^{(p-2)},$

...

$y_{1,1}=(A-\lambda I)y_{1,2},\ ...,\ y_{l,1}=(A-\lambda I)y_{l,2}\in N_{\lambda }^{(1)}.$

By Exercise 2, the $y$'s defined here are linearly independent. But we can show more:

Exercise 3. All $x$'s from Step 1 combined with the $y$'s from Step 2 are linearly independent.

The proof is analogous to that of Exercise 2.

Denote $L_{y}$ the span of vectors introduced in Step 2. $L_{x}\cap L_{y}=\{0\}$ because they have different bases. Therefore we can consider a direct sum $L_{x}\dotplus L_{y}.$ Repeating Step 2 as many times as necessary, after the last step we obtain a subspace, say, $L_{z},$ such that $N_{\lambda }^{(p)}=L_{x} \dotplus L_{y} \dotplus ... \dotplus L_{z}.$ The restrictions of $A$ onto the subspaces on the right is described by Jordan cells with the same $\lambda$ and of possibly different dimensions. We have proved the following theorem:

Theorem (Jordan form) For a matrix $A$ in $C^{n}$ one can find a basis in which $A$ can be written as a block-diagonal matrix

(1) $A=\left(\begin{array}{ccc}A_{1} & ... & ... \\... & ... & ... \\... & ... & A_{m}\end{array}\right) .$

Here $A_{i}$ are (square) Jordan cells, with possibly different lambdas on the main diagonal and of possibly different sizes, and all off-diagonal blocks are zero matrices of compatible dimensions.