28
Dec 21

## Chi-squared distribution

This post is intended to close a gap in J. Abdey's guides, which is absence of distributions widely used in Econometrics.

### Chi-squared with one degree of freedom

Let $X$ be a random variable and let $Y=X^{2}.$

Question 1. What is the link between the distribution functions of $Y$ and $X?$

Chart 1. Inverting a square function

The start is simple: just follow the definitions. $F_{Y}\left( y\right)=P\left( Y\leq y\right) =P\left( X^{2}\leq y\right) .$ Assuming that $y>0$, on Chart 1 we see that $\left\{ x:x^{2}\leq y\right\} =\left\{x: -\sqrt{y}\leq x\leq \sqrt{y}\right\} .$ Hence, using additivity of probability,

(1) $F_{Y}\left( y\right) =P\left( -\sqrt{y}\leq X\leq \sqrt{y}\right) =P\left( X\leq \sqrt{y}\right) -P\left( X<-\sqrt{y}\right)$

$=F_{X}\left( \sqrt{y}\right) -F_{X}\left( -\sqrt{y}\right) .$

The last transition is based on the assumption that $P\left( X for all $x$, which is maintained for continuous random variables throughout the guide by Abdey.

Question 2. What is the link between the densities of $X$ and $Y=X^{2}?$ By the Leibniz integral rule (1) implies

(2) $f_{Y}\left( y\right) =f_{X}\left( \sqrt{y}\right) \frac{1}{2\sqrt{y}} +f_{X}\left( -\sqrt{y}\right) \frac{1}{2\sqrt{y}}.$

Exercise. Assuming that $g$ is an increasing differentiable function with the inverse $h$ and $Y=g(X)$ answer questions similar to 1 and 2.

See the definition of $\chi _{1}^{2}.$ Just applying (2) to $X=z$ and $Y=z^{2}=\chi _{1}^{2}$ we get

$f_{\chi _{1}^{2}}\left( y\right) =\frac{1}{\sqrt{2\pi }}e^{-y/2}\frac{1}{2 \sqrt{y}}+\frac{1}{\sqrt{2\pi }}e^{-y/2}\frac{1}{2\sqrt{y}}=\frac{1}{\sqrt{ 2\pi }}y^{1/2-1}e^{-y/2},\ y>0.$

Since $\Gamma \left( 1/2\right) =\sqrt{\pi },$ the procedure for identifying the gamma distribution gives

$f_{\chi _{1}^{2}}\left( x\right) =\frac{1}{\Gamma \left( 1/2\right) }\left( 1/2\right) ^{1/2}x^{1/2-1}e^{-x/2}=f_{1/2,1/2}\left( x\right) .$

We have derived the density of the chi-squared variable with one degree of freedom, see also Example 3.52, J. Abdey, Guide ST2133.

### General chi-squared

For $\chi _{n}^{2}=z_{1}^{2}+...+z_{n}^{2}$ with independent standard normals $z_{1},...,z_{n}$ we can write $\chi _{n}^{2}=\chi _{1}^{2}+...+\chi _{1}^{2}$ where the chi-squared variables on the right are independent and all have one degree of freedom. This is because deterministic (here quadratic) functions of independent variables are independent.

Recall that the gamma density is closed under convolutions with the same $\alpha .$ Then by the convolution theorem we get

$f_{\chi _{n}^{2}}=f_{\chi _{1}^{2}}\ast ...\ast f_{\chi _{1}^{2}}=f_{1/2,1/2}\ast ...\ast f_{1/2,1/2}$ $=f_{1/2,n/2}=\frac{1}{\Gamma \left( n/2\right) 2^{n/2}}x^{n/2-1}e^{-x/2}.$
27
Dec 21

## Gamma distribution

Definition. The gamma distribution $Gamma\left( \alpha ,\nu \right)$ is a two-parametric family of densities. For $\alpha >0,\nu >0$ the density is defined by

$f_{\alpha ,\nu }\left( x\right) =\frac{1}{\Gamma \left( \nu \right) }\alpha ^{\nu }x^{\nu -1}e^{-\alpha x},\ x>0;$ $f_{\alpha ,\nu }\left( x\right) =0,\ x<0.$

Obviously, you need to know what is a gamma function. My notation of the parameters follows Feller, W. An Introduction to Probability Theory and its Applications, Volume II, 2nd edition (1971). It is different from the one used by J. Abdey.

### Property 1

It is really a density because

$\frac{1}{\Gamma \left( \nu \right) }\alpha ^{\nu }\int_{0}^{\infty }x^{\nu -1}e^{-\alpha x}dx=$ (replace $\alpha x=t$)

$=\frac{1}{\Gamma \left( \nu \right) }\alpha ^{\nu }\int_{0}^{\infty }t^{\nu -1}\alpha ^{1-\nu -1}e^{-t}dt=1.$

Suppose you see an expression $x^{a}e^{-bx}$ and need to determine which gamma density this is. The power of the exponent gives you $\alpha =b$ and the power of $x$ gives you $\nu =a+1.$ It follows that the normalizing constant should be $\frac{1}{\Gamma \left( a+1\right) }b^{a+1}$ and the density is $\frac{1}{\Gamma \left( a+1\right) }b^{a+1}x^{a}e^{-bx},$ $x>0.$

### Property 2

The most important property is that the family of gamma densities with the same $\alpha$ is closed under convolutions. Because of the associativity property $f_{X}\ast f_{Y}\ast f_{Z}=\left( f_{X}\ast f_{Y}\right) \ast f_{Z}$ it is enough to prove this for the case of two gamma densities.

First we want to prove

(1) $\left( f_{\alpha ,\mu }\ast f_{\alpha ,\nu }\right) \left( x\right) = \frac{\Gamma \left( \mu +\nu \right) }{\Gamma \left( \mu \right) \Gamma \left( \nu \right) }\int_{0}^{1}\left( 1-t\right) ^{\mu -1}t^{\nu -1}dt\times f_{\alpha ,\mu +\nu }(x).$

Start with the general definition of convolution and recall where the density vanishes:

$\left( f_{\alpha ,\mu }\ast f_{\alpha ,\nu }\right) \left( x\right) =\int_{-\infty }^{\infty }f_{\alpha ,\mu }\left( x-y\right) f_{\alpha ,\nu }\left( y\right) dy=\int_{0}^{x}f_{\alpha ,\mu }\left( x-y\right) f_{\alpha ,\nu }\left( y\right) dy$

(plug the densities and take out the constants)

$=\int_{0}^{x}\left[ \frac{1}{\Gamma \left( \mu \right) }\alpha ^{\mu }\left( x-y\right) ^{\mu -1}e^{-\alpha \left( x-y\right) }\right] \left[ \frac{1}{\Gamma \left( \nu \right) }\alpha ^{\nu }y^{\nu -1}e^{-\alpha y} \right] dy$ $=\frac{\alpha ^{\mu +\nu }e^{-\alpha x}}{\Gamma \left( \mu \right) \Gamma \left( \nu \right) }\int_{0}^{x}\left( x-y\right) ^{\mu -1}y^{\nu -1}dy$

(replace $y=xt$)

$=\frac{\Gamma \left( \mu +\nu \right) }{\Gamma \left( \mu \right) \Gamma \left( \nu \right) }\frac{\alpha ^{\mu +\nu }x^{\mu +\nu -1}e^{-\alpha x}}{ \Gamma \left( \mu +\nu \right) }\int_{0}^{1}\left( 1-t\right) ^{\mu -1}t^{\nu -1}dt$ $=\frac{\Gamma \left( \mu +\nu \right) }{\Gamma \left( \mu \right) \Gamma \left( \nu \right) }\int_{0}^{1}\left( 1-t\right) ^{\mu -1}t^{\nu -1}dt\times f_{\alpha ,\mu +\nu }\left( x\right).$

Thus (1) is true. Integrating it we have

$\int_{R}\left( f_{\alpha ,\mu }\ast f_{\alpha ,\nu }\right) \left( x\right) dx=\frac{\Gamma \left( \mu +\nu \right) }{\Gamma \left( \mu \right) \Gamma \left( \nu \right) }\int_{0}^{1}\left( 1-t\right) ^{\mu -1}t^{\nu -1}dt\times \int_{R}f_{\alpha ,\mu +\nu }\left( x\right) dx.$

We know that the convolution of two densities is a density. Therefore the last equation implies

$\frac{\Gamma \left( \mu +\nu \right) }{\Gamma \left( \mu \right) \Gamma \left( \nu \right) }\int_{0}^{1}\left( 1-t\right) ^{\mu -1}t^{\nu -1}dt=1$

and

$f_{\alpha ,\mu }\ast f_{\alpha ,\nu }=f_{\alpha ,\mu +\nu },\ \mu ,\nu >0.$

Alternative proof. The moment generating function of a sum of two independent beta distributions with the same $\alpha$ shows that this sum is again a beta distribution with the same $\alpha$, see pp. 141, 209 in the guide.

26
Dec 21

## Gamma function

### Gamma function

The gamma function and gamma distribution are two different things. This post is about the former and is a preparatory step to study the latter.

Definition. The gamma function is defined by

$\Gamma \left( t\right) =\int_{0}^{\infty }x^{t-1}e^{-x}dx,\ t> 0.$

The integrand $f(t)=x^{t-1}e^{-x}$ is smooth on $\left( 0,\infty \right) ,$ so its integrability is determined by its behavior at $\infty$ and $0$. Because of the exponent, it is integrable in the neighborhood of $\infty .$ The singularity at $0$ is integrable if $t>0.$ In all calculations involving the gamma function one should remember that its argument should be positive.

## Properties

1) Factorial-like property. Integration by parts shows that

$\Gamma \left( t\right) =-\int_{0}^{\infty }x^{t-1}\left( e^{-x}\right) ^{\prime }dx=-x^{t-1}e^{-x}|_{0}^{\infty }+\left( t-1\right) \int_{0}^{\infty }x^{t-2}e^{-x}dx$

$=\left( t-1\right) \Gamma \left( t-1\right)$ if $t>1.$

2) $\Gamma \left( 1\right) =1$ because $\int_{0}^{\infty }e^{-x}dx=1.$

3) Combining the first two properties we see that for a natural $n$

$\Gamma \left( n+1\right) =n\Gamma ( n) =...=n\times \left( n-1\right) ...\times 1\times \Gamma \left( 1\right) =n!$

Thus the gamma function extends the factorial to non-integer $t>0.$

4) $\Gamma \left( 1/2\right) =\sqrt{\pi }.$

Indeed, using the density $f_{z}$ of the standard normal $z$ we see that

$\Gamma \left( 1/2\right) =\int_{0}^{\infty }x^{-1/2}e^{-x}dx=$

(replacing $x^{1/2}=u$)

$=\int_{0}^{\infty }\frac{1}{u}e^{-u^{2}}2udu=2\int_{0}^{\infty }e^{-u^{2}}du=\int_{-\infty }^{\infty }e^{-u^{2}}du=$

(replacing $u=z/\sqrt{2}$)

$=\frac{\sqrt{\pi }}{\sqrt{2\pi }}\int_{-\infty }^{\infty }e^{-z^{2}/2}dz= \sqrt{\pi }\int_{R}f_{z}\left( t\right) dt=\sqrt{\pi }.$

Many other properties are not required in this course.

25
Dec 21

## Analysis of problems with conditioning

These problems are among the most difficult. It's important to work out a general approach to such problems. All references are to J. Abdey,  Advanced statistics: distribution theory, ST2133, University of London, 2021.

### General scheme

Step 1. Conditioning is usually suggested by the problem statement: $Y$ is conditioned on $X$.

Your life will be easier if you follow the notation used in the guide: use $p$ for probability mass functions (discrete variables) and $f$ for (probability) density functions (continuous variables).

a) If $Y|X$ and $X$ both are discrete (Example 5.1, Example 5.13, Example 5.18):

$p_{Y}\left( y\right) =\sum_{Set}p_{Y\vert X}\left( y\vert x\right) p_{X}\left( x\right) .$

b) If $Y|X$ and $X$ both are continuous (Activity 5.6):

$f_{Y}\left( y\right) =\int_{Set}f_{Y\vert X}\left( y\vert x\right) f_{X}\left( x\right) dx.$

c) If $Y|X$ is discrete, $X$ is continuous (Example 5.2, Activity 5.5):

$p_{Y}\left( y\right) =\int_{Set}p_{Y\vert X}\left( y\vert x\right) f_{X}\left( x\right) dx$

d) If $Y|X$ is continuous, $X$ is discrete (Activity 5.12):

$f_{Y}\left( y\right) =\sum_{Set}f_{Y\vert X}\left( y\vert x\right) p_{X}\left( x\right) .$

In all cases you need to figure out $Set$ over which to sum or integrate.

Step 2. Write out the conditional densities/probabilities with the same arguments

Step 3. Reduce the result to one of known distributions using the completeness
axiom.

### Example 5.1

Let $X$ denote the number of hurricanes which form in a given year, and let $Y$ denote the number of these which make landfall. Suppose each hurricane has a probability of $\pi$ making landfall independent of other hurricanes. Given the number of hurricanes $x$, then $Y$ can be thought of as the number of successes in $x$ independent and identically distributed Bernoulli trials. We can write this as $Y|X=x\sim Bin(x,\pi )$. Suppose we also have that $X\sim Pois(\lambda )$. Find the distribution of $Y$ (noting that $X\geq Y$ ).

### Solution

Step 1. The number of hurricanes $X$ takes values $0,1,2,...$ and is distributed as Poisson. The number of landfalls for a given $X=x$ is binomial with values $y=0,...,x$. It follows that $Set=\{x:x\ge y\}$.

Write the general formula for conditional probability:

$p_{Y}\left( y\right) =\sum_{x=y}^{\infty }p_{Y\vert X}\left( y\vert x\right) p_{X}\left( x\right) .$

Step 2. Specifying the distributions:

$p_{X}\left( x\right) =e^{-\mu }\frac{\mu ^{x}}{x!},$ where $x=0,1,2,...,$

and

$P\left( Bin\left( x,\pi \right) =y\right) =p_{Y\vert X}\left( y\vert x\right) =C_{x}^{y}\pi ^{y}\left( 1-\pi \right) ^{x-y}$ where $y\leq x.$

Step 3. Reduce the result to one of known distributions:

$p_{Y}\left( y\right) =\sum_{x=y}^{\infty }C_{x}^{y}\pi ^{y}\left( 1-\pi \right) ^{x-y}e^{-\mu }\frac{\mu ^{x}}{x!}$

(pull out of summation everything that does not depend on summation variable
$x$)

$=\frac{e^{-\mu }\mu ^{y}}{y!}\pi ^{y}\sum_{x=y}^{\infty }\frac{1}{\left( x-y\right) !}\left( \mu \left( 1-\pi \right) \right) ^{x-y}$

(replace $x-y=z$ to better see the structure)

$=\frac{e^{-\mu }\mu ^{y}}{y!}\pi ^{y}\sum_{z=0}^{\infty }\frac{1}{z!}\left( \mu \left( 1-\pi \right) \right) ^{z}$

(using the completeness axiom $\sum_{x=0}^{\infty }\frac{\mu ^{x}}{x!}=e^{\mu }$ for the Poisson variable)

$=\frac{e^{-\mu }}{y!}\left( \mu \pi \right) ^{y}e^{\mu \left( 1-\pi \right) }=\frac{e^{-\mu \pi }}{y!}\left( \mu \pi \right) ^{y}=p_{Pois(\mu \pi )}\left( y\right) .$

14
Dec 21

## Sum of random variables and convolution

### Link between double and iterated integrals

Why do we need this link? For simplicity consider the rectangle $A=\left\{ a\leq x\leq b,c\leq y\leq d\right\} .$ The integrals

$I_{1}=\underset{A}{\int \int }f(x,y)dydx$

and

$I_{2}=\int_{a}^{b}\left( \int_{c}^{d}f(x,y)dy\right) dx$

both are taken over the rectangle $A$ but they are not the same. $I_{1}$ is a double (two-dimensional) integral, meaning that its definition uses elementary areas, while $I_{2}$ is an iterated integral, where each of the one-dimensional integrals uses elementary segments. To make sense of this, you need to consult an advanced text in calculus. The  difference notwithstanding, in good cases their values are the same. Putting aside the question of what is a "good case", we  concentrate on geometry: how a double integral can be expressed as an iterated integral.

It is enough to understand the idea in case of an oval $A$ on the plane. Let $y=l\left( x\right)$ be the function that describes the lower boundary of the oval and let $y=u\left( x\right)$ be the function that describes the upper part. Further, let the vertical lines $x=m$ and $x=M$ be the minimum and maximum values of $x$ in the oval (see Chart 1).

Chart 1. The boundary of the oval above the green line is described by u(x) and below - by l(x)

We can paint the oval with strokes along red lines from $y=l\left( x\right)$ to $y=u\left(x\right) .$ If we do this for all $x\in \left[ m,M\right] ,$ we'll have painted the whole oval. This corresponds to the representation of $A$ as the union of segments $\left\{ y:l\left( x\right) \leq y\leq u\left( x\right) \right\}$ with $x\in \left[ m,M\right] :$

$A=\bigcup\limits_{m\leq x\leq M}\left\{ y:l\left( x\right) \leq y\leq u\left( x\right) \right\}$

and to the equality of integrals

(double integral)$\underset{A}{\int \int }f(s,t)dsdt=\int_{m}^{M}\left( \int_{l\left( x\right) }^{u(x)}f(x,y)dy\right) dx$ (iterated integral)

### Density of a sum of two variables

Assumption 1 Suppose the random vector $\left( X,Y\right)$ has a density $f_{X,Y}$ and define $Z=X+Y$ (unlike the convolution theorem below, here $X,Y$ don't have to be independent).

From the definitions of the distribution function $F_{Z}\left( z\right)=P\left( Z\leq z\right)$ and probability

$P\left( A\right) =\underset{A}{\int \int }f_{X,Y}(x,y)dxdy$

we have

$F_{Z}\left( z\right) =P\left( Z\leq z\right) =P\left( X+Y\leq z\right) = \underset{x+y\leq z}{\int \int }f_{X,Y}(x,y)dxdy.$

The integral on the right is a double integral. The painting analogy (see Chart 2)

Chart 2. Integration for sum of two variables

suggests that

$\left\{ (x,y)\in R^{2}:x+y\leq z\right\} =\bigcup\limits_{-\infty

Hence,

$\int_{-\infty }^{z}f_{Z}\left( z\right) dz=F_{Z}\left( z\right) =\int_{R}\left( \int_{-\infty }^{z-x}f_{X,Y}(x,y)dy\right) dx.$

Differentiating both sides with respect to $z$ we get

$f_{Z}\left( z\right) =\int_{R}f_{X,Y}(x,z-x)dx.$

If we start with the inner integral that is with respect to $x$ and the outer integral $-$ with respect to $y,$ then similarly

$f_{Z}\left( z\right) =\int_{R}f_{X,Y}(z-y,y)dy.$

Exercise. Suppose the random vector $\left( X,Y\right)$ has a density $f_{X,Y}$ and define $Z=X-Y.$ Find $f_{Z}.$ Hint: review my post on Leibniz integral rule.

### Convolution theorem

In addition to Assumption 1, let $X,Y$ be independent. Then $f_{X,Y}(x,y)=f_{X}(x)f_{Y}\left( y\right)$ and the above formula gives

$f_{Z}\left( z\right) =\int_{R}f_{X}(x)f_{Y}\left( z-x\right) dx.$

This is denoted as $\left( f_{X}\ast f_{Y}\right) \left( z\right)$ and called a convolution.

The following may help to understand this formula. The function $g(x)=f_{Y}\left( -x\right)$ is a density (it is non-negative and integrates to 1). Its graph is a mirror image of that of $f_{Y}$ with respect to the vertical axis. The function $h_{z}(x)=f_{Y}\left( z-x\right)$ is a shift of $g$ by $z$ along the horizontal axis. For fixed $z,$ it is also a density. Thus in the definition of convolution we integrate the product of two densities $f_{X}(x)f_{Y}\left( z-x\right) .$ Further, to understand the asymptotic behavior of $\left( f_{X}\ast f_{Y}\right) \left( z\right)$ when $\left\vert z\right\vert \rightarrow \infty$ imagine two bell-shaped densities $f_{X}(x)$ and $f_{Y}\left( z-x\right) .$ When $z$ goes to, say, infinity, the humps of those densities are spread apart more and more. The hump of one of them gets multiplied by small values of the other. That's why $\left(f_{X}\ast f_{Y}\right) \left( z\right)$ goes to zero, in a certain sense.

The convolution of two densities is always a density because it is non-negative and integrates to one:

$\int_{R}f_{Z}\left( z\right) dz=\int_{R}\left( \int_{R}f_{X}(x)f_{Y}\left( z-x\right) dx\right) dz=\int_{R}f_{X}(x)\left( \int_{R}f_{Y}\left( z-x\right) dz\right) dx$

Replacing $z-x=y$ in the inner integral we see that this is

$\int_{R}f_{X}(x)dx\int_{R}f_{Y}\left( y\right) dy=1.$
12
Dec 21

## Leibniz integral rule

This rule is about differentiating an integral that has a parameter in three places: the lower and upper limits of integration and in the integrand.

Let $l(z),u(z)$ be the lower and upper limits of integration and

$\Phi \left( z\right) =\int_{l\left( z\right) }^{u\left( z\right) }f\left( z,t\right) dt.$

Then

$\Phi ^{\prime }\left( z\right) =f\left( z,u\left( z\right) \right) u^{\prime }\left( z\right) -f\left( z,l\left( z\right) \right) l^{\prime }\left( z\right) +\int_{l\left( z\right) }^{u\left( z\right) }\frac{\partial f\left( z,t\right) }{\partial z}dt.$

## Special cases

In fact, these special cases allow one to see how the Leibniz rule is obtained.

### Case 1

$l(z)=a,\ u\left( z\right) =z,\ f$ depends only on $t.$ Then

$\frac{d}{dz}\int_{a}^{z}f\left( t\right) dt=f\left( z\right)$

(the upper limit goes into the argument of $f$).

Chart 1. Slope of tangent (violet) is limit of slopes of secants (green). Source http://faculty.wlc.edu/buelow/CALC/nt2-6.html

Intuition. By definition, $f^{\prime }\left( z\right)$ is the limit of $\frac{f\left( z+h\right) -f\left( z\right) }{h}$ when $h\rightarrow 0.$ (From Chart 1, in which $a=z$, it is seen that this ratio is almost a slope at point $z$.) Hence,

$\frac{d}{dz}\int_{a}^{z}f\left( t\right) dt$

is the limit of

$\left[ \int_{a}^{z+h}f\left( t\right) dt-\int_{a}^{z}f\left( t\right) dt \right] /h=\int_{z}^{z+h}f\left( t\right) dt/h$ $\approx f\left( z\right) \int_{z}^{z+h}dt/h=f\left( z\right) \left[ z+h-z\right] /h=f\left( z\right) .$

When $h\rightarrow 0,$ this approximation becomes better and better and in the limit

$\frac{d}{dz}\int_{a}^{z}f\left( t\right) dt=f\left( z\right) .$

### Case 2

$l(z)=z,$ $u\left( z\right) =b,$ $f$ depends only on $t.$

Then

$\frac{d}{dz}\int_{z}^{b}f\left( t\right) dt=-f\left( z\right)$

(the sign is opposite to Case 1).

Intuition. One of the properties of integral is that $\int_{z}^{b}f\left( t\right) dt=-\int_{b}^{z}f\left( t\right) dt.$

To the last integral we can apply the intuition for Case 1.

### Case 3

$l\left( z\right) =a,\ u\left( z\right) =b,\ f$ depends on $(z,t)$.

Then

$\frac{d}{dz}\int_{a}^{b}f\left( z,t\right) dt=\int_{a}^{b}\frac{\partial f\left( z,t\right) }{\partial z}dt$

(only the integrand is differentiated).

Intuition.

$\left[ \int_{a}^{b}f\left( z+h,t\right) dt-\int_{a}^{b}f\left( z,t\right) dt \right] /h=\int_{a}^{b}\frac{f\left( z+h,t\right) -f\left( z,t\right) }{h}dt$

which leads to differentiation under the integral sign.

### Putting it all together

In the general case we can denote

$\Psi \left( a,b,z\right) =\int_{a}^{b}f\left( z,t\right) dt$

so that

$\Phi \left( z\right) =\int_{l\left( z\right) }^{u\left( z\right) }f\left( z,t\right) dt=\Psi \left( l\left( z\right) ,u\left( z\right) ,z\right) .$

By the chain rule

$\Phi ^{\prime }\left( z\right) =\frac{\partial \Psi \left( l\left( z\right) ,u\left( z\right) ,z\right) }{\partial a}l^{\prime }\left( z\right) +\frac{ \partial \Psi \left( l\left( z\right) ,u\left( z\right) ,z\right) }{\partial b}u^{\prime }\left( z\right) +\frac{\partial \Psi \left( l\left( z\right) ,u\left( z\right) ,z\right) }{\partial z}$

giving us the general Leibniz rule.

Note that Case 1 implies

$f_{X}\left( x\right) =\frac{d}{dx}\int_{-\infty}^{x}f(t)dt=F_{X}^{\prime }\left( x\right)$

(the density is found from the distribution function).

20
Apr 21

This post parallels the one about the call debit spread. A combination of several options in one trade is called a strategy. Here we discuss a strategy called a put debit spread. The word "debit" in this name means that a trader has to pay for it. The rule of thumb is that if it is a debit (you pay for a strategy), then it is less risky than if it is a credit (you are paid). Let $p(K)$ denote the price of the put with the strike $K,$ suppressing all other variables that influence the put price.

Assumption. The market values higher events of higher probability. This is true if investors are rational and the market correctly reconciles views of different investors.

We need the following property: if $K_{1} are two strike prices, then for the corresponding put prices (with the same expiration and underlying asset) one has $p(K_{1})

Proof.  A put price is higher if the probability of it being in the money at expiration is higher. Let $S(T)$ be the stock price at expiration $T.$ Since $T$ is a moment in the future, $S(T)$ is a random variable. For a given strike $K,$ the put is said to be in the money at expiration if $S(T) If $K_{1} and $S(T) then $S(T) It follows that the set $\{ S(T) is a subset of the set $\{S(T) Hence the probability of the event $\{S(T) is higher than that of the event $\{S(T) and $p(K_{2})>p(K_{1}).$

Put debit spread strategy. Select two strikes $K_{1} buy $p(K_{2})$ (take a long position) and sell $p(K_{1})$ (take a short position). You pay $p=p(K_{2})-p(K_{1})>0$ for this.

Our purpose is to derive the payoff for this strategy. We remember that if $S(T)\ge K,$ then the put $p(K)$ expires worthless.

Case $S(T)\ge K_{2}.$ In this case both options expire worthless and the payoff is the initial outlay: payoff $=-p.$

Case $K_{1}\leq S(T) Exercising the put $p(K_{2})$, in comparison with selling the stock at the market price you gain $K_{2}-S(T).$ The second option expires worthless. The payoff is: payoff $=K_{2}-S(T)-p.$

Case $S(T) Both options are exercised. The gain from $p(K_{2})$ is, as above, $K_{2}-S(T).$ The holder of the long put $p(K_{1})$ sells you stock at price $K_{1}.$ Since your position is short, you have nothing to do but comply. The alternative would be to buy at the market price, so you lose $S(T)-K_{1}.$ The payoff is: payoff $=\left(K_{2}-S(T)\right) +\left( S(T)-K_{1}\right) -p=K_{2}-K_{1}-p.$

Summarizing, we get:

payoff $=\left\{\begin{array}{ll} -p, & K_2\le S(T) \\ K_{2}-S(T)-p, & K_{1}\leq S(T)

Normally, the strikes are chosen so that $K_{2}-K_{1}>p.$ From the payoff expression we see then that the maximum profit is $K_{2}-K_{1}-p>0,$ the maximum loss is $-p$ and the breakeven stock price is $S(T)=K_{2}-p.$ This is illustrated in Figure 1, where the stock price at expiration is on the horizontal axis.

Figure 1. Payoff from put debit spread. Source: https://www.optionsbro.com/

Conclusion. For the strategy to be profitable, the price at expiration should satisfy $S(T)< K_{2}-p.$ Buying a put debit spread is appropriate when the price is expected to stay in that range.

In comparison with the long put position $p(K_{2}),$ taking at the same time the short call position $-p(K_{1})$ allows one to reduce the initial outlay. This is especially important when the stock volatility is high, resulting in a high put price. In the difference $p(K_{2})-p(K_{1})$ that volatility component partially cancels out.

Remark. There is an important issue of choosing the strikes. Let $S$ denote the stock price now. The payoff expression allows us to rank the next choices in the order of increasing risk: 1) $S (both options are in the money, less risk), 2) $K_1 and 3) $K_1 (both options are out of the money, highest risk).  Also remember that a put debit spread is less expensive than buying $p(K_{2})$ and selling $p(K_{1})$ in two separate transactions.

Exercise. Analyze a put credit spread, in which you sell $p(K_{2})$ and buy $p(K_{1})$.

21
Mar 21

A combination of several options in one trade is called a strategy. Here we discuss a strategy called a call debit spread. The word "debit" in this name means that a trader has to pay for it. The rule of thumb is that if it is a debit (you pay for a strategy), then it is less risky than if it is a credit (you are paid). Let $c(K)$ denote the call price with the strike $K,$ suppressing all other variables that influence the call price.

Assumption. The market values higher events of higher probability. This is true if investors are rational and the market correctly reconciles views of different investors.

We need the following property: if $K_{1} are two strike prices, then for the corresponding call prices (with the same expiration and underlying asset) one has $c(K_{1})>c(K_{2}).$

Proof.  A call price is higher if the probability of it being in the money at expiration is higher. Let $S(T)$ be the stock price at expiration $T.$ Since $T$ is a moment in the future, $S(T)$ is a random variable. For a given strike $K,$ the call is said to be in the money at expiration if $S(T)>K.$ If $K_{1} and $S(T)>K_{2},$ then $S(T)>K_{1}.$ It follows that the set $\{ S(T)>K_{2}\}$ is a subset of the set $\{S(T)>K_{1}\} .$ Hence the probability of the event $\{S(T)>K_{2}\}$ is lower than that of the event $\{S(T)>K_{1}\}$ and $c(K_{1})>c(K_{2}).$

Call debit spread strategy. Select two strikes $K_{1} buy $c(K_{1})$ (take a long position) and sell $c(K_{2})$ (take a short position). You pay $p=c(K_{1})-c(K_{2})>0$ for this.

Our purpose is to derive the payoff for this strategy. We remember that if $S(T)\leq K,$ then the call $c(K)$ expires worthless.

Case $S(T)\leq K_{1}.$ In this case both options expire worthless and the payoff is the initial outlay: payoff $=-p.$

Case $K_{1} Exercising the call $c(K_{1})$ and immediately selling the stock at the market price you gain $S(T)-K_{1}.$ The second option expires worthless. The payoff is: payoff $=S(T)-K_{1}-p.$ (In fact, you are assigned stock and selling it is up to you).

Case $K_{2} Both options are exercised. The gain from $c(K_{1})$ is, as above, $S(T)-K_{1}.$ The holder of the long call $c(K_{2})$ buys from you at price $K_{2}.$ Since your position is short, you have nothing to do but comply. You buy at $S(T)$ and sell at $K_{2}.$ Thus the loss from $-c(K_{2})$ is $K_{2}-S(T).$ The payoff is: payoff $=\left(S(T)-K_{1}\right) +\left( K_{2}-S(T)\right) -p=K_{2}-K_{1}-p.$

Summarizing, we get:

payoff $=\left\{\begin{array}{ll} -p, & S(T)\leq K_{1} \\ S(T)-K_{1}-p, & K_{1}

Normally, the strikes are chosen so that $K_{2}-K_{1}>p.$ From the payoff expression we see then that the maximum profit is $K_{2}-K_{1}-p>0,$ the maximum loss is $-p$ and the breakeven stock price is $S(T)=K_{1}+p.$ This is illustrated in Figure 1, where the stock price at expiration is on the horizontal axis.

Figure 1. Payoff for call debit strategy. Source: https://www.optionsbro.com/

Conclusion. For the strategy to be profitable, the price at expiration should satisfy $S(T)\geq K_{1}+p.$ Buying a call debit spread is appropriate when the price is expected to stay in that range.

In comparison with the long call position $c(K_{1}),$ taking at the same time the short call position $-c(K_{2})$ allows one to reduce the initial outlay. This is especially important when the stock volatility is high, resulting in a high call price. In the difference $c(K_{1})-c(K_{2})$ that volatility component partially cancels out.

Remark. There is an important issue of choosing the strikes. Let $S$ denote the stock price now. The payoff expression allows us to rank the next choices in the order of increasing risk: 1) $K_1 (both options are in the money, less risk), 2) $K_1 and 3) $K_1 (both options are out of the money, highest risk).  Also remember that a call debit spread is less expensive than buying $c(K_{1})$ and selling $c(K_{2})$ in two separate transactions.

Exercise. Analyze a call credit spread, in which you sell $c(K_{1})$ and buy $c(K_{2})$.

27
Jan 21

## AP Stats and Business Stats

Its content, organization and level justify its adoption as a textbook for introductory statistics for Econometrics in most American or European universities. The book's table of contents is somewhat standard, the innovation comes in a presentation that is crisp, concise, precise and directly relevant to the Econometrics course that will follow. I think instructors and students will appreciate the absence of unnecessary verbiage that permeates many existing textbooks.

Having read Professor Mynbaev's previous books and research articles I was not surprised with his clear writing and precision. However, I was surprised with an informal and almost conversational one-on-one style of writing which should please most students. The informality belies a careful presentation where great care has been taken to present the material in a pedagogical manner.

Carlos Martins-Filho
Professor of Economics
Boulder, USA

18
Oct 20

## People need real knowledge

### Traffic analysis

The number of visits to my website has exceeded 206,000. This number depends on what counts as a visit. An external counter, visible to everyone, writes cookies to the reader's computer and counts many visits from one reader as one. The number of individual readers has reached 23,000. The external counter does not give any more statistics. I will give all the numbers from the internal counter, which is visible only to the site owner.

I have a high percentage of complex content. After reading one post, the reader finds that the answer he is looking for depends on the preliminary material. He starts digging it and then has to go deeper and deeper. Hence the number 206,000, that is, one reader visits the site on average 9 times on different days. Sometimes a visitor from one post goes to another by link on the same day. Hence another figure: 310,000 readings.

I originally wrote simple things about basic statistics. Then I began to write accompanying materials for each advanced course that I taught at Kazakh-British Technical University (KBTU). The shift in the number and level of readership shows that people need deep knowledge, not bait for one-day moths.

For example, my simple post on basic statistics was read 2,300 times. In comparison, the more complex post on the Cobb-Douglas function has been read 7,100 times. This function is widely used in economics to model consumer preferences (utility function) and producer capabilities (production function). In all textbooks it is taught using two-dimensional graphs, as P. Samuelson proposed 85 years ago. In fact, two-dimensional graphs are obtained by projection of a three-dimensional graph, which I show, making everything clear and obvious.

The answer to one of the University of London (UoL) exam problems attracted 14,300 readers. It is so complicated that I split the answer into two parts, and there are links to additional material. On the UoL exam, students have to solve this problem in 20-30 minutes, which even I would not be able to do.

### Why my site is unique

My site is unique in several ways. Firstly, I tell the truth about the AP Statistics books. This is a basic statistics course for those who need to interpret tables, graphs and simple statistics. If you have a head on your shoulders, and not a Google search engine, all you need to do is read a small book and look at the solutions. I praise one such book in my reviews. You don't need to attend a two-semester course and read an 800-page book. Moreover, one doesn't need 140 high-quality color photographs that have nothing to do with science and double the price of a book.

Many AP Statistics consumers (that's right, consumers, not students) believe that learning should be fun. Such people are attracted by a book with anecdotes that have no relation to statistics or the life of scientists. In the West, everyone depends on each other, and therefore all the reviews are written in a superlative degree and streamlined. Thank God, I do not depend on the Western labor market, and therefore I tell the truth. Part of my criticism, including the statistics textbook selected for the program "100 Textbooks" of the Ministry of Education and Science of Kazakhstan (MES), is on Facebook.

Secondly, I have the world's only online, free, complete matrix algebra tutorial with all the proofs. Free courses on Udemy, Coursera and edX are not far from AP Statistics in terms of level. Courses at MIT and Khan Academy are also simpler than mine, but have the advantage of being given in video format.

The third distinctive feature is that I help UoL students. It is a huge organization spanning 17 universities and colleges in the UK and with many branches in other parts of the world. The Economics program was developed by the London School of Economics (LSE), one of the world's leading universities.

The problem with LSE courses is that they are very difficult. After the exams, LSE puts out short recommendations on the Internet for solving problems like: here you need to use such and such a theory and such and such an idea. Complete solutions are not given for two reasons: they do not want to help future examinees and sometimes their problems or solutions contain errors (who does not make errors?). But they also delete short recommendations after a year. My site is the only place in the world where there are complete solutions to the most difficult problems of the last few years. It is not for nothing that the solution to one problem noted above attracted 14,000 visits.

Fourthly, my site is unique in terms of the variety of material: statistics, econometrics, algebra, optimization, and finance.

The average number of visits is about 100 per day. When it's time for students to take exams, it jumps to 1-2 thousand. The total amount of materials created in 5 years is equivalent to 5 textbooks. It takes from 2 hours to one day to create one post, depending on the level. After I published this analysis of the site traffic on Facebook, my colleague Nurlan Abiev decided to write posts for the site. I pay for the domain myself, \$186 per year. It would be nice to make the site accessible to students and schoolchildren of Kazakhstan, but I don't have time to translate from English.

Once I was looking at the requirements of the MES for approval of electronic textbooks. They want several copies of printouts of all (!) materials and a solid payment for the examination of the site. As a result, all my efforts to create and maintain the site so far have been a personal initiative that does not have any support from the MES and its Committee on Science.