12
Nov 16

## Properties of standard deviation

Properties of standard deviation are divided in two parts. The definitions and consequences are given here. Both variance and standard deviation are used to measure variability of values of a random variable around its mean. Then why use both of them? The why will be explained in another post.

### Properties of standard deviation: definitions and consequences

Definition. For a random variable $X$, the quantity $\sigma (X) = \sqrt {Var(X)}$ is called its standard deviation.

#### Digression about square roots and absolute values

In general, there are two square roots of a positive number, one positive and the other negative. The positive one is called an arithmetic square root. The arithmetic root is applied here to $Var(X) \ge 0$ (see properties of variance), so standard deviation is always nonnegative.
Definition. An absolute value of a real number $a$ is defined by
(1) $|a| =a$ if $a$ is nonnegative and $|a| =-a$ if $a$ is negative.
This two-part definition is a stumbling block for many students, so making them plug in a few numbers is a must. It is introduced to measure the distance from point $a$ to the origin. For example, $dist(3,0) = |3| = 3$ and $dist(-3,0) = |-3| = 3$. More generally, for any points $a,b$ on the real line the distance between them is given by $dist(a,b) = |a - b|$.

By squaring both sides in Eq. (1) we obtain $|a|^2={a^2}$. Application of the arithmetic square root gives

(2) $|a|=\sqrt {a^2}.$

This is the equation we need right now.

### Back to standard deviation

Property 1. Standard deviation is homogeneous of degree 1. Indeed, using homogeneity of variance and equation (2), we have

$\sigma (aX) =\sqrt{Var(aX)}=\sqrt{{a^2}Var(X)}=|a|\sigma(X).$

Unlike homogeneity of expected values, here we have an absolute value of the scaling coefficient $a$.

Property 2. Cauchy-Schwarz inequality. (Part 1) For any random variables $X,Y$ one has

(3) $|Cov(X,Y)|\le\sigma(X)\sigma(Y)$.

(Part 2) If the inequality sign in (3) turns into equality, $|Cov(X,Y)|=\sigma (X)\sigma (Y)$, then $Y$ is a linear function of $X$: $Y = aX + b$, with some constants $a,b$.
Proof. (Part 1) If at least one of the variables is constant, both sides of the inequality are $0$ and there is nothing to prove. To exclude the trivial case, let $X,Y$ be non-constant and, therefore, $Var(X),\ Var(Y)$ are positive. Consider a real-valued function of a real number $t$ defined by $f(t) = Var(tX + Y)$. Here we have variance of a linear combination

$f(t)=t^2Var(X)+2tCov(X,Y)+Var(Y)$.

We see that $f(t)$ is a parabola with branches looking upward (because the senior coefficient $Var(X)$ is positive). By nonnegativity of variance, $f(t)\ge 0$ and the parabola lies above the horizontal axis in the $(f,t)$ plane. Hence, the quadratic equation $f(t) = 0$ may have at most one real root. This means that the discriminant of the equation is non-positive:

$D=Cov(X,Y)^2-Var(X)Var(Y)\le 0.$

Applying square roots to both sides of $Cov(X,Y)^2\le Var(X)Var(Y)$ we finish the proof of the first part.

(Part 2) In case of the equality sign the discriminant is $0$. Therefore the parabola touches the horizontal axis where $f(t)=Var(tX + Y)=0$. But we know that this implies $tX + Y = constant$ which is just another way of writing $Y = aX + b$.

Comment. (3) explains one of the main properties of the correlation:

$-1\le\rho(X,Y)=\frac{Cov(X,Y)}{\sigma(X)\sigma(Y)}\le 1$.