Properties of standard deviation are divided in two parts. The definitions and consequences are given here. Both variance and standard deviation are used to measure variability of values of a random variable around its mean. Then why use both of them? The why will be explained in another post.
Properties of standard deviation: definitions and consequences
Definition. For a random variable , the quantity is called its standard deviation.
Digression about square roots and absolute values
In general, there are two square roots of a positive number, one positive and the other negative. The positive one is called an arithmetic square root. The arithmetic root is applied here to (see properties of variance), so standard deviation is always nonnegative.
Definition. An absolute value of a real number is defined by
(1) if is nonnegative and if is negative.
This two-part definition is a stumbling block for many students, so making them plug in a few numbers is a must. It is introduced to measure the distance from point to the origin. For example, and . More generally, for any points on the real line the distance between them is given by .
By squaring both sides in Eq. (1) we obtain . Application of the arithmetic square root gives
This is the equation we need right now.
Back to standard deviation
Property 1. Standard deviation is homogeneous of degree 1. Indeed, using homogeneity of variance and equation (2), we have
Unlike homogeneity of expected values, here we have an absolute value of the scaling coefficient .
Property 2. Cauchy-Schwarz inequality. (Part 1) For any random variables one has
(Part 2) If the inequality sign in (3) turns into equality, , then is a linear function of : , with some constants .
Proof. (Part 1) If at least one of the variables is constant, both sides of the inequality are and there is nothing to prove. To exclude the trivial case, let be non-constant and, therefore, are positive. Consider a real-valued function of a real number defined by . Here we have variance of a linear combination
We see that is a parabola with branches looking upward (because the senior coefficient is positive). By nonnegativity of variance, and the parabola lies above the horizontal axis in the plane. Hence, the quadratic equation may have at most one real root. This means that the discriminant of the equation is non-positive:
Applying square roots to both sides of we finish the proof of the first part.
(Part 2) In case of the equality sign the discriminant is . Therefore the parabola touches the horizontal axis where . But we know that this implies which is just another way of writing .
Comment. (3) explains one of the main properties of the correlation: