Here is the why: analogy with Euclidean geometry
Euclid axiomatically described the space we live in. What we have known about the geometry of this space since the ancient times has never failed us. Therefore, statistical definitions based on the Euclidean geometry are sure to work.
1. Analogy between scalar product and covariance
Geometry. See Table 2 here for operations with vectors. The scalar product of two vectors is defined by
Statistical analog: Covariance of two random variables is defined by
Both the scalar product and covariance are linear in one argument when the other argument is fixed.
2. Analogy between orthogonality and uncorrelatedness
Geometry. Two vectors are called orthogonal (or perpendicular) if
Exercise. How do you draw on the plane the vectors ? Check that they are orthogonal.
Statistical analog: Two random variables are called uncorrelated if .
3. Measuring lengths
Geometry: the length of a vector is , see Figure 1.
Statistical analog: the standard deviation of a random variable is
This explains the square root in the definition of the standard deviation.
4. Cauchy-Schwarz inequality
Statistical analog: . See the proof here. The proof of its geometric counterpart is similar.
5. Triangle inequality
Geometry: , see Figure 2 where the length of X+Y does not exceed the sum of lengths of X and Y.
Statistical analog: using the Cauchy-Schwarz inequality we have
4. The Pythagorean theorem
Geometry: In a right triangle, the squared hypotenuse is equal to the sum of the squares of the two legs. The illustration is similar to Figure 2, except that the angle between X and Y should be right.
Proof. Taking two orthogonal vectors as legs, we have
Squared hypotenuse =
(squaring out and using orthogonality (1))
= Sum of squared legs
Statistical analog: If two random variables are uncorrelated, then variance of their sum is a sum of variances
5. The most important analogy: measuring angles
Geometry: the cosine of the angle between two vectors is defined by
Cosine between X,Y =
Statistical analog: the correlation coefficient between two random variables is defined by
This intuitively explains why the correlation coefficient takes values between -1 and +1.
Remark. My colleague Alisher Aldashev noticed that the correlation coefficient is the cosine of the angle between the deviations and and not between themselves.