Variance, covariance, standard deviation and correlation: their definitions and properties are deeply rooted in the Euclidean geometry.

## Here is the why: analogy with Euclidean geometry

Euclid axiomatically described the space we live in. What we have known about the geometry of this space since the ancient times has never failed us. Therefore, statistical definitions based on the Euclidean geometry are sure to work.

#### 1. Analogy between scalar product and covariance

**Geometry**. See Table 2 here for operations with vectors. The **scalar produc**t of two vectors is defined by

**Statistical analog**: **Covariance** of two random variables is defined by

Both the scalar product and covariance are linear in one argument when the other argument is fixed.

#### 2. Analogy between orthogonality and uncorrelatedness

**Geometry**. Two vectors are called **orthogonal** (or **perpendicular**) if

(1)

**Exercise**. How do you draw on the plane the vectors ? Check that they are orthogonal.

**Statistical analog**: Two random variables are called **uncorrelated** if .

#### 3. Measuring lengths

**Geometry**: the **length of a vector** is , see Figure 1.

**Statistical analog**: the **standard deviation** of a random variable is

This explains the square root in the definition of the standard deviation.

#### 4. **Cauchy-Schwarz inequality**

**Geometry**: .

**Statistical analog**: . See the proof here. The proof of its geometric counterpart is similar.

#### 5. Triangle inequality

**Geometry**: , see Figure 2 where the length of X+Y does not exceed the sum of lengths of X and Y.

**Statistical analog**: using the Cauchy-Schwarz inequality we have

#### 4. The Pythagorean theorem

**Geometry**: In a right triangle, the squared hypotenuse is equal to the sum of the squares of the two legs. The illustration is similar to Figure 2, except that the angle between X and Y should be right.

**Proof**. Taking two orthogonal vectors as legs, we have

Squared hypotenuse =

(squaring out and using orthogonality (1))

= Sum of squared legs

**Statistical analog**: If two random variables are **uncorrelated**, then variance of their sum is a sum of variances

#### 5. The most important analogy: measuring angles

**Geometry**: the **cosine of the angle between two vectors** is defined by

Cosine between X,Y =

**Statistical analog**: the **correlation coefficient** between two random variables is defined by

This intuitively explains why the correlation coefficient takes values between -1 and +1.

**Remark**. My colleague Alisher Aldashev noticed that the correlation coefficient is the cosine of the angle between the deviations and and not between themselves.

You must be logged in to post a comment.