10
Jan 16

What is a z score: the scientific explanation




You know what is a z score when you know why people invented it.

As usual, we start with a theoretical motivation. There is a myriad of distributions. Even if we stay within the set of normal distributions, there is an infinite number of them, indexed by their means \mu(X)=EX and standard deviations \sigma(X)=\sqrt{Var(X)}. When computers did not exist, people had to use statistical tables. It was impossible to produce statistical tables for an infinite number of distributions, so the problem was to reduce the case of general \mu(X) and \sigma(X) to that of \mu(X)=0 and \sigma(X)=1.

But we know that that can be achieved by centering and scaling. Combining these two transformations, we obtain the definition of the z score:

z=\frac{X-\mu(X)}{\sigma(X)}.

Using the properties of means and variances we see that

Ez=\frac{E(X-\mu(X))}{\sigma(X)}=0, Var(z)=\frac{Var(X-\mu(X))}{\sigma^2(X)}=\frac{Var(X)}{\sigma^2(X)}=1.

The transformation leading from X to its z score sometimes is called standardization.

This site promises to tell you the truth about undergraduate statistics. The truth about the z score is that:

(1) Standardization can be applied to any variable with finite variance, not only to normal variables. The z score is a standard normal variable only when the original variable X is normal, contrary to what some sites say.

(2) With modern computers, standardization is not necessary to find critical values for X, see Chapter 14 of my book.

Leave a Reply

You must be logged in to post a comment.