14
Mar 16

What is a p value?

What is a p value? The definition that is easier to apply

This blog discusses how tricky the notion of a p-value is. It states the technical definition of a p-value — the probability of getting results at least as extreme as the ones you observed, given that the null hypothesis is correct. Stated like this, I don't understand it either. The definition from Wikipedia is not much clearer: In frequentist statistics, the p-value is a function of the observed sample results (a test statistic) relative to a statistical model, which measures how extreme the observation is. The p-value is defined as the probability of obtaining a result equal to or "more extreme" than what was actually observed, assuming that the model is true.

Below I give the definition I prefer, hoping that it's easier to apply. This discussion requires the knowledge of the null and alternative hypotheses and of the significance level. It also presumes availability of some test statistic (t statistic for simplicity).

Suppose we want to test the null hypothesis H_0:\ \beta=0 against a symmetric alternative H_a:\ \beta\neq0. Given a small number \alpha\in(0,1) (which is called a significance level in this context) and an estimator \hat{\beta} of the parameter \beta, consider the probability f(\alpha)=P(|\hat{\beta}|>\alpha). Note that when \alpha decreases to 0, the value f(\alpha) increases. In my book, I use the following definition: the p-value is the smallest significance level at which the null hypothesis can still be rejected. With this definition, it is easier to understand that (a) the null is rejected at any \alpha\geq p-value and (b) for any \alpha<p-value the decision is ”Fail to reject the null”.

Remarks. Statistical procedures for hypotheses testing are not perfect. In particular, there is no symmetry between the null and alternative hypotheses. The fact that their choice is up to the researcher makes the test subjective. The choice of the significance level is subjective, as is the choice of the model and test statistic. Users with limited statistical knowledge tend to overestimate the power of statistics.