Oct 17

Significance level and power of test

Significance level and power of test

In this post we discuss several interrelated concepts: null and alternative hypotheses, type I and type II errors and their probabilities. Review the definitions of a sample space and elementary events and that of a conditional probability.

Type I and Type II errors

Regarding the true state of nature we assume two mutually exclusive possibilities: the null hypothesis (like the suspect is guilty) and alternative hypothesis (the suspect is innocent). It's up to us what to call the null and what to call the alternative. However, the statistical procedures are not symmetric: it's easier to measure the probability of rejecting the null when it is true than other involved probabilities. This is why what is desirable to prove is usually designated as the alternative.

Usually in books you can see the following table.

Decision taken
Fail to reject null Reject null
State of nature Null is true Correct decision Type I error
Null is false Type II error Correct decision

This table is not good enough because there is no link to probabilities. The next video does fill in the blanks.

Significance level and power of test

Video. Significance level and power of test

Significance level and power of test

The conclusion from the video is that

\frac{P(T\bigcap R)}{P(T)}=P(R|T)=P\text{(Type I error)=significance level}

\frac{P(F\bigcap R)}{P(F)}=P(R|F)=P\text{(Correctly rejecting false null)=Power}

Mar 16

What is a p value?

What is a p value? The definition that is easier to apply

This blog discusses how tricky the notion of a p-value is. It states the technical definition of a p-value — the probability of getting results at least as extreme as the ones you observed, given that the null hypothesis is correct. Stated like this, I don't understand it either. The definition from Wikipedia is not much clearer: In frequentist statistics, the p-value is a function of the observed sample results (a test statistic) relative to a statistical model, which measures how extreme the observation is. The p-value is defined as the probability of obtaining a result equal to or "more extreme" than what was actually observed, assuming that the model is true.

Below I give the definition I prefer, hoping that it's easier to apply. This discussion requires the knowledge of the null and alternative hypotheses and of the significance level. It also presumes availability of some test statistic (t statistic for simplicity).

Suppose we want to test the null hypothesis H_0:\ \beta=0 against a symmetric alternative H_a:\ \beta\neq0. Given a small number \alpha\in(0,1) (which is called a significance level in this context) and an estimator \hat{\beta} of the parameter \beta, consider the probability f(\alpha)=P(|\hat{\beta}|>\alpha). Note that when \alpha decreases to 0, the value f(\alpha) increases. In my book, I use the following definition: the p-value is the smallest significance level at which the null hypothesis can still be rejected. With this definition, it is easier to understand that (a) the null is rejected at any \alpha\geq p-value and (b) for any \alpha<p-value the decision is ”Fail to reject the null”.

Remarks. Statistical procedures for hypotheses testing are not perfect. In particular, there is no symmetry between the null and alternative hypotheses. The fact that their choice is up to the researcher makes the test subjective. The choice of the significance level is subjective, as is the choice of the model and test statistic. Users with limited statistical knowledge tend to overestimate the power of statistics.