New e-value even more flexible: significance level adjustable at a later stage

Paper by researcher Peter Grünwald published in Proceedings of the National Academy of Sciences.

Publication date
23 Sep 2024

Peter Grunwald

Recently, a paper by statistics researcher Peter Grünwald (Centrum Wiskunde & Informatica/Universiteit Leiden) entitled “Beyond Neyman-Pearson: E-values enable hypothesis testing with a data-driven alpha” was published in the prestigious scientific journal Proceedings of the National Academy of Sciences of date 20 September 2024.

It was already known that e-values are more flexible than p-values: with e-values you can stop an experiment earlier than originally planned or, for example, add subjects afterwards. In this paper, Grünwald shows that e-values are also more flexible in another way: with e-values, it is possible to determine the significance level at a later time than usual.

Traditional statistical methods under fire

Whether experimental results are significant and not due to chance is traditionally determined using p-values and the significance level, usually denoted by “alpha”.

This methodology was largely developed in the 1930s by Neyman and Pearson, two founding fathers of modern statistics. But in recent years, p-values and related confidence intervals - error bars around graphs that were widely used, for example, by the RIVM during the COVID pandemic - have come under increasing scrutiny. P-values are extremely difficult to interpret, and lend themselves well to misuse, intentional or otherwise. This is one of the reasons for the “replication crisis”: there are far more false positive results in applied science than one would hope or expect.

The idea of Neyman-Pearson statistics with p-values is that you determine a signficance level in advance, usually you set it at 0.05. Then you observe data, and from that you calculate a p-value. If it is then smaller than your significance level, you conclude that you have presumably found something 'significant' (e.g. “the drug works”, “the phenomenon did not arise by chance”). If you proceed in this way, the probability of a false positive (you say “there is a connection/the drug works/it is not a coincidence” when this is not the case) is smaller than the significance level 5%.

The smaller the p-value, the stronger the evidence that there is really something going on. But oddly enough, you are not allowed to adjust the significance level at a later stage. If you see a p-value of 0.01 instead of 0.05 you are inclined to think: now the probability of error is only 1%! In practice, it doesn't work that way. The significance level cannot be adjusted retrospectively, so: once 5%, always 5%.

Letter e appears in large letter P.
Image: Papernerd.

Adjusting the significance level does not affect the reliability of the study

What an observation like 'it turned out that p < 0.01 but the sifnificance level was 0.05' means exactly, is almost impossible to explain in practice. Applied researchers (such as medics, biologists, and psychologists) tend to explain a small p simply as a small probability of a false positive, and even professional statisticians unfortunately sometimes make similar mistakes.

In his recent article, Grünwald provides a mathematical proof that shows that if you work with e-values instead of p-values, adjusting the p-value is indeed possible: you may change the significance level in a later stage of the research project, and the research result remains reliable.

Through previously published research by Grünwald and colleagues, it was already clear that with the e-value -in contrast to the p-value- you may adjust the number of participants in your study: you may stop when you want and add data as long as you want. So now it also becomes clear that e-values are more flexible in yet another way than p-values and confidence intervals: the significance level may also be determined retrospectively.

The confusion between p-value and significance level is perhaps the main reason why p-values are so difficult to understand – and this is what makes Grunwald's discovery revolutionary. He shows that this problem is largely eliminated with the e-value.

Earlier this year, Peter Grünwald, senior researcher in CWI's Machine Learning research group, was awarded an ERC Advanced Grant to further research flexible statistical methods based on the e-value, a robust and flexible alternative to the p-value.