March 25, 2021

The p-value and Sir Ronald Fischer

The p-value and Sir Ronald Fischer

There is no shortage of published clinical research. Every 20 seconds, a new medical paper is added to MedLine.

Unfortunately, they are far from all relevant.

We discuss the difference between statistical significance and clinical significance or clinical relevance in the Precision Evidence Podcast Episode 10 

Precision Evidence - Episode 10

The p-value with a value at 0.05 or lower is widely accepted as statistical significant.

Significant Results

The most common method to prove the significance of a trial's discovery is to use the p-value. The p-value is, in short, a mathematical calculation of the probability (hence the "p") for the presented findings (or something more extreme) to be due to pure chance. The p-value is a calculated probability for the findings or even more drastic results being due to chance. The p-value does not relate directly to the research question but to the data collected – or better analyzed – and is not a measure of relevant differences between groups in a trial or the findings' power.

The use of p-value to define results as "statistically significant" goes almost 100 years back to 1925 when Sir Ronald A. Fischer presented the idea and definition in his book" Statistical Methods for Research Workers," where he suggested p=0.05 as the limit to judge whether a deviation should be considered significant or not.

In the Precision Evidence Podcast episode 10, we discuss this further and suggest how we can focus on getting relevant and helpful information from clinical research and not just let publishing yet another paper be the end of the study.

Ronald A. Fischer and  " Statistical Methods for Research Workers"

When Ronald A. Fisher, a UK-based statistician, wrote and published "Statistical Methods for Research Workers," he hardly had any idea of the enormous impact this book would have on clinical research almost 100 years later.

In the book, Fischer wrote

["…it is convenient to take this point [p = 0.05] as a limit in judging whether a deviation is to be considered significant or not. Deviations exceeding twice the standard deviation [approxemately equals p=0.05] are thus formally regarded as significant]

Voila, "are thus formally regarded as significant," and it became and continued to be.

Undoubtedly, this had never been Fischer's intention that this cut-off value of p = 0.05 should be used as a black or white way to distinguish between right or wrong. There is no scientific reason for this choice; it is purely arbitrary. In the book, Fischer also wrote

["Whether this procedure should, or should not, be used must be decided, not by the mathematical attainments of the investigator, but by discovering whether it will or will not give a sufficiently accurate answer."]

So it was never Fischer's intention to use the p-value to dichotomize between true or false. The p-value is one and only one brick in the puzzle of evaluating the usefulness of research results.

In the podcast episode 10, we talk about how statistical and clinical significance can work together and why  WE MUST ALWAYS ASK FOR A CLINICAL RELEVANCE EVALUATION OF THE RESULTS - and make it just as normal as the p-value

Precision Evidence - Episode 10