Memorizing p-value interpretation through Coronavirus!
WHAT YOU’LL GET OUT OF IT
If you are anything like me you grew up conducting inferential statistics using the following as a Bible rule-
if p-values comes out to be less than 0.05, reject the null hypothesis, else fail to reject.
Nothing wrong with it except for a mind like mine (and I am sure yours too), this logical jump is a bit too mechanical and deserves better understanding.
So I am not going to bore you with the actual (read: pretentious) definitions of the p-value that exist out there or even how to calculate p-values. This short post is about how you can wrap your mind around the fact that you are in fact supposed to reject the null hypothesis when you encounter a very small p-value.
INTRODUCTION
With the Covid-19 virus gripping almost the entire planet, numerous research teams across the globe are working towards developing medicine to combat the deadly virus. Let us assume a bunch of them create a novel medicine but WHO claims that the medicine is no good i.e. a useless medicine. According to them, your health is not going to get any better than it was before even if you take the medicine.
Now, since we as a species are skeptical and cannot take WHO’s words regarding the (non)effectiveness of the medicine for granted, we decide to conduct an inferential analysis to test if the medicine is any good i.e. whether person’s health can actually improve after taking the medicine.
This test, however, must be done on a random sample from the population, since you cannot run tests on the entire population (which in our case would be ALL covid positive patients in each and every country), administering the medicine to all of them and then waiting to hear back if their health improved or not.
Now upon running the tests on a sample of the population, let’s assume you obtain a p-value of 0.03. What this means is that there is a 3% chance that the improvement in health observed in the sample was due to noise i.e. random sampling error. (In other words, the medicine is no good but just because of the way the sample was selected, some improvements were observed which were pure happenstance). This can be reworded for our understanding as there is a 97% chance (100% — 3%) that the improvement in health observed was due to true effect i.e. the true effect of the novel medicine. Since 97% is a large probability speaking in favor of the true effectiveness of the novel medicine, this means our initial assumption or rather WHO’s preconceived notions regarding the medicine were wrong and the novel medicine is, in fact, useful against fighting Coronavirus.
SPEAKING IN TERMS OF NULL AND ALTERNATE HYPOTHESIS...
(so you can apply the interpretation to your own examples as well)
Ho: no differences exist (e.g. taking medicine does not improve health)
H1: some differences exist (e.g. taking the medicine improves the health)
Interpretations of some p-value examples:
Example 1: p-value — 0.03. Assuming the null hypothesis (Ho) is true, this means there is only 3% chance the differences observed are due to noise (random sampling error), and thus there is a whopping 97% chance that the differences observed are due to true effect. Thus, our initial assumption is wrong. That is, we must reject Ho.
Example 2: p-value — 0.82. Assuming the null hypothesis (Ho) is true, this means it is 82% likely that the observed differences are due to chance and thus only, 18% is due to the true effect, meaning our initial assumption is true and we do not reject our null hypo Ho.
In general, the mantra to remember is —
the less likely an effect is caused by random noise (i.e. small p-value), more likely it is to be caused by a true effect. Since we always assume the null hypothesis to be true to begin with, a small p-value will only increase our confidence in the alternative hypothesis, leading us to reject null hypothesis.
CONCLUSION
Hopefully, this post gives you a better sense of remembering the interpretation of p-values, next time you see one in your calculations during significance testing.