How much p-value?

Ok so this is my first post in a long while. Life gets in the way some times. If I intend to stop posting I’ll at least do you the courtesy and let you know.

We all need a break from time to time, but now that I am back with fresh eyes and ears what do I see and hear everywhere? P-values, yes P-values. If someone asked me what is the first and major step in a journey towards conquering statistics, then I would say understanding p-values. This is because almost every typical scientific study use a p-value as a way of scientifically testing if their hypothesis is true or not.

For example of the individuals who develop a certain rash, suppose the mean recovery time for individuals who do not use any form of treatment is 30 days with standard deviation equal to 8. A pharmaceutical company manufacturing a certain cream wishes to determine whether the cream shortens, extends, or has no effect on the recovery time. The company chooses a random sample of 100 individuals who have used the cream (the test sample), and determines that the mean recovery time for these individuals was 28.5 days. Does the cream have any effect? The p-value is a measure of how likely that difference from 30 to 28.5 is random or due to the cream. Intuitively we can see that the difference is small. In any case how is this achieved? Well typically we fit the data to a “Normal” probability bell curve distribution whereby the mean of 30 is the the mid point at top of the bell curve. We can then plot where 28.5 sits on the curve. To measure distance on the probability distribution we use z-scores. A z-score is the number of standard deviations a score is above or below the mean.

z-score = (sample mean – mean)/(standard deviation/sqrt n). n being the sample size

28.5 – 30 = -1.5

8/sqrt(100) = 8/10 = 0.8

-1.5 / 0.8 = -1.875

For example -1.875 is just inside 2 standard deviations to the left of a mean (at 0). Quickly refreshing our memory that a standard deviation is a measure of variance, or how dispersed, the data is from the mean. When looking at a bell curve, 68% of the measures lies within one standard deviation of the mean. 95% of the distribution lies within two standard deviations of the mean, and a whopping 99.7% of all measures fall within three standard deviations. So at this point we can see that our -1.875 z-score value tells us that our 28.5 statistic we can assume falls within 95% of all other data from our control group who don’t didn’t use the cream. In short z-scores allow us to compare any one score to any other score in a distribution because it is standardized metric based on mean and standard deviations, also known as the Empirical or Three Sigma Rule.

Now back to p-values, so the next step is to take our Z-score and convert into a probability, i.e. the likelihood at which our 28.5 test statistic would occur due to random chance. Calculating probabilities is difficult so a z-score table is used to lookup the corresponding p-value.

The left column in a z-table will show values to the tenths place, while the top row will show values to the hundredths place, we need to round this to the hundredths place, so -1.88 looks up to

p-value 0.0301

However in this hypothesis test we are testing if the if the average days are either less than OR greater than the control group’s 30 days. Due to this we use a two tailed test, because half of the probability is attributed to either side of mean. This means that we have to multiply our p-value by 2, so now we have the final p-value of

0.0602

Which is just greater than 0.05, so we reject our hypothesis that the drug effect is statistically “significant”, but the 0.05 cut off? Well in 1925 Ronald Fisher wrote in his “Statistical Methods for Research Workers”:

“Personally, the writer prefers to set a low standard of significance at the 5 per cent point, and ignore entirely all results which fail to reach this level.”

And that’s the real reason, Fisher said he liked 0.05 and everybody else just ran with it. Fast forward almost 100 years though and it has become apparent that 0.05 has some short comings. I’ll finish this post by leaving you with this very interesting video on p-values, which also covers a concept called “p-hacking”.