Statistical significance is usually assessed by appeal to a p-value, which can be calculated by considering the number of standard errors difference between two groups and referring this to the normal distribution (or sometimes the t distribution, especially if the sample is small and the outcome approximately normally distributed in the population).

The p-value is a probability. It can take any value between 0 (impossible) and 1 (certain). It represents the predicted proportion of results which would be more extreme than the one we have observed purely by chance, if there was no true underlying difference between the treatments.

An arbitrary threshold for statistical significance must be set, preferably in advance by the trialists, and is usually chosen as p<0.05 (5% or 1 in 20) or sometimes p<0.01 (1% or 1 in 100).

The p-value is often misinterpreted as being the probability that there is no difference between the treatments being compared. It is not as simple as this, unfortunately. The p-value is a conditional probability, defined as the probability of observing a result as or more extreme purely by chance if there were in fact no difference between the treatments. It is a subtle distinction, but quite important. For example, if we were to do 1000 trials of distilled water vs distilled water, we would get ~50 results (5% or 1 in 20) with a p-value of <0.05, but 100% of these would be false positive results. If we were to do 1000 trials of jumping out of a high altitude aeroplane without a parachute vs staying tucked up in bed, there would not be any false positive results at all.

*Click the "quiz" link below to take a quick test on p-values.*