Statistical Significance | Health Knowledge

In a well-conducted randomised controlled trial, the groups being compared should differ from each other only by chance and the treatment received. We can quantify chance, and so we can quantify the treatment effect in relation to it. If the difference between groups is large compared to the play of chance, we can conclude that the difference is probably due to the treatment; if it is small by comparison then we cannot exclude chance as the reason for the results.

The same statistical methods can be used for non-randomised research designs, but the interpretation is not as straightforward as for an RCT. This is because the measure of the effect is influenced by other factors which introduce bias and which cannot be fully accounted for in the analysis.

The variability (the amount of "noise" due to chance) within the population(s) studied may be described (for a continuous outcome) by the standard deviation of the sample, calculated by summing the differences between the group mean and the individual score for each participant and then dividing by the number of participants. The standard error of the mean for each group can then be calculated by dividing this standard deviation by the square root of the number of participants.

It is important to distinguish between these two measures of variability. The standard deviation describes the amount of variation found between individuals in the population, whereas the standard error predicts the variability of the mean for similarly sized samples drawn from the same population. The standard deviation is a descriptive tool, whereas the standard error is used to make predictions or assess statistical significance. There is no meaningful interpretation of the standard deviation for binary data, but the standard error of RR, OR and HR can be calculated from the events observed (the precise formulae are easy to look up and do not matter here).

The normal distribution provides us with a particularly convenient method for assessing statistical significance because its shape is closely related to the variability in the data: around 95% of results will lie within ~2 standard errors of the true underlying effect, and 99% lie within ~3 standard errors. When our summary statistic is normally distributed we can therefore simply compare the size of the treatment effect to the size of the standard error to establish whether it is large or small compared to the amount of variability in the data. Happily, we can use the normal distribution for all the major outcome types discussed in this tutorial. For a continuous outcome, the distribution of the mean for a reasonably large sample (>20) will be close to normal even if the characteristic is not normally distributed in the population (this is known as the Central Limit Theorem). RR, OR and HR are not normally distributed but their natural logarithms (lnRR, lnOR, lnHR) are, allowing us to use a simple transformation and the normal distribution to analyse these statistics.

Other sorts of statistical test, for example rank tests or chi-squared, use the same principles but compare the results to different distributions. In all cases, the approach is to ask how extreme the observed result is compared to what we would expect to observe by chance.