When a measure is continuous, that is the outcome for each participant consists of a measurement on a numerical scale, the results would usually be summarised as means (the sum of all the measurements divided by the number of participants), and the effect size summarised as the difference between the means of the groups compared. Sometimes, you might see the median reported if the samples are skewed (the distribution of results is asymmetrical). If the sample is small and skewed, the sorts of statistical tests used to compare the outcomes might involve ranking the scores rather than using the means directly.
In some trials it is possible to assess the measurement of interest both before and after treatment. In these cases, there are three different ways in which the data could be analysed.
- Ignore the baseline data and trust that randomisation has achieved balance overall (if it is an RCT)
- Analyse the mean change in measurement from baseline to follow-up to account for the influence of the baseline measurement
- Analyse the mean change in scores at follow-up whilst accounting for baseline scores (achieved using a regression model)
All of these methods are acceptable, in that they do not introduce any bias into the analysis. However, method 3 is the most powerful from a statistical point of view. The method removes some of the variability from the data by accounting for the baseline, and so the standard error associated with the estimate of the treatment effect will be somewhat smaller than that obtained by the other methods.
Sometimes you will see the standardised mean difference used, especially in meta-analyses where the outcome measure of interest was measured in different ways in the included trials, for example different measures of depression, headache severity or daily exercise. In order to place all the trial results on the same scale the difference in means for each trial is divided by the standard error, so that the results are effectively expressed as the number of standard errors difference between groups. This is sometimes the only method available when combining such data, but the results are difficult to interpret. The two main problems are that some trials may appear to have a higher treatment effect simply because they recruited a much more tightly defined population, and that the numbers produced at the end have no natural interpretation until they are translated back into the sorts of measures yielded by the various instruments concerned.