Statistics: Graphical Methods
This section covers:
- Dot-plots
- Histograms
- Box-whisker plots
- Scatter plots
- Campbell MJ and Machin D. Medical Statistics a Commonsense Approach. Chichester: Wiley 1999 Chapter 4.
- Simpson A. PhD Thesis Institute of Primary Care, University of Sheffield, 2004.
A picture is worth a thousand words, or numbers, and there is no better way of getting a 'feel' for the data than to display them in a figure or graph. The general principle should be to convey as much information as possible in the figure, with the constraint that the reader is not overwhelmed by too much detail.
Dot Plots
The simplest method of conveying as much information as possible is to show all of the data and this can be conveniently carried out using a dot-plot.
Example
Data on birth weight and type of delivery are shown in Figure1 as a dot-plot. This method of presentation retains the individual subject values and clearly demonstrates differences between the groups in a readily appreciated manner. An additional advantage is that any outliers will be detected by such a plot. However, such presentation is not usually practical with large numbers of subjects in each group because the dots will obscure the details of the distribution.
Histograms
The patterns may be revealed in large data set of a numerically continuous variable by forming a histogram with them. This is constructed by first dividing up the range of variable into several non-overlapping and equal intervals, classes or bins, then counting the number of observations in each. A histogram for all the 98 birth weights in the Simpson (2004) data is shown in Figure 2. The area of each histogram block is proportional to the number of subjects in the particular birth-weight category concentration group. Thus the total area in the histogram blocks represents the total number of volunteers. Relative frequency histograms allow comparison between histograms made up of different numbers of observations which may be useful when studies are compared.
The choice of the number of intervals is important. Too few intervals and much important information may be smoothed out; too many intervals and the underlying shape will be obscured by a mass of confusing detail. It is usual to choose between 5 and 15 intervals, but the correct choice will be based partly on a subjective impression of the resulting histogram. Histograms with bins of unequal interval length can be constructed but they are usually best avoided.
Box-Whisker Plot
If the number of points is large, a dot-plot can be replaced by a box-whisker plot which is more compact than the corresponding histogram. Such a plot is illustrated in Figure 3 for the birth weight and type of delivery from Simpson (2004).
The 'whiskers' in the diagram indicate the minimum and maximum values of the variable under consideration. The median value is indicated by the central horizontal line while the lower and upper quartiles by the corresponding horizontal ends of the box. The box-whisker plot as used here therefore displays the median and two measures of spread, namely the range and interquartile range.
Scatterplots
When one wishes to show a relationship between two continuous variables then Figure 4 shows a scatterplot of birthweight by maternal age.
It is clear that the severity of hypercapnia and hypoxia are associated in that high values of one are associated with high values of the other. In Figure 4 it is immaterial which variable (hypercapnia or hypoxia) is plotted on which axis. However, if one variable, x, clearly causes the other, y, then it is usual to plot the x variable on the horizontal axis and the y variable on the vertical axis. Thus if a drug is given in various doses, the doses would be along the x-axis and the response measure on the y-axis.
References
© MJ Campbell 2006

