We are currently in the process of updating this chapter and we appreciate your patience whilst this is being completed.
Epidemiological studies often generate a large volume of data. Summarising these can help draw out patterns and results.
Methods used to summarise epidemiological data are addressed in detail in section 1b, Statistical Methods. Data can be summarised numerically (see “Measures of location and dispersion and their appropriate uses”) or graphically (see “Graphical methods in statistics”). The type of data collected influences which methods of numerical and graphical summary can be used.
Types of Data
Data can be broadly distinguished as categorical or numeric. Categorical data may be nominal, ordinal or binary.
This describes categorical data without an order. Examples include blood groups (O, A, B, AB), eye colour and marital status.
Ordinal data are also categorical, but in this case categories have an order and can be ranked. Examples include stages of breast cancer. Importantly the “distances” between the different groups can be variable. For example, Likert responses may have the options “strongly agree”, “agree”, “neither agree nor disagree”, “disagree” and “strongly disagree”. Clearly this can be ordered, so it is an example of ordinal data, but it is apparent that the difference in agreement between “agree” and “strongly agrees” may not be the same as that between “agree” and “neither agree nor disagree”.
Binary, or dichotomous, data have only two possible outcomes. Common examples are Yes/No or True/False responses, but they could also include other common epidemiological outcomes, such as “survived” and “not survived”.
Numeric data can be discrete or continuous. Discrete data have fixed values. Examples include shoe size or number of people. Continuous data can take any value, frequently within a given range. Examples include weight and length (where the range would be from zero to, theoretically, infinity).
There are four data scales: nominal, ordinal, interval and ratio. Nominal and ordinal data have already been described.
Interval data are numerical data where the differences between two numbers can be interpreted, but the ratio between two numbers is meaningless. Additionally, interval data do not have a true zero. An example is temperature measured in degrees Celsius. The difference between 10°C and 20°C is the same as the difference between 30°C and 40°C – so the differences are meaningful. However, the 20°C is not twice as hot as 10°C, so the ratios are not meaningful.
Ratio data are numerical. Ratio data have a true zero and both differences and ratios are meaningful. An example is weight. The difference between 1kg and 2kg is the same as the difference between 3kg and 4kg. In addition, 2kg is twice as much as 1kg, and 10kg is twice as much as 5kg – so ratios are meaningful.
Note that ratio and interval data may be either discrete or continuous.
Figure 1 gives a summary of the different data scales.
Figure 1. Data scales. INSERT FIGURE 1 HERE
Information on summarising data numerically, and using plots, can be found in Chapter 1B (“Measures of Location and Dispersion and their appropriate uses” and “Graphical methods in Statistics”, respectively)
© Helen Barratt 2009, Saran Shantikumar 2018