Sources of variation, its measurement and control

PLEASE NOTE:

We are currently in the process of updating this chapter and we appreciate your patience whilst this is being completed.

A principal assumption in epidemiology is that we can draw an inference about the experience of the entire population based on the evaluation of a representative sample of the population. However, a problem with drawing such an inference is that the errors in measured data may affect the results of an epidemiological study. For example, the results may be influenced by the play of chance, because of the effects of random variation from sample to sample.¹ This is also important when carrying out surveys, which are addressed elsewhere in this section (see “The design of documentation for recording survey data”).

Types of errors in measurement

Some measurement or classification errors are almost inevitable in epidemiological studies and they may affect the assessment of the exposure or the outcome, as well as potential confounders. There are several different types of measurement error, outlined below.

1. Measurement error (reliability and validity)

All epidemiological investigations involve the measurement of exposures, outcomes and other characteristics of interest (e.g. potential confounding factors).

Types of measures may include:

Responses to self-administered questionnaires
Responses to interview questions
Laboratory results
Physical measurements
Information recorded in medical records
Diagnosis codes from a database

All these measures may be subject to some degree of systematic measurement error and therefore result in the introduction of bias into the study. The research instruments used to measure exposure, disease status and other variables of interest should be both valid and reliable.

a) Validity

The degree to which an instrument is capable of accurately measuring what it intends to measure is referred to as its validity. For example, how well a questionnaire measures the exposure or outcome in a prospective cohort study, or the accuracy of a diagnostic test.

Assessing validity requires that an error-free reference test or 'gold standard' is available to which the measure can be compared.

b) Reliability (reproducibility)

Reliability, also known as reproducibility, refers to the consistency of the performance of an instrument over time and among different observers. Repeatability (also known as test-retest reliability) refers to the consistency of measurements given by a single subject, using the same instrument, under the same conditions.

More detail on validity and reliability can be found in the chapter “Validity, reliability and generalisability”.

2. Random error (chance)

Chance differences in the true and recorded values may result in an apparent association between an exposure and an outcome, and such variations may arise from unbiased measurement errors (e.g. weight of an individual can vary between measurements due to limited precision of scales) or biological variation within an individual (e.g. blood pressure or body temperature, which are likely to vary between measurements). The effect of random error may produce an estimate that is different from the true underlying value. It may result in either an under- or overestimation of the true value.

3. Systematic error (bias)

This is a consistent difference between the recorded value and the true value in a series of observations which results in some individuals being systematically misclassified. For example, if the height of an individual is always measured when the person is wearing the same shoes, the measurement will be consistent, but have a systematic bias.

4. Misclassification (Information bias)

Misclassification refers to the classification of an individual, a value or an attribute into a category other than that to which it should be assigned.¹ The misclassification of exposure or disease status can be considered as either differential or non-differential.

a) Non-differential (random) misclassification

This exists when misclassifications of disease status or exposure occurs with equal probability in all study participants, regardless of the groups being compared. That is, the probability of exposure being misclassified is independent of disease status and the probability of disease status being misclassified is independent of exposure status.

Non-differential misclassification increases the similarity between the exposed and non-exposed groups, and may result in an underestimate (dilution) of the true strength of an association between exposure and disease.

b) Differential (non-random) misclassification

This occurs when the proportion of subjects being misclassified differs between the study groups. That is, the probability of exposure being misclassified is dependent on disease status, or the probability of disease status being misclassified is dependent on exposure status. This type of error is considered a more serious problem because it may result in and under- or overestimation of the true association.² The direction of bias arising from differential misclassification may be unpredictable but, where it is known and quantifiable, differential misclassification may be compensated for in the statistical analysis.

Differential misclassification may be introduced in a study as a result of:

Recall bias (differences in the accuracy of recollections by study participants)
Observer/interviewer bias

More information on these, and other, biases can be found in Section 1B “Biases and Confounding”.

Sampling Error

Because of the play of chance, different samples will produce different results and therefore this must be taken into account when using a sample to make inferences about a population.² This difference is referred to as the sampling error and its variability is measured by the standard error.

Sampling error may result in:

Type I error (α) - Rejecting the null hypothesis when it is true (a “false positive”)
Type II error (β) - Failing to reject the null hypothesis when it is false (a “false negative”)

Sampling error cannot be eliminated but with an appropriate study design it can be reduced to an acceptable level. One of the major determinants of the degree to which chance can affect the findings of a study is the sample size.² In general, sampling error decreases as the sample size increases. Therefore, use of an appropriate sample size will reduce the degree to which chance variability may account for the results observed in a study. This is covered in more detail in the statistics section of the DFPH syllabus.

The role of chance can be assessed by performing appropriate statistical tests to produce a p-value and by calculation of confidence intervals. Confidence intervals are more informative than p-values because they provide a range of values that is likely to include the true population effect. They also indicate whether a non-significant result is, or is not, compatible with a true effect that was not detected because the sample size was too small.

NB: Statistical methods only assess the effect of sampling variation and cannot control for non-sampling errors such as confounding or bias in the design, conduct or analysis of a study.

Sources of variation in measurements³

The quality of measurement data is vital for the accurate classification of study participants according to their personal attributes, exposure and outcome. Unlike studies involving routine data, which has already been collected, investigators carrying out their own measurements have the advantage of being able to choose which observations they will make, and to maximise the quality of their data. However, each measurement will usually only be made once and it is vital that every effort is made to ensure consistent results are obtained between patients.

a) Subject variation

Differences made on the same subject on different occasions may be due to several factors, including:

Physiological changes – e.g. blood pressure, pulse
Factors affecting response to a question – e.g. rapport with the interviewer
Changes because the participant is aware they are being studied – e.g. courtesy bias, giving the answer they believe the interviewer wants to hear

b) Observer variation

Variations in recording observations arise for several reasons including bias, errors, and lack of skill or training. There are two principal types:

Inconsistency in recording repeat results – intra-observer variation
Failure of different observers to record the same results – inter-observer variation

c) Technical limitations

Technical equipment may give incorrect results for several reasons, including:

The method is unreliable – e.g. peak flow rate in asthma
Faults in the test system – e.g. defective instruments, poor calibration
Absence of an accurate test

Avoiding variation in measurements

Prior to starting data collection, careful thought should be given to potential sources of error, bias and variation in measurements, and every effort made to minimise them. Principles of avoiding unnecessary variation include:

Using clearly defined diagnostic criteria
Observing participants under similar biological/environmental conditions
Training observers
Blinding observers and participants to the study hypothesis
Using calibrated, easy-to-use equipment
Employing standardised measurement methods
Piloting questionnaires to identify ambiguous questions

Furthermore, when data are processed, sensitivity analyses should be conducted and presented to test how robust the study findings are to variations in, for example, classifications or assumptions.

References

Hennekens CH, Buring JE. Epidemiology in Medicine. Lippincott Williams & Wilkins, 1987.
Kirkwood B, Sterne J. Essential Medical Statistics. Wiley-Blackwell, 2003.
Ben-Shlomo Y, et al. Lecture notes: Epidemiology, Evidence-based Medicine and Public Health Medicine (6th ed.). Wiley-Blackwell, 2013.