# Sources of variation, its measurement and control

### Reducing Errors and Variation in Epidemiological Measurements

A principal assumption in epidemiology is that we can draw an inference about the experience of the entire population based on the evaluation of a sample of the population. However a problem with drawing such an inference is that the errors may affect the results of an epidemiological study. For example, the results may be influenced by the play of chance, because of the effects of random variation from sample to sample.^{1} This is also important when carrying out surveys, which are addressed separately in this section.

**Reducing errors in measurement**

Some measurement or classification errors are almost inevitable in epidemiological studies and they may affect the assessment of the exposure or the outcome, as well as potential confounders. There are several different types of measurement error, outlined below.

**1. Random error (chance)**

Chance is a random error appearing to cause an association between an exposure and an outcome. The effect of random error may produce an estimate that is different from the true underlying value. It may result in either an under- or overestimation of the true value.

**2. Systematic error (bias)**

This is a consistent difference between the recorded value and the true value in a series of observations. For example, if the height of an individual is always measured when the person is wearing the same shoes, the measurement will be consistent, but have a systematic bias.

**3. Measurement error (reliability and validity)**

All epidemiological investigations involve the measurement of exposures, outcomes and other characteristics of interest (e.g. potential confounding factors).

Types of measures may include:

- Responses to self-administered questionnaires
- Responses to interview questions
- Laboratory results
- Physical measurements
- Information recorded in medical records
- Diagnosis codes from a database

All these measures may be subject to some degree of measurement error and therefore result in the introduction of bias into the study. The research instruments used to measure exposure, disease status and other variables of interest should be both valid and reliable.

**a) Validity**

The degree to which an instrument is capable of accurately measuring what it purports to measure is referred to as its validity. For example how well a questionnaire measures exposure or outcome in a prospective cohort study or the accuracy of a diagnostic test.

Assessing validity requires that an error free reference test or 'gold standard' is available to which the measure can be compared.

**b) Reliability (repeatability)**

Reliability, also known as reproducibility, refers to the consistency of the performance of an instrument over time and among different observers.

- Intra observer reliability: Repeated measurements by the same observer on the same subject.
- Inter-observer reliability - measurement carried out on the same subject by two or more observers and the results compared. Cohen's kappa statistic provides a way of calculating the degree of agreement between two observers. Values range from -1.0 to 1.0 where large numbers mean better reliability, whilst values near zero suggest poor agreement.

**4. Misclassification (Information bias)**

Misclassification refers to the classification of an individual, a value or an attribute into a category other than that to which is should be assigned.^{1} The misclassification of exposure or disease status can be considered as either differential or non-differential.

**a) Non-differential (random) misclassification**

This occurs when classifications of disease status or exposure occurs equally in all study participants, regardless of the groups being compared. That is, the probability of exposure being misclassified is independent of disease status and the probability of disease status being misclassified is independent of exposure status.

Non-differential misclassification increases the similarity between the exposed and non-exposed groups, and may result in an underestimate (dilution) of the true strength of an association between exposure and disease.

**b) Differential (non-random) misclassification**

This occurs when the proportion of subjects being misclassified differs between the study groups. That is, the probability of exposure being misclassified is dependent on disease status, or the probability of disease status being misclassified is dependent of exposure status. This type of error is considered a more serious problem because it may result in and ‘under’-or ‘overestimate’ of the true association.^{2} The direction of bias arising from differential misclassification is however often unpredictable.

Differential misclassification may be introduced in a study as a result of:

- Recall bias
- Observer/interviewer bias

**5. Sampling Error**

Because of the play of chance, different samples will produce different results and therefore this must be taken into account when using a sample to make inferences about a population.^{2} This difference is referred to as the sampling error and its variability is measured by the standard error.

Sampling error may result in:

- Type I error (α) - Rejecting the null hypothesis when it is true
- Type II error (β) - Failing to reject the null hypothesis when it is false

Sampling error cannot be eliminated but with an appropriate study design it can be reduced to an acceptable level. One of the major determinants of the degree to which chance can affect the findings of a study is sample size.^{2} In general, sampling error decreases as the sample size increases. Therefore, use of an appropriate sample size will reduce the degree to which chance variability may account for the results observed in a study. This is covered in more detail in the statistics section of the MFPH Part A syllabus.

The role of chance can be assessed by performing appropriate statistical tests to produce a p-value and by calculation of confidence intervals. Confidence intervals are more informative that p-values because they provide a range of values, that is likely to include the true population effect. They also indicate whether a non-significant result is or is not compatible with a true effect that was not detected because the sample size was too small.

NB: Statistical methods only assess the effect of sampling variation and cannot control for non-sampling errors such as confounding or bias in the design, conduct or analysis of a study.

**Reducing variation in measurements ^{3}**

The quality of measurement data is vital for the accurate classification of study participants according to their personal attributes, exposure and outcome. Unlike studies involving routine data, which has already been collected, investigators carrying out their own measurements have the advantage of being able to chose which observations they will make, and to maximise the quality of their data. However, each measurement will usually only be made once and it is vital that every effort is made to ensure consistent results are obtained between patients.

**a) Subject variation**

Differences made on the same subject on different occasions may be due to several factors, including:

- Physiological changes – e.g. blood pressure, pulse
- Factors affecting response to a question – e.g rapport with the interviewer
- Changes because the participant is aware they are being studied – e.g. courtesy bias, giving the answer they believe the interviewer wants to hear

**b) Observer variation**

Variations in recording observations arise for several reasons including bias, errors, and lack of skill or training. There are two principal types:

- Inconsistency in recording repeat results – intra-observer variation
- Failure of different observers to record the same results – inter-observer variation

**c) Technical limitations**

Technical equipment may give incorrect results for several reasons, including:

- The method is unreliable – e.g. peak flow rate in asthma
- Faults in the test system – e.g. defective instruments, poor calibration
- Absence of an accurate test

**Avoiding variation**

Prior to starting on data collection, careful thought should be given to potential sources of error, bias and variation in measurements, and every effort made to minimise them. Principles include:

- Clearly defined diagnostic criteria
- Observing participants in similar biological conditions
- Training of observers
- Blinding observers and participants to the study hypothesis
- Simple equipment that is easy to use
- Standardised measurement methods
- Piloting questionnaires to identify ambiguous questions

**Past Papers**

The topics covered here are addressed by the following MFPH Part A past paper question:

- January 2005 – Paper IA, Question 1

**References**

- Hennekens CH, Buring JE. Epidemiology in Medicine. Lippincott Williams & Wilkins, 1987.
- Kirkwood B. Essentials of Medical Statistics. Blackwell Science, 2003.
- Farmer R, Miller D, Lawrenson R. Epidemiology and Public Health Medicine (4th ed.). Blackwell Science, 1996.

© Helen Barratt, Maria Kirwan 2009