Epidemiology: Assessing the validity of observational techniques
The earlier section of these notes on reducing errors and variation in epidemiological measurements touches briefly on the challenges of using observational techniques and the importance of assessing validity.
As described elsewhere, the validity of an instrument is whether or not it measures accurately what it is intended to measure. This is important if the results of a study are to be meaningful and relevant to the wider population.
Streiner and Norman describe validation as a process of hypothesis testing: ‘Someone who scores high on this measure will also do well in situation A, perform poorly on test B, and will differ from those who score low on the scale for traits C and D.1
There are four main types of validity described in the literature:2
- Face validity – this relies on whether the measure looks like it assesses the variable of interest. This is never an adequate measure on its own, but the opinions of a panel of experts may be helpful.
- Content validity – the representativeness of the indicators chosen to measure the concept, for example does it assess the full range of symptoms experienced by patients with a given condition?
- Criterion validity – the extent to which the indicator chosen correlates with another measure, such as an accepted gold standard
- Construct validity – does the indicator perform in a way that the underlying theory suggests it should? This is used when there is no suitable gold standard for comparison.
Assessing the validity of observational techniques
Observational techniques involve measuring phenomena in their natural setting. In practice, there are two broad approaches for assessing validity:
- A test may be compared with the best available clinical assessment. For example, a self-administered psychiatric questionnaire may be compared with the majority opinion of an expert psychiatric panel.
- Alternatively, a test may be validated by its ability to predict some other relevant finding or event, such as the ability of glycosuria to predict an abnormal glucose tolerance test, or of a questionnaire to predict future illness.
Subjects are classified as positive or negative, first on the basis of the survey or new instrument and then according to the reference test. The findings can then be expressed in a 2x2 contingency table.
From this table several important statistics can be derived.
- Sensitivity (a/a+c) - a sensitive test detects a high proportion of the true cases
- Specificity (d/b+d) - a specific test has few false-positives
- Systematic error (a+b)/(a+c) - the ratio of the total numbers positive from the new test compared with the reference tests. This indicates the proportion of counts that were correct.
- Predictive value - the proportion of test positives that are truly positive.
These statistics are addressed in more detail here
- Streiner D, Norman G. Health measurement scales: A practical guide to their development and use (3rd ed). Oxford University Press, 2003.
- Green J, Browne J (eds). Principles of Social Research. Open University Press, 2005.
- Rose G, Barker D. Repeatability and validity. British Medical Journal 1978; 2:1070-1071
© Helen Barratt 2009