Diagnostic tests
Introduction
Learning objectives: You will learn about diagnostic tests, sensitivity and specificity, and the predictive value of a test.
No diagnostic test perfectly identifies both those with and those without disease. The important parameters of a diagnostic test are the sensitivity and specificity, the false positive and false negative rates, and the likelihood ratio. This section defines these and illustrates their use.
Please now read the resource text below.
Resource text
To evaluate a diagnostic test we have to establish if there is a definite way to decide if someone has a particular condition. For example, to diagnose cancer you could take a biopsy, to diagnose depression you could look for key symptoms in a psychiatry consultation, and to diagnose a walking problem you could video a patient and have the video viewed by an expert. This is sometimes called the 'gold standard', since currencies used to be valued against gold. Often the gold standard test is expensive and difficult to administer. We may require a test which is cheaper and easier to use. Initially we will consider a simple binary situation in which both the gold standard and the novel diagnostic test have either a positive or negative outcome (disease is present or absent). Evaluating a novel test against the accepted gold standard may also be to assess performance.
The situation is best summarised by the following table. In writing this table, always put the gold standard on top and the results of the test on the side.
Standard table for a novel diagnostic test
Gold Standard  
Positive  Negative  
Novel diagnostic Test  Positive  a  b  a+b 
Negative  c  d  c+d  
Total  a+c  b+d  n=a+b+c+d 
The numbers 'a' and 'd' are the numbers of true positives (number diagnosed positive who do have the condition) and negatives respectively. The number 'b' is the number of false positives (number diagnosed who don't have the condition), because although the test is positive, the subjects don't have the disease, and similarly 'c' is the number of false negatives.
Example
Consider the study by Kroenke et al. (2007), which surveyed 965 people attending primary care centres in the US. They were interested in whether a family practitioner can diagnose Generalized Anxiety Disorder (GAD). They asked two simple questions (the GAD2 questionnaire): "Over the last two weeks, how often have you been bothered by the following problems? 1) Feeling nervous, anxious or on edge 2) Not able to stop or control worrying." The patients answered each question from 'not at all', 'several days', 'more than half' and 'nearly every day', scoring from 0,1,2, 3 respectively. The scores for the two questions were totalled and a score of over 3 was considered positive. Two mental health professionals then held structured psychiatric interviews with the subject over the telephone to diagnose GAD. The professionals were ignorant of the result of the GAD2 questionnaire.
The results from Kroenke et al. 's study are given in the following table.
Results from Kroenke et al. (2007)
Diagnosis by mental health worker  
Positive  Negative  
GAD2  ≥3 (+ve)  63  152  215 
10  740  750  
73  892  965 
We now want to derive some summary statistics from these tables. These are the prevalence, the sensitivity and specificity of the test, and the positive predictive value.
The prevalence of the disease is the proportion of people diagnosed by the gold standard and is given by (a+c)/n. For the GAD example it is 73/965 = 0.076 = 7.6%.
Given a person has the disease, the sensitivity of the test is the proportion of people who have a positive result on the diagnostic test. This is given by a/(a+c) = 63/73 = 0.86.
Suppose a test is 100% sensitive, then the number of false negatives is zero and we would expect the following table.
Results of a novel diagnostic test with 100% sensitivity.
Gold Standard  
Positive  Negative  
Novel diagnostic Test  Positive  a  b  a+b 
Negative  0  d  d  
Total  a  b+d 
Now suppose a patient has a negative test result. From the above table, we can see this means we can be certain that the patient does not have the disease. Sackett et al. (1997) refer to this as SnNout. That is, for a test with a high sensitivity (Sn), a Negative result rules out the disease.
Given a person does not have the disease, the specificity of the test is the proportion of people who have a negative result on the diagnostic test. This is given by d/(b+d). For the GAD example it is 740/892 = 83%.
If we suppose a test is 100% specific, then the number of false positives is zero and we would have the following table.
Results of a novel diagnostic test with 100% specificity.
Gold Standard  
Positive  Negative  
Novel diagnostic Test  Positive  a  0  a 
Negative  c  d  c+d  
Total  a+c  d  n 
Now suppose a patient has a positive test result, from the above table we can see this means we can be certain the patient has the disease. Sackett et al. (1997) refer to this as SpPin. That is, for a test with a high specificity (Sp), a Positive test rules in the disease.
A useful mnemonic is: SeNsitivity = 1proportion false Negatives (n in each side)
SPecificity = 1proportion false Positives (p in each side)
What subjects really want to know, however, is if I have a positive test, what are the chances I have the disease? This is given by the positive predictive value (PPV) which is a/(a+b). For the GAD example this is 63/215 = 29%. The negative predictive value is c/(c+d) = 1.3%. One way of looking at the test is that before the test the chances of having GAD were 7.6%. After the test they are either 29% or 1.3% depending on the result, but note that even with a positive test the chances of having GAD are only about 1/3.
Sensitivity and specificity are independent of prevalence, but positive predictive value is not.
Suppose that in a different population, the prevalence of the disease is double that of the current population (assume the prevalence is low, so that a and c are much smaller than b and d and so the results for those without the disease are much the same as the earlier table). The situation is given in the following table:
Standard situation but with a doubling of the prevalence.
Gold Standard  
Positive  Negative  
Novel diagnostic Test  Positive  2a  b  2a+b 
Negative  2c  d  2c+d  
Total  2a+2c  b+d 
We see that the sensitivity is now 2a/(2a+2c) = a/(a+c) as before. The specificity is unchanged. However, the positive predictive value is given by 2a/(2a+b), which is greater than the earlier value of a/(a+b). If a is very small relative to b, then the positive predictive value increases directly as the prevalence increases.
This highlights that sensitivity and specificity are characteristics of the test and will be valid for different populations with different prevalences. Thus, we could use them in populations with high prevalence such as elderly people as well as for low prevalence such as for young people. However, the PPV is a characteristic of the population and so will vary depending on the prevalence. In general, where the prevalence is low, even a positive result will mean it is unlikely that one has the disease.
Likelihood ratio
It is common to prefer a single summary measure, and for a diagnostic test it is the likelihood ratio for a positive test, LR(+). This is defined as:
LR+ = Probability of positive test given the disease
For the GAD example we find that LR(+) = 0.86/(10.83) = 5.06
One reason why this is useful is that we can use it to calculate the odds of having the disease, given a positive result. (see 'summarising Binary data' module)
Before the test is conducted the probability of having the disease is just the prevalence, and the odds are simply {(a+c)/n}/{b+d)/n} = (a+c)/(b+d). Thus for GAD, the odds are 0.082 which are close to the prevalence of 0.076 because the prevalence is quite low.
A useful result is derived from what is known as Bayes' theorem and states.
Odds of disease after positive test = odds of disease before test x LR(+)
We can get the odds after a positive test directly from the PPV since the odds of disease after a positive test are PPV/(1PPV). For the GAD example odds = 0.29/(10.29) = 0.41.
We can also get this from Bayes' theorem since:
odds of disease before test x LR(+) = 0.082 x5.06 = 0.41. Thus the LR(+) is a simple way to estimate how likely someone is to have a disease, if one knows the prevalence, without having to set up a 2x2 table.
It is important to remember that the sensitivity, specificity and LR(+) are all estimates. They have an associated uncertainty attached to them, and methods of estimating this will be considered in the section on confidence intervals.
Diagnosis and screening
There is an important distinction between diagnosing a disease and screening for it. In the former case there are usually some symptoms, and so we have some suspicion that the patient has something wrong with them. If a test is positive we will take some action. In the latter case there are usually no symptoms and so if the test is negative the person will have no further tests.
Recalling Sackett's mnemonics SpPin and SnNout, for diagnosis we want a positive test to rule people in, so we want a high specificity. For screening we want a negative test to rule people out so we want a high sensitivity. Thus mass mammography will have a fairly low threshold of suspicion, to ensure a high sensitivity and reduce the chances of missing someone with breast cancer. The subsequent biopsy of positive results will have a high specificity to ensure that if, say, mastectomy is to be considered, the doctor is pretty sure that the patient has breast cancer.
As a final point, it is worth mentioning that there are a number of conditions to be met before you would instigate a mass screening programme. One is that catching the disease early makes a difference to prognosis. Another is that there is a treatment available if we did diagnose a patient with a disease. There can be artefacts associated with screening. Thus, instituting a screening programme may initially apparently increase the incidence of the disease. For example, Kinmonth et al. (1998) found that GPs diagnosed more diabetes in an arm of a trial which gave education related to care of diabetic patients than in the control arm. Good screening may apparently improve survival from a disease. For example, diagnosing cancer early will mean a patient will live for longer from diagnosis, compared to a late diagnosis, irrespective of whether treating cancer early is beneficial.
Based on the paper by Kroenke et al. should GPs begin screening for anxiety? The answer in general is no. In the absence of trials showing improved patient benefit and lack of simple treatments for generalised anxiety, it would be difficult to justify screening an asymptomatic population.
References

Campbell, Machin and Walters. Medical Statistics: a Textbook for the Health Sciences. Chichester: Wiley 2007.
Kinmonth AL, Woodcock A, Griffin S., Spiegal N and Campbell MJ. BMJ 1998; 317: 12021208. Randomised controlled trial of patient centred care of diabetes in general practice: impact on current well being and future disease risk.
Kroenke K, Spitzer RL, Williams JB Monahan PO and Löwe B. Ann Intern Med 2007; 146: 31725. Anxiety disorders in primary care: prevalence, impairment, comorbidity and detection.
Sackett DL, Richardson WS, Rosenberg W and Haynes RB. Churchill Livingstone 1997. Evidence Based Medicine.
Related links

More on sensitivity and specificity is give at the following sites:
http://cebm.net/index.aspx?o=1042
http://en.wikipedia.org/wiki/Diagnostic_test