Introduction to study designs - case-control studies


Learning objectives:You will learn about basic introduction to case-control studies, its analysis and interpretation of outcomes. Case-control studies are one of the frequently used study designs due to the relative ease of its application in comparison with other study designs. This section introduces you to basic concepts, application and strengths of case-control study. This section also covers: 1. Issues in the design of case-control studies 2. Common sources of bias in a case-control study 3. Analysis of case-control studies 4. Strengths and weaknesses of case-control studies 5. Nested case-control studies Read the resource text below.

Resource text

Case-control studies start with the identification of a group of cases (individuals with a particular health outcome) in a given population and a group of controls (individuals without the health outcome) to be included in the study.

In a case-control study the prevalence of exposure to a potential risk factor(s) is compared between cases and controls. If the prevalence of exposure is more common among cases than controls, it may be a risk factor for the outcome under investigation. A major characteristic of case-control studies is that data on potential risk factors are collected retrospectively and as a result may give rise to bias. This is a particular problem associated with case-control studies and therefore needs to be carefully considered during the design and conduct of the study.

1. Issues in the design of case-control studies

Formulation of a clearly defined hypothesis
As with all epidemiological investigations the beginning of a case-control study should begin with the formulation of a clearly defined hypothesis. Case definition It is essential that the case definition is clearly defined at the outset of the investigation to ensure that all cases included in the study are based on the same diagnostic criteria. Source of cases The source of cases needs to be clearly defined.

Selection of cases
Case-control studies may use incident or prevalent cases.

Incident cases
comprise cases newly diagnosed during a defined time period. The use of incident cases is considered as preferential, as the recall of past exposure(s) may be more accurate among newly diagnosed cases. In addition, the temporal sequence of exposure and disease is easier to assess among incident cases.

Prevalent cases
comprise individuals who have had the outcome under investigation for some time. The use of prevalent cases may give rise to recall bias as prevalent cases may be less likely to accurately report past exposures(s). As a result, the interpretation of results based on prevalent cases may prove more problematic, as it may be more difficult to ensure that reported events relate to a time before the development of disease rather than to the consequence of the disease process itself. For example, individuals may modify their exposure following the onset of disease. In addition, unless the effect of exposure on duration of illness is known, it will not be possible to determine the extent to which a particular characteristic is related to the prognosis of the disease once it develops rather than to its cause.

Source of cases
Cases may be recruited from a number of sources; for example they may be recruited from a hospital, clinic, GP registers or may be population bases. Population based case control studies are generally more expensive and difficult to conduct.

Selection of controls
A particular problem inherent in case-control studies is the selection of a comparable control group. Controls are used to estimate the prevalence of exposure in the population which gave rise to the cases. Therefore, the ideal control group would comprise a random sample from the general population that gave rise to the cases. However, this is not always possible in practice. The goal is to select individuals in whom the distribution of exposure status would be the same as that of the cases in the absence of an exposure disease association. That is, if there is no true association between exposure and disease, the cases and controls should have the same distribution of exposure. The source of controls is dependent on the source of cases. In order to minimize bias, controls should be selected to be a representative sample of the population which produced the cases. For example, if cases are selected from a defined population such as a GP register, then controls should comprise a sample from the same GP register.

In case-control studies where cases are hospital based, it is common to recruit controls from the hospital population. However, the choice of controls from a hospital setting should not include individuals with an outcome related to the exposure(s) being studied. For example, in a case-control study of the association between smoking and lung cancer the inclusion of controls being treated for a condition related to smoking (e.g. chronic bronchitis) may result in an underestimate of the strength of the association between exposure (smoking) and outcome. Recruiting more than one control per case may improve the statistical power of the study, though including more than 4 controls per case is generally considered to be no more efficient.

Measuring exposure status
Exposure status is measured to assess the presence or level of exposure for each individual for the period of time prior to the onset of the disease or condition under investigation when the exposure would have acted as a causal factor. Note that in case-control studies the measurement of exposure is established after the development of disease and as a result is prone to both recall and observer bias. Various methods can be used to ascertain exposure status. These include:

  • Standardized questionnaires
  • Biological samples
  • Interviews with the subject
  • Interviews with spouse or other family members
  • Medical records
  • Employment records
  • Pharmacy records

The procedures used for the collection of exposure data should be the same for cases and controls.

2. Common sources of bias in case-control studies

Due to the retrospective nature of case-control studies, they are particularly susceptible to the effects of bias, which may be introduced as a result of a poor study design or during the collection of exposure and outcome data. Because the disease and exposure have already occurred at the outset of a case control study, there may be differential reporting of exposure information between cases and controls based on their disease status. For example, cases and controls may recall past exposure differently (recall bias). Similarly, the recording of exposure information may vary depending on the investigator's knowledge of an individual's disease status (interviewer/observer bias). Therefore, the design and conduct of the study must be carefully considered, as there are limited options for the control of bias during the analysis. Selection bias in case-control studies Selection bias is a particular problem inherent in case-control studies, where it gives rise to non-comparability between cases and controls. Selection bias in case control studies may occur when: 'cases (or controls) are included in (or excluded from) a study because of some characteristic they exhibit which is related to exposure to the risk factor under evaluation' [1]. The aim of a case-control study is to select study controls who are representative of the population which produced the cases. Controls are used to provide an estimate of the exposure rate in the population. Therefore, selection bias may occur when those individuals selected as controls are unrepresentative of the population that produced the cases.

The potential for selection bias in case control studies is a particular problem when cases and controls are recruited exclusively from hospital or clinics. Hospital patients tend to have different characteristics than the population, for example they may have higher levels of alcohol consumption or cigarette smoking. If these characteristics are related to the exposures under investigation, then estimates of the exposure among controls may be different from that in the reference population, which may result in a biased estimate of the association between exposure and disease. Berkesonian bias is a bias introduced in hospital based case-control studies, due to varying rates of hospital admissions. As the potential for selection bias is likely to be less of a problem in population based case-control studies, neighbourhood controls may be a preferable choice when using cases from a hospital or clinic setting. Alternatively, the potential for selection bias may be minimized by selecting controls from more than one source, such as by using both hospital and neighbourhood controls. Selection bias may also be introduced in case-control studies when exposed cases are more likely to be selected than unexposed cases.

3. Analysis of case-control studies

The odds ratio (OR) is used in case-control studies to estimate the strength of the association between exposure and outcome. Note that it is not possible to estimate the incidence of disease from a case control study unless the study is population based and all cases in a defined population are obtained.

The results of a case-control study can be presented in a 2x2 table as follow:

The odds ratio is a measure of the odds of disease in the exposed compared to the odds of disease in the unexposed (controls) and is calculated as:

Example: Calculation of the OR from a hypothetical case-control study of smoking and cancer of the pancreas among 100 cases and 400 controls. Table 1. Hypothetical case-control study of smoking and cancer of the pancreas.

OR = 60 x 300        100 x 40 OR = 4.5 The OR calculated from the hypothetical data in table 1 estimates that smokers are 4.5 times more likely to develop cancer of the pancreas than non-smokers. NB: The odds ratio of smoking and cancer of the pancreas has been performed without adjusting for potential confounders. Further analysis of the data would involve stratifying by levels of potential confounders such as age. The 2x2 table can then be extended to allow for stratum specific rates of the confounding variable(s) to be calculated and, where appropriate, an overall summary measure, adjusted for the effects of confounding, and a statistical test of significance can also be calculated. In addition, confidence intervals for the odds ratio would also be presented.

4. Strengths and weaknesses of case-control studies


  • Cost effective relative to other analytical studies such as cohort studies.
  • Case-control studies are retrospective, and cases are identified at the beginning of the study; therefore there is no long follow up period (as compared to cohort studies).
  • Efficient for the study of diseases with long latency periods.
  • Efficient for the study of rare diseases.
  • Good for examining multiple exposures.


  • Particularly prone to bias; especially selection, recall and observer bias.
  • Case-control studies are limited to examining one outcome.
  • Unable to estimate incidence rates of disease (unless study is population based).
  • Poor choice for the study of rare exposures.
  • The temporal sequence between exposure and disease may be difficult to determine.

References 1. Hennekens CH, Buring JE. Epidemiology in Medicine, Lippincott Williams & Wilkins, 1987.