Epidemiology: Genetic Association Studies
The MFPH Part A syllabus includes a unit on genetics, and this website provides notes on that section here. Readers unfamiliar with the terminology of genetics are advised to read those notes first.
The Part A syllabus requires that candidates have an ‘understanding of basic issues and terminology’ around genetic association studies. This page seeks to provide that. Readers seeking more detailed information are directed to the series of articles on genetic epidemiology run by the Lancet medical journal in 2005. (These can be freely downloaded at the bottom of this external page). Much of the information below is drawn from these papers.
Genetic epidemiology and association analysis
Genetic epidemiology is closely allied to traditional epidemiology, focussing on familial, and in particular genetic, determinants of disease and the joint effects of genes and non-genetic determinants. It takes into account the biology that underlies the action of genes and the known mechanisms of inheritance to investigate the health consequences of genetic variants.
Extensive information about the human genome is now available for use in genetic epidemiology studies. Once it is known which two versions of a potentially causative gene an individual possesses, looking for an association between variants in that gene and the disease of interest is fundamentally no different from an exploration of a disease-exposure association in traditional epidemiology.1
Traditional epidemiology often seeks to prove that across a study population environmental exposure X is consistently associated with observed disease Y. Association analysis in genetic epidemiology asks the same question of genetic exposures, and many of the analytical approaches used in epidemiology and medical statistics can be applied directly in genetic epidemiology.
Twin studies were one of the earliest genetic studies, first carried out in the 19th century by Francis Galton, considered by many to be father of medical genetics. He investigated the extent to which the similarity of twins changes over the course of development.
Twin studies involve comparing both monozygotic (identical) twins and dizygotic or fraternal twins to estimate the relative contributions of genes and environment to specific traits. Monozygotic twins share the same genetic material whereas dizygotic twins have only 50% of their genes in common.
Monozygotic twins serve as excellent subjects for controlled experiments because they share prenatal environments and those reared together also share common family, social, and cultural environments. However, twin studies have the potential to ‘over’ or ‘underestimate’ the role of genetics, because of the challenges of quantifying environmental influences.
Linkage analysis is often the first stage in the genetic investigation of a trait, since it can be used to identify broad genomic regions that might contain a disease gene, even in the absence of previous biologically driven hypotheses.
Genetic linkage analysis can be used to identify regions of the genome that contain genes that predispose to disease. It involves two key concepts:
- Linkage - two genetic loci are linked if they are transmitted together from parent to offspring more often than would be expected under independent inheritance, for example if they are separated during meiosis in recombination less than 50% of the time
- Linkage disequilibrium – two genetic loci are in linkage disequilibrium if, across the population as a whole, they are found together on the same haplotype more often than expected.
In general, two loci in linkage disequilibrium will also be linked, but the reverse is not necessarily true. Every time recombination occurs between two loci in the population, the linkage disequilibrium between them is weakened, and is maintained only if the two loci are very close together.
There are two major types of linkage analysis, described in more detail by Teare and colleagues:2
- Parametric linkage analysis - the analysis of how genetic loci cosegregate in pedigrees or family units. Loci that are close together on the same chromosome segregate together more often than loci on different chromosomes. The further apart two loci are on the same chromosome, the more likely it is that a recombination event at meiosis will break up the cosegregation. The main quantity of interest is the recombination fraction, the probability of recombination between two loci at meiosis). By genotyping genetic markers and studying their segregation through pedigrees, it is possible to infer their position relative to each other on the genome. This can then be used to map genetic markers or disease loci.
- Model-free (non-parametric) linkage analysis – this is used for multifactorial diseases, where several genes (and environmental factors) might contribute to disease risk and there is no disease model available. The rationale is that, between affected relatives excess sharing of haplotypes that are identical by descent (IBD) in the region of a disease-causing gene would be expected, irrespective of the mode of inheritance. Various methods test whether IBD sharing at a locus is greater than expected under the null hypothesis of no linkage.
Linkage is usually reported using a LOD score (logarithm of the odds), which takes into account the recombination fraction and chromosomal positions. Large positive LOD scores are evidence for linkage and negative scores are evidence against.
Genetic association studies3
Genetic association studies aim to detect associations between one or more genetic polymorphisms and a trait, for example a disease. Association differs from linkage in that the same allele (or alleles) is associated with the trait in a similar manner across the whole population, while linkage allows different alleles to be associated with the trait in different families.
Genetic associations only arise because humans share common ancestry and it has been argued that association studies are really just a special form of linkage study in which the extended family is the wider population. However, this type of research has more in common with classical epidemiology than the family studies described above, because they are examining associations at a population level.
Cordell and colleagues outline three reasons why there might be an association between a polymorphism and a trait in a population:
- Direct association – the polymorphism has a causal role
- Indirect association – the polymorphism has no causal role but is associated with a nearby causal variant
- Confounded association – the association is due to some underlying stratification or admixture of the population, requiring further investigation
Familiar epidemiological study designs such as case-control or cohort designs are often used for genetic association studies and the data are analysed much the same way. Risk factors or exposures such as smoking are replaced by the presence or absence of a particular genetic polymorphism.
Appraising association studies
Hattersley and colleagues provide guidance on assessing the quality of association studies. They propose asking a series of questions, similar to those that would be addressed in the appraisal of a standard epidemiology paper:
- What are we hoping for from an association study? - Many association studies have only limited power to detect true susceptibility effects and even less power to exclude the involvement of a gene in causing a trait.
- How good a candidate is the gene in question? - There are up to 30,000 genes in the human genome, so it is unlikely that more than a few hundred make a meaningful contribution to any single trait. The probability that a gene selected at random will influence a given trait is very low.
- How strong is the case for the variants that have been typed? – to detect all possible disease-associated gene variants it would be necessary to examine unfeasibly large samples. This is often unrealistic, due to cost reasons.
- How appropriate are the samples typed? – although prospective cohort studies are often regarded as the gold standard, they are usually not efficient for the initial stages of gene discovery. Unless the disease is very common, the study samples generated will have far fewer individuals with disease than without. Furthermore, the unselected nature of the cases could compromise power, especially when compared with samples that are deliberately enriched for genetic aetiology and disease homogeneity. The case-control study remains the mainstay of genetic association studies, and the most important issues relate to choice of the two study groups.
- Is the study size large enough? – sample size is a key determinant of quality in an association study
- How good is the genotyping? - most association studies assume implicitly that the genotypes are accurate. However, even with the best methods, some assays will be unreliable; and the accuracy of earlier genotyping methods (on which much of the current published work is based) will have been even worse.
- How appropriate is the analysis?
- How appropriate is the interpretation? – there is still much discussion about the level of evidence needed before a genetic association can be regarded as proven.
Papers by Frayling5 and Ioannidis6 both provide further guidance on interpreting association studies, for those seeking more information.
Problems with genetic association studies
Genetic association studies are central to efforts to identify and characterise genomic variants underlying susceptibility to multifactorial disease. However, their role in the characterisation of genes contributing to common traits remains controversial. Bird and colleagues identify several potential pitfalls with studies of this kind:
- Accuracy of diagnostic criteria for the disorder to be studied. Investigators should provide evidence that all the subjects have the same disease.
- Selection of appropriate control subjects, especially regarding age, sex, and ethnic background.
- Choice of study strategy, for example using a population-based, case-control study vs a family approach
- The problem of multiple comparisons leading to a high likelihood of false-positive results occurring by chance because of large numbers of comparisons in the study.
- Choice of statistical analysis and threshold for significance.
- The tendency of both investigators and journals to report only studies with positive rather than negative results. As a result, the literature becomes heavily weighted toward unconfirmed associations.
- Burton P, Tobin M, Hopper J. Key concepts in genetic epidemiology. Lancet 2005; 366: 941–51
- Teare MD, Barrett J. Genetic linkage studies. Lancet 2005; 366: 1036–44
- Cordell H, ClaytonD. Genetic association studies. Lancet 2005; 366: 1121–31
- Hattersley A, McCarthy M. What makes a good genetic association study? Lancet 2005; 366: 1315–23
- Frayling T. Genetic association studies see light at the end of the tunnel. International Journal of Epidemiology 2008;37:133–135
- Ioannidis J et al. Assessment of cumulative evidence on genetic associations: interim guidelines. International Journal of Epidemiology 2008; 37:120–132
© Helen Barratt 2009