Your shopping cart is empty.

Data linkage within and across datasets


We are currently in the process of updating this chapter and we appreciate your patience whilst this is being completed.

Health care and associated interventions take place in a wide variety of locations, including GP practices hospitals, and health centres, social centres, schools, pharmacies, patients' homes, not all of which are principally concerned with health care. Related data may be held and processed locally to the collection or may be extracted for inclusion on external databases (e.g. cancer registries). To ensure a complete health record it is clearly desirable that each separate data collection can be linked into a 'seamless' whole, which ideally should be available on demand wherever it is required to support and inform a clinical decision.  The processes by which this can be achieved are known as data linkage.

Most health data includes patient's name, sex and date of birth, together with the date and possibly time the intervention took place, and a link to the patient's address (including postcode). As a rule, computerised systems maintain data linkage within their databases in such a way that users may safely take the function for granted.  Clearly, if these data items are complete and consistent, linkage is not a problem, whether the record system is electronic or comprises mainly paper-based systems.

Linking across datasets becomes more difficult when there may be variation in styles in which the data items are held. For example, whether a patient's title is held, recording of full given name or initials, consistency of spelling, or currency of address can all introduce potential confounders to reliable linkage. For that reason, a cornerstone of the NHS's Information Strategy is the ' NHS Number' (NN), a unique identifier for each individual in England and Wales, intended to act both as the single person-based identifier in all NHS data systems and to protect the privacy of the individual, as when the NN is present names and all fields of the address except postcode are deleted from the system. A separate similar arrangement is in place for Scotland.

The NN takes the form of a ten-figure numeric string, including a check-digit. (A check digit is a means of protecting against wrongly entered data, as it flags any 2-figure transposition of figures, the most common data entry error). The NN is issued by the NHS Central Register (NHSCR), which comprises records for all people who have registered with an NHS GP and all people born and registered in England and Wales since 1991. The NHSCR is part of the Office of National Statistics, who issue NNNs for newly registered births. NNs are available to GPs and CCGs via the Exeter System, which records registration of patients with NHS GP Practices. Hospitals populate the NN field by sending a extract file from their Patient Administration System to the Strategic Tracing Services, which attempts to match key data items against the NHSCR.

A pioneering example of real-time NHS clinical record linkage started as the ‘Hampshire Health Record’ initiated by Dr Hugh Sanderson. It has developed into the Care and Infiormation Exchange (CHIE).

“The Care and Health Information Exchange (CHIE) is a secure system which shares health and social care information from GP surgeries, hospitals, community and mental health, social services and others. CHIE helps professionals across Hampshire, the Isle of Wight and surrounding areas provide safer and faster treatment for you and your family….” [accessed 20/08/2018

For further information see the CHIE Browser access user guide [accessed 20/08/2018]

An American study found that Social Security Number alone provided a good basis for linkage to cancer registers1. This may be a sensible way forward in countries that do not operate a national health service.

Linking with Historic Data

Prospective studies can now be designed to link on NN. Retrospective studies, using data predating the full implementation of the NN, or using records which were not part of central NHS systems, are more problematical. They have, effectively, to operate similar processes of tracing and matching to those of the Strategic Tracing Service, without the guarantee of consistency of format. Within the UK outstanding work has been done on this by Michael Goldacre, at Oxford2 and by the Scottish Linkage Study3. The pioneering worker in the field was the Canadian, Newcombe4, who developed the main tools and methods over a 30+year career. Newcombe initially formulated  the principles of probability matching, and was insistent that  the characteristics and structure of the data sets in question required close empirical attention to the emergent qualities of each linkage. In his view, probability matching is at heart a simple and intuitive process and should not be turned into a highly specialised procedure isolated from the day to day concerns of the organisation in which it is carried out. Matching is done on name/Soundex (a system which allows names that may sound alike to be treated similarly), date of birth and postcode, exact matches of each item increase the overall probability of a match, while any non-matches correspondingly reduce the likelihood. The principles are fairly simple. None the less, some quite sophisticated statistical concepts underpin his work. A full description of the Scottish Linkage Study and Newcombe's methods can be found at: [accessed 20/08/2018]

The Oxford Linkage Study, was another pioneering record linkage development which initially included anonymised statistical abstracts of all hospital admissions and deaths in the former Oxford Region from 1963 to 2005, and was commissioned to carry out similar work nationally for 1999-2005.

The early years’ description is well worth a read; how much progress has been made since then? [accessed 20/08/2018]



1   Comparison of record linkage yield for health research using different variable sets
   Authors: Simon, Michael1; Mueller, Beth; Deapen, Dennis; Copeland, Glenn Source:
   Breast Cancer Research and Treatment, Volume 89, Number 2, January 2005 , pp.
2  Michael J Goldacre, Clare J Wotton, Valerie Seagroatt, David Yeates, J Epidemiol
   Community Health 2004;58:1032-1035. doi: 10.1136/jech.2003.018366
3  Heasman, M. A. and Clarke, J. A. Medical Record Linkage in Scotland,. Health Bulletin
   (Edinburgh) 1979; 37; 97-103.
4   Newcombe, H. B. Handbook of Record Linkage, OUP: New York, 1988



                                                                   © M Goodyear 2008, D Lawrence 2018