Your shopping cart is empty.

Data linkages

Introduction

Learning objectives: You will learn about New NHS Number (NNN) and other methods that have been used with linking historical datasets.

This section gives a brief overview on the methods in place to ensure that data across datasets and records can be linked to assist in research decision-making.

Read the resource text below.

Resource text

Health care and associated interventions take place in a wide variety of locations, including hospitals, GP practices and health centres, social centres, schools, pharmacies, patients' homes, not all of which are principally concerned with health care. Related data may be held and processed locally to the collection or may be extracted for inclusion on external databases (e.g. cancer registries). To ensure a complete health record, it is clearly desirable that each separate data collection can be linked into a seamless whole. This should be ideally available on demand wherever it is required to support and inform a clinical decision. The processes by which this can be achieved are known as data linkage.

Most health data includes patient's name, sex and date of birth, together with the date and possibly time the intervention took place, and a link to the patient's address (including postcode). As a rule, computerised systems maintain data linkage within their databases in such a way that users may safely take the function for granted. Clearly, if these data items are complete and consistent, linkage is not a problem, whether the record system is electronic or comprises mainly paper-based systems.

Linking across datasets becomes more difficult when there may be variation in styles in which the data items are held. For example, whether a patient's title is held, recording of full given name or initials, consistency of spelling, or currency of address can all introduce potential confounders to reliable linkage. For that reason, a cornerstone of the NHS Information Strategy is the 'New NHS Number' (NNN), a unique identifier for each individual in England and Wales, intended to act both as the single person-based identifier in all NHS data systems and to protect the privacy of the individual, as when the NNN is present, names and all fields of the address except postcode are deleted from the system. A separate similar arrangement is in place for Scotland.

The NNN takes the form of a ten figure numeric string, including a check-digit. A check digit is a means of protecting against wrongly entered data, as it flags any 2-figure transposition of figures, the most common data entry error. The NNN is issued by the NHS Central Register (NHSCR), which comprises records for all people who have registered with an NHS GP and all people born and registered in England and Wales since 1991. The NHSCR is part of the Office of National Statistics, who issue NNNs for newly registered births. NNNs are available to GPs and PCTs via the Exeter System, which records registration of patients with NHS GP Practices. Hospitals populate the NNN field by sending an extract file from their Patient Administration System to the Strategic Tracing Services, which attempts to match key data items against the NHSCR.

An American study found that Social Security Number alone provided a good basis for linkage to cancer registers 1. This may be a sensible way forward in countries that do not operate a national health service.

Linking with Historic Data

Prospective studies can now be designed to link on NNN. Retrospective studies, using data predating the full implementation of the NNN, or using records which were not part of central NHS systems, are more problematic. Whilst using uniquely identifiable data results in deterministic matching (as long as data error is minimal), probabilistic matching is often required for historical data.

They have, effectively, to operate similar processes of tracing and matching to those of the Strategic Tracing Service, without the guarantee of consistency of format. Within the UK outstanding work has been done on this by Goldacre at Oxford and by the Scottish Linkage Study. The pioneering worker in the field was the Canadian H.B. Newcombe, who has developed the main tools and methods over a 30+year career. Newcombe initially formulated the principles of probability matching, and was insistent that the characteristics and structure of the data sets in question required close empirical attention to the emergent qualities of each linkage. In his view, probability matching is at heart a simple and intuitive process and should not be turned into a highly specialised procedure isolated from the day to day concerns of the organisation in which it is carried out. Matching is done on name/Soundex (a system which allows names that may sound alike to be treated similarly), date of birth and postcode. Exact matches of each item increase the overall probability of a match, while any non-matches correspondingly reduce the likelihood. The principles are fairly simple. Nonetheless, some quite sophisticated statistical concepts underpin his work.

The Oxford Record Linkage Study, which initially included anonymised statistical abstracts of all hospital admissions and deaths in the former Oxford Region from 1963 to 2005, has been commissioned to carry out similar work nationally for 1999-2005, and the expectation is that this will continue prospectively.

References

    Comparison of record linkage yield for health research using different variable sets Authors: Simon, Michael1; Mueller, Beth; Deapen, Dennis; Copeland, Glenn Source: Breast Cancer Research and Treatment, Volume 89, Number 2, January 2005 , pp. 107-110(4)

    Michael J Goldacre, Clare J Wotton, Valerie Seagroatt, David Yeates, J Epidemiol Community Health 2004;58:1032-1035. doi: 10.1136/jech.2003.018366

    Heasman, M. A. and Clarke, J. A. Medical Record Linkage in Scotland, Health Bulletin (Edinburgh) 1979; 37; 97-103.

    Newcombe, H. B. Handbook of Record Linkage, OUP: New York, 1988

Related links