The UK Faculty of Public Health has recently taken ownership of the Health Knowledge resource. This new, advert-free website is still under development and there may be some issues accessing content. Additionally, the content has not been audited or verified by the Faculty of Public Health as part of an ongoing quality assurance process and as such certain material included maybe out of date. If you have any concerns regarding content you should seek to independently verify this.

Collection of routine and ad hoc data


There are four main types of health information:

1) Demographic data
This covers factors such as age, sex, migration patterns, ethnicity, marital status in populations and how it influences health.

  • 1.1 Census
  • 1.2 NHS Administrative data 

2) Health event data
This covers recording of health events affecting individuals or populations.

  • 2.1 Births
  • 2.2 Deaths
  • 2.3 Self-reported health
  • 2.4 Primary care interactions
  • 2.5 Secondary care interactions
  • 2.6 Health hazards

3) Circumstantial data
This covers aspects of individuals' and populations' circumstances that may affect the wider determinants of health, including socio-economic, lifestyle, and environmental data.

  • 3.1 Education data
  • 3.2 Employment data
  • 3.3 Housing data
  • 3.4 Environment data

4) National reference data
This covers data not purely issued for health purposes, but which is used in connection with health data to improve understanding of health issues. Examples are:

  • 4.1 Postcode look-up files that link postcodes to administrative and geographical units of which they are components, and include map reference data
  • 4.2 Deprivation data
  • 4.3 Clinical coding such as ICD 10, OPCS4, Clinical terms/Read/SNOMED CT coding systems for diagnoses and operative interventions

Clinical coding is the process whereby information from medical case notes for each patient is expressed as codes, covering operations/procedures and treatments, diagnoses and comorbidities.

Diagnoses are coded using the World Health Organisation’s International Statistical Classification of Diseases and Related Health Problems (ICD). The NHS currently uses version ten (ICD-10) but health registries in other countries and even the private medical sector of the UK still use ICD-9.

OPCS Classification of Interventions and Procedures version 4 (OPCS-4) is the procedural classification used by the NHS to codify operations, procedures.

Read codes are the standard clinical coding system used by general practices for recording clinical information (signs, symptoms, diagnoses or activities as well as demographic categories such as ethnicity, occupation and social circumstances).

SNOMED-CT (Systematized Nomencalture of Medicine – Clinical Terms) is another clinical coding system and  NHS England has committed to the strategic move from Read codes to SNOMED-CT by 2020.

Key issues and questions that should be considered for any data set that provides information on population health:

  • Accuracy - to what extent is the data that is present correct?
  • Precision - have appropriate measures of uncertainty been included (e.g. 95% confidence intervals)
  • Completeness - how much of the data is missing?
  • Timeliness - what period does the data refer to, and how relevant is that to the current position?
  • Coverage - is the whole population of interest represented, and if not, what fraction makes up the sample?
  • Accessibility - who has access to the data, and is it controlled (e.g. via password-restricted access, public domain)?
  • Confidentiality/suppression/disclosure control - there are strict regulations preventing the publication of datasets that might, when used in combination with other available sources, enable individuals to be identified. Are these
regulations followed?
  • Original purpose of collection/collation - under Data Protection legislation, personal data may only be used for the purposes for which it was collected. NHS data registrations generally include improvement of the health of the population and
management of the NHS among their purposes, but non-NHS data may not. In addition, change of purpose may be a source of bias in the data.
  • Who undertook the collection/collation? - this may not be available.
  • How the data have been collected? - this may not be available.
  • Whether what is included in the data set is the actual requirement, or whether it will have to act as a proxy for the real item?
  • Is the data set comparative, what are the comparators, and are they appropriate?
  • If the dataset presents rates or ratios, have appropriate techniques been used to control for differing population structures? - for example, has direct or indirect standardisation been applied?


Demographic data

1.1 Census data

The most important source of demographic data at the population level for the UK is the Census.

The Office of National Statistics produces annual resident population estimates every year (called mid-year population estimates) as they are calculated in June of each year).  These population estimates are updated from the census (which happens every ten years) and is updated each year until the next census.

1.2 NHS Administrative data

Within England, a record is held by NHS Digital of every person in contact with the NHS, on the Personal Demographics Database (PDS).


The PDS stores information at individual patient level. It contains information on:

  • NHS Number
  • Name
  • Address
  • Postcode
  • Administrative gender
  • Date of birth
  • Additional birth details
  • Date of death
  • Other contact details (preferred method, language)
  • Agreement (or not) for data to be shared
  • GP Practice patient is registered with, and date of registration
  • Linkage data for interaction with the NHSAIS (the primary care administration system)

Information Governance

Strict controls to protect patient information are applied, including authentication processes that allow systems to identify which actions have been taken by which healthcare professional, role-based access controls, linked to the identity of each authorised healthcare professional, control precisely what they are able to see and do when logged on to the system, search controls that constrain how healthcare professionals are able to look up the details of individual patients, sensitive record controls that prevent local healthcare professionals from accessing PDS information when records are flagged as sensitive, and tools for auditing who has looked at or amended PDS records and local access to these by 'privacy officers' so as to identify appropriate use. 

Further details of these arrangements are available at [accessed 01/10/2018]

In general, this information is only shared between those directly responsible for a patient's care, and a potentially powerful source for the production of local public health statistics cannot be used for that purpose. Aggregate data is used to help understand population numbers

Health event data: Births and death registration.

In the UK, registration of birth and death events is a legal requirement. Information on how registration occurs for England and Wales  can be found at [accessed 11/11/2015]. 

2.1 Births data

Details of births in England and Wales are available at:

Information collected includes:

  • Child's forenames
  • Sex
  • Date of birth
  • Place of birth
  • Mother's full name and maiden name
  • Father's full name and occupation if married to the mother
  • Name, address and relationship to child of the person who registered the birth
  • Information on marital status and living arrangements of parents
  • Parents' occupation
  • Postcode of mother's normal place of residence

In Scotland the following additional information is given:

  • Time of birth
  • Date and place of parents' marriage

Publicly accessible births data on births can be found at:

Summary data includes information on stillbirths:


2.2 Mortality data

Mortality data helps the measurement of a population or community’s health status.  The importance of mortality statistics comes from both the significance of death in an individual’s life as well as their potential to improve the public’s health when used to systematically assess and monitor the health status of a whole community. 

Mortality data provides a snapshot of current health problems, suggests patterns of health risk in specific populations and can identify trends in specific causes of death over time.


In England and Wales, deaths need to be reported to the local registrar within 5 days of the death occurring. [accessed 07/01/2016].

Deaths data are passed to ONS on a weekly basis.  For NHS purpose the HSCIC maintains the Primary Care Mortality database (PCMD), which combines data from death certificates and NHS administrative data. Data that can be extracted by residence (CCG or upper tier Local Authority) or GP practice registration, is made available to Directors of Public Health on a monthly basis. There are strict regulations about use of the data: it may only be used for public health statistical purposes.

Data on the certificate and the data files available to Public Health departments include:

  • Full name of deceased
  • Date of death
  • Address and postcode of normal place of residence
  • Place of death
  • Given age
  • Cause of death, underlying and participatory
  • Occupation (or name and occupation of husband if the deceased was a married or widowed woman)
  • Name, address and family relationship (if any) of the person who reported the death.                      

In Scotland the following additional information would be given:

  • Marital status
  • Spouse's name
  • Sex
  • Father's name and rank or profession
  • Mother's name and maiden name

Mortality statistics publications are routinely available from the ONS   [accessed 11/11/2015]. 

Among them are:

DH1 - Annual review of the Registrar General on deaths in England and Wales

DH2 - Deaths by cause

DH3 - Childhood infant and perinatal mortality

DH4 - Injury and poisoning

Further information on data about deaths is available at:  [accessed 11/11/2015].

Data on stillbirths can be found on the Compendium website:  [accessed 11/11/2015].

Public Health Uses of Births and Deaths data

  • Health service planning
  • Epidemiology
  • Monitoring and evaluation
  • Audit
  • Screening programmes (breast and ovarian cancer, immunisation take up)
  • Confidential enquiries and register checking
  • Inequalities analysis (postcode enables precise geographical analysis of population and patients).
  • Assessing progress against targets (e.g. infant mortality, life expectancy).


Both births and deaths data are very complete and accurate for the UK.  Deaths data can provide very important information on health of populations.

Both births and deaths can be applied to an accurate and specified calendar time period.

Ongoing data collection and availability.

Many natality and mortality indicators are derived from these sources.


Ethnicity is not  collected for either deaths or  births (though where these take place in NHS hospitals it may be derived from Hospital Episode Statistics (HES) records explained later).

Defining the socio-economic status from the occupation recorded on birth registration records is very difficult and possibly even more tenuous for single mothers. 

Deaths are not reliable as a picture of burden of morbidity of chronic illness; more people are living longer with illness than in the past; quality of recording cause of death varies considerably.

Numerous crucial factors to a patient’s death are not recorded in mortality data, e.g. the underlying condition of the patient, how sick the patient was when (if) they were admitted to hospital, what aspects of care were delivered to the patient prior to their death, etc.

Risk behaviour, pregnancy condition and neonatal outcome data may be incomplete and can vary across data collections.

The underlying cause of death may be coded inconsistently.

2.3       Self-reported health

Self-reported health is by its nature subjective and different people will have different thresholds for what is considered 'serious' or 'very painful'.  Definitions can change over time as well.  Conversely people can forget illnesses that they have had, or choose to withhold information.  However, only a proportion of people experiencing ill-health make contact with health services. Self-reported ill-health can therefore help to show more of the burden of ill health in the population that is otherwise 'under the line'.

Figure 1 the iceberg concept


Source:  Donaldson and Donaldson 'Essential Public Health' 2nd edition.


Sources of information on self-reported health

Census: question on self-reported general health and long term limiting illness.

Health Survey for England

Carried out annually since 1991, The Health Survey for England is a series of annual national surveys about the health and related behaviours of people in England involving about 16,000 adults.  It comprises a questionnaire with taking
some physical measurements and blood samples.  Questions on specific diseases are also asked at certain intervals such as cardio-vascular disease and accidents.  However, currently its sample size is not large enough to provide
CCG/local authority level data.  It is open to appropriate organisations with a relevant interest to sponsor a boost of the health survey for England. The London Health Observatory did so in 2006 to provide an increased sample of London residents on
specific aspects of health related behaviours.

Integrated Household survey [accessed 07/01/2016]

The Integrated Household Survey (IHS) is the largest social survey collected by the Office for National Statistics (ONS), providing estimates from approximately 325,000 individual respondents − the biggest pool of UK social data after the census.

The Integrated Household Survey (IHS) was previously formed from "core" questions asked by a number of ONS household surveys. Currently the IHS is based solely upon the Annual Population Survey (APS), following the removal of IHS questions on the Living Costs and Food (LCF) survey in January 2014. Topics covered by the IHS include sexual identity, perceived general health, smoking prevalence, education, housing and employment.

2.4 Primary care interactions

In the UK, 90% of reported ill health is captured at GP practice level and so primary care data is very important, especially when the iceberg effect is considered (see figure 1).  There is potential to offer a huge amount of information
on health of population, but it has been historically very difficult to access. Most GP Practices are now computerised, and in some areas may allow access to the data to public health staff directly. Changes in the NHS have resulted in
improved primary care data collection, but stricter controls on sharing that data for public health and NHS management purposes.


The Quality and Outcomes Framework (QOF) is a voluntary annual reward and incentive programme for all GP surgeries in England, detailing practice achievement results. It is not about performance management but resourcing and then rewarding good practice.  [accessed 01/10/2018]

The QOF contains three main components, known as domains. The three domains are: Clinical; Public Health and Public Health – Additional Services. Each domain consists of a set of achievement measures, known as indicators, against which practices score points according to their level of achievement. The 2014/15 QOF measured achievement against 81 indicators; practices scored points on the basis of achievement against each indicator, up to a maximum of 559 points.

  • clinical: the domain consists of 69 indicators across 19 clinical areas (e.g. chronic kidney disease, heart failure, hypertension) worth up to a maximum of 435 points.
  • public health: the domain consists of seven indicators (worth up to 97 points) across four clinical areas – blood pressure, cardiovascular disease – primary prevention, obesity 16+ and smoking 15+.
  • public health – additional services: the domain consists of five indicators (worth up to 27 points) across two service areas – cervical screening and contraception.

    For accessibility purposes, all six conditions/measures within public health and public health additional services are to be found under the one heading ‘Public Health’.


QOF data enables local prevalence to be estimated for the conditions it covers. These may be compared with other prevalence studies such as the Health Survey for England.

QOF prevalence data is also used to calculate practice payments within each of the clinical indicator groups for example, points are awarded to a practice for a specific clinical indicator group (e.g. asthma) if they practice can produce a register of patients with that condition or group of conditions.


The new GP contract gives an incentive to GPs to improve completeness of data.  It encourages the establishment of disease registers.  There are incentives to identify more registered patients needing to be on disease registers and receive treatment.


Raw data are not available to PCT or public health departments. Instead, some  ready analysed data are available. This is  limited: there is no age/sex/ethnicity breakdown; comparable analyses may be inappropriate(due e.g. to different age structures in different areas); there is no information on co-morbidity, and significant under recording of some indicators. Participation is voluntary; not all GP practices take part in the QOF.

Clinical Practice Research Datalink (CPRD) [accessed 01/10/2018]

The Clinical Practice Research Datalink (CPRD) is a governmental, not-for-profit research service, jointly funded by the NHS National Institute for Health Research (NIHR) and the Medicines and Healthcare products Regulatory Agency (MHRA), a part of the Department of Health.

It has been providing anonymised primary care records for public health research since 1987. Research using CPRD data has resulted in over 1,500 publications which have led to improvements in drug safety, best practice and clinical guidelines. Examples include confirming safety of MMR vaccine, informing NICE cancer guidance, safeguarding use of pertussis vaccine in pregnancy, influencing the management of hypertension in diabetics. CPRD is now also using primary care data in clinical trials. Examples include a real world diabetes study comparing a new therapy to standard of care, and randomised controlled trials on myocardial infarction and COPD patients.


Quality continually assessed.

Available for research questions.

Standards for recording allow collation.


Incomplete - only a small proportion of self-selecting practices across the country (450 or so).

2.5 Secondary care interactions

In the NHS, data on patients' interactions with the secondary (i.e. hospital) care services is recorded on statutorily defined datasets. These are recorded by providers and exchanged with commissioners of care via electronic clearing houses. The data collected varies according to the type of interaction - outpatient attendance, admitted care, waiting for elective admission, A&E. In each case the data to be collected is set out in the NHS Data Dictionary.  [accessed 01/10/2018].

The mechanism of exchange has varied from time to time, the former NHS Wide Clearing Service (NWCS) has been replaced by the Secondary User Service (SUS), though at present the data flows remains similar.  [accessed 07/01/2016]

Commissioner organisations (e.g. CCGs) access data from the SUS on a role-based access basis, controlled by a smart card.  Arrangements for commissioners accessing data from SUS were amended as of 1 November 2014, with the default position for all commissioner organisations now being to view data only in pseudonymised form. A relevant Data Sharing Agreement (DSA) is required to be in place with the HSCIC for any SUS data access.


SUS data can help to identify health needs of the local population. It can be used to determine patient flows for treatment and contribute to the analysis of health outcomes. The most direct use of SUS data is for monitoring contracts between primary care trusts and hospital providers.  Under Payment by Results, hospitals are paid for the activity they undertake.  Payment by Results is underpinned by Healthcare Resource Group (HRG) codes.  HRGs are grouping of conditions and procedures that are clinically and resource-intensively similar.  A national tariff is applied to each HRG code and updated each year.  [accessed 01/10/2018]

SUS data is cleaned and collated on a national basis to create HES data.


Timely data; components of SUS data are made available to CCGs by hospital trusts on a monthly basis.

Useful in Public Health where secondary care is a significant aspect of care for a condition (e.g. stroke).

Has a lot of detail of medical conditions (ICD) and operations.

The system by which it is delivered allows questioning and challenging of the data before a certain date each month.


Clinical coding may be of variable quality. Recording of ethnicity is sometimes a problem. Outpatient datasets do not contain diagnostic information. Exchange of A&E data, although mandatory, often does not happen as the define dataset is not considered particularly useful,  so only a partial picture of A&E activity is available.

Operationally, there can be severe difficulties in public health staff accessing "personal confidential data" (PCD) although the relevant legislation permits this where it is related to risks to public health, and explicitly included inequalities in health care provision, and lifestyles among those risks.

Current (2018) arrangements for accessing data are set out in



Hospital Episode Statistics  [accessed 01/10/2018]

Hospital Episode Statistics (HES) processes over 125 million admitted patient, outpatient and accident and emergency records each year.

What is HES?

HES is a data warehouse containing details of all admissions, outpatient appointments and A&E attendances at NHS hospitals in England.

This data is collected during a patient's time at hospital and is submitted to allow hospitals to be paid for the care they deliver. HES data is designed to enable secondary use, that is use for non-clinical purposes, of this administrative data.

It is a records-based system that covers all NHS trusts in England, including acute hospitals, primary care trusts and mental health trusts. HES information is stored as a large collection of separate records - one for each period of care - in a secure data warehouse.

Each record is a subset of the record submitted by the provider to SUS, and so defined in the NHS Data Dictionary. It is generally issued on an annual basis, though provisional data is now issued quarterly.  Each record represents time under the responsibility of a specific consultant 'finished consultant episodes'.  A patient could have several episodes within one spell in hospital.  There are hundreds of fields including data on maternities and augmented care for patients whose care involves the use of Intensive and High Dependency care facilities. Data held for admitted patients include:

  • NHS Number
  • Full postcode
  • Date of birth
  • Sex
  • Ethnicity (more complete with each year)
  • Diagnosis fields
  • Procedures fields
  • Registered GP
  • PCT of residence
  • Date of admission
  • Date of discharge
  • Method of admission
  • Method of discharge

Strict protocols are in place for how data can be disseminated including clear suppression rules to prevent disclosure. Various pre-analysed reports are available free online.

Dr Foster intelligence [accessed 01/10/2018] has access to SUS and HES data and provides data via web-based tools on hospital and CCG activity.


At a national level, health service planning, monitoring activity, and assessing quality of care. At a local level (e.g. CCG) /SUS data is more timely, and HES is mostly used to enable national comparisons and benchmarking. Comparisons can be made between areas, for example, by calculating standardised rates, and for monitoring inequalities across geographical areas, including ward level.

Extracts are used for research purposes.

Examples of outputs can be found at: [accessed 07/01/2016].


Completeness of data increasing.

Standard codes (ICD10 and OPCS4) used for diagnoses and procedures.

For serious morbidity, can give prevalence of the condition across the country.

Recently becoming linked with ONS mortality data.


HES and SUS data only tells us about those who have a disease and then use health care facilities.  They do not give a full picture of morbidity.  Specifically, they do not tell us about those people who have a disease
or disability but do not seek care - see iceberg effect and figure 1.  Some fields are very incomplete - for example, whilst completeness of the ethnicity field has been increasing with each year, a significant proportion (10 - 15%) is filled with 'not stated'.

Outpatient data became available via HES in 2007 but comes with limitations.  Diagnostic data are very poorly completed. 

Timeliness - there used to be quite a delay between NWCS outputs and HES.  This is less because provisional data are made available on a quarterly basis, before ministerial sign-off.

Cancer registries  [accessed 07/01/2016]

The National Cancer Registration Service is run by Public Health England and is responsible for cancer registration that has been an integral part of the NHS for over 50 years.

The Service aims to collect data on all cases of cancer that occur in people living in England. The data is used to support public health, healthcare and research. It provides data to the Office for National Statistics on new cases of cancer and cancer survival, monitor new cases of cancer in the population and look at trends and geographical patterns so that we can detect risk factors and cancer clusters.


Patient follow up.

Auditing treatment, comparing with other treatment outcomes.

Evaluation of services.

Studies of causation.

Health service planning.


Very rich, detailed source of information.


Expensive to run - updating the registry is laborious.

Assessing completeness/under-coverage is not straightforward [Brenner et al]

Completeness of ethnicity could be improved: [accessed 07/01/2016].

Other examples of registries:

Congenital anomalies.

Industrial diseases.


Issues to consider when establishing a registry


Why (users and purposes)

Who will handle the analyses

What disease

Who is taking enquiries

Clear case definition

Who is producing reports

System for reporting new events

Financial implications

What is to be stored

Confidentiality and ethical issues

What is to be reported

Maintaining quality


Other examples of routine morbidity data

Notifications of communicable diseases.

Notifications of foetal anomalies.

Abortion statistics.

Significant datasets for Public health are:

  • Genitourinary Medicine (GUM) clinic activity
  • Immunisation data
  • Adult screening, cervical and breast
  • Mental Health and Learning Disabilities Data Set.

Values and limitations of routine data

Value of routine data

Limitations of routine data

Readily available.

As the data already exists, obtaining access should be straight forward.

Lack of completeness.
Although ONS population data is not generally available for each ward, ward-level data can be calculated and are available from some local health authorities.

Limited costs.
Existing data is generally available at low cost to health professionals compared to the collection of primary data for a specific purpose which can prove costly and timely.

Potential for bias.

There is the risk of not knowing whether desired outcomes came about as a result of the intervention of as a result of other influences.

Up to date.
Some large-scale surveys are repeated at regular intervals.

Limited details of determinants such as income and ethnicity. Best for data required for administrative purposes.

Useful for identifying hypotheses.

Certain data sources may provide impartial and objective data, which can be used as a check on the validity of self-reported behaviour and help to identify hypotheses

Often poorly presented and analysed.

Usually the role of beautifully presented and analysed data will come down to the user of the routine data.

Useful for initial assessment due to the ease of obtaining the data.

Occasionally subject to political influences and manipulation.

Provides baseline data on expected levels of health/disease.

Data quality.

The standards of routine data collection are not always as high as those expected in research or for more rigorous evaluation.

Useful for comparison with other sources where similar indices have been measured.


Time-series analysis enabled.
Where design is kept constant, comparison of indicators over time can be made.



The most valuable feature of routine data is their availability at little cost to the researcher.  They may be especially helpful in establishing baseline characteristics regarding the health status of the community, in generating hypotheses as a result of sex, age, cohort or geographic variation, in identifying potential areas requiring further research. However, no one single data set can provide the whole picture of a population's health and its needs. 

Ad hoc data

Local health surveys may be carried out by CCGs or LAPH departments when national surveys do not give sufficient information at local level.

Lifestyle data

3.1 Education data

Information on educational attainment data, where it can be sourced from and how it is of relevance to inequalities in health can be found in section 3 of [accessed 07/01/2016].

3.2 Employment data

Information on employment data, where it can be sourced from and how it is of relevance to inequalities in health can be found in section 1 of [accessed 07/01/2016].

3.3 Housing data

Information on housing and homelessness, where it can be sourced from and its relevance to inequalities in health can be found in section 2 of [accessed 07/01/2016].

3.4 Environment data

Information on air quality can be found at [accessed 28/11/2007]. Statistics on air quality can be found in section 5 of [accessed 07/01/2016].



  • H Brenner, C Stegmaier, and H Ziegler, Estimating completeness of cancer registration: an empirical evaluation of the two source capture-recapture approach in Germany J Epidemiol Community Health 1995; 49: 426-430




© M Goodyear & N Malhotra, 2007, M Goodyear 2016 and S Seager 2018