Populations: Collection of routine and ad hoc data
There are four main types of health information
1) Demographic data
This covers factors such as age, sex, migration patterns, ethnicity, marital status in populations and how it influences health.
- 1.1 Census
- 1.2 Exeter data
2) Health event data
This covers recording of health events affecting individuals or populations.
- 2.1 Births
- 2.2 Deaths
- 2.3 Self reported health
- 2.4 Primary care interactions
- 2.5 Secondary care interactions
- 2.6 Health hazards
3) Circumstantial data
This covers aspects of individuals' and populations' circumstances that may affect the wider determinants of health, including socio-economic, lifestyle, and environmental data.
- 3.1 Education data
- 3.2 Employment data
- 3.3 Housing data
- 3.4 Environment data
4) National reference data
This covers data not purely issued for health purposes, but which is used in connection with health data to improve understanding of health issues. Examples are
- 4.1 Postcode look-up files that link postcodes to administrative and geographical units of which they are components, and include map reference data
- 4.2 Deprivation data
- 4.3 ICD 10, OPCS4, Clinical terms/Read/SNOMED CT coding systems for diagnoses and operative interventions
Key issues and questions that should be considered for any data set that provides information on population health:
-
Accuracy - to what extent is the data that is present correct?
-
Precision - have appropriate measures of uncertainty been included (e.g. 95% confidence intervals)
-
Completeness - how much of the data is missing?
-
Timeliness - what period does the data refer to, and how relevant is that to the current position?
-
Coverage - is the whole population of interest represented, and if not, what fraction makes up the sample?
-
Accessibility - who has access to the data, and is it controlled (e.g. via password-restricted access, public domain)?
-
Confidentiality/suppression/disclosure control - there are strict regulations preventing the publication of datasets that might, when used in combination with other available sources, enable individuals to be identified. Are these
regulations followed? -
Original purpose of collection/collation - under Data Protection legislation, personal data may only be used for the purposes for which it was collected. NHS data registrations generally include improvement of the health of the population and
management of the NHS among their purposes, but non-NHS data may not. In addition, change of purpose may be a source of bias in the data. -
Who undertook the collection/collation - this may not be available.
-
How the data have been collected - this may not be available.
-
Whether what is included in the data set is the actual requirement, or whether it will have to act as a proxy for the real item.
-
Is the data set comparative, what are the comparators, and are they appropriate?
-
If the dataset presents rates or ratios, have appropriate techniques been used to control for differing population structures? - for example, has direct or indirect standardisation been applied?
-
Demographic data
1.1 Census data
The most important source of demographic data at the population level for the UK is the Census.
1.2 Exeter data
Within England, another important source of demographic data is Exeter data, managed by the National Strategic Tracing Service http://www.connectingforhealth.nhs.uk/nsts [accessed
28/11/2007].Description
The Exeter database stores information at individual patient level, on patient registration with general practitioners. It contains information on:
-
NHS Number
-
Name
-
Address
-
Postcode
-
Sex
-
Date of birth
-
Place of birth
-
GP and GP Practice patient is registered with
-
PCT of where the patient is registered
Uses
-
The main purpose of the Exeter system was to pay GPs, on the basis of list capitation.
-
For tracing people as they move and register with a new GP.
-
For providing GPs with a register.
-
Deprivation of registered patients at ward level is also factored in when calculating primary care resource allocation.
-
For recording national adult cancer screening programmes data.
-
For understanding local populations and to inform practice based commissioning.
Strengths
Crucial for practice profiling by practice clusters, PCTs and public health observatories.
Postcodes enable determination of local authority of residence. Local authorities do not have equivalent databases of their residents, and in collaborative work between NHS and LAs the picture of the population Exeter makes available can
be enormously useful.Weaknesses
GP lists are inflated on average by 5.7%, due to mobility among young adults and delays in removing list members on death or emigration (http://www.primary-care-db.org.uk/datasets_help.cfm)
[accessed 28/11/2007].Vulnerable populations such as homeless people, asylum seekers, and some migrant workers tend not to be registered with GPs so are missing from the Exeter system.
Place of birth, which might be useful in ethnic analyses, is a free text field, and may vary from 'home' to country to detailed address.
-
-
Health event data
Births and death registration.
In the UK, registration of birth and death events is a legal requirement. Information on how registration occurs can be found at http://www.gro.gov.uk/gro/content/ [accessed 28/11/2007].
2.1 Births data
http://www.familyrecords.gov.uk/topics/bmd_2.htm#birth [accessed 28/11/2007].
-
Child's forenames
-
Sex
-
Date of birth
-
Place of birth
-
Mother's full name and maiden name
-
Father's full name and occupation if married to the mother
-
Name, address and relationship to child of the person who registered the birth
-
Information on marital status and living arrangements of parents
-
Parents' occupation
-
Postcode of mother's normal place of residence
In Scotland the following additional information would be given:
-
Time of birth
-
Date and place of parents' marriage
Publicly accessible births data on births can be found at:
http://www.statistics.gov.uk/statbase/Product.asp?vlnk=5768 [accessed 28/11/2007].
Summary data includes information on stillbirths:
http://www.lho.org.uk/DATAANDMETHODS/Datasources_Description/Births_Data.aspx [accessed 28/11/2007].
2.2 Deaths data
Description
In England and Wales, deaths need to be reported to the local registrar within 5 days of the death occurring. http://www.gro.gov.uk/gro/content/deaths/ [accessed 28/11/2007].
Deaths data are passed to ONS on a weekly basis. ONS provide a monthly dataset on deaths to directors of public health, including cause of death and contributing factors. Information on what is captured on the death certificate can
be found at http://www.familyrecords.gov.uk/topics/bmd_2.htm#death [accessed 28/11/2007].Data on the certificate and the data files available to Public Health departments includes:
-
Full name of deceased
-
Date of death
-
Address and postcode of normal place of residence
-
Place of death
-
Given age
-
Cause of death, underlying and participatory
-
Occupation (or name and occupation of husband if the deceased was a married or widowed woman)
-
Name, address and family relationship (if any) of the person who reported the death.
In Scotland the following additional information would be given:
-
Marital status
-
Spouse's name
-
Sex
-
Father's name and rank or profession
-
Mother's name and maiden name
Mortality statistics publications are routinely available from the ONS http://www.statistics.gov.uk/onlineproducts/default.asp#health [accessed
28/11/2007]. Among them are:DH1 - Annual review of the Registrar General on deaths in England and Wales
DH2 - Deaths by cause
DH3 - Childhood infant and perinatal mortality
DH4 - Injury and poisoningFurther information on data about deaths is available at http://www.lho.org.uk/DATAANDMETHODS/Datasources_Description/Deaths_Data.aspx [accessed
28/11/2007].Public Health Mortality Files contain information about deaths within different health authority boundaries. It also includes details about people who died outside the health authority region in which they were normally resident. These
files are only available to health authorities and at a cost.National births and deaths data are made available to regional Public Health Observatories (PHOs). Mortality data have been linked to Hospital Episode Statistics and are in the process of being made available to PHOs. 2005/6 and historic
years' extracts are available with date of death added. Discussions with ONS are ongoing to secure a regular quarterly feed of deaths data also including cause and place of death.Data on stillbirths can be found on the Compendium website http://www.nchod.nhs.uk/ [accessed 28/11/2007].
Public Health Uses of Births and Deaths data
- Health service planning
- Epidemiology
- Monitoring and evaluation
- Audit
- Screening programmes (breast and ovarian cancer, immunisation take up)
- Confidential enquiries and register checking
- Inequalities analysis (postcode enables precise geographical analysis of population and patients).
- Assessing progress against targets (e.g. infant mortality, life expectancy).
Strengths
Both births and deaths data are very complete and accurate for the UK.Deaths data can provide very important information on health of populations.
Weaknesses
Ethnicity is not collected for either deaths or births (though where these take place in NHS hospitals it may be derived from HES
records.Defining the socio-economic status from the occupation recorded on birth registration records is very difficult and possibly even more tenuous for single mothers.
Deaths are not reliable as a picture of burden of morbidity of chronic illness; more people are living longer with illness than in the past; quality of recording cause of death varies considerably.
2.3 Self reported health
Self reported health is by its nature subjective and different people will have different thresholds for what is considered 'serious' or 'very painful'. Definitions can change over time as well. Conversely people can forget
illnesses that they have had, or choose to withhold information. However, only a proportion of people experiencing ill-health make contact with health services. Self reported ill-health can therefore help to show more of the burden of ill
health in the population that is otherwise 'under the line'.Figure 1 the iceberg concept

Source: Donaldson and Donaldson 'Essential Public Health' 2nd edition.Sources of information on self-reported health
Census: question on self reported general health and long term limiting illness.
Health Survey for England
Carried out annually since 1991, The Health Survey for England is a series of annual national surveys about the health and related behaviours of people in England involving about 16,000 adults. It comprises a questionnaire with taking
some physical measurements and blood samples. Questions on specific diseases are also asked at certain intervals such as cardio-vascular disease and accidents. However, currently its sample size is not large enough to provide
PCT/local authority level data. It is open to concerned organisations to sponsor a boost of the health survey for England. The London Health Observatory did so in 2006 to provide an increased sample of London residents on
specific aspects of health related behaviours. Collection of the data began in February 2006 and takes 12 months to complete. Data from the survey is expected in Summer - Autumn of 2007. More details can be found at
http://www.lho.org.uk/OurWork/LondonHealthSurvey.aspx [accessed 28/11/2007].General Household survey
http://www.statistics.gov.uk/ssd/surveys/general_household_survey.asp [accessed 28/11/2007].The General Household survey is conducted every year and involves approximately 13,000 adults in Great Britain. It provides information on all sorts of areas for government departments. On health issues, questions include acute illness
in the last 2 weeks; illness over the last year; presence of chronic illness; consultations with a doctor; visits to hospital; smoking and alcohol consumption. The General Household survey has indicated an increase in self-reported
long standing illness.2.4 Primary care interactions
In the UK, 90% of reported ill health is captured at GP practice level and so primary care data is very important, especially when the iceberg effect is considered (see figure 1). There is potential to offer a huge amount of information
on health of population, but it has been historically very difficult to access. Most GP Practices are now computerised, and in some areas may allow access to the data to public health staff directly. Changes in the NHS have resulted in
improved primary care data collection.QMAS:
http://www.connectingforhealth.nhs.uk/delivery/programmes/qmas [accessed 28/11/2007].In operation since 2004, the Quality Management and Analysis System (QMAS) supports a new GP contract which depends partly on quality of care as well as numbers of patients registered. Almost all practices are submitting data. Its
primary function is financial, and awards payment under the Quality and Outcomes Framework (QOF). The database is replicated for other purposes, including Public Health. The replica is managed by the prescribing support unit, based at
the information centre for health and social care http://www.ic.nhs.uk/psu/services/QOF [accessed 28/11/2007].Data are collected on a number of clinical domains, which vary over time. They currently include:
-
CHD
-
Stroke
-
Hypothyroidism
-
Diabetes
-
Hypertension
-
Mental health
-
COPD
-
Asthma
-
Epilepsy
Uses
QoF data enables local prevalence to be estimated for the conditions it covers. These may be compared with other prevalence studies such as the Health Survey for England, as exemplified in the LHO report
http://www.lho.org.uk/viewResource.aspx?id=10070 [accessed 28/11/2007].Strengths
The new GP contract gives an incentive to GPs to improve completeness of data. It encourages the establishment of disease registers. There are incentives to identify more registered patients needing to be on disease registers
and receive treatment.Weaknesses
Raw data is not available to PCT or public health departments. Instead, some ready analysed data is available. This is limited: there is no age/sex/ethnicity breakdown; comparable analyses may be inappropriate; there is
no information on co-morbidity, and significant under recording of some indicators.General Practice Research Database (GPRD)
http://www.gprd.com/home/ [accessed 28/11/2007]This is a proprietary product, only accessible to public health departments on a fee-paying basis. It is a longitudinal anonymised database that claims to be the largest source of computerised information on morbidity and prescription activity
in GP practices, holding data from 1987 to present. Participating practices agree guidelines on recording clinical data.Strengths
Quality continually assessed.
Available for research questions.
Standards for recording allow collation.Weaknesses
Incomplete - only a small proportion of self selecting practices across the country (450 or so)
2.5 Secondary care interactions
In the NHS, data on patients' interactions with the secondary (i.e. hospital) care services is recorded on statutorily defined datasets. These are recorded by providers and exchanged with commissioners of care via electronic clearing houses.
The data collected varies according to the type of interaction - outpatient attendance, admitted care, waiting for elective admission, A&E. In each case the data to be collected is set out in the NHS Data Dictionary http://www.connectingforhealth.nhs.uk/systemsandservices/data/datamodeldictionary/index_html
[accessed 28/11/2007].The mechanism of exchange has varied from time to time, the former NHS Wide Clearing Service (NWCS) being replaced by the Secondary User Service (SUS), though at present the data flows remains similar.
Primary Care Trusts receive data on a monthly basis. Mostly this is a measure of activity, but data on admitted patients also includes clinically coded diagnoses and operative procedures and ethnicity, that enable determination of
met need and assist in the analysis of health need and inequalities.Uses
NWCS/SUS data can help to identify health needs of the local population. It can be used to determine patient flows for treatment and contribute to the analysis of health outcomes.
The most direct use of NWCS data is for monitoring contracts between primary care trusts and hospital providers. Under Payment by Results, hospitals are paid for the activity they undertake. Payment by Results is underpinned by
Healthcare Resource Group (HRG) codes. HRGs are grouping of conditions and procedures that are clinically and resource-intensively similar. A national tariff is applied to each HRG code and updated each year. http://www.kingsfund.org.uk/resources/briefings/payment_by.html
[accessed 28/11/2007].NWCS/SUS data is cleaned and collated on a national basis to create HES data.
Strengths
Timely data; components of NWCS data are made available to PCTs by hospital trusts on a monthly basis.
The system by which it is delivered allows questioning and challenging of the data before a certain date each month.Weaknesses
Clinical coding may be of variable quality. Recording of ethnicity is sometimes a problem.
Outpatient datasets do not contain diagnostic information.
Exchange of A&E data, although mandatory, often does not happen as the define dataset is not considered particularly useful, so only a partial picture of A&E activity is available.Hospital Episode Statistics (HES)
http://www.hesonline.nhs.uk/Ease/ [accessed 28/11/2007]Description
HES is a nation-wide dataset of all hospital admissions, recorded using computerised Patient Administration Systems. Each record is a subset of the record submitted by the provider to NWCS/SUS, and so defined in the NHS Data Dictionary. It is
generally issued on an annual basis, though provisional data is now issued quarterly. Each record represents time under the responsibility of a specific consultant 'finished consultant episodes'. A patient could have several
episodes within one spell in hospital. There are hundreds of fields including data on maternities and augmented care for patients whose care involves the use of Intensive and High Dependency care facilities. Data include:-
NHS Number
-
Full postcode
-
Date of birth
-
Sex
-
Ethnicity (more complete with each year)
-
Diagnosis fields
-
Procedures fields
-
Registered GP
-
PCT of residence
-
Date of admission
-
Date of discharge
-
Method of admission
-
Method of discharge
Public Health Observatories are considered 'HES safe havens' and get access to sensitive fields on a national basis. Strict protocols are in place for how data can be disseminated including clear suppression rules to prevent
disclosure. Various pre-analysed reports are available free online.Dr Foster intelligence [accessed 28/11/2007] has access to NWCS and HES data and provides data via web-based tools on hospital and PCT activity.
Uses
At a national level, health service planning, monitoring activity, and assessing quality of care.
At a local level (e.g. PCT) NCWS/SUS data is more timely, and HES is mostly used to enable national comparisons and benchmarking. Comparisons can be made between areas, for example, by calculating standardised rates.
Monitoring inequalities across geographical areas, including ward level.
Examples of outputs can be found at http://www.lho.org.uk/DATAANDMETHODS/Local_Data/HES_Analyses.aspx [accessed 28/11/2007].Strengths
Completeness of data increasing.
Standard codes (ICD10 and OPCS4) used for diagnoses and procedures.
For serious morbidity, can give prevalence of the condition across the country.
Recently becoming linked with ONS mortality dataWeaknesses
HES and NWCS/SUS data only tells us about those who have a disease and then use health care facilities. They do not give full picture of morbidity. Specifically they do not tell us about those people who have a disease
or disability but do not seek care - see iceberg effect and figure 1. Some fields are very incomplete - for example, whilst completeness of the ethnicity field has been increasing with each year, a significant proportion (10 - 15%) is
filled with 'not stated'.
Outpatient data became available via HES in 2007 but comes with limitations. Diagnostic data are not completed. Other limitations can be found here:
http://www.hesonline.org.uk/Ease/servlet/ContentServer?siteID=1937&categoryID=805
Timeliness - there used to be quite a delay between NWCS outputs and HES. This is less because provisional data are made available on a quarterly basis, before ministerial sign-off.Cancer registries
http://www.thames-cancer-reg.org.uk/ [accessed 28/11/2007]In the United Kingdom, there are 12 registries (geographically defined) and each contributes to the National Cancer Registry overseen by the Office for National Statistics. Cancer registries were set up to collate new cases of
cancer and use this information to produce statistics about cancer incidence, prevalence, survival and mortality. In recent years the work of cancer registries has expanded from the monitoring of cancer occurrence to include the analysis of
different aspects of cancer prevention, treatment outcomes and care.Uses
Patient follow up.
Auditing treatment, comparing with other treatment outcomes.
Evaluation of services.
Studies of causation.
Health service planning.Strengths
Very rich, detailed source of information - patient identifiable information which is longitudinal i.e. updated over time.
Weaknesses
Expensive to run - updating the registry is laborious.
Confidentiality issues
Assessing completeness/under-coverage is not straightforward [Brenner et al]
More recently, more rigorous methods have been developed to assess completeness: http://www.thames-cancer-reg.org.uk/research/pubs/research_activities_2004.pdf
[accessed 28/11/2007].
Completeness of ethnicity could be improved: http://www.biomedcentral.com/1471-2458/6/281/abstract [accessed 28/11/2007].Other examples of registries:
Congenital anomalies
Industrial diseases
DiabetesIssues to consider when establishing a registry
Why Who will handle the analyses What disease Who is taking enquiries Clear case definition Who is producing reports System for reporting new events Financial implications What is to be stored Confidentiality and ethical issues What is to be reported Maintaining quality Other examples of routine morbidity data
Notifications of communicable diseases
Notifications of foetal anomalies
Abortion statistics
Korner data, or Central Returns, are available at aggregated level only and are the NHS' only source of community data. Significant datasets for Public health are:-
KC60 - GUM clinic activity
-
KC50 - immunisation data
-
KC53, 63 - adult screening, cervical and breast
-
Mental health minimum dataset
The Mental Health Needs Index (MINI) provides an estimate of the need for inpatient mental health services for adults (ages 16-59) by ward and borough. It is calculated using a number of population variables likely to indicate need for access
to services, such as deprivation; proportion of economically active adults unemployed; proportion of adults living in households not self contained etc. The MINI provides both predicted admission rates and a ratio of need compared to the England
average. The MINI was developed by the Centre for Public Mental Health which has produced an online tool for accessing information on a ward / borough level. http://www.dur.ac.uk/mental.health/index.php?l1=1&l2=27&s=27
[Accessed 11/1/2008]Values and limitations of routine data
Value of routine data Limitations of routine data Readily available Lack of completeness Limited costs Potential for bias Up to date Limited details of determinants such as income and ethnicity Useful for identifying hypotheses Often poorly presented and analysed Useful for initial assessment Occasionally subject to political influences and manipulation Provides baseline data on expected levels of health/disease
The most valuable feature of routine data is their availability at little cost to the researcher. They may be especially helpful in establishing baseline characteristics regarding the health status of the community, in generating
hypotheses as a result of sex, age, cohort or geographic variation, in identifying potential areas requiring further research. However, no one single data set can provide the whole picture of a population's health and its needs.Ad hoc data
Local health surveys may be carried out by PCTs when national surveys do not give sufficient information at local level.- Lifestyle data
3.1 Education data
Information on educational attainment data, where it can be sourced from and how it is of relevance to inequalities in health can be found in section 3 of http://www.lho.org.uk/HEALTH_INEQUALITIES/
[accessed 28/11/2007].3.2 Employment data
Information on employment data, where it can be sourced from and how it is of relevance to inequalities in health can be found in section 1 of http://www.lho.org.uk/HEALTH_INEQUALITIES/
[accessed 28/11/2007].3.3 Housing data
Information on housing and homelessness, where it can be sourced from and its relevance to inequalities in health can be found in section 2 of http://www.lho.org.uk/HEALTH_INEQUALITIES/
[accessed 28/11/2007].3.4 Environment data
Information on air quality can be found at http://www.lho.org.uk/HIL/ [accessed 28/11/2007]. Statistics on air quality can be found in section 5 of
http://www.lho.org.uk/HEALTH_INEQUALITIES/ [accessed 28/11/2007].
References
© M Goodyear & N Malhotra 2007
-

