Working with the NHS to implement an efficient and flexible approach to data interoperability

Robust data processing is fundamental to the NHS and a core principle when developing data pipelines
Most large organisations grapple with how to store, share and reuse data efficiently and effectively. The challenge is striking a balance: governance that’s strict enough to be manageable, maintainable and effective, but not so burdensome that people seek workarounds.
The NHS faces this challenge at an enormous scale. With hundreds of systems generating millions of data points daily, and constant pressure to improve how data informs decision-making, building a better data landscape is a huge undertaking.
dxw has been working in the breast screening programme for the past 15 months, contributing to the future data landscape alongside many other teams. We’re at the forefront, a responsibility that calls for thoughtful, high-quality data practices.
We’re implementing this through a combination of consistency (aligning with established NHS schemas and patterns) and technical best practice in how data is constructed, captured and shared. Specifically, we’re:
- complying with the NHS’s 5-stage data processing pattern
- aligning with the schema for data metrics requiring organisational logic
- using shared functions where they exist and contributing to them when they don’t
- following NHS agreed linting plugins
- implementing unit-tests for 100% of pipeline functions
- following best-practice on data expectations
Compliance with the NHS’s 5-stage data processing pattern
To improve consistency and help those working with data to onboard and move between projects, there is a drive to ensure teams adopt a consistent pattern of working. The pattern sets out a way all teams should look at data processing pipelines.
Akin to the DBT pattern, the NHS’ pattern works in a similar way making it easy for engineers to pick up and adjust to. The pattern comprises ingestion, cleaning, calculation, context preparation and final state stages.
Aligning with the schema for data metrics requiring organisational logic
A big challenge in the NHS is making sure that key metrics are being calculated in a consistent way that is clear to everyone using them. It requires agreed logic but also sufficient metadata to be able to understand and interpret the metric.
For example, there is a lot of data published on Breast Screening Coverage. From the definition alone “The proportion of women eligible for screening who have had a test with a recorded result at least once in the previous 36 months” you may not realise that this excludes those whose recall had ceased for clinical reasons (for example, due to previous bilateral mastectomy).
Aligning to the NHS’s schema ensures that we are providing a sufficient level of metadata that allows people to better understand the complexities of the data we’re working with. It allows others in the NHS to reuse our calculations with confidence, making them much more efficient.
Using shared functions and reference data where they exist and contributing to them when they don’t
In the NHS’s context, reference data refers to data that is likely to be reused by teams across all parts of the organisation. For example a dataset containing all location data of all GPs across the country. Instead of teams across the organisation setting up and maintaining these datasets individually, having a GP reference dataset that is centrally managed not only saves time but means it can be maintained regularly with dedicated ownership.
Following NHS agreed linting plugins
Making code readable and consistent across projects is essential for enabling people to move quickly between teams. While individual teams retain autonomy over their pipeline implementation, adopting agreed linting and styling tools helps ensure better readability and maintainability. This consistency reduces friction when knowledge transfers between projects and makes it easier for new team members to understand existing codebases.
Implementing unit tests for 100% of pipeline functions
Robust data processing is fundamental to the NHS and a core principle when developing data pipelines. All functions responsible for processing data in our pipelines are covered by unit tests. These tests serve as a critical safety net, catching errors before they propagate through downstream systems and affect the quality of data used in decision-making. By testing comprehensively, we ensure data integrity at every stage of the pipeline and maintain confidence in the outputs that inform clinical and operational decisions.
Following best-practice on data expectations
Data expectations are built into the pipeline, acting as contracts with the systems we receive data from and pass data to. This ensures we’re receiving the data we expect and passing on the data we expect.
When data arrives outside expected parameters, the pipeline alerts us to investigate, preventing poor quality data from being used downstream. This approach provides clarity for both current and future team members about what the pipeline requires and produces, supporting maintainability and confidence in our data processes.
Our focus on the long-term sustainability of data in the NHS
This data engineering work is part of dxw’s broader commitment to the NHS. By embedding these practices into our projects, we’re contributing to a cultural shift in how the organisation approaches data. Through cross-disciplinary teams focused on impact and quality outcomes, we’re helping to build a data landscape where quality, consistency and trust are foundations, not afterthoughts. This supports better, more confident decision-making across the organisation.