Accelerating Terraform adoption at the Department for Education

Man sat on chair in office smiling

DfE was juggling infrastructure deployed on both the GDS Government Platform as a Service and an ageing Azure tenancy

For the last 18 months or so, we’ve been working directly with the Department for Education (DfE) to build up their new Azure infrastructure for the services in the Regional Services Division (RSD).

Identifying areas of improvement

When we started out on this project, DfE was juggling infrastructure deployed on both the GDS Government Platform as a Service (GPaaS), and an ageing Azure tenancy left undocumented and poorly configured. At the time, each individual service was working in silos, with their own deployment strategies, and infrastructure configurations.

Looking at the existing infrastructure stacks, we started noticing similarities in architecture patterns. Most of the apps were written in .NET and launched either as Web App Services in Azure, or as containers in GPaaS. Some of them used Redis, and others depended on SQL Server.

If all the services were using the same patterns for architecture topology, we could save a lot of time by re-using the same Infrastructure-as-Code. It would also simplify the approach, meaning less documentation would be needed, and training could be provided to DfE engineers. 

Introducing Terraform

We started by writing Terraform that covered the basic resources that the services needed. Knowing that some of the services were running in Containers meant that we could use the newly available Azure Container Apps product from Microsoft. This is a serverless, scalable platform that would enable us to quickly and efficiently deploy the services into Azure Cloud. Containerising the other .NET apps was fairly straightforward and we were quickly able to come up with a template Dockerfile that the other services could use.

At the time, Terraform didn’t have much traction in DfE. Only a few engineers were using it, others were using Bicep and most were not using any sort of Infrastructure-as-Code. At dxw our Technical Operations team are experts with Terraform, so we deemed it the most suitable option. Not only for us to develop from, but also to promote or seed further adoption within DfE. 

After some time, we had got our initial Terraform configured such that we could deploy a Container Registry, the Container Apps, and optionally, a SQL Server and Redis. 

At this point we were happy that we had a strong foundation that we could re-use across all the other services within RSD. So we decided to convert our work into a reusable module and published it on GitHub for other engineers to use. 

Iterations and improvements

Inspired by the work we were contributing to DfE, we reflected on the Terraform we’ve been using across the internal hosting platform at dxw. This prompted us to revisit some of our older code, rewriting and refactoring to create a series of modules to better improve the interoperability of our infrastructure-as-code. We’ve also opted to publish a draft of our own Terraform Playbook and Terraform Module template that we continue to develop.

We bundled monitoring, alerting, diagnostic and application logging to support more comprehensive visibility across the infrastructure out-of-the-box. We made sure to follow Microsoft recommended best practices where available, and relied on our own expert experience when we needed to. 

Aligning to a single set of standard deliverables meant that we could:

This standardisation of infrastructure brought a lot of value to DfE. This was a strong start in normalising the approach to Azure Cloud across DfE. Having the Terraform module meant that we could make iterative changes to the configuration of infrastructure and propagate it quickly across all services.

A security-first approach

We had a strong focus on network security within Azure when building the module, meaning that any implementer would not need to take any further steps to make their infrastructure secure. Web Application Firewall (WAF), Azure Front Door CDN, Network Security Groups, Network Firewalls and Defender for Cloud were all great additions to include in the module. 

Defender for Cloud is a Cloud security posture management (CSPM) tool that was already established within the Cloud Platform team at DfE so having the ability to enrol from the module was a sensible choice.

Getting noticed

Other engineers within DfE soon became aware of the published Terraform module. Adoption of Terraform across other programmes grew and dxw were proud to be the first to have established a ‘best-practice’ approach for others to follow. We established an unofficial Terraform support channel within the Slack Organisation so we could facilitate a break-out of silos, and promote cross communication with other engineering teams.

Over the next few months, we continued to develop the Terraform module. We added new features, tweaked configurations and tightened up security. We listened to the DfE community and worked with a number of other team engineers to implement extra infrastructure into the module, such as PostgreSQL Server, or custom sidecar containers (for example, ClamAV).

In recent months, the module has achieved a high level of maturity. We’ve received lots of positive feedback detailing how much time and effort has been saved by other programmes, being able to pick up the Terraform module and reduce their time-to-live in half. We continue to be led by the existing Cloud Platform, Networking and Infrastructure Operations teams within DfE to ensure that our infrastructure patterns follow DfE’s greater governance alignment.

Where are things now?

A year later, we’re proud to have supported DfE through this adoption phase, and published a number of Terraform modules that have been adopted across DfE. 

These modules cover a number of other use cases: 

And we’ve launched a Web Application Firewall for Application Gateway.

If you’re curious about what other Terraform modules we have been working on for our own hosting platform, you can check out our GitHub.