Maintaining a house of code – lessons learned at the end of a long project
Communication with non-developers is an essential part of a modern developer’s toolkit
As part of our agile ways of working, we have retrospectives at the end of each sprint. At the end of a long-running client project, it’s a good idea to have a project team retrospective as well. This time should be dedicated to reflecting on the project as a whole and how we worked as individuals and as a team, with each other and with the client teams.
Metropolitan Thames Valley Housing (MTVH) have been a dxw client for years. We started working with them on the development of MyTVH, an online customer service tool, when they were Thames Valley Housing. We’ve taken on various support and development work since then. Following the merger of Thames Valley Housing with Metropolitan Housing, MyTVH has been scaled up into MTVH Online, and the focus switched to capability building for the MTVH team to develop MTVH Online in-house.
Our project team retrospective focused on the last chunk of active development work, before the focus switched to capability building. This work included:
- implementing customer authentication for Stripe payments
- scaling up MyTVH to Metropolitan residents
- rebranding MyTVH to MTVH Online
- onboarding 2 junior developers, MTVH’s first Rails developers
The main themes that emerged from the retrospective were the need to recognise when the demand on building capability is increasing, to make user research a pillar of a multidisciplinary team, to create a shared understanding of the value of maintaining a service, and to articulate clearly the pain points. In this post, I’m going to focus on the last 2 points.
Create a shared understanding of the value of maintaining a service
It’s easy to understand the need to implement new features. However, the benefit of spending developer time on maintaining existing features often needs to be explained, and it’s in everyone’s benefit to explain it in terms that are familiar to the client.
It can be hard to explain how existing code can decay or break. On a housing estate, everyone will likely agree that maintaining existing houses is part of the service provided to the residents, and necessary as physical features such as windows, plumbing, or lifts degrade or break. But what happens to code?
Sometimes, the need to change working code arises from external requirements. This was the case with the European regulation:
<abbr title=”Payment Services Directive”>PSD2</abbr>
asking card providers to implement:
<abbr title=”Strong Customer Authentication”>SCA</abbr>
There was nothing wrong with the payments workflow on MTVH Online. However, when banks and other card providers changed their payment processes to conform to the new regulations, the previously working code would have been broken by their changes.
To use the housing analogy, this was the equivalent of stronger building regulations needing housing providers to adapt and enhance existing buildings.
The impact of not doing this kind of work is usually obvious and tangible. In this case, the impact of not making the payment code compatible with the enhanced authorisation and authentication workflow would be that some residents wouldn’t be able to complete online payments, and most residents wouldn’t be able to schedule payments to be processed at a future date.
Other ways in which working code can become “creaky” are changes in the scale of the user base. Code that was working fine for a thousand users might not perform as well for eight thousand users.
The impact of not doing the work to optimise the code and the infrastructure might not be immediately evident. As the number of users grows, response time might creep up slowly. Errors caused by race conditions (two or more processes trying to modify the same resource in the same time) that were extremely rare might become more common, and require an entirely different approach to how we record the data input by residents.
As developers, we have to notice these trends and discuss their implications with the wider team, in a language that’s common to everyone.
Create a common language for everyone to understand
While working on the necessary changes to the payment code, our work was slowed down by code that was hard to understand and modify. Some sections of the code hadn’t been touched in years, which meant that the current team didn’t have the context for why some things were done the way they were. Other sections bore the marks of a previous change from one payment processor to another, with some code that was potentially redundant.
This is a normal thing for a long running project like this one. Many teams of developers have worked on this codebase over the years, and every developer has their own coding style. However, it’s also normal for a long running codebase to need some maintenance work once in a while, so that the code can be maintainable by current and future teams.
What does “good code” look like?
This maintenance work can look like rewriting code that “basically works”, for it to produce the same results, but be clearer, more understandable, have a better structure, and be more resilient to future changes. This is what software engineering jargon calls “refactoring”.
Since there’s no one way to write “good code”, what “well-written” code looks like changes with the requirements of the application and with the practices and conventions of the community that develops the frameworks we work with. If we think of code as communication, we don’t want our codebase to become the equivalent of an ancient manuscript that only a few can decipher. Or, to reuse the housing analogy, a wooden hut with candle lighting.
Communication with non-developers is an essential part of a modern developer’s toolkit, and it falls on us to find a common language to convey our knowledge. This means that clients can decide whether to prioritise maintenance with the full picture of the potential consequences of their choices.
With the benefit of hindsight, we think framing this maintenance work as “refactoring” or “tech debt” – both terms very familiar to software developers but not necessarily outside our field of expertise – didn’t help us convey their importance for the long term health of the service. It also wasn’t clear what the impact of not doing this work would have on future teams.
Articulate pain points clearly
“This code is messy and hard to work with” doesn’t necessarily explain why we think it’s valuable to spend a few hours making it less messy, and what “easier to work with” means. We want to quantify the impact that not doing this work now will have in the future.
Most of the time we think of impact in terms of time, and let the client translate that into financial cost. We can estimate the additional development time caused by technical debt, and show the value of spending a certain number of hours during each sprint now, versus the increased number of hours that each future team will have to spend every time they have to work with this code.
However, there could be other things that are just as important, depending on the specifics of the project. For example, the impact could be:
- a missed opportunity to reduce the costs of running and hosting a service
- a risk of an increase in the number of customer support requests over time and the added pressure on the customer satisfaction team
- the erosion of trust in the service and the reputational damage that follows
Understanding the needs of the client and adapting the case we make has to be part of our approach to work.
The decision to prioritise maintenance work along with the development of new features is ultimately the client’s. What is within our control is to make the most compelling case we can, so that the client’s decision is well informed.