Democratising Cloud development standards with Architecture Decision Records (ADRs)
Introduction
For the better part of the last 18 months, I’ve been working as an platform engineer on a strategic cloud platform for a large Australian financial institution.
The goal of this initiative was to develop from the ground up a foundational AWS platform suitable for all of the groups cloud workloads from the most critical and regulated internet facing banking applications to backend HPC clusters.
This platform is built using a wide range of technologies that includes:
- Amazon Web Services
- HashiCorp Terraform & Packer
- Microsoft Powershell
- A variety of programming, configuration management and scripting languages such as
- Bash
- Python
- GoLang
- NodeJS
- Puppet
The initial project initially commenced with a core platform engineering team of approximately 10 co-located individuals to build it’s MVP foundations, however it rapidly grew as new features needed to be added and more customers were onboarded onto the platform to use it’s capabilities and fix it’s defects.
Over the coming months, additional AWS platform engineers were introduced to the team to implement things such as
- Platform wide centralised logging
- Windows and Linux Standard Operating Environments (SOEs)
- Automated Cyber Defence and forensic tooling
- Puppet Masters for long lived instance management
- CICD pipelines for common automation actions
- AWS Account vending
- An extensive library of Terraform modules that would be used by customers to deploy their applications within the platform
- Network deployment and integration
Before we knew it, the 10 person team had reached almost 100, working on a codebase that was hundreds of thousands of lines in size, and as is common in large organisations these individuals were dispersed all over the world, we had representation from Sydney, Melbourne, Auckland, Wellington, Bangalore and more
Cracks begin to appear
It was around the time that the team had hit the 25 person mark that we observed a drop in developer output and the technical teams started to fall short of their sprint objectives. To make things more interesting this also coincided with the global COVID pandemic which forced all team members to their homes, and the programme of work needing to be delivered remotely. What was once a medium sized team all in a common office was now a fully remote software project.
We spoke to the team members, and dug up some metrics and found that the core of the problems seemed to lie with the way code was being developed and introduced into the environment. Specifically;
- Code quality and development practices were inconsistent and reviews were becoming difficult
- Code merges and sequencing of releases were challenging
- Cloud resources and naming standards were not clearly defined or communicated
- Standards were spread out across a variety of knowledge bases and were often communicated via tribal knowledge
- It became difficult to train and onboard new developers on the platform, or uplift the capabilities of other teams who worked on the platform
- Language selection for things like Lambdas were not clear, resulting in sprawl
- Tagging standards were not well understood
In addition to the technical challenges, Individuals felt that they did not have a voice or forum to influence the way that the platform’s standards were defined and evolved, this lead to;
- Reduced developer output and increased rework on code prior to merging
- Team relationship strain, we were regularly encountering developer frustration at a team and individual level
Solving this problem with ADR’s
We needed a clear way to define and collaborate on our AWS development standards. It was at this stage that one of my customers brought to my attention the concept of “Architectural Decision Records”, or ADRs as a lightweight way for us to consolidate our standards.
What are ADR’s ?
Familiar to many who are experienced developers, but maybe not to those who have a more traditional infrastructure background such as myself, ADR’s are described as
- An Architectural Decision (AD) is a software design choice that addresses a functional or non-functional requirement that is architecturally significant.
- An Architectural Decision Record (ADR) captures a single AD, such as often done when writing personal notes or meeting minutes; the collection of ADRs created and maintained in a project constitute its decision log.
- Source: adr.github.io
How to they work ?
Lightweight in nature, each ADR consists of a small markdown document stored in a dedicated git repository that has a predefined set of headings, generally following the below format.
| Heading | Function |
|---|---|
| Title | The ID number and title of the ADR being introduced |
| Status | The current status of the ADR. Accepted, Rejected, Superseded being the most common |
| Context | The technical detail about the ADR, what it’s objectives are and the problems it’s looking to address, as well as any alternatives that were considered |
| Decision | The decision that is being made by ratifying the standards proposed in the ADR alongside any comments about discarding any other alternatives |
| Consequences | The conseqences of implementing the ADR standards in the environment (both positive and negative) |
Note: ADR’s are focused more on implementation standards, rather than being end to end architecture documents for a particular solution.
As ADRs are stored in git, your version control software such as Github can then be used by your development teams to discuss and revise the proposed standard using the same processes they are familiar with, and when ADR changes are merged to the master branch of the repository they are then considered active.
How did we implement these ?
Within our environment, we created the ADR git repository and opted to use the simple ADR Tools to automate the management of the ADR files in conjunction with Github to keep the process consistent.
We then worked with the development team to define a backlog of problematic and missing standards that were causing developer friction of which we started working through to assist the development team, and held a series of team workshops to bring attention and alignment to them as they were phased in.
Out initial ADR’s for the AWS platform included:
| ADR Name | Objective |
|---|---|
| Terraform data model | Define the approach for keeping business logic out of terraform modules to assist in ongoing maintenance and unit testing |
| Code Development Standards | Define git commit, branching and pull request standards and setting the expectation of commit messages that contain clear and concise explainations and references to tracking systems such as JIRA tickets |
| AWS Resource naming standards | Define the approach to creating AWS resources in a consistent way both at the platform and end user module level |
| Semantic Versioning for Terraform Modules | Formally communicate that Terraform modules are aligning to the semver versioning framework |
How did this help ?
As we phased in the ADR process and introduced more standards, we found that developer friction was greatly reduced and and through the clear definition of standards we were able to automate their validation and enforcement in CICD and external governance / compliance systems more effectively increasing developer output and reducing friction.
Reflecting on the project, I wish that I was more aware of the ADR concept and we had implemented them earlier and I’d advocate looking to start any new project of this nature by defining some of these early cloud and development standards even before development commences to pre-empt these occurring and being able to scale your team more effectively earlier.
I encourage you to check out some other ADR examples and reference documentation in the below references which can help you see how other teams implement these for their projects.