For the better part of the last 18 months, I’ve been working as an platform engineer on a strategic cloud platform for a large Australian financial institution.
The goal of this initiative was to develop from the ground up a foundational AWS platform suitable for all of the groups cloud workloads from the most critical and regulated internet facing banking applications to backend HPC clusters.
This platform is built using a wide range of technologies that includes:
The initial project initially commenced with a core platform engineering team of approximately 10 co-located individuals to build it’s MVP foundations, however it rapidly grew as new features needed to be added and more customers were onboarded onto the platform to use it’s capabilities and fix it’s defects.
Over the coming months, additional AWS platform engineers were introduced to the team to implement things such as
Before we knew it, the 10 person team had reached almost 100, working on a codebase that was hundreds of thousands of lines in size, and as is common in large organisations these individuals were dispersed all over the world, we had representation from Sydney, Melbourne, Auckland, Wellington, Bangalore and more
It was around the time that the team had hit the 25 person mark that we observed a drop in developer output and the technical teams started to fall short of their sprint objectives. To make things more interesting this also coincided with the global COVID pandemic which forced all team members to their homes, and the programme of work needing to be delivered remotely. What was once a medium sized team all in a common office was now a fully remote software project.
We spoke to the team members, and dug up some metrics and found that the core of the problems seemed to lie with the way code was being developed and introduced into the environment. Specifically;
In addition to the technical challenges, Individuals felt that they did not have a voice or forum to influence the way that the platform’s standards were defined and evolved, this lead to;
We needed a clear way to define and collaborate on our AWS development standards. It was at this stage that one of my customers brought to my attention the concept of “Architectural Decision Records”, or ADRs as a lightweight way for us to consolidate our standards.
Familiar to many who are experienced developers, but maybe not to those who have a more traditional infrastructure background such as myself, ADR’s are described as
Lightweight in nature, each ADR consists of a small markdown document stored in a dedicated git repository that has a predefined set of headings, generally following the below format.
|Title||The ID number and title of the ADR being introduced|
|Status||The current status of the ADR. Accepted, Rejected, Superseded being the most common|
|Context||The technical detail about the ADR, what it’s objectives are and the problems it’s looking to address, as well as any alternatives that were considered|
|Decision||The decision that is being made by ratifying the standards proposed in the ADR alongside any comments about discarding any other alternatives|
|Consequences||The conseqences of implementing the ADR standards in the environment (both positive and negative)|
Note: ADR’s are focused more on implementation standards, rather than being end to end architecture documents for a particular solution.
As ADRs are stored in git, your version control software such as Github can then be used by your development teams to discuss and revise the proposed standard using the same processes they are familiar with, and when ADR changes are merged to the master branch of the repository they are then considered active.
Within our environment, we created the ADR git repository and opted to use the simple ADR Tools to automate the management of the ADR files in conjunction with Github to keep the process consistent.
We then worked with the development team to define a backlog of problematic and missing standards that were causing developer friction of which we started working through to assist the development team, and held a series of team workshops to bring attention and alignment to them as they were phased in.
Out initial ADR’s for the AWS platform included:
|Terraform data model||Define the approach for keeping business logic out of terraform modules to assist in ongoing maintenance and unit testing|
|Code Development Standards||Define git commit, branching and pull request standards and setting the expectation of commit messages that contain clear and concise explainations and references to tracking systems such as JIRA tickets|
|AWS Resource naming standards||Define the approach to creating AWS resources in a consistent way both at the platform and end user module level|
|Semantic Versioning for Terraform Modules||Formally communicate that Terraform modules are aligning to the semver versioning framework|
As we phased in the ADR process and introduced more standards, we found that developer friction was greatly reduced and and through the clear definition of standards we were able to automate their validation and enforcement in CICD and external governance / compliance systems more effectively increasing developer output and reducing friction.
Reflecting on the project, I wish that I was more aware of the ADR concept and we had implemented them earlier and I’d advocate looking to start any new project of this nature by defining some of these early cloud and development standards even before development commences to pre-empt these occurring and being able to scale your team more effectively earlier.
I encourage you to check out some other ADR examples and reference documentation in the below references which can help you see how other teams implement these for their projects.