Platform Challenge 1: A Small Team Supporting a Big Platform
Over the last 6 years I have had the pleasure of working on a few amazing platform teams. In each case we had an outsized impact on the productivity and happiness of a much larger engineering team. In both cases we hovered around 5% to (at most) 10% of the headcount for the total engineering team. From what I can tell, this is quite a common platform team scale.
The responsibilities we had included but were not limited to:
Underlying networking (e.g. certificates for secure transit and VPN)
Identity management and providing least-privilege permissions
Providing instances of existing infrastructure (e.g. another PostgreSQL database)
New infrastructure procurement (e.g. exploring options for a queuing system)
Management of long-running environments (e.g. upgrades to production K8s clusters)
Security (e.g. egress control, hosting penetration tests)
Given that each item in this list can itself be a team’s remit in larger organisations, there was always a new capability available to supercharge one of our application development teams. The problem was, we spent a lot of our time responding to inbound tickets asking for things that required our support. In each case, moving more and more of our solutions into self-serve capabilities unlocked us from this burden.
For example, each time a software engineer needed to introduce a new application secret or rotate an existing secret, they had to ask the platform team to do so since we were the only ones with admin access to manage secrets. We were able to reduce the risk around secret access, but as you can imagine with more than a 10-1 ratio of software devs to platform engineers, this was a frequent, time-intensive exercise. To reduce our team's toil, we designed an internal CLI command that allowed engineers to introduce new secret versions securely and roll versions out safely through a Helm values file change. This removed many requests to the platform team that were both time-consuming and fraught with errors due to miscommunication and manual processes. It also reduced the time for the application team to roll out a new secret from hours or even days to just minutes.
As a team we aimed to provide self-service options for anything where we were confident to define a good or best practice (see Cynefin framework for more detail). This unlocked our team to explore the more complex or even novel areas where the business did not yet have a known solution.
Managing self-service offerings did not come for free, and one of our biggest costs was the fact that each offering had a different interface. If this is a problem you face, you may be interested in the 9th blog in this series “App Teams Need to Learn Platform Tools”.
This blog kicks off our series, The 12 Platform Challenges of Christmas. Check back daily until January 5th, 2023 for new posts!