Fleet Management: What Platform Engineering Can Learn From Over-the-Air Car Updates

Abby Bangser
May 20
5 min read

Platform engineering promises improved developer experience and organisational efficiency, and it delivers. But these wins aren’t one-time gains. Developer experience and efficiency are games of infinite progress: there is always more to improve.

Yet if you listen to conference talks or vendor demos, you’d think the most challenging part is getting started. That’s not where the real challenge lies. The real test is in the “day 2” and “day 2000” realities: managing security patching, rolling out new cloud capabilities, sunsetting deprecated vendor features, and maintaining momentum without breaking what’s already in place.

When a platform is successful, the challenges shift. It’s no longer just about onboarding users. Instead, the platform must continue to onboard while also maintaining trust, enabling continuous transformation, and scaling capabilities and support sub-linearly so that operational costs don’t balloon alongside adoption. Thankfully, with the right approach to fleet management, platform engineering can address these challenges too.

What platform engineering can learn from over-the-air updates

Fast start, fast decay: Templates + CI pipelines create a platform nightmare

In modern organisations, the number of deployed services is continually expanding. Microservices architectures, internal tooling, test environments, and third-party integrations all add up to a sprawling, interconnected ecosystem. With this expansion, and "Vulnerable and Outdated Components" still in the OWASP top 10, managing fleets of resources (infrastructure, applications, CI/CD pipelines, etc.) becomes not just useful, but critical.

Ironically, the speed at which platforms let teams deploy new resources can become a liability. If spinning up new cloud VMs or Kubernetes clusters is too easy, who is responsible for cleaning up the mess afterwards? Without proper lifecycle management, your “platform” starts to decay and become a graveyard of forgotten services and inconsistent configurations.

Fleet updates: Avoiding “pets wearing cattle costumes”

I used to believe we had solved the “pets vs. cattle” problem. At one of my past companies, we had infrastructure-as-code, templated repos, and a decent developer experience to help teams get started quickly. Everything was stamped out in a consistent, even cattle-like way. But when it came time to update those services, we were in for a real lesson.

At one point, we wanted to shift from using `kubectl apply` for deployments to GitOps with ArgoCD. In theory, it should’ve been straightforward since we already had a common template, right?

But when we looked closer, only about a quarter of our services could be updated automatically. The rest had drifted just enough with their own custom CI/CD tweaks here and small manifest changes there that they required, at best, batch changes to different groups. For over a quarter of the repos, they actually needed bespoke manual intervention. It was frustrating. What we thought was a herd of cattle turned out to be a scattered collection of pets wearing cattle costumes.

That’s the real challenge: starting from a template doesn’t guarantee long-term consistency. Without continuous fleet tracking and drift management, we were barely better off than click ops.

Don’t plan for perfection; plan for change

It’s understandable that teams don’t want to over-architect from day one. When you’re fighting fires, investing in future unknowns feels like a luxury. But with the right platform principles, you can enable flexibility without committing to overly rigid processes.

The key is to build in feedback loops and track drift. When platforms include a callback or reconciliation mechanism, the organisation can maintain both observability and control for the long term. These callbacks and reconciliations are where the platform itself can regularly validate whether a resource matches the expected state over time. This will feel very familiar to those who already work with Kubernetes, and it is one of the reasons why Kubernetes is such a natural fit as a base for building a platform.

Platforms that build on Kubernetes don't need to create declarative outputs, but when they do, such as with the GitOps approach, it is even more powerful. That is why Syntasso Kratix Enterprise (SKE) extends the Kubernetes API to enable bespoke platform APIs and provides each with built-in native support for scheduling declarative GitOps workloads.

With that foundation, you can introduce new rules, updates, and enhancements over time without requiring a full rewrite or causing service interruptions. It’s not about predicting the future. It’s about being ready for it and enabling revolution through evolution.

Fleet management: Inspiration from the physical world

Most infrastructure innovation over the past decade has drawn inspiration from the software world: cloud APIs, CI/CD pipelines, and the rise of SRE principles. But when it comes to fleet management, we might need to look to balance with the other side of things, the physical side. One very mature physical manufacturing industry is the automotive industry.

When car manufacturers ship a vehicle, they can’t predict how far the owner lives from a dealership or whether they’ll bother to schedule a service appointment. Yet if a defect is found, it must be addressed and often quite quickly, given the level of risk associated with high-speed vehicles. Historically, this meant recalls, in-person visits, delays, and customer frustration.

I’ve experienced this firsthand: I once needed a dent fixed on my car, but the dealership refused to work on it until they also performed a mandatory software update. That update wasn’t available for weeks since they had to book it in with a specific technician, and the hassle meant I just delayed everything. This meant I not only didn't get my cosmetic dent fixed, but I also didn't get that "mandatory" and "high priority" update for months.

Now, my newer vehicle handles updates over the air. While it’s parked in my driveway, it can download and install critical software updates overnight without any dealership visits, any downtime, or really any friction.

This is the inspiration for what platforms should offer: non-disruptive, centrally coordinated, reliable updates across an entire fleet.

Platform orchestration must provide over-the-air updates

With an API based platform orchestrator, you should get the velocity of self-service and templates without the maintenance overhead. Here are a few scenarios that any orchestrator should be able to support:

Push a security patch to every environment immediately
Roll out that new version to non-production workloads first, and then gradually escalate to high-risk production workloads
Allow users to pull in updates until the final deprecation date when their upgrade is forced

Syntasso Kratix Enterprise (SKE) supports these scenarios and more. It embodies the principle of “over-the-air” updates for your platform.

With SKE, all platform resources are created on demand using predefined configurations. However, unlike the typical cookie-cutter templates and pipelines approach, SKE does not stamp out unmanaged copies. Instead, it creates a cohesive fleet where every resource can be tracked, updated, and governed centrally.

Because the update mechanism is part of the core open source Kratix framework itself, the cost of updating a single resource is the same as updating a thousand. And since everything is auditable and policy-driven, risk becomes visible, manageable, and even boring.

Conclusion: Fleet Management Is the Day 2 Advantage

Great platforms don’t just help development teams start fast. They help teams and organisations evolve quickly without chaos.

Fleet management can’t be an afterthought. It must be a core capability for any mature platform engineering effort. By learning from physical-world systems and embracing flexible, feedback-driven architectures, platform teams can stay ahead of drift, reduce operational toil, and confidently evolve their services.

If your platform can’t keep pace with your organisation’s growth, it’s not really a platform. With solutions like Syntasso Kratix Enterprise, you don’t have to choose between velocity and governance, both are core capabilities.