Platform Engineering in the Age of AI: Why Operational Complexity Is the New Bottleneck

Daniel Bryant
11 hours ago
5 min read

Recently, I had the pleasure of presenting at an O’Reilly Infrastructure and Ops Superstream on a topic that’s becoming increasingly urgent for engineering leaders, platform teams, architects, and developers alike:

AI is accelerating software creation faster than organisations can safely operationalise it.

That single observation formed the backbone of the session, and it reflects what we’re consistently seeing at Syntasso when working with organisations building modern internal platforms.

AI coding assistants, autonomous agents, and increasingly capable developer tooling are dramatically reducing the friction involved in writing software. But while coding is becoming cheaper and faster, operational complexity hasn’t disappeared. In many organisations, it’s getting worse.

The result is that platforms are now more important than ever.

AI is accelerating software creation faster than organisations can safely operationalise it.

The recording of the talk is available to O’Reilly platform subscribers (a free trial is available, too), and you can find the slides on SpeakerDeck:

Quick interlude: If you’re struggling with operationalising AI on your platforms, check out Syntasso Kratix Agentic (SKA), which enables you to move your existing technology investments into the AI-enabled world without rebuilding your platforms or creating unmanaged agent sprawl.

The New Bottleneck Isn’t Coding

Over the last decade, we’ve made incredible progress in software delivery. Cloud platforms abstracted infrastructure, Kubernetes standardised orchestration, CI/CD automated deployment, DevOps improved collaboration, and platform engineering improved developer self-service. Now AI is transforming how software itself gets created.

But here’s the critical issue: writing code is no longer the bottleneck. Operationalising code is.

AI can generate application code with relative ease. We’re regularly bumping into traditional organisations leveraging Claude Code and Codex extensively. Even a few are exploring the idea of building their own Gas Town to go all-in on the agentic software factory concept.

AI is also increasingly being used to generate Terraform modules, Kubernetes manifests, microservices, CI/CD pipelines, and infrastructure automation. But generating software isn’t the same as safely operating software at scale.

If your platform already struggles with inconsistent templates, manual approvals, fragmented ownership, upgrade pain, or operational drift, then AI will magnify those issues dramatically. As I said during the talk, AI amplifies both the strengths and weaknesses of your organisation.

Platform Architecture Matters as Much as Software Architecture

One of the core arguments I made during the presentation is that we need to take platform architecture far more seriously.

Software engineers have spent decades refining architectural patterns around layered systems, bounded contexts, APIs, lifecycle management, and service decomposition. Yet platform implementations are often still assembled as loosely connected tooling stacks with unclear ownership and weak abstractions.

That approach does not scale in an AI-driven world.

At Syntasso, we’ve increasingly found it useful to think about platforms as a three-layer architecture: application choreography, platform orchestration, and infrastructure composition. We shared this model over two years ago, and it has proven useful both technically and organisationally.

The application choreography layer is the developer-facing experience. It includes portals, CLIs, APIs, declarative application definitions, and the workflows developers (and increasingly AI agents) use to consume platform capabilities. This layer should focus on intent and outcomes rather than infrastructure implementation details.

At the bottom sits the infrastructure composition layer. This is the domain of infrastructure specialists, SREs, and platform engineers who manage compute, networking, storage, security, and observability. Critically, these teams must retain ownership of infrastructure lifecycle management. Without that ownership, upgrades, security patching, and operational consistency become increasingly difficult.

The middle layer, the platform orchestration layer, is where I think many organisations are currently missing an essential abstraction. Too often, teams connect developers directly to infrastructure tooling via raw Terraform configuration and modules, Kubernetes manifests, or cloud APIs. That creates tight coupling and operational fragility over time.

Instead, platforms need a dedicated orchestration layer responsible for lifecycle management, capability definitions, policy enforcement, abstraction boundaries, and API standardisation. AI agents make this orchestration layer even more critical.

The problem isn’t that AI agents make mistakes. Humans already do this (hands up who hasn’t broken a production at least once!). The problem is that AI dramatically increases the rate at which infrastructure changes can be generated and propagated across systems.

AI Makes Platform Lifecycle Management Critical

One of the recurring themes during the presentation was that AI is excellent at generating variation. That’s incredibly powerful, but uncontrolled operational variation quickly becomes dangerous.

If every team, or every agent, forks infrastructure definitions and customises platform templates independently, upgrades become painful, security patching slows, operational drift explodes, and platform consistency disappears.

This is why lifecycle ownership matters so much.

Platform teams should expose capabilities as products and services rather than distributing endlessly customizable infrastructure fragments. The goal is to enable teams to move quickly without sacrificing operational safety or maintainability.

The “Platinum Platform Metrics”

A common question I get from platform teams is how to actually measure platform success. Many organisations still struggle to define meaningful platform metrics beyond vague developer satisfaction scores.

During the talk, I proposed three “North Star” metrics, which I called the "Platinum Platform Metrics".

The first is the time to provision a platform capability. How long does it take to provision a capability bespoke to your organisation’s needs, such as a database, a development environment, observability tooling, or a deployment environment? In many enterprises, the answer is still measured in weeks or months. Modern platforms should aim for minutes.

The second metric is time to upgrade all instances of a capability. How quickly can you patch a vulnerability, update a runtime, or roll out policy changes across your entire fleet? This is where lifecycle ownership becomes essential.

The third metric is time to offer a new platform capability. This is firmly focused on your platform's producers. How quickly can your organisation introduce a new AI model, a new observability stack, or a new deployment pattern? Increasingly, this metric determines how adaptable your organisation can be in a rapidly changing technology landscape.

Team Topologies and Platform Engineering

Technology alone won’t solve these challenges. The organisational side of platform engineering matters just as much.

The principles from Team Topologies remain highly relevant in the AI era. Stream-aligned teams, platform teams, enabling teams, and clear interaction modes all become increasingly important as software delivery accelerates.

In particular, the concept of X-as-a-Service becomes critical. AI agents, just like humans, need well-defined APIs, clear ownership boundaries, stable abstractions, and self-service capabilities. The more predictable and consumable your platform becomes, the more effectively both humans and agents can operate.

The goal is not simply automation. The goal is to enable predictable, scalable, and safe delivery of value across the organisation.

What This Means for Platform Teams

If there was one key message from the talk, it was this: organisations that invest in platform architecture now will be significantly better positioned for the AI-native future.

Not because platforms are trendy, but because platforms are becoming the operational control plane for AI-enabled software delivery.

The organisations that succeed will optimise for the flow of value, embrace clear lifecycle ownership, invest in platform APIs, reduce cognitive load, and treat platforms as products.

AI is accelerating the pace of software creation. Platforms determine whether organisations can keep up safely.

These are exactly the kinds of operational and lifecycle management challenges we’ve been focusing on at Syntasso. You can learn more about how Syntasso Kratrix Agentic (SKA) helps organisations like yours to move your existing technology investments into the AI-enabled world without rebuilding your platforms or creating unmanaged agent sprawl.

Thanks again to O’Reilly for hosting the session, and thanks to everyone who joined the discussion!