What Are Kubernetes Operators, and Do You Still Need Them in 2025?
- Adil Sameer Shaikh
- Jul 25
- 12 min read
The Kubernetes market is booming and is expected to reach a $3.76 billion market size this year, reflecting its widespread adoption and innovation. At the same time, platform engineering is evolving rapidly with Kubernetes as its backbone.
When using Kubernetes, you will most likely use multiple Kubernetes Operators. Kubernetes Operators revolutionised how complex applications are managed on Kubernetes by automating operational tasks that traditionally required human intervention. But as you navigate the Kubernetes landscape in 2025, you might wonder if Operators still hold the key to efficient infrastructure management or if their limitations are prompting a shift.
Modern developer platforms prioritise self-service and streamlined workflows, empowering teams to deploy and manage applications with minimal friction. Yet, off-the-shelf Kubernetes Operators often struggle to keep pace, creating challenges like rigid configurations, limited customisation, and integration hurdles that hinder scalability and developer autonomy. And bespoke operators bring with them high barriers to entry due to the complexity of writing and maintaining.
This article explores the current state of Kubernetes Operators, the pain points you might face, and introduces Kratix—a promising alternative that abstracts operational logic into flexible, declarative workflows, potentially redefining how platform teams manage Kubernetes infrastructure.
What are Kubernetes Operators?
Kubernetes Operators emerged to solve a key challenge: how do you automate the management (Day 2) of complex applications such as databases or message queues within Kubernetes?
While Kubernetes was originally designed with stateless workloads in mind, many real-world applications require nuanced operational knowledge to manage lifecycle events like backups, scaling, upgrades, and failovers. This need led to the creation of the Operator pattern by CoreOS in 2016, and as CoreOS CTO Brandon Philips explained at the time:
“We are introducing the concept of an ‘Operator.’ It’s a concept for taking a lot of the knowledge an engineer [or developer] has inside of a script or run book — domain-specific knowledge — and writing software that can do a lot of that automatically.”
This innovation allowed operational expertise to be encoded in software, reducing manual intervention and operational risk.
The operator pattern: CRDs and controllers
At the heart of the Operator pattern are two key concepts:
Custom Resource Definitions (CRDs): These let you introduce new types of resources into your Kubernetes cluster, such as postgresql or InnoDBCluster, tailored to specific applications.
Controllers: These are control loops that continuously monitor your custom resources and take action to reconcile the current state of your cluster with the desired state you’ve defined.

Distinguishing Operators from Controllers
While the terms are related and often used interchangeably, an Operator is not just a controller, and understanding the difference is important:
A controller is a Kubernetes concept representing a control loop that manages resources by reconciling desired and actual states. Controllers can be generic (like the Deployment controller) or custom-built for specific resources.
An Operator is a higher-level pattern that encapsulates domain-specific operational knowledge. It typically includes one or more controllers, plus the custom resource definitions (CRDs) and the operational logic needed to manage complex applications automatically.
In other words, a controller is a component within an Operator, responsible for the automation, while the Operator as a whole represents the full solution that extends Kubernetes to manage a particular application or service.
Real-world examples of Kubernetes Operators
Let’s look at some concrete examples to illustrate how Operators simplify complex deployments:
Zalando Postgres Operator
The Zalando Postgres Operator enables you to deploy a production-ready PostgreSQL cluster with a manifest like the one below:
With this, the operator handles StatefulSet creation, persistent storage, user setup, and service discovery by abstracting away the low-level Kubernetes resources.
Prometheus Operator
The Prometheus Operator simplifies the deployment and management of Prometheus monitoring stacks on Kubernetes. Instead of manually setting up configurations, you can define monitoring components using custom resources like ServiceMonitor and Prometheus.
Here’s an example of a Prometheus instance defined with a minimal YAML file:
This Operator not only manages deployment and scaling but also streamlines service discovery, alerting configurations, and seamless integration with Grafana dashboards.
RabbitMQ Operator
The RabbitMQ Cluster Operator enables Kubernetes-native deployment and lifecycle management of RabbitMQ clusters. It simplifies not only provisioning but also configuration, scaling, TLS, and high availability.
Here’s an example of how you can define a RabbitMQ cluster using a minimal YAML file:
This Operator handles advanced operations, such as rolling upgrades, persistent storage management, and secure credential rotation, making it easier to run RabbitMQ reliably in production Kubernetes environments.
How are Kubernetes Operators created?
When people discuss Kubernetes Operators, a common misconception often arises: “Operators must be written in Go.” While Go is indeed the most popular and widely supported language for building Kubernetes controllers, especially due to its native integration with the Kubernetes API, it’s not the only option.
As highlighted in this KubeCon talk, platform teams shouldn't feel limited by Go. If you're more productive in Python, Java, or even Bash scripts for simple tasks, there's a path forward. The Kubernetes API is simply an HTTP interface, meaning that any language capable of speaking REST and monitoring events can be used to build a controller.

As the data above shows, Go-based Operators are by far the most popular on OperatorHub.io, reflecting their strong community and ecosystem support. However, alternatives of Operators written in Ansible, Helm, Python, Java, and other languages are also represented, giving you flexibility based on your team's expertise and project needs.
The five Operator capability levels
Operators can be designed with varying degrees of automation and intelligence, which are categorised into five capability levels (as shown in the image below):

Basic install: Automates application installation and configuration.
Seamless upgrades: Supports smooth, automated upgrades of both the application and the Operator itself.
Full lifecycle: Manages backup, restore, failover, and advanced configuration flows.
Deep insights: Provides monitoring, custom metrics, and alerts for proactive management.
Autopilot: Enables self-healing, auto-scaling, and performance tuning with minimal human intervention.
The benefits that made Operators popular
Kubernetes Operators gained traction because they provided a consistent and automated way to manage complex applications, a space where vanilla Kubernetes primitives fall short. By extending Kubernetes' declarative and event-driven model, Operators empowered platform teams to codify operational knowledge into custom controllers.

1. Codifying operational knowledge
Operators transform the expertise of SREs and DevOps engineers into reliable, automated software. Instead of relying on manual runbooks or ad hoc scripts, Operators encapsulate routine yet complex tasks such as database upgrades, failover handling, and backup/restore operations within a controller loop. This loop continuously reconciles the desired state declared in a CRD with the actual state of the system.
Technical impact:
Ensures operational consistency across clusters
Reduces human error and manual intervention
Enforces organisational best practices at scale
2. Automating complex stateful services
Some Operators are purpose-built to manage stateful applications that require persistent storage, coordinated configuration, and robust recovery logic. Examples include the Postgres Operator for PostgreSQL, Prometheus Operator for monitoring, and Operators for Kafka, Redis, and more.
Technical impact:
Automates the entire lifecycle: provisioning, health checks, scaling, patching, and disaster recovery
Makes running stateful services as simple as applying a YAML manifest
Empowers teams to run production-grade data services natively on Kubernetes
Technical impact:
Enables repeatable, auditable, and reliable deployments
Facilitates rollbacks, change tracking, and compliance
Bridges the gap between infrastructure-as-code and application delivery
3. Native integration with Kubernetes RBAC and CRD ecosystem
Operators extend Kubernetes with new resource types via CRDs and integrate seamlessly with native Role-Based Access Control (RBAC). This means that custom resources, such as KafkaCluster or ElasticsearchNode, are treated as first-class resources in the cluster.
Technical impact:
Provides granular access control and security for custom resources
Supports seamless observability and management through standard Kubernetes tooling (kubectl, events, metrics)
Ensures compatibility with existing Kubernetes policies, admission controllers, and ecosystem tools
The growing pain points with Operators in 2025
As the Kubernetes ecosystem matures, platform teams are encountering significant challenges that limit the long-term scalability and flexibility of the Operator pattern. These issues are especially pronounced in large organisations and platform engineering teams striving for composability and rapid innovation.

1. Operator overload: When Operators start to pile up
Each new service, database, or infrastructure component often requires its own dedicated Operator, resulting in dozens, sometimes hundreds, of controllers running within a single cluster. This “Operator overload” leads to:
Increased resource consumption: Every Operator runs as a separate pod, consuming CPU and memory.
Complex dependency management: Operators may have overlapping responsibilities or conflicting resource requirements, making upgrades and troubleshooting more difficult.
Operational blind spots: With so many controllers, it becomes challenging to monitor, audit, and govern their collective behaviour across environments.
2. Lack of composability across teams
Operators are typically designed around a single application or domain, tightly coupling business logic and operational workflows. This creates technical silos:
Limited reuse: Teams cannot easily compose automation logic from multiple Operators to create higher-level workflows.
Duplication of effort: Similar operational patterns (e.g., backup, scaling, monitoring) must be reimplemented in each Operator, leading to code and maintenance bloat.
3. Upgrading and maintaining Operators is hard
Managing Kubernetes Operators is no small feat, especially as organisations scale their Kubernetes environments. The operational complexity of upgrades and maintenance, combined with a shortage of specialised expertise, creates significant hurdles:
Version drift: Keeping Operators in sync with Kubernetes API changes and security patches requires constant vigilance.
Complex upgrade paths: Updating an Operator may require downtime, custom migration scripts, or even breaking changes to CRDs.
Testing overhead: Each Operator upgrade must be validated against the specific workloads it manages, increasing QA and release complexity.
Engineering Bottlenecks: Writing and maintaining custom Operators requires niche expertise, which is scarce in most organisations. Even teams that successfully build custom Operators initially struggle to scale them across diverse use cases, as the limited pool of skilled engineers becomes a bottleneck.
4. Business logic locked inside compiled code
The core intelligence of most Operators, their reconciliation logic and automation workflows, is embedded deep within the source code:
Opaque automation: Platform teams cannot easily inspect, audit, or modify the logic without programming expertise and access to the source repository.
Slow iteration: Because the operational logic is hardcoded, making even small updates like tweaking a retry policy or changing a backup schedule requires developers to modify the code, recompile the Operator, run it through CI/CD pipelines, and redeploy it to the cluster.
Limited extensibility: Integrating new policies, compliance checks, or custom workflows often means forking or rewriting Operators, which is both risky and time-consuming.
5. Multi-Cluster management challenges
Operators are inherently designed for single-cluster environments, creating significant hurdles in today’s multi-cluster and multi-cloud world:
Synchronisation Issues: Ensuring consistent Operator configurations and state across multiple clusters is complex and error-prone.
Scalability Barriers: Deploying and updating Operators across tens or hundreds of clusters amplifies consistency, maintenance and governance challenges.
Inconsistent Behaviour: Variations in cluster configurations can lead to unpredictable Operator performance, undermining reliability.
Security risk: When an operator needs to manage resources across multiple clusters, it requires elevated permissions to make direct calls to other clusters. This creates a security risk where the operator cluster must be granted broad access credentials, potentially exposing multiple clusters if the operator cluster is compromised. While these cross-cluster operations are technically feasible with proper RBAC configuration, the expanded permission scope increases the attack surface and requires careful security consideration.
Are there better patterns emerging?
As the Operator pattern reveals its limitations at scale, is there a better way to extend Kubernetes in delivering “Operational” knowledge?
Yes, there is, and there has always been.
It is going back to the source of truth. Custom Resource Definitions (CRDs) and Controllers.
Instead of embedding business logic directly into compiled controller binaries, modern teams are moving toward fully declarative definitions of both infrastructure and workflows. The desired state is expressed in YAML or domain-specific languages (DSLs), while execution logic is separated from resource definitions.
Technical advantages include:
Improved testability and visibility by decoupling what should happen from how it happens
Enhanced reusability of workflows across teams and projects
Seamless alignment with GitOps principles, where infrastructure, policies, and workflows are version-controlled and auditable
Now, how can you do this?
Shift toward declarative workflows with Kratix Promises
Kratix is an open-source framework designed to help platform engineering teams deliver self-service, composable platforms on Kubernetes. It enables teams to expose operational capabilities (like databases, queues, or third-party services) as easily consumable APIs for developers.
At the core of Kratix is the concept of a Promise—a powerful abstraction that defines:
What a developer can request (e.g., a PostgreSQL database, Redis cache, or a SaaS integration)
How that request is fulfilled, using declarative, GitOps-native workflows (built with Terraform, Helm charts, Operators, or custom scripts)
A Kratix Promise is a Kubernetes Custom Resource Definition (CRD) combined with a delivery pipeline, enabling platform teams to create reusable, composable workflows that are easy to maintain and evolve.
What’s inside a Kratix Promise?
A Promise consists of three main parts, as detailed in the Kratix documentation:

api: Defines the API schema (CRD) that developers use to request resources.
dependencies: Lists prerequisites that must be installed on target clusters before the Promise can fulfil requests.
workflows: Contain pipelines executed at various lifecycle stages, such as Promise installation or resource creation, encoding the business logic needed to deliver the requested capability as a service.
Example: Kratix app Promise
Below is a simplified example of a Promise for delivering “Apps as a Service”, bundling a Deployment, Service, and Ingress (with NGINX) into a single requestable resource. This example is adapted from the Kratix workshop:
Below is a breakdown of the three main parts of the Promise above:
api: Defines the CRD for the App resource, specifying required fields like image and service.port.
workflows.promise.configure: Can be used to ensure that your promise dependencies are installed, in this nginx
workflows.resource.configure: Specifies a pipeline that provisions the Deployment, Service, and Ingress for each app request.
For a detailed walkthrough example of Kratix in action, see Kratix for Database Management: A Step-by-Step Guide to Amazon RDS Deployment.
Kubernetes Operators vs declarative workflows: A comparison table
Aspect | Kubernetes Operators | Declarative workflows (e.g., Kratix Promises) |
Definition | Custom controllers that extend Kubernetes APIs using CRDs to automate lifecycle management of complex, stateful applications. | Declarative resource definitions paired with flexible, version-controlled workflows that separate what is requested from how it’s fulfilled. |
Automation scope | Automates deployment, upgrades, scaling, backups, failover, and recovery for specific applications, encoding operational knowledge in code. | Automates provisioning, management, and business processes (e.g., compliance, approvals) via reusable, composable workflows that integrate tools like Terraform, Helm, and Operators declaratively. |
Complexity | Requires development and maintenance of compiled controller code; high complexity in lifecycle management logic. | Uses declarative YAML and GitOps-native pipelines; separates orchestration logic from resource definitions, simplifying updates and testing. |
Flexibility and composability | Tightly coupled to specific applications, limiting reuse and composability across teams and services. | Highly composable and modular, workflows and APIs can be reused and combined across platforms and teams. |
Developer experience | Developers interact with custom resources but may face backend complexity and limited visibility into workflows. | Developers interact with simple platform APIs (Promises), abstracted from backend orchestration complexity, improving productivity. |
GitOps compatibility | Supports declarative management, but business logic is embedded in code, making GitOps integration less flexible. | Fully GitOps-native; workflows and resource states are version-controlled, auditable, and easily rolled back. |
Upgrade and maintenance | Upgrading Operators can be complex, requiring code changes, recompilation, and careful testing. | Workflows and APIs can be updated independently without redeploying controllers, enabling easier maintenance. |
Security and RBAC | Integrates with Kubernetes RBAC and secrets management, enabling fine-grained access control on custom resources. | Also leverages Kubernetes RBAC; policies can be enforced declaratively across workflows and resources. |
Use cases | Best suited for managing complex, stateful applications like databases, caches, and monitoring systems. | Ideal for building composable internal platforms, multi-cloud provisioning, and abstracting infrastructure complexity, while enabling business-driven processes like compliance workflows and team coordination. |
Ecosystem and tooling | Mature ecosystem with many Operators available; requires Operator SDK or custom development. | Emerging ecosystem focused on platform engineering tools like Kratix, emphasising modularity and GitOps workflows. |
Should you still use Kubernetes Operators in 2025?
The short answer? It depends.
Operators still serve a purpose, especially for mature, complex stateful applications (e.g., databases, monitoring stacks) where community-maintained Operators like Postgres or Prometheus abstract away well-understood operational tasks. These "battle-tested" Operators benefit from broad adoption, reducing the need to reinvent the wheel.
However, the rise of platform engineering and declarative infrastructure has exposed limitations in the Operator pattern, particularly for custom use cases:
Tight coupling: Operators embed operational logic directly into the cluster, creating inflexible, hard-to-reuse components.
Maintenance overhead: Building and maintaining custom Operators demands significant Kubernetes expertise and ongoing investment.
Declarative gap: Operators often mix imperative logic with business requirements, straying from GitOps ideals.
Rethink custom operators unless necessary
Before building a new Operator, ask:
Can this be declarative? Can the desired state of the infrastructure and application be expressed purely through declarative configuration, rather than imperative code within an Operator?
Is this composable? Can the desired functionality be broken down into smaller, reusable building blocks that can be combined in various ways, rather than a monolithic Operator?
Can this be separated from business logic? Is the operational concern truly infrastructure-related, or is it intertwined with specific business logic that would be better managed outside the infrastructure layer?
The best of both worlds: Operators and flexibility
For teams already invested in Operators, Kratix offers a bridge. By wrapping Operators (or their logic) into Kratix Promises, you retain their benefits while decoupling them from your platform’s core. Promises let you:
Abstract away Operator complexity behind declarative APIs.
Compose Operators with other workflows (e.g., provisioning external services).
Future-proof your platform by swapping underlying Operators without breaking user workflows.
Operators aren’t obsolete, but the industry is shifting toward modular, platform-native abstractions. Tools like Kratix enable you to evolve beyond hardcoded Operators while still leveraging their ecosystem, providing platform teams with control without lock-in.
A smarter way forward: Beyond the Operator pattern
Kubernetes Operators were a revolutionary step in automating operational knowledge, but as platform engineering matures, it’s time to think bigger. The future isn’t about embedding more logic into clusters; it’s about flexibility, reusability, and developer-first interfaces that scale with your team’s needs.
What modern platforms demand:
Composability: Glueing together Operators, APIs, and external services without tight coupling.
Declarative control: Defining what you need (not how it runs) and letting the platform handle the rest.
Portability: Avoiding lock-in to any single implementation (e.g., swapping Operators without refactoring apps).
With Kratix, you can expose capabilities, whether backed by Operators, external tools, or custom workflows, as self-service APIs (Promises). This shifts the focus from how things run to what outcomes teams need, thereby offering:
A unified interface: Developers request resources declaratively, while platform teams manage the underlying logic (even wrapping existing Operators).
True reuse: Promises compose across environments, teams, and vendors.
Future-proofing: Change infrastructure implementations without breaking user workflows.
Operators paved the way, but the next era of platforms demands abstraction, not accumulation. Explore how Kratix can help you build a modular, maintainable, and user-centric platform without starting from scratch.
Comments