Platform Challenge 11: Scaling the Platform Beyond a Single Kubernetes Cluster
So you've successfully migrated an application or team to a single Kubernetes cluster. Perhaps you're using a managed offering such as GKE, EKS or AKS. You have your applications running and are happy with how things are going.
You're so successful that the other teams in your organisation are now also wanting to migrate to Kubernetes. No problem! You start setting up your cluster for multi-tenancy.You create namespaces for each team and strict RBAC rules for each team to ensure proper isolation. As you do this, the stack running on your cluster quickly begins to explode. Team A needs Flux and Redis to get going, team B needs PostgreSQL and Jenkins, and team C uses Crossplane and ArgoCD to drive its infrastructure provisioning.
All of a sudden the software stack is becoming overwhelming, software that isn't designed to run together on the same cluster like Flux and ArgoCD begin fighting, and the large number of Custom Resource Definitions from components like Crossplane quickly begins to slow down your cluster. You have different software requiring the same dependencies but at different versions which aren’t designed to be run more than once per cluster, such as cert manager. Your worker count starts to grow into the hundreds, with various different types (ARM, GPU etc). Kubernetes upgrades become unthinkable. A single bug or issue on your single cluster could take everything down. Things are quickly spiralling out of control.
You're starting to wonder if the one cluster approach is feasible. The alternative is clear: isolate your teams across different kubernetes stacks, enabling you to handle each cluster individually, reducing the mental load for your application teams and preventing the likelihood of software clashes. But with this comes an added cost. You now need a more robust mechanism for orchestrating the provisioning of your clusters and, more importantly, an easy way to orchestrate the deployment of large, complex application stacks across various clusters. Managed cloud Kubernetes offerings have made the former easier, but what software can you use to help with the latter?
Kratix is a framework for building out your platform as a product. It provides multi-cluster scheduling as a first class concern. Kratix is built upon the concepts of Promises. A Promise is an encapsulation of a service you want to provide to your consumers, the application teams. Promises are designed to work across multiple clusters, and to be able to organically scale as you introduce more clusters. A core concept of Promises are worker cluster resources, which provide a way of defining a set of resources that need to be deployed to a cluster to make a service available for request. Promises declare which clusters are eligible to provide the Promise, and therefore require the worker cluster resources, by using a Kuberntes-native labelling approach.
Let's take Team A from above. They need Flux and Redis to be available. Every time they need a new environment, they want Flux and the Redis Operator to be automatically installed. You can encapsulate those two requirements into a Promise and specify that you need these applications to be running on all clusters with the label team: A. Kratix will then take control and deliver on this Promise by orchestrating all of these applications to clusters with the label team: A. As you grow your clusters and start to provision more to serve team A's growing needs, all you have to ensure is that the new clusters are labelled correctly.
You might then notice that teams are starting to have overlaps of some core must-have apps, like Prometheus. Rather than including Prometheus within every individual Promise, you could write a Compound Promise that's set up to go to all clusters, regardless of labels. Promises make the process of shifting from single- to multi-cluster Kubernetes easy.
This blog is the eleventh in our series, The 12 Platform Challenges of Christmas. Check back daily until January 5th, 2023 for new posts!