Platform Challenge 12: Managing a Multi-Cloud Platform
I was on a team in 2017. An edict came from somewhere: we negotiated a rate with Google Cloud so you need to move from Amazon to Google. 😳 My initial reaction: woah, what? We had a number of accounts, services, and configurations associated with our AWS account. This was going to take time! Maybe not loads, but it would definitely take away from product delivery. We were one small team building part of a much larger product. There were more than 40 other global teams that also all had infrastructure in AWS. How would this work without every team all of the sudden going tools down?
What happened from there? Well, teams did not down tools, that’s for sure. The organisation had made a vendor change, but like anything else, the work associated with that change needed to be reasoned about and prioritised. Nearly six years on, that small team still exists, and it still uses its AWS services. It also now it has Google Cloud services. Some teams with much higher cloud consumption costs did immediately prioritise work to make a dent in billing, but our teams costs compared to the cost of change meant it made sense to keep what we had and change things over time.
Maybe this story sounds familiar, but it’s certainly not the dream that public clouds often sell. Migrating between clouds is not, in fact, a breeze. Over time and with growth, organisations inevitably evaluate and change their strategies on infrastructure. Whether it’s from one public cloud to another or from on prem to public cloud, these decisions are not made lightly, and the work involved to make the transition in a way that makes sense for the organisation’s context is not trivial. So it’s never a surprise to see a tech organisation that hosts its services in more than one place.
Platform teams are meant to help keep the costs of organisational change, like a new IaaS, as low as possible for application development teams. They want to provide teams with the services they need, like Redis or Nginx, but they don’t want teams to have to worry too much about all of the details and particulars of where that Redis lives.
Kratix, as a framework to build platforms, is designed to help keep that same cost of organisational change low for the platform team.
Kratix, by default, supports plurality of IaaS because it is multi-cluster out of the box and because it uses GitOps as its synchronisation mechanism. By default, the Kratix Platform cluster is isolated from its Worker Clusters so that workloads aren’t coupled to the infrastructure for the Platform itself. Additionally, all of the clusters in the Kratix system are registered to and reconciled from Repositories. The Kratix Platform knows how to define workloads that need to be deployed to Worker Clusters, and the Platform publishes workloads to the Repository when a request for a new workload comes in. The Repository is watched by all Worker Clusters, regardless of where the cluster is located, and the multi-cluster, multi-cloud nature of a typical organisation’s topology just works.
If you’re part of a platform team that supports more than one IaaS, take a look at the Kratix example deployment topology in our docs and think about how your world would be structured with Kratix. How close is it to where you are now? What advantages would it bring? What are the biggest challenges to getting there?