top of page

GitOps Health Checks: How Kratix Closes the Feedback Loop

According to the CNCF’s GitOps survey, observability is one of the biggest challenges GitOps adopters face. Platform teams often lack visibility into the post-deployment state of their workloads. In this post, we’ll examine how Kratix addresses this challenge by offering a flexible and extensible approach to tracking and surfacing service health across your platform.


In previous posts, we introduced Kratix’s GitOps scheduler, which enables platform teams to declaratively deliver workloads to various destinations, including Kubernetes clusters, cloud platforms, or on-premises environments, all via a StateStore. A StateStore in Kratix is a system that can store the declared desired state. 


At the time of writing, the supported types are S3 and Git, with support for OCI Registries on the roadmap. Once stored in these systems, they can then be converged upon by external tooling, the traditional example being a Git repository synced by Flux or ArgoCD.


However, once you commit to the StateStore and Kratix hands the workload off to an external system, such as Flux, Terraform, or Ansible Tower, a familiar question arises: what happens next? Did it work? Is the app running? Is it healthy? 


Let’s explore the problem of GitOps observability and then dive into the solutions.


Why should I care about GitOps visibility?

GitOps is changing how platform teams manage infrastructure. By using declarative systems, such as OCI Registries or Git Repos as the source of truth and driving deployments through version-controlled changes, teams benefit from consistency, traceability, and a clear separation of responsibilities. Adoption continues to rise across the industry. In fact, one recent survey found that 93% of development teams are adopting GitOps practices.


But while GitOps excels at defining and delivering the desired state, it often leaves a gap when it comes to confirming the actual state. Once a change is pushed and picked up by a tool like Flux or Argo CD, platform teams are left asking: did it work? Is the service running? Is it healthy? 


Without visibility into what happens after deployment, GitOps workflows can feel like a black box. That’s where Kratix comes in, extending GitOps with built-in observability so teams can confidently answer those post-deployment questions.


Observability for GitOps with Kratix

Kratix exposes a lightweight API for capturing health reports from any system, running anywhere. This API is implemented as a Kubernetes Custom Resource Definition (CRD) called a HealthRecord.


Anyone with the right Kubernetes RBAC permissions can create a HealthRecord, making it easy for external systems to report their health status back to the platform. This can be done directly via the Kubernetes API, or indirectly, as we’ll explore below.


Here’s a basic example of what a health check might look like for a Kubernetes deployment:


kind: HealthRecord

metadata:

  name: jake-test

  namespace: default

data:

  details:    totalReplicas: 3

    readyReplicas: 3    restartCount: 0

  lastRun: 1751553041

  promiseRef:

    name: deployment

  resourceRef:

    name: backend

    namespace: default

  state: healthy


This record gives a quick view of the system’s health, along with metadata about how that status was determined and which resource it applies to, in this case, a backend deployment.


One of the most significant advantages of Kratix’s health model is its flexibility. Because Kratix can push workloads to various destinations, it also supports collecting health data from those destinations, regardless of their location or operation. As long as the system can obtain health information from a visible source to Kratix, it works.


A Real-World GitOps Observability Example

Let’s say your production application consists of three parts:

  • A container running in Kubernetes

  • A cloud-managed database provisioned with Terraform

  • A cache running on-prem, managed with Ansible


All of these can be deployed via GitOps using a Kratix Promise. The Promise defines what needs to be deployed and where it should be deployed. Kratix handles scheduling:

  • Kubernetes manifests are written to a Git repo watched by a cluster

  • Terraform configs are written to a Git repo monitored by Terraform Enterprise

  • Ansible playbooks are pushed to a Git repo accessible to the on-prem setup


Each system picks up its config and converges:

  • Kubernetes deploys the app

  • Terraform provisions the database

  • Ansible configures the cache

ree

Now the platform needs visibility. Did everything deploy correctly? Are these components still healthy?


This is where Kratix HealthRecords come in. Each system can push health status updates to Kratix using the method that best suits its environment.


If you’re already in Kubernetes, you can use the Syntasso Kratix Enterprise (SKE) health checks runner. It periodically posts health reports directly to the cluster. But what if you’re in an on-prem environment without access to the platform’s Kubernetes API?


StateStore as the Health Check Path

In cases like these, direct API access isn’t an option, but you already have a shared interface between the platform and the external system: the StateStore.


Instead of pushing the health check directly to the platform, the external system can push a health record to the same StateStore (e.g. Git repo) that Kratix wrote to earlier. Kratix is already watching that repo, so it can pull the health update back into the platform.


This creates a two-way GitOps workflow. The StateStore becomes both the delivery path and the feedback channel. It’s simple, secure, and works well across firewalls or air-gapped setups.

ree

Scaling StateStores beyond the traditional Git setup

The most widely used StateStore type today is Git, which is a well-tested solution; however, for systems that require frequent or high-volume health checks, constant commits aren’t practical. For that, Kratix also supports reading from S3-compatible buckets and has plans on its roadmap to support OCI Registries.


Kratix's flexibility allows you to use one StateStore type for pushing data to external systems, such as Git, and another for pushing data back into Kratix, such as S3 or OCI Registries. All you have to do is set up Kratix to read from this Statestore. This hybrid approach leverages Git’s strengths in traceability and version control for human-driven changes, while utilising tools like S3’s scalability and throughput to handle high-volume health data reporting efficiently.

ree

When choosing which StateStores to use, it’s helpful to consider the tradeoffs. 


Git is great for version control, auditing, and collaborating with others through pull requests. It’s easy to see what changed and why, but it’s not ideal when you’re dealing with high-frequency updates. 


S3 provides speed and scale, making it a good fit for tasks such as automated health reporting. However, it doesn’t provide versioning or traceability out of the box, so you’ll need to implement that yourself. 


OCI sits in the middle: you get immutable, signed artifacts with strong supply chain guarantees, but the ecosystem for using it to store documents (as opposed to Docker images) isn’t as mature yet, and it’s not the easiest thing to inspect or debug. In most cases, mixing and matching backends based on the workload gives you the best results.


Closing the Loop: GitOps with Built-In Feedback

With Kratix, GitOps doesn’t stop at delivery. It includes a built-in feedback loop. You commit the desired state, your system applies it, and health status is pushed back, whether through Git, S3, OCI, or direct API access. 


All health data is stored in Kubernetes as HealthRecord resources, making it easy to integrate with tools like Grafana, Prometheus, or Datadog. Since it’s just another Kubernetes resource, you can query it with kubectl or hook it into your existing dashboards and alerts without any special setup.


This gives platform teams the visibility they need to confidently scale GitOps across environments, while keeping everything managed in one place.


To get started with Kratix’s GitOps scheduler, check out the quick start guide. For more on health checks, see the docs here: https://docs.kratix.io/ske/guides/healthchecks..

Comments


bottom of page