Declarative operations with Kubernetes and GitOps

02 Jun 2019

For the longest time, we were developing a monolithic application at my current employer, Actano. Deployment was nothing of concern for the common software developer. After a successful build, the application instances would update without downtime. Magic … or – more accurately – it were custom shell scripts which implemented a blue green deployment.

These scripts had a few disadvantages:

Confidence: Due do their imperative and non-idempotent nature, the development team was afraid to change anything. The risk of breaking something with no easy way back was too high.
Security: The CI Job would SSH into the production servers and execute commands remotely.
Ownership: The scripts were originally authored by the CTO and another developer. The team wouldn’t know how to handle potential failures. Most developers didn’t even bother looking at it.

Even with those problems, it did work out in the monolithic environment. However this drastically changed when the company decided to split the monolith into domain specific services.

Kubernetes

As I was interested in the topic, I got the opportunity to be one of the developers driving the change to Kubernetes – whilst learning about it myself.

Kubernetes is a container-orchestration platform. It handles application deployments and their scaling. Resources are defined as objects in the YAML format. This declarative nature is a big win compared to the usual scripting. Instead of saying “Add one more instance of my application” (imperative style) you say “I want three instances to be running” (declarative style). Kubernetes then translates those declarations into executable commands internally.

Because everything is declarative, it is idempotent by nature. Applying the same declaration twice is always the same as doing it once. If some change in the declaration doesn’t provide the desired outcome, we can always go back by applying the former declaration. To do so easily, it is obviously advised to store those in a version control system such as Git.

After having setup Kubernetes and a repository to store the application’s Kubernetes objects, we are still left with two problems:

How do we setup continuous deployment? For security reasons we generally try to avoid setups where the build server has credentials to access the production cluster (see point 2 above).
How do we ensure that the repository’s state is on par with the cluster’s state? Some kind of automation would be nice.

GitOps

Enter GitOps. The term was coined initially by Weaveworks and promises to solve the above problems. Its main component is a small application which follows the Kubernetes Operator pattern. The operator is deployed in the production cluster and watches for changes in a configured git repository. This config repository serves as the source of truth for all Kubernetes resources. If the desired cluster state diverges from the actual, the operator takes necessary actions to converge against it again. Weaveworks implementation of this operator is the open source software Flux.

To be able to do continuous deployment, the operator also watches specified Docker image registries. It detects new images and auto-upgrades your configuration files to reflect new image tags.

Deployment pipeline with GitOps

Image taken from the weave flux repository

This pull-based approach is arguably more secure than a push-based one. The CI cluster doesn’t have credentials for the production cluster (it does have write access to the image registry though). The interface between CI and CD is just the Docker image. This makes it really easy to reason about the boundaries of CI and CD.

Learnings

The GitOps approach enables our teams to propose service deployments and other cluster changes via pull requests. Errors can be spotted by more experienced DevOps engineers. Unfortunately, you can only know whether a change worked after it has been applied (in this case: the PR being merged). This is due to missing automated tests for our infrastructure. In contrast to tests for application code, it is quite hard to set those up. You’d essentially need a mirror of the production cluster for every single PR.

The clear cut between CI and CD is nice and easy on a conceptional level. However, with multiple services, some tests (e.g. end-to-end tests) require those services to be available and hence need to run in a shared environment with them. This is not easy with Flux right now because environments are static. Our current approach is to continuously deploy into a staging environment and manually promote this to production. On a side note, Jenkins X claims to solve this by dynamically creating environments and automate their promotion.

Conclusion

Operations work doesn’t need to be scary. Kubernetes provides the necessary means to deploy and manage all kinds of applications. It’s great to see it becoming the industry standard.

The GitOps approach of describing the cluster state in a git repository is quite useful. It enables operations by pull requests and provides an easy way for rollbacks. Moreover, knowing that the current live state is always visible in a git repository makes operations transparent on an organizational level. We saw good adoption among development teams which made it easier to build cross-functional teams.

Written by Hendrik Purmann

Berlin-based software engineer. Currently interested in cross functional teams, Microservices, GoLang, Kubernetes and the DevOps movement.