For the longest time, we were developing a monolithic application at my current employer, Actano. Deployment was nothing of concern for the common software developer. After a successful build, the application instances would update without downtime. Magic … or – more accurately – it were custom shell scripts which implemented a blue green deployment.
These scripts had a few disadvantages:
Even with those problems, it did work out in the monolithic environment. However this drastically changed when the company decided to split the monolith into domain specific services.
As I was interested in the topic, I got the opportunity to be one of the developers driving the change to Kubernetes – whilst learning about it myself.
Kubernetes is a container-orchestration platform. It handles application deployments and their scaling. Resources are defined as objects in the YAML format. This declarative nature is a big win compared to the usual scripting. Instead of saying “Add one more instance of my application” (imperative style) you say “I want three instances to be running” (declarative style). Kubernetes then translates those declarations into executable commands internally.
Because everything is declarative, it is idempotent by nature. Applying the same declaration twice is always the same as doing it once. If some change in the declaration doesn’t provide the desired outcome, we can always go back by applying the former declaration. To do so easily, it is obviously advised to store those in a version control system such as Git.
After having setup Kubernetes and a repository to store the application’s Kubernetes objects, we are still left with two problems:
Enter GitOps. The term was coined initially by Weaveworks and promises to solve the above problems. Its main component is a small application which follows the Kubernetes Operator pattern. The operator is deployed in the production cluster and watches for changes in a configured git repository. This config repository serves as the source of truth for all Kubernetes resources. If the desired cluster state diverges from the actual, the operator takes necessary actions to converge against it again. Weaveworks implementation of this operator is the open source software Flux.
To be able to do continuous deployment, the operator also watches specified Docker image registries. It detects new images and auto-upgrades your configuration files to reflect new image tags.
This pull-based approach is arguably more secure than a push-based one. The CI cluster doesn’t have credentials for the production cluster (it does have write access to the image registry though). The interface between CI and CD is just the Docker image. This makes it really easy to reason about the boundaries of CI and CD.
The GitOps approach enables our teams to propose service deployments and other cluster changes via pull requests. Errors can be spotted by more experienced DevOps engineers. Unfortunately, you can only know whether a change worked after it has been applied (in this case: the PR being merged). This is due to missing automated tests for our infrastructure. In contrast to tests for application code, it is quite hard to set those up. You’d essentially need a mirror of the production cluster for every single PR.
The clear cut between CI and CD is nice and easy on a conceptional level. However, with multiple services, some tests (e.g. end-to-end tests) require those services to be available and hence need to run in a shared environment with them. This is not easy with Flux right now because environments are static. Our current approach is to continuously deploy into a staging environment and manually promote this to production. On a side note, Jenkins X claims to solve this by dynamically creating environments and automate their promotion.
Operations work doesn’t need to be scary. Kubernetes provides the necessary means to deploy and manage all kinds of applications. It’s great to see it becoming the industry standard.
The GitOps approach of describing the cluster state in a git repository is quite useful. It enables operations by pull requests and provides an easy way for rollbacks. Moreover, knowing that the current live state is always visible in a git repository makes operations transparent on an organizational level. We saw good adoption among development teams which made it easier to build cross-functional teams.