build-deploy the future, a behind the scenes look at Lagoon

Toby Bellwood - Sep 20 '22 - - Dev Community

When we announced Lagoon 2.0, we discussed how we'd reformed the architecture to better suit a distributed global platform. This unveiled the "separation" of lagoon-core and lagoon-remote, and the gradual migration of services needed for running sites on Lagoon towards the lagoon-remote.

Background

Over the last few months, we've really pushed ahead with this.

  • The Lagoon <> Harbor integration has been moved from a single core-affiliated Harbor towards a lagoon-remote being able to define and manage it's own, or a shared, Harbor install.
  • The lagoon-core auto-idler service has been deprecated in favour of running independent idlers in lagoon-remotes (using Aergia, which has the added ability to unidle)
  • A number of other features, such as backups, restores and logging configurations can now be set at the lagoon-remote level.

The primary way, though, that lagoon-core provides lagoon-remote with the instructions is via a "Lagoon Build" sent as a message to the build-deploy controller in the lagoon-remote. This build contains most of the information needed for the controller to progress a build (the rest of the information about that specific cluster is stored against the build-deploy controller).

The build itself is handled by a service image called "kubectl-build-deploy-dind" - catchy, huh? This service is essentially a collection of Bash scripts, across thousands of lines of code, that knows all the ins and outs of Lagoon - how to use the various variables, settings etc that Lagoon can control when deploying a site.

With the increased capabilities of Lagoon over the last couple of years, these files are becoming increasingly more complex to maintain, and the logic is getting harder to follow. As a team, we've also evaluated the cloud-native landscape a little more closely, and looked at what other projects, tools or specifications there may be that we could utilize. Lagoon has always prided itself in its developer focus, and the predictable relationship between production and local has been key in ensuring minimal friction in deploying workloads. The other critical thought is that the majority of the fast-release work we do is on this service, and curating a full Lagoon release is a lot of effort for a simple script change.

So what's happening?

With these considerations in mind, we've decided to take the following steps:

  1. The current "kubectl-build-deploy-dind" Bash scripts will be progressively rewritten, rearchitected and replaced by a series of Go modules, built into a new build-deploy-tool.
  2. Development of the build-deploy-tool (and the generation of the Docker image to drive the service) will be split from the main Lagoon repository, into its own repository, in order that it may have its own development lifecycle, allowing for targeted testing, faster releases and easier rollbacks.
  3. The parts of the tool that need to interact with the user repositories will utilize standards-compliant methods where possible. This means that the docker-compose.yml file will be read in by the compose-spec reference library compose-go.
  4. We will create a schema for the .lagoon.yml to allow easier parsing and error-detection.

What does this mean for users, administrators etc?

Good question! Hopefully nothing. However, as we have started on this journey, migrating the first few components (see the full list here across to go & compose-go, we've observed some things:

  • Not all "working" docker-compose.yml and .lagoon.yml files are actually valid. It's possible to make breaking errors in your file, and still have it deploy locally. Reading it in via a standards-compliant parser, however...
  • Not all users actually test their code locally before deploying to Lagoon - otherwise they would have known that their file was badly broken...
  • Even if those errors are in a section unrelated to the section being processed, the inability to parse the file will cause the build-deploy-tool to error out.

However, to counter this, we've (in Lagoon release v2.8.4) added a new feature into the existing builds that will create a user-facing warning when the build-deploy-tool encounters an error that would ordinarily cause the process to fail, and we've generated a message in the build log.

We're also working very closely with the wider team at amazee.io (as the largest installed base of Lagoons) to trawl through thousands of build logs to try and capture any of these errors to be sure that we're causing no adverse harm to running builds, and where we encounter a fatal error that we could handle, we create a test case and a workaround for it. Some are easy (such as handling the presence of local-only env files that are not pushed to Git), some are harder (like handling anchor and alias edge cases in the YAML).

Next Steps

We will also be implementing (as of Lagoon v2.10.0) the switch across to uselagoon/build-deploy-image as the main source of the builder. However - fear not, we will still be re-publishing it as the kubectl-build-deploy-dind image to match the Lagoon release. We'll build a fair amount of flexibility into Lagoon to support this - as well as allowing Lagoon admins to specify build images per remote in the API, and provide access to more cutting-edge builds outside of the Lagoon release cycle.

In the meantime, keep an eye out here and we'll keep you updated on what we're up to!

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .