I wrote a reusable GitHub Action workflow for one of my pet projects, rickroller. The workflow itself is very small and uses mostly well-known Actions, but encompasses a lot of complex? ideas and logic.
So let's go together through all the steps required to create a strong, reusable workflow to build state-of-the-art, multi-architecture images with GitHub Actions.
As rickroller is a pretext to play with Google Cloud Run, GitHub actions and to try Open Source best practices, you can find a lot of details in the README about GitHub Actions, but not only. Feel free to have a look and leave a ⭐ !
NOTE: I assume you have a basic knowledge of GitHub Actions and Docker.
Being in 2022, my python project is deployed using a Docker image. This image is built and (pushed to a registry) in multiple "processes":
during a PR - so I can test my container (tag like pr-4),
when there is a new push to main - so people can preview new features (pulling the image latest or main-eeffaa2),
and from a release (with tag v1.0.0).
It thus makes sense to create one reusable workflow, that can be called in different CI contexts.
Having a Mac M1 myself, it is important users with AMD or ARM processors can run rickroller. It must be available for multiple architectures.
Speed and security are also important topics: the Dockerfile should be scanned for vulnerabilities, and the workflow should avoid building the same layer over and over when it can actually just build and cache it once.
Summing it up, I want:
a reusable workflow,
that builds a multi-arch image (at least AMD64 and ARM64),
with caching to speed up the process,
running some security scan,
able to push to GitHub Registry,
with meaningful image tags (latest, pr-xxxx, main-ef34221vX.X.X) and annotations
Let's get started !
The basics: a reusable workflow
GitHub introduced reusable workflows and composite actions about a year ago as an attempt to enhance reusability and DRY-ness (Don't Repeat Yourself).
In short, composite actions let you create actions that are calling other actions. They are still actions, that need to be called from within a series of steps.
A reusable workflow - or callable workflow is instead a complete job (with checkout, etc), that runs in a specific runner. It is less flexible, as you cannot add a step before or after it (you can, however, define another job that generates its inputs or depends on its output).
The documentation is very well written, so I won't go too much into the details. In short, to create a reusable workflow, you simply set its triggers to on: workflow_call. You can then define inputs, secrets, etc. that can be later referenced using ${{ inputs.some_input }} or ${{ secrets.some_secret }}:
name:Reusable Workflow Exampleon:workflow_call:inputs:some_input:description:Just an example inputtype:booleanrequired:falsedefault:falsesecrets:#...jobs:my_job:name:Do Somethingruns-on:ubuntu-lateststeps:-uses:actions/checkout@v3# ...
For my reusable workflow, I have 2 inputs:
publish (boolean): whether to publish the image (to GitHub Registry),
version (string): the version being released (empty if not called from a release workflow).
The steps
The GitHub community has all the Actions I need already made up to perform the painful tasks of setting up the environment, building, and pushing. The only difficulty is finding out which ones to use, and how to configure them.
Extracting Labels and Tags
No need to manually and painfully determine the proper labels and tags, as an Action is already available for this task: docker/metadata-action - v3 at the time of writing.
Labels
Docker image labels are a way to attach (any) key-value metadata to an image. These metadata are not available to the running container but are rather used to share information like where the source code for the image resides, who supports it, or what CI build/codebase version generated it.
It is common to use the OCI - Open Container Initiative - set of standardized labels, all prefixed with org.opencontainers.image.*. The most common being .source, .version, and .revision.
The docker/metadata-action automatically extracts OCI Labels based on the commit and the repository's metadata. Here is what it computes for my rickroller repository:
{"org.opencontainers.image.created":"2022-04-15T06:13:02.269Z","org.opencontainers.image.description":"A simple Python app to test GitHub CI","org.opencontainers.image.licenses":"","org.opencontainers.image.revision":"83592e8567ee6bfb01caccae5e791ab38b5ee7e0","org.opencontainers.image.source":"https://github.com/derlin/rickroller","org.opencontainers.image.title":"rickroller","org.opencontainers.image.url":"https://github.com/derlin/rickroller","org.opencontainers.image.version":"main-83592e8"}
Tags
Before building, we need to determine the ID(s) of the images, for example, ghcr.io/derlin/rickroller:latest. There are three parts: ghcr.io, which matches the name of the registry you push the image to, derlin/rickroller, the user+ID of your project (usually static), and finally the tag or version, in this case, latest.
Ideally, tags should be meaningful. Images from PRs should be versioned following e.g. the format pr-{id}, images from branch main something like main-{sha (short)}, and releases vX.X.X. The version latest is known as a moving tag, as it points to a different image (usually from the latest successful build on the main branch) as time goes on.
That's a lot of combinations and ifs... Fortunately, docker/metadata-action can do everything, provided the right set of inputs.
Let's break it down. First, I give an id to the step, so I can refer to its output later in the workflow.
Next, I provide a list of image names, without the tag. Since I only push to GitHub Registry (ghcr.io), I can use ghcr.io/${{ github.repository }}, which will be resolved as ghcr.io/derlin/rickroller.
The magic is in the tags input. The Action supports a list of "conditions" that will be evaluated to generate the right tags depending on things like branches, triggers, etc.
In this specific case, the first lines starting with type=semver are used when a version is passed to the workflow (e.g. from a release). The version input must have the form X.Y.Z - or more accurately {major}.{minor}.{patch} (semantic versioning). With those 3 lines, the Action returns the Docker tags X.Y.Z, X.Y, and X when the value= isn't evaluated to an empty string (i.e. when the version input is provided).
Next, there are the type=ref,event={event} lines to conditionally get tags based on the trigger of the workflow (event=). If this is from a pr, the Action generates the tag pr-{number}. If it is from a branch, it generates {branch_name}. Since I want to have the branch name suffixed with a unique id, I also specify suffix=-{{ sha }}. The {{ sha }} will be evaluated by the Action to the short SHA of the git commit (git rev-parse --short), and appended to the branch name (for example main-eeffaa3).
Finally, to have the latest tag set only for the main branch and a release, I disable the default "latest flavor" (which always adds the tag latest) and provide instead the expression:
This line says "use the raw value "latest", but only if the condition enable= evaluates to true".
As you can see, it is quite powerful!
Output
The metadata Action returns the metadata and tags in multiple formats, but what is important is that I can later reference them using:
# meta is the id we gave to the metadata action
${{steps.meta.outputs.tags}}${{steps.meta.outputs.labels}}
Scanning the Dockerfile for vulnerabilities
In production settings, the full Docker images should be scanned with state-of-the-art tools such as sysdig, Docker scan, snyk, or the like. Those tools scan the built image itself and are thus able to detect vulnerabilities not only in your Dockerfile, but in all its above layers, as well as in the softwares your Dockerfile brings in.
There are plenty to choose from, but none of them is completely free. You always need an account, a specific setup, and are often limited in the number of scans or repositories.
So in this workflow, I chose to limit myself to a "linter", checkov. Feel free to augment this example with a real scan!
Checkovsupports the evaluation of policies on your Dockerfile files. [...] it will validate if the file is compliant with Docker best practices such as not using root user, making sure health check exists and not exposing SSH port.
The full list of Dockerfile policies its checks can be found here.
-name:Lint Dockerfile using Checkovid:checkovuses:bridgecrewio/checkov-action@masterwith:directory:.framework:dockerfile# only ask for dockerfile scansquiet:true# show only failed checkscontainer_user:1000# UID to run the container under# to prevent permission issues
Checkov will parse the Dockerfile, and report the results:
Passed checks: 9, Failed checks: 0, Skipped checks: 0
...
Check: CKV_DOCKER_3: "Ensure that a user for the container has been created"
PASSED for resource: Dockerfile.USER
File: Dockerfile:49-49
Guide: https://docs.bridgecrew.io/docs/ensure-that-a-user-for-the-container-has-been-created
...
If you have python installed, it is possible to run the same check locally:
To be able to push to a registry, I need to log in first. By default, the workflow inherits the GITHUB_TOKEN, so logging to ghcr.io is as easy as calling:
-name:Login to Container Registryuses:docker/login-action@v2with:registry:ghcr.iousername:${{ github.actor }}password:${{ secrets.GITHUB_TOKEN }}
Setting up QEMU and buildx
QEMU lets you "run operating systems for any machine, on any supported architecture", while buildx is "a Docker CLI plugin for extended build capabilities with BuildKit". Both are necessary to build Docker images targeting another platform/architecture.
Setting them up is again available through simple Actions:
-name:Set up QEMUuses:docker/setup-qemu-action@v2-name:Set up Docker Buildxuses:docker/setup-buildx-action@v2
We finally have all the bricks in place to build and push a multi-arch Docker image using the docker/build-push-action:
-name:Build and push Docker imageuses:docker/build-push-action@v3with:# the Dockerfile is at the root of the workspacecontext:.# Build for AMD and ARM (requires buildx+qemu)platforms:linux/amd64,linux/arm64# only push when requestedpush:${{ inputs.publish }}# pass the output of the metadata actiontags:${{ steps.meta.outputs.tags }}labels:${{ steps.meta.outputs.labels }}# use layer caching# The mode=max is to also cache the builder image# (vs only the final image - mode: min)cache-from:type=ghacache-to:type=gha,mode=max
(for cache-from and cache-to, keep reading !)
Layer caching
The only thing I haven't talked about is Docker layer caching (DLC), which is a great feature when building Docker images as a regular part of the CI process.
The idea is to cache the individual layers of Docker images built in CI jobs, and then reuse unchanged image layers
on subsequent runs, rather than rebuilding the entire image from scratch every time.
This caching mechanism is a given when building Docker images locally (see Docker's documentation - leverage build cache). However, in CI, a new runner is started each time, so the cache is always empty.
The build-push-action from Docker supports multiple types of caches. In this workflow (see precedent section), I use the GitHub cache (gha). It is rather straightforward to turn on. Simply set the cache-from and cache-to parameters:
One important detail is the mode=max, which instructs the Action to cache all layers, and not only the ones from the final image. It is very important if the Dockerfile is using multi-stage builds: without it, the layers from the builder image are ignored.
At the time of writing, GHA is limited to 10G, and isn't shared between branches.
Quick reminder: if a top layer changes, all subsequent layers will change as well, so devise your Dockerfiles wisely!
Calling the workflow
To call this reusable workflow from the same repository, I simply use:
name:...jobs:# ... other jobs ? ...docker:uses:./.github/workflows/reusable_docker-build-and-push.yamlwith:publish:true# ... other inputs ...
it is also possible to call it from another repository, in which case the repository path must be provided:
RickRoll your friends like a pro! Just enter a webpage URL and let the magic happen. Every click on the page will send you to the one and only Rick Astley's iconic hit.
RickRoller
RickRoller is a dumb (yet funny) project mostly used as a pretext to play with Google Cloud Run
GitHub Actions and to try Open Source best practices. Keep reading to know more about what I learned.
The demo runs on Google Cloud Run and may take a moment to start... Be patient :).
I deployed it initially on Divio (which is awesome, check it out), but some
people used it to rickroll their friends with websites that were detected by AWS to be scams
and AWS forced me to take it down.
Simply take a webpage, paste its URL into the box, and BAM! The same webpage will be displayed,
but every click will redirect you to the famous Rick Astley video,
never gonna give you up.
As a bonus, I wanted to be able to also push to Docker Hub, but only in specific situations (e.g. from a release, but not from a PR build) and with a different set of tags (no main-{sha}).
This conditionality was a bummer, as it is not supported by default. After playing around a bit, I designed a hack to implement this "if" without duplicating the whole workflow.