Touching a working Dockerfile can feel like playing with fire. We know that an innocent-looking change can have branching, hard-to-debug consequences. It's easy to get burned.
But change is inevitable, and while commits on Dockerfiles are easy to control, the impact of those changes on the resulting image are not. Fortunately, where there’s a need, there’s a tool.
Introducing container-diff
Available in macOS, Linux, and Windows, container-diff (like the name suggests) is diff
for container images.
The project, developed by many of the same faces behind Container Structure Tests, does a lot more than just diffing: it can analyze container images, show installed packages, and reverse-engineer the commands used to generate them.
Testing containers
Container-diff has the following test modes:
- Size: shows the total filesystem size.
- Packages: shows a list of OS-installed packages (only for Debian-based distros), as well as those installed with
pip
andnpm
. - Filesystem: shows all the files in the image and their size.
- Layer history: prints the commands that generated each of the layers in the image.
The command to analyze an image looks like this:
container-diff analyze [--type=TEST_TYPE] <IMAGE_NAME>
The tool pulls the image from the registry and unpacks the filesystem into $HOME/.container-diff/cache
. Then, the contents are scanned, and a report is printed out.
So, for instance, we can analyze a PostgreSQL image with:
$ container-diff analyze postgres:14
-----Size-----
Analysis for postgres:14:
IMAGE DIGEST SIZE
postgres sha256:3ee027aeb3c8bc4a5870b21 ... 6e27685ac1eab6d4ada 352.9M
The default test is size. Change it to --type=apt
to find out which OS-level packages are installed.
$ container-diff analyze --type=apt postgres:14
-----Apt-----
Packages found in postgres:14:
NAME VERSION SIZE
-adduser 3.118 849K
-apt 2.2.4 4.2M
-base-files 11.1 deb11u1 340K
-base-passwd 3.5.51 243K
-bash 5.1-2 b3 6.3M
-bsdutils 1:2.36.1-8 394K
-coreutils 8.32-4 b1 17.1M
...
-util-linux 2.36.1-8 4.5M
-xz-utils 5.2.5-2 612K
-zlib1g 1:1.2.11.dfsg-2 166K
Similarly, you can get a list of globally-installed packages for Node and Python with --type=node
and --type=pip
.
$ container-diff analyze --type=pip python:3.10-bullseye
-----Pip-----
Packages found in python:3.10-bullseye:
NAME VERSION SIZE INSTALLATION
-pip 21.2.4 5.1M /usr/local/lib/python3.10/site-packages
-setuptools 57.5.0 2.4M /usr/local/lib/python3.10/site-packages
-wheel 0.37.0 94.4K /usr/local/lib/python3.10/site-packages
You can see every file in the image with --type=file
, along with its size.
$ container-diff analyze --type=file postgres:14
-----File-----
Analysis for postgres:14:
FILE SIZE
/bin 5.1M
/bin/bash 1.2M
/bin/cat 42.9K
...
/var/spool 7B
/var/spool/mail 7B
/var/tmp 0
💡 Use --order
to show files ordered by size instead of alphabetically.
Finally, the history test shows the Docker layers, which roughly reflect the Dockerfile. The output of --type=history
is hard to read, so we’ll format it with sed
.
$ container-diff analyze --type=history postgres:14 | sed 's/ */ /g;s/;/\n\t/g'
-----History-----
Analysis for postgres:14:
-/bin/sh -c #(nop) ADD file:16dc2c6d1932194edec28d730b004fd6deca3d0f0e1a07bc5b8b6e8a1662f7af in /
-/bin/sh -c #(nop) CMD ["bash"]
-/bin/sh -c set -ex
if ! command -v gpg > /dev/null
then apt-get update
apt-get install -y --no-install-recommends gnupg dirmngr
rm -rf /var/lib/apt/lists/*
fi
-/bin/sh -c set -eux
groupadd -r postgres --gid=999
useradd -r -g postgres --uid=999 --home-dir=/var/lib/postgresql --shell=/bin/bash postgres
mkdir -p /var/lib/postgresql
chown -R postgres:postgres /var/lib/postgresql
...
Comparing containers
We’re only scratching the surface so far. Container-diff really shines when comparing images. The command for this is:
container-diff diff [--type=TEST_TYPE] <IMAGE1> <IMAGE2>
Let’s see some use cases for image comparison.
Use case 1: generating a changelog
Container-diff works great for generating changelogs. And, as we'll see in the next section, the output format can be customized using a template.
We can list what changed at the OS level:
$ container-diff diff --type=size --type=apt postgres:13 postgres:14
-----Apt-----
Packages found only in postgres:13:
NAME VERSION SIZE
-postgresql-13 13.5-1.pgdg110 1 46.9M
-postgresql-client-13 13.5-1.pgdg110 1 6.3M
Packages found only in postgres:14:
NAME VERSION SIZE
-postgresql-14 14.1-1.pgdg110 1 48.9M
-postgresql-client-14 14.1-1.pgdg110 1 7.1M
Version differences: None
-----Size-----
Image size difference between postgres:13 and postgres:14:
SIZE1 SIZE2
350.2M 352.9M
In the same vein, we can compare globally-installed Node packages:
$ container-diff diff --type=node node:16 node:17
-----Node-----
Packages found only in node:16: None
Packages found only in node:17: None
Version differences:
PACKAGE IMAGE1 (node:16) IMAGE2 (node:17)
-npm 8.1.0, 8M 8.1.2, 8M
Or changes in Python packages:
$ container-diff diff --type=pip python:3.6.15-buster python:3.10-bullseye
-----Pip-----
Packages found only in python:3.6.15-buster:
NAME VERSION SIZE
-argparse 1.2.1 87.1K
-mercurial 4.8.2 9.5M
-wsgiref 0.1.2 98.7K
Packages found only in python:3.10-bullseye: None
Use case 2: troubleshooting containers
Debugging a failing container is easy when we have a healthy image to use as a reference. To see all the file changes, run container-diff with --type=file
:
$ container-diff diff --type=file myapp/myservice:v1 myapp/myservice:v2
-----File-----
These entries have been added to myapp/myservice:v1:
FILE SIZE
/app/node_modules/fsevents 186.2K
/app/node_modules/fsevents/LICENSE 1.1K
/app/node_modules/fsevents/README.md 2.9K
These entries have been deleted from myapp/myservice:v1:
FILE SIZE
/app/.npm/_cacache/index-v5/ce/9f/58654f1 310B
/app/.npm/_cacache/index-v5/3d/b7/10f6556 309B
/app/.npm/_cacache/index-v5/7e/eb/c1538ff 308B
These entries have been changed between myapp/myservice:v1: and myapp/myservice:v2:
FILE SIZE1 SIZE2
/app/package-lock.json 554.6K 554.6K
/app/node_modules/.package-lock.json 297.7K 298.1K
/app/node_modules/clean-css/History.md 77.5K 77.8K
Once the problematic file is identified, you can compare the files in both containers to see what changed.
$ container-diff diff <IMAGE1> <IMAGE2> --type=file --filename=PATH/TO/FILE
Use case 3: test-driving new containers
You can run container-diff to preview the impact of your changes in a build. For instance, to quickly try out different base images or play with the Dockerfile. You can iterate until you’re sure you've got it right.
Container-diff is not limited to images in remote repositories. You can analyze any local image by prefixing its name with daemon://
.
container-diff diff --type=TEST_TYPE daemon://IMAGE_NAME:TAG daemon://IMAGE_NAME:TAG
Imagine that you’re building a container for a Ruby app and want to try upgrading from Ruby 2.7 to 3.0. As a Ruby developer, you know what to expect from the language side, but can you say the same about the container?
To answer the question, let's compare the respective Ruby images:
$ container-diff diff --type=size --type=apt ruby:2.7.4-bullseye ruby:3.0.2-bullseye
-----Apt-----
Packages found only in ruby:2.7.4-bullseye: None
Packages found only in ruby:3.0.2-bullseye: None
Version differences: None
-----History-----
Docker history lines found only in ruby:2.7.4-bullseye:
-/bin/sh -c #(nop) ENV RUBY_MAJOR=2.7
-/bin/sh -c #(nop) ENV RUBY_VERSION=2.7.4
-/bin/sh -c #(nop) ENV RUBY_DOWNLOAD_SHA256=2a80824e0ad6100826b69b9890bf55cfc4cf2b61a1e1330fccbcb30c46cef8d7
Docker history lines found only in ruby:3.0.2-bullseye:
-/bin/sh -c #(nop) ENV RUBY_MAJOR=3.0
-/bin/sh -c #(nop) ENV RUBY_VERSION=3.0.2
-/bin/sh -c #(nop) ENV RUBY_DOWNLOAD_SHA256=570e7773100f625599575f363831166d91d49a1ab97d3ab6495af44774155c40
-----Size-----
Image size difference between ruby:2.7.4-bullseye and ruby:3.0.2-bullseye:
SIZE1 SIZE2
819.2M 835.8M
Compare that with changing the OS flavor in the Node image. What happens if you want to swap out Bullseye for Bullseye Slim?
$ container-diff diff --type=size --type=apt --type=node node:17-bullseye node:17-bullseye-slim
-----Apt-----
Packages found only in node:17-bullseye:
NAME VERSION SIZE
-autoconf 2.69-14 1.8M
-automake 1:1.16.3-2 1.8M
-autotools-dev 20180224.1 nmu1 157K
...
-----Node-----
Packages found only in node:17-bullseye: None
Packages found only in node:17-bullseye-slim: None
Version differences: None
-----Size-----
Image size difference between node:17-bullseye and node:17-bullseye-slim:
SIZE1 SIZE2
942.9M 230.7M
Comparing regular Bullseye vs. Slim shows that:
Node stays the same.
Slim image is about 12 MB smaller.
The smaller image has a long list of missing packages.
This information will help you decide which is the best version for you. It makes sense to pick Slim in order to reduce the attack surface if you don’t need the extra packages.
Extending and customizing container-diff
When the default text output is not enough, we can write an output template. You can see the examples in the built-in template file.
The --format
option lets us customize how information is printed out, giving us a way to export the data to other formats, such as CSV:
$ container-diff diff python:3.9-bullseye python:3.10-bullseye --type=pip --format='
package,{{.Image1}},{{.Image2}}
{{range .Diff.InfoDiff}}{{.Package}},{{range .Info1}}{{.Version}}{{end}},{{range .Info2}}{{.Version}}{{"\n"}}{{end}}{{end}}
'
package,python:3.9-bullseye,python:3.10-bullseye
pip,21.2.4,21.2.4
setuptools,57.5.0,57.5.0
wheel,0.37.0,0.37.0
When custom formats are not enough, container-diff can be extended by writing your own differ. You'll need solid knowledge of Go for that, though.
Automated container testing with CI/CD
How does container-diff help us deploy safely? Well, if you’re doing continuous integration, you’re probably deploying several times a day, which means each new container is only a little bit different from the previous one.
Following that logic, we can assume that if too many things change at once, it may be a signal that further analysis is needed before deployment. Maybe some unexpected file snuck into the build and the image size doubled, or the base image was updated in the registry and unexpectedly shipped with different libraries.
We have to strike the right balance between stability and mutability. Every team will have different thresholds but, as a starting point, let's say that we’ll reject images that:
- Grow more than 10% in size.
- Have different OS libraries.
- Have different globally-installed Node packages.
- Were built from a different Dockerfile.
Gauging change rate between images
We can evaluate the changes by running container-diff with --json
and processing the output. The format is:
{
"Image1": "foo",
"Image2": "bar",
"DiffType": "Test_Type",
"Diff": {
// Differences Object
}
}
We can process the report with a combination of shell scripts and jq, the JSON Query CLI tool. First, run all the tests at once and save the output in a file:
$ container-diff --type=size --type=apt --type=node --type=history --json > diff.json
Then, pipe the output to jq
. You can filter the results per test by selecting DiffType
. Use the following command to see the APT changes:
$ jq '.[] | select(.DiffType=="Apt")' diff.json
You can get the total number of changed packages by appending .Diff.Packages1 + .Diff.Packages2 | length
to the query.
$ jq '.[] | select(.DiffType=="Apt") | .Diff.Packages1 + .Diff.Packages2 | length' diff.json
💡 You can try jq online at jq play.
Once we have all the jq queries ready, we can write a script that runs the differ, filters the results, and fails if the changes exceed certain thresholds.
#!/bin/bash
# Compare container and stop pipeline when changes exceed control parameters
# Parameters expected:
# $ALLOWED_APT_CHANGES - max number of allowed APT packages changed
# $ALLOWED_HISTORY_CHANGES - max number of Dockerfile commands changed
# $ALLOWED_NPM_CHANGES - max number of NPM packages changed
# $MAX_GROWTH_RATIO - percentual growth size allowed (0 is no growth, 100 is double size)
set -ex
image1=$1
image2=$2
diffile=$(mktemp XXXXXX.json)
container-diff diff \
--type=history --type=node --type=size --type=apt --json \
"$image1" \
"$image2" \
> ${diffile}
changes_apt=$(jq '.[] | select(.DiffType=="Apt") | .Diff.Packages1 + .Diff.Packages2 | length' ${diffile})
changes_history=$(jq '.[] | select(.DiffType=="History") | .Diff.Adds + .Diff.Dels | length' ${diffile})
changes_npm=$(jq '.[] | select(.DiffType=="Node") | .Diff.Packages1 + .Diff.Packages2 | length' ${diffile})
# When sizes are equal jq returns a string "null"
size1=$(jq '.[] | select(.DiffType=="Size") | .Diff[0].Size1 ' ${diffile})
if [ "$size1" = "null" ]
then
size_ratio=0
else
size_ratio=$(jq '.[] | select(.DiffType=="Size") | 100 * .Diff[0].Size2 / .Diff[0].Size1 - 100 | floor' ${diffile})
fi
# Evaluate thresholds
if [ $changes_apt -gt $ALLOWED_APT_CHANGES ] \
|| [ $changes_history -gt $ALLOWED_HISTORY_CHANGES ] \
|| [ $changes_npm -gt $ALLOWED_NPM_CHANGES ] \
|| [ $size_ratio -gt $MAX_GROWTH_RATIO ]
then
exit 1
else
echo OK
fi
Adding a change-control job to CI/CD
Where were we? Let's see, we have two images and a script to compare them. What we need now is a CI/CD pipeline that builds the image. Semaphore has the capabilities that we want for this task. If you’ve never used Semaphore before, I recommend checking out the getting started guide.
Open the workflow editor and add a block after the container image build step. Then, add the following commands in the job:
curl -LO https://storage.googleapis.com/container-diff/latest/container-diff-linux-amd64
sudo install container-diff-linux-amd64 /usr/local/bin/container-diff
echo "${DOCKER_PASSWORD}" | docker login -u "${DOCKER_USERNAME}" --password-stdin
checkout
chmod a+x container-diff-test.sh && ./container-diff-test.sh "${DOCKER_USERNAME}"/mycontainer:latest "${DOCKER_USERNAME}"/mycontainer:$SEMAPHORE_WORKFLOW_ID
This job installs container-diff in the CI machine, logs in the Docker Hub registry (you'll need to activate a secret), clones the repository, and runs the comparison script. Change the parameters in container-diff-test.sh
as needed. In this case, we're comparing the latest
image against the one tagged with the unique id $SEMAPHORE_WORKFLOW_ID
.
That’s it! You can complete the pipeline with the deployment method of your choice.
If you need inspiration for setting up a deployment, check these resources to learn how you can deploy with Semaphore:
- How To Deploy a Go Web Application with Docker
- Continuous Integration and Delivery to AWS Kubernetes
- How to Release Faster with Continuous Delivery for Google Kubernetes
- A Step-by-Step Guide to Continuous Deployment on Kubernetes
Wrapping up
Container-diff is yet another quality tool to keep containers in check. Remember, when using containers, you’re responsible for the whole mini OS that comes with them, not just the code.
Increase your Docker-fu with these posts:
- Structure Testing for Docker Containers
- CI/CD with Docker and Kubernetes.
- Beyond Docker with Earthly
- Kubernetes vs. Docker: Understanding Containers in 2021
Thank you for reading!