API's From Dev to Production - Part 2 - Image Size

Pete King - Feb 18 '21 - - Dev Community

Series Introduction

Welcome to Part 2 of this blog series that will go from the most basic example of a .net 5 webapi in C#, and the journey from development to production with a shift-left mindset. We will use Azure, Docker, GitHub, GitHub Actions for CI/C-Deployment and Infrastructure as Code using Pulumi.

In this post we will be looking at:

  • Optimising the Docker image size

TL;DR

We managed to optimise the resulting Docker image from 210MB down to 109MB; this 109MB could be even smaller if you were happy not to use AOT (ahead of time compiling) with .net - you can get this down to ~69MB. We did this by using Alpine Linux - a small, lightweight Linux distribution, using the .net 5 SDK to restore, build, publish, and then we used the minimal .net runtime dependencies image (which is only 9MB!). We took advantage of self-contained, PublishReadyToRun and PublishTrimmed.

--self-contained=true \
-p:PublishReadyToRun=true \
-p:PublishTrimmed=true
Enter fullscreen mode Exit fullscreen mode

GitHub Repository

GitHub logo peteking / Samples.WeatherForecast-Part-2

This repository is part of the blog post series, API's from Dev to Production - Part 2 on dev.to. Based on the standard .net standard Weather API sample.


Requirements

We will be picking-up where we left off in Part 1, which means you’ll need the end-result from GitHub Repo - Part 1 to start with.

We have the same requirements as last time, but to save you some time, I've put them below anyway :)

I’m using Windows 10, for other OS’s like MacOS and Linux, there will be small differences that are not covered here.

VS Code Extensions


Dockerfile optimisation

Dockerfiles works in distinct layers and each of the following commands creates a new layer.

In relation to a previous file which is below.

FROM creates a layer from the mcr.microsoft.com/dotnet/aspnet:5.0 Docker image.

COPY adds files from your source to destination directory.

RUN in our case will restore, build and publish using dotnet.

FROM mcr.microsoft.com/dotnet/aspnet:5.0 AS base
WORKDIR /app
EXPOSE 80

FROM mcr.microsoft.com/dotnet/sdk:5.0 AS build
WORKDIR /app
COPY . .

WORKDIR /app/src/Samples.WeatherForecast.Api

RUN dotnet restore "Samples.WeatherForecast.Api.csproj"

RUN dotnet build "Samples.WeatherForecast.Api.csproj" -c Release -o /app/build --no-restore

FROM build AS publish
RUN dotnet publish "Samples.WeatherForecast.Api.csproj" -c Release -o /app/publish

FROM base AS final
WORKDIR /app
COPY --from=publish /app/publish .
ENTRYPOINT ["dotnet", "Samples.WeatherForecast.Api.dll"]
Enter fullscreen mode Exit fullscreen mode

Docker build cache

When building an image, Docker steps through the instructions in your Dockerfile, executing each in the order specified. As each instruction is examined, Docker looks for an existing image in its cache that it can reuse, rather than creating a new (duplicate) image.

If you do not want to use the cache at all, you can use the --no-cache=true option on the docker build command. However, if you do let Docker use its cache, it is important to understand when it can, and cannot, find a matching image. The basic rules that Docker follows are outlined below:

  • Starting with a parent image that is already in the cache, the next instruction is compared against all child images derived from that base image to see if one of them was built using the exact same instruction. If not, the cache is invalidated.

  • In most cases, simply comparing the instruction in the Dockerfile with one of the child images is sufficient. However, certain instructions require more examination and explanation.

  • For the ADD and COPY instructions, the contents of the file(s) in the image are examined and a checksum is calculated for each file. The last-modified and last-accessed times of the file(s) are not considered in these checksums. During the cache lookup, the checksum is compared against the checksum in the existing images. If anything has changed in the file(s), such as the contents and metadata, then the cache is invalidated.

  • Aside from the ADD and COPY commands, cache checking does not look at the files in the container to determine a cache match. For example, when processing a RUN apt-get -y update command the files updated in the container are not examined to determine if a cache hit exists. In that case just the command string itself is used to find a match.

Once the cache is invalidated, all subsequent Dockerfile commands generate new images and the cache is not used.

It’s preferable to use COPY instead of ADD.

This is simply because the term COPY is more transparent in terms of what it is doing, whereas ADD supports features like local-only tar extraction and remote URL support that are not immediately obvious.

See references [1]


How can we optimise?

For starters, we can use a different base image, a different OS (Operating System), one of the smallest is Alpine Linux. Alpine is great, it’s small, is not bloated with anything extra, even curl isn’t installed!

With .net, the question becomes - How do I know what’s available?

Well, DockerHub is your best friend, it has everything you need to find out about the image that companies and individuals have published.

Let’s take a look at .net 5 from Microsoft.

You can see there are over 1 billion downloads overall!

DockerHub - DotNet


The build

What do we actually need to build? When you think about it, you need the SDK (Software Development Kit) in order to build your projects.

If you navigate to ‘dotnet/sdk’ you’ll see a page with all the relevant information you need about the SDK.

From there you can see the various versions (tags) are available, we want Alpine. There are various tags there, but they all equate to the same Dockerfile; therefore, the same image. For us, using the tag, ‘5.0-alpine’ is enough.

DockerHub - Linux Tags

Always specify a version

It’s best practice to pin a version, it is then a known thing - You don’t want to be caught out by using ‘latest’ tag. i.e. if you build later on, and the version with the ‘latest’ tag has changed, but you haven’t tested it, you may find some issues.


Rewrite the Dockerfile

Let’s start by rewriting our Dockerfile, don’t worry, it will still only be about 20 lines of code.

ARG VERSION=5.0-alpine

FROM mcr.microsoft.com/dotnet/sdk:${VERSION} AS build
WORKDIR /app
Enter fullscreen mode Exit fullscreen mode

Here you can see a couple of things, we are using ARG to hold 5.0-alpine and we use that variable in the FROM statement with the dollar sign and curly braces ${ }

If you’re wondering what this FROM is and why we have more than one (even in our generated Dockerfile from Part 1), this is known as a multi-stage build - We essentially take advantage of different base images to do certain things, and use another final image to ensure it is the most optimal it can be. For more information about multi-stage builds, please see, Docker Docs - Multistage Build


DotNet restore

Next up is dotnet restore we want to restore because if that fails, there is certainly no point in going any further with the build. We will also optimise for the runtime too, because we are targeting x64 Linux, we can specify the -r or -runtime for more information about the options available for dotnet restore please see, Microsoft Docss - dotnet restore

Our Dockerfile now looks like:

# Copy and restore as distinct layers
COPY . .
WORKDIR /app/src/Samples.WeatherForecast.Api
RUN dotnet restore Samples.WeatherForecast.Api.csproj -r linux-musl-x64
Enter fullscreen mode Exit fullscreen mode

Here you can see a few more things, one the COPY command and two, the WORKDIR command. We copy everything from the source to our destination, hence the period . for the source and the period . for the destination. Next, the WORKDIR changes the current directory.


DotNet build (publish)

Now we move on to the build, or rather publish, we do this because publish will also build as well, but what we don’t want it to do is restore, so we use the option of --no-restore. I feel it is down to personal preference, you may want to do a dotnet build before a dotnet publish and then in addition specify --no-build, but I’ll leave it up to you to decide. Here, I’m going to go straight to publish.

FROM build AS publish
RUN dotnet publish \
    -c Release \
    -o /out \
    -r linux-musl-x64 \
    --self-contained=true \
    --no-restore \
    -p:PublishReadyToRun=true \
    -p:PublishTrimmed=true
Enter fullscreen mode Exit fullscreen mode

There are a few more dotnet publish options happening here.

-c defines the build configuration, in our case Release.

-o specifies the path for the output directory.

-r or -runtime specifies the given runtime, just like in dotnet restore

Please see the RID catalogue for more information. https://docs.microsoft.com/en-us/dotnet/core/rid-catalog

--self-contained publishes the .net runtime with the application so the runtime doesn't need to be installed in the image.

--no-restore doesn't execute an implicit restore when running the command.

-p:PublishReadyToRun=true compiles the apps assemblies as ReadyToRun (R2R) format. R2R is a form of ahead-of-time (AOT) compilation.

This will make our start-up time a great deal faster, but it comes at the cost of size.

-p:PublishTrimmed=true trims unused libraries to reduce the deployment size of an app when publishing a self-contained executable.

Take great care in using this option
For more information, please see, Microsoft Docs - dotnet trim-self-contained

What is linux-musl-x64 ?

Lightweight distributions using musl like Alpine Linux

See references [2]


The final stage

With the new commands we have recently used, we can use the FROM to indicate a different base image. This is the most important part to take note around Docker multi-stage builds.

We don’t want the SDK in the final runtime image, for one reason, it’s too large, but most importantly we don’t actually even need it!

We have compiled AOT (Ahead-of-time) and including the .net runtime as part of our app.

We go to Docker Hub again and we can find the runtime dependency images. It’s name is mcr.microsoft.com/dotnet/runtime-deps and it has a bunch of tags like we’ve seen previously. So we’ll attach ${VERSION} to it.

For more information, please see, Microsoft Docs - dotnet Runtime Dependencies

# Final stage/image
FROM mcr.microsoft.com/dotnet/runtime-deps:${VERSION}
WORKDIR /app
COPY --from=publish /out .
Enter fullscreen mode Exit fullscreen mode

We set the WORKDIR again and COPY the output from publish, and if you noted the directory with the -o option, this is where we specified the output from the dotnet publish command.

Finally, we can expose the port and specify an entry point.

EXPOSE 8080
ENTRYPOINT ["./Samples.WeatherForecast.Api"]
Enter fullscreen mode Exit fullscreen mode

However, you may have clocked it, but if you haven’t, our ENTRYPOINT looks a little different compared to what it was in Part 1. This is because we are taking advantage of -p:PublishReadyToRun=true.


Full Dockerfile

ARG VERSION=5.0-alpine

FROM mcr.microsoft.com/dotnet/sdk:${VERSION} AS build
WORKDIR /app

# Copy and restore as distinct layers
COPY . .
WORKDIR /app/src/Samples.WeatherForecast.Api
RUN dotnet restore Samples.WeatherForecast.Api.csproj -r linux-musl-x64

FROM build AS publish
RUN dotnet publish \
    -c Release \
    -o /out \
    -r linux-musl-x64 \
    --self-contained=true \
    --no-restore \
    -p:PublishReadyToRun=true \
    -p:PublishTrimmed=true

# Final stage/image
FROM mcr.microsoft.com/dotnet/runtime-deps:${VERSION}
WORKDIR /app
COPY --from=publish /out .

EXPOSE 8080
ENTRYPOINT ["./Samples.WeatherForecast.Api"]
Enter fullscreen mode Exit fullscreen mode

Let’s build it

Navigate to your repo root directory and execute the docker build command.

docker build -t samples.weatherforecast:v2 .

We have changed the tag so we can compare from Part 1.

Now, execute docker image ls to see your images.
Console - docker image ls

Notice the size different, we’ve been able to optimise the size from 210MB down to 109MB!; and this includes AOT.

With AOT, you do pay a size price, you need to determine if this size is worth paying for in terms of execution speed, or you’d rather an even smaller image size and the JIT compile.

If you believe the image size is more important to you, simply remove the option and adjust your ENTRYPOINT.

You end image size will be about 69MB!

Image size is not everything and does not equate to an increase in security hardening - However, it does help, the less bloat a base image has, i.e. less tooling etc. including in an image, the slightly less likely a hacker could take advantage of any vulnerabilities of those.

You’ll also notice the TAG, and hopefully seeing this will really hit home about what it really is, and the concept of using latest can be misleading...

You can see latest is 25 hours old, whereas v2 is about 1 hour, the tag can be anything, just because I’ve used the term latest, it doesn’t actually mean anything. In our case, the latest version is really the image with the v2 tag.

It comes down to good practice around the image tag - Using latest should point to the latest production image, however, using latest is no longer good practice.


Let’s test it

We should test the image to make sure it serves requests as we expect.

docker run -it --rm -p 8080:80 samples-weatherforecast:v2

Don’t forget the tag change to v2.

Let’s use Postman again... It works!
Postman - GET request


What have we learned?

We have learned about multi-stage builds in Docker, how the build cache works, and how to really optimise your final image. There are many different base images supporting different OS’s, Alpine, Buster, Ubuntu and more. This give engineers the ultimate flexibility to chooser what is right for their solution. For this example, we’ve opted for something very minimal and lightweight.

If you're new to Docker, I recommend that you spend further time to understand more docker commands that are available in the Dockerfile.

Up next

Part 3 in this series will be about:

  • GitHub Actions - Build the docker image

  • GitHub Actions - Publish the docker image to the GitHub Container Registry


More information


References

[1] - https://docs.docker.com/develop/develop-images/dockerfile_best-practices/

[2] - https://docs.microsoft.com/en-us/dotnet/core/tools/dotnet-publish

. . . . . . . . . . . . . . . .