As you might know, Kubernetes has deprecated Docker as container runtime, and Docker support will be removed in next versions (currently planned for the 1.22 release in late 2021).

If you are using a managed Kubernetes cluster (like GKE, EKS, AKS) you shouldn't have a lot to handle and it should be pretty straight forward for you. But if you are managing a cluster by yourself (with kubeadm for example) and use Docker as container runtime, you will have to handle that runtime switch soon or later to keep enjoying Kubernetes updates.

The aim of this post is not to deep dive into the reasons of that change introduced by Kubernetes, or deep dive into container runtime behaviour in a Kubernetes cluster, but to step by step describe how to switch your container runtime from Docker to any runtime that implements Container Runtime Interface (CRI). If you need more details on the reasons which lead to Docker deprecation, you can read Kubernetes Blog post Don't Panic: Kubernetes and Docker

What to check in the first place

Appart from the changes linked to Kubernetes installation itself, the impacts on the workloads running in your cluster should be limited, if not non-existent. One of the only thing you have to care about is if you are using Docker-in-Docker in any of your container workload by mounting the Docker socket /var/run/docker.sock. In that case you will have to find an alternative (Kaniko for example) before switching from Docker to your new container runtime.

It's also warmly advised to backup your data before proceeding with the container runtime switch!

Let's proceed with the changes !

Ok now that you are ready to apply the container runtime switch, let's proceed with the changes. I will use containerd as container runtime in this post but the steps below can be adapted to any container runtime (like CRI-O)

We will first start by impacting all worker nodes, and then finish by the control plane.

Worker nodes

The steps below have to be applied on each worker node.

1. First we will cordon and drain the node so that no more workload will be scheduled and executed on the node during the procedure.

kubectl cordon <node_name>
kubectl drain <node_name>

Remark: if you have DaemonSets running on the node, you can use the flag --ignore-daemonsets to proceed with the drain without evicting the pods linked to your DaemonSet (which is by the way impossible with the drain command). Don't worry, these pods will be automatically restarted by kubelet at the end of the procedure with the new container runtime. If you have critical workload linked to the DaemonSets and don't want to let them run during the process, you can either specify a nodeSelector on your DaemonSet or completely uninstall and reinstall them at the end of the process.

2. Once the node is drained, stop the kubelet service:

sudo systemctl stop kubelet
sudo systemctl status kubelet

3. Uninstall Docker.
I will not detail the commands here as it depends on your Linux distribution and the way you have installed Docker. Just be carefull if you want completely clean Docker artifacts, you might have to manually remove some files (for example /var/lib/docker)

You can check Docker documentation to help you uninstalling the engine.

4. Install containerd (same here, I let you choose your favorite way to install it following containerd documentation)

5. Enable and Start containerd service

sudo systemctl enable containerd
sudo systemctl start containerd
sudo systemctl status containerd

6. Kubernetes communicates with the container runtime through the CRI plugin. Be sure this plugin is not disabled in your containerd installation by editing the config file /etc/containerd/config.toml and check the disabled_plugins list:

disabled_plugins = [""]

Then restart containerd service if needed

sudo systemctl restart containerd

7. Edit kubelet configuration file /var/lib/kubelet/kubeadm-flags.env to add the following flags to KUBELET_KUBEADM_ARGS variable (adapt container-runtime-endpoint path if needed):

--container-runtime=remote --container-runtime-endpoint=/run/containerd/containerd.sock

8. Start kubelet

sudo systemctl start kubelet

9. Check if the new runtime has been correctly taken into account on the node:

kubectl describe node <node_name>

You should see the container runtime version and name:

System Info:
  Machine ID:                 21a5dd31f86c4
  System UUID:                4227EF55-BA3BCCB57BCE
  Boot ID:                    77229747-9ea581ec6773
  Kernel Version:             3.10.0-1127.10.1.el7.x86_64
  OS Image:                   Red Hat Enterprise Linux Server 7.8 (Maipo)
  Operating System:           linux
  Architecture:               amd64
  Container Runtime Version:  containerd://1.4.3
  Kubelet Version:            v1.20.2
  Kube-Proxy Version:         v1.20.2

10. Uncordon the node to mark it as schedulable and check your pods running status

kubectl uncordon <node_name>

That's it, once all your pods have been restarted you can proceed with the next worker node !

Control Plane

The procedure to upgrade the container runtime on master nodes is exactly the same than on the worker node. However you have to be careful if you are on a single master node configuration. Indeed, while the new container runtime will pull kube-apiserver, etcd and coredns images and then create corresponding containers, the cluster will be unavailable. You shouldn't also be able to run kubectl command.

Here are some tips to help you follow the new container runtime start and troubleshoot potential problems:

1. Use journalctl to follow kubelet logs:

journalctl -u kubelet

2. As well watch containerd logs:

journalctl -u containerd

3. Use crictl command to follow container deployments

crictl --runtime-endpoint /run/containerd/containerd.sock ps

4. Check at the end of the upgrade that you are well using the new container runtime by executing a describe command on your master nodes:

kubectl describe node <master_node_name>

Congratulations! You are now running a Kubernetes cluster without Docker and are now ready to receive future releases!

How to switch container runtime in a Kubernetes cluster

What to check in the first place

Let's proceed with the changes !

Worker nodes

Control Plane