What and Why

Node draining is the process of Kubernetes for safely evicting pods from a node.

Kubernetes has the drain command for safely evicting all your pods from a node before you perform a maintenance on the node (e.g. kernel upgrade, hardware maintenance, etc.) or for some reason you want to move your services from one node to another without introducing downtime or some kind of disruption.

By using kubectl drain also you give the chance to the pods to be gracefully terminated and will respect PodDisruptionBudgets you have specified.

For more information about drain command and flags you can use, check here.

PodDisruptionBudget

PodDisruptionBudget (PDB) is a resource in Kubernetes that ensures a certain number or percentage of pods for a specified service will not be voluntarily evicted and suffer from frequent disruptions.

You can create a PDB for your application and limit the number of pods of a replicated service that are down simultaneously from voluntary disruptions.

Note: According to Kubernetes Docs Voluntary disruptions include both actions initiated by the application owner and those initiated by a Cluster Administrator.
e.g

deleting the deployment that manages the pod

updating a deployment's pod template causing a restart

directly deleting a pod

An example of a PDB object will look like this:

apiVersion: policy/v1beta1
kind: PodDisruptionBudget
metadata:
  name: fastify-budget
spec:
  maxUnavailable: 1
  selector:
    matchLabels:
      app: fastify_server

where we specify the name of PDB as fastify-budget and in the spec we can set either maxUnavailable or minAvailable. You can specify only one of maxUnavailable and minAvailable in a single PodDisruptionBudget. Values for those fields can be expressed as integers or as a percentage (e.g 50%).

Finally we set a selector to specify the set of pods to which the PDB applies.

Example

I am going to use minikube in multi-node clusters to show how to safely evict pods.

You can deploy it to other cloud managed kubernetes like AWS or GKE with no changes to yaml files.

In my example I have a deployment with a simple nodejs server in fastify image running in 10 pods.

fastify-server-77476f7bc4-78zkv   1/1     Running   0          45m
fastify-server-77476f7bc4-9h6bf   1/1     Running   0          45m
fastify-server-77476f7bc4-cnh9r   1/1     Running   0          45m
fastify-server-77476f7bc4-cqs2z   1/1     Running   0          45m
fastify-server-77476f7bc4-fn5nn   1/1     Running   0          45m
fastify-server-77476f7bc4-nvnkl   1/1     Running   0          45m
fastify-server-77476f7bc4-pt5xz   1/1     Running   0          45m
fastify-server-77476f7bc4-r2btz   1/1     Running   0          45m
fastify-server-77476f7bc4-r92b7   1/1     Running   0          45m
fastify-server-77476f7bc4-xstrj   1/1     Running   0          45m

We can drain a node by running

kubectl drain <node-name>

drain command has many flags like grace-period or ignore-daemonsets in order to parametrize the draining process.

With this command two things are going to happen, first the node is going to be cordoned and marked as unschedulable for new pods

multinode-m02   Ready,SchedulingDisabled   <none>                 10h   v1.20.0

and the second is that the eviction process will start but you will notice in the terminal messages looking like

error when evicting pods/"fastify-server-77476f7bc4-t9rgs" -n "node-drain" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.
error when evicting pods/"fastify-server-77476f7bc4-t4rsm" -n "node-drain" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.
error when evicting pods/"fastify-server-77476f7bc4-qf7gq" -n "node-drain" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.

meaning eviction process is respecting the PDB we have put into place for our deployment.

After a period (depending on your deployment and how much it needs to replace old pods with new) the kubectl drain command will finish and you can verify your node is empty (except some daemonset pods maybe) by running

kubectl describe node <node-name>

In the output you will see a section of Non-terminated Pods, there we can see only system pods running.

Non-terminated Pods:          (2 in total)
  Namespace                   Name                CPU Requests  CPU Limits  Memory Requests  Memory Limits  AGE
  ---------                   ----                ------------  ----------  ---------------  -------------  ---
  kube-system                 kindnet-2gjjw       100m (2%)     100m (2%)   50Mi (1%)        50Mi (1%)      10h
  kube-system                 kube-proxy-54tlm    0 (0%)        0 (0%)      0 (0%)           0 (0%)         10h

Finally you can do maintenance on your cordoned node or replace it. If you wish to put the node back usage you just need to mark it again as Schedulable by running

kubectl uncordon <node-name>

Kubernetes: Node Drain by example

Table of contents

What and Why

PodDisruptionBudget

Example

Resources