Taking Backup of your Kubernetes etcd Data: A step-by-step guide

omkar kulkarni - Nov 1 '23 - - Dev Community

This article was originally posted on Everything DevOps.

In the ever-evolving landscape of container orchestration, Kubernetes (K8s) has emerged as the gold standard for managing and scaling containerized applications. At the heart of every K8s cluster lies a critical component known as etcd. etcd is a distributed key-value store that stores and manages all of the K8s' configuration data, ensuring the system's reliability and consistency.

While K8s provides a robust platform for deploying and managing applications, the need to safeguard the etcd data cannot be overstated. This is where the importance of taking regular backups comes into play.

In this article, we'll dive into the essential part of etcd backup in Kubernetes, understanding why it's crucial for the stability and recoverability of your cluster.

The Relationship between Kubernetes and etcd

At the core of Kubernetes, etcd — an open-source distributed key-value store that acts as Kubernetes' primary database for storing configuration data and ensuring cluster consistency.
Etcd serves as the single source of truth, storing information about the cluster's state, configuration, and secrets. Kubernetes components, including the API server, controller manager, and scheduler, rely heavily on etcd to synchronize and manage containerized workloads across the cluster.

This tight integration makes etcd indispensable in maintaining the stability and reliability of a Kubernetes cluster, underlining the need for regular backups to safeguard this vital component.

Why is it crucial to take a backup of your Kubernetes cluster?

Taking regular backups of etcd in the Kubernetes cluster is crucial for several reasons, as it ensures the reliability, recoverability, and security of your K8s cluster. Here are key points explaining why regular etcd backups are essential:

  • Data Recovery: In the event of data loss or cluster-wide failures, etcd backups serve as a lifeline to restore your K8s cluster to a previously known state. This minimizes downtime and ensures business continuity.
  • Configuration History: Etcd stores the entire configuration history of your K8s cluster. Regular backups provide a historical record of changes, enabling you to trace and understand configuration modifications and troubleshoot issues over time.
  • Rollback and Versioning: Etcd backups enable you to roll back to previous cluster configurations or versions, which is essential for testing new configurations or reverting to a stable state in case of issues with updates or changes.

Prerequisites

Before you learn how to take a backup of the etcd cluster, ensure you have the following prerequisites:

  • A Kubernetes Cluster using Kubeadm
  • An etcd server

For demo purposes, I used the Killerkoda Kubernetes playground.
To communicate with etcd, you’ll need etcdctl, a command line utility for communicating with the etcd database, as it comes with the Kubeadm cluster by default.
etcdctl supports two versions of the etcd server's API. When making server calls, it defaults to version 2 of the API. In version 2, some operations are either undefined or have different arguments.
Next, you will tell etcdctl to use the V3 API, which is required for the snapshot functionality.

Setting up **ETCDCTL_API** to VERSION 3
To make etcdctl use the V3 API; you can either set the environment variable with each call as in the following commands.

$ ETCDCTL_API=3 etcdctl snapshot save ...  
$ ETCDCTL_API=3 etcdctl snapshot restore ...
Enter fullscreen mode Exit fullscreen mode

or the entire terminal session.

$ export ETCDCTL_API=3
$ etcdctl snapshot save ...
$ etcdctl snapshot restore ...
Enter fullscreen mode Exit fullscreen mode

How to Backup your Kubernetes etcd Data

To take a backup of the etcd database, you run the following command:

$ etcdctl snapshot save
Enter fullscreen mode Exit fullscreen mode

For executing this operation, you’ll need a few flags (arguments) of certificates, which are mandatory for verification of the etcd server. This is because you must authenticate with the etcd server before it will expose its sensitive data. The authentication scheme is called Mutual TLS (mTLS).

To learn more about the flags, run:

$ etcdctl snapshot save -h
Enter fullscreen mode Exit fullscreen mode

The output of the above command should look like this:

You’ll need 4 important arguments to successfully backup etcd:

  1. --cacert
  2. --cert
  3. --key
  4. --endpoints (Optional)

Let’s look into these arguments, what they are, and why you should pass them.
1. --cacert
This provides the path to the Certificate Authority (CA). The CA certificate is used to verify the authenticity of the TLS certificate sent to etcdctl by the etcd server. The server's certificate found must be signed by the CA. Creating the CA is one of the tasks you need to do when building a cluster. Kubeadm does it automatically.
2. --cert
This is the path to the TLS certificate that etcdctl sends to the etcd server. The etcd server will verify that this certificate is also signed by the same CA. Certificates of this type contain a public key that can be used to encrypt data. The public key is used by the server to encrypt data being sent back to etcdctl during the authentication steps.
3. --key
This is the path to the private key that is used to decrypt data sent to etcdctl by the etcd server during the authentication steps. The key is only used by the etcdctl process. It is never sent to the server.
4. --endpoints (optional)
The --endpoints argument on etcdctl is used to tell it where to find the etcd server. If you are running the command on the same host where etcd service is running and there is only one instance of etcd, then you do not need to provide this argument, as it has a default value of https://127.0.0.1:2379.
If your etcd service is running on the different port you need to provide that different port number instead of 2379 - https://127.0.0.1:port
OR
If your etcd service is running on the remote host then you need to pass -
--endpoints https://host-ip:port

Where to find the values of these arguments?
As etcd is running as a pod in the Kubernetes namespace called kube-system. You can describe the same pod, and you will able to see all the arguments and their values.

$ kubectl describe -n kube-system pod etcd-controlplane
Enter fullscreen mode Exit fullscreen mode

As this contains a lot of information that we don't need right now, we can use grep command to extract only what we need.

$ kubectl describe -n kube-system pod etcd-controlplane | grep -i file 
Enter fullscreen mode Exit fullscreen mode

As you can observe here the path of these all certificates is at the location /etc/kubernetes/pki/etcd so you can find them as well from controlplane node.

The Final backup command will be:

$ ETCDCTL_API=3 etcdctl snapshot save \
      --cacert /etc/kubernetes/pki/etcd/ca.crt \
      --cert /etc/kubernetes/pki/etcd/server.crt \
      --key /etc/kubernetes/pki/etcd/server.key \
      /opt/etcd-backup.db
Enter fullscreen mode Exit fullscreen mode

/opt/etcd-backup.db is the path for storing etcd backup data.
You should see output similar to this

Restoring from a backup

Normally you will restore this to another directory, and then point the etcd service at the new location. For restores, the certificate and endpoints arguments are not required, as we are doing creating files in directories and not talking to the etcd API, so the only argument required is --data-dir to tell etcdctl where to put the restored files.

$ etcdctl snapshot restore -h
Enter fullscreen mode Exit fullscreen mode

You can pass any value as the path to the argument -- data-dir .

The final restore command will be:

$ ETCDCTL_API=3 etcdctl snapshot restore \
      --data-dir /var/lib/etcd-from-backup \
      /opt/etcd-backup.db
Enter fullscreen mode Exit fullscreen mode

The above command will output the following:

Conclusion

This article described how you can take a backup of etcd in the Kubernetes cluster and restore it safely to avoid data loss and cluster-wide failures.
There is much more to learn about Kubernetes and etcd. Check out the following resources to explore more:

. . . . . . . . . . . . . . . . . . .