In this step-by-step tutorial, learn how to run MySQL, PostgreSQL, MongoDB, and other stateful applications on Kubernetes.
Even though almost no one questions using Kubernetes (K8s) to manage container applications today, many engineers (including me) remain very skeptical about running databases on Kubernetes. Because databases are typically stateful applications, they require persistent data storage and consistency, and Kubernetes built its reputation on stateless applications. Therefore, to run databases on Kubernetes, you must ensure it can provide persistent storage, backup and restore, and high availability and failover.
In this tutorial, I’ll use the example of creating and running a MySQL database on Kubernetes to demonstrate how to manage stateful applications in Kubernetes. I will dive into key concepts such as StatefulSets, PersistentVolumes (PVs), PersistentVolumeClaims (PVCs) and StorageClasses. I’ll assume that you already have an understanding of both databases and Kubernetes.
Before I begin, it is vital to understand the difference between a stateless and a stateful application. Stateless applications do not keep data between requests; each request processes data individually with no concern about sharing the data. Stateful applications do keep data between requests and share it across sessions or pods. Workloads like databases need the data to be persistent.
Key Concepts for Running Databases on Kubernetes
Running databases such as MySQL, PostgreSQL, and MongoDB on Kubernetes requires careful planning around persistent storage, stable network identities, and scaling strategies. The following details need to be considered when running a database in Kubernetes.
Database Storage
Each database pod needs its own PV to ensure that the data is persistent. This means that even if the pod is deleted or restarted, the data still remains intact. Each database pod is assigned a dedicated PVC and PV.
Scaling Databases
When scaling databases, it is very important to ensure data consistency. StatefulSets supports running a leader-follower database architecture (primary-secondary), or a primary, read-only replica database, like PostgreSQL or MySQL. The primary database handles updates or writes, while the secondary database replicates or synchronizes, ensuring both consistency and redundancy.
Data Consistency and Backups
It is crucial to have a strategy to ensure data consistency across all database replicas and validate the integrity of the data. Regular backups and disaster recovery plans should be incorporated into your Kubernetes workflows. This must include routine (weekly or monthly) disaster recovery tests to validate the integrity of the database backup.
StatefulSets
A StatefulSet is a Kubernetes resource designed for managing stateful applications such as databases. It ensures that pods possess persistent storage and that data remains intact even when the pods get restarted. Key features of StatefulSets include:
- Persistent storage: StatefulSets utilize PVs, which ensure that each pod has dedicated, stable storage that remains intact even after a pod restarts.
- Stable network identifiers: Every individual pod in a StatefulSet receives a unique and consistent name, which remains unchanged even after deployment; for example: mypod-0 , mypod-1 , mypod-2.
Tutorial: Create a Database on Kubernetes
To create a StatefulSet application (such as a database) on Kubernetes, follow this step-by-step guide.
Step 1: Create a StorageClass (if You Don’t Have One)
A StorageClass in Kubernetes is similar in concept to a profile, as it contains the details of an object. The storage class defines the storage type (either gp2 or gp3) and the parameter for your PV. You can specify a default storage class for dynamic volume provisioning and for any PVC that does not include a specific storage class.
Here is an example of a storage class created for Amazon EKS.
Create a new file called storage-class.yaml and copy this code into the file.
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: standard
provisioner: kubernetes.io/aws-ebs # Use the correct provisioner for your cloud provider (AWS, GCP, Azure, etc.)
parameters:
type: gp3
reclaimPolicy: Retain
Create the storage class by running:
kubectl apply -f storage-class.yaml
Step 2: Create a PersistentVolume (PV)
apiVersion: v1
kind: PersistentVolume
metadata:
name: mysql-pv
spec:
capacity:
storage: 10Gi
accessModes:
- ReadWriteOnce
persistentVolumeReclaimPolicy: Retain
storageClassName: standard
hostPath:
path: /mnt/data # Specify a path in the host for storage
A PV is storage allocated in your Kubernetes cluster. If dynamic provisioning is enabled, Kubernetes will create a PV automatically. Otherwise, you can create one manually.
Step 3: Create a Persistent Volume Claim (PVC)
A PVC serves as an interface between your application and requested storage. A PVC allows your application to request storage from the available PV.
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: mysql-pvc
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
storageClassName: standard
Step 4: Deploy a MySQL StatefulSet
This code snippet creates a StatefulSet for MySQL This ensures each MySQL pod (instance) gets its own unique identifier, persistent storage and stable network identity. Please note: you can parse your password from a separate file or vault. But not in a clear text.
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: mysql
spec:
serviceName: "mysql"
replicas: 3
selector:
matchLabels:
app: mysql
template:
metadata:
labels:
app: mysql
spec:
containers:
- name: mysql
image: mysql:5.7
ports:
- containerPort: 3306
name: mysql
env:
- name: MYSQL_ROOT_PASSWORD
value: "your_password"
volumeMounts:
- name: mysql-storage
mountPath: /var/lib/mysql
volumeClaimTemplates:
- metadata:
name: mysql-storage
spec:
accessModes: [ "ReadWriteOnce" ]
resources:
requests:
storage: 10Gi
storageClassName: standard
Step 5: Create a Headless Service for MySQL
Create a MySQL StatefulSets headless service to enable the pods to communicate with each other in the Kubernetes cluster. The headless service in the example below is named mysql. The MySQL pods will be accessible within the cluster by using the name .mysql from within any pod in the same Kubernetes namespace and cluster.
# Headless service
apiVersion: v1
kind: Service
metadata:
name: mysql
labels:
app: mysql
spec:
ports:
- name: mysql
port: 3306
selector:
app: mysql
Step 6: Pipe MySQL Logs to Monitoring Tools
Monitoring MySQL is very important in identifying the database performance, bottlenecks, and errors and ensuring database health. The logs from the MySQL StatefulSets can be routed to monitoring tools such as Datadog, Grafana, Prometheus and ElasticSearch (the ELK Stack) to get full visibility into the performance and heath of the database.
You need to configure MySQL to pipe logs to your monitoring tools. Commonly monitored logs include:
- Slow query logs identify slow-running logs.
- Error logs track errors and warnings.
- General query logs track all MySQL queries.
Step 7: Perform Regular Backups and Routine Restore
It is very important to perform regular backups to ensure the availability of your Kubernetes workloads and routine restore to validate the integrity of the database.
Velero is an open-source tool designed to safely back up and restore resources on Kubernetes clusters and PVs. It is an excellent solution for ensuring that your applications or databases do not experience any data loss. Velero offers essential functionalities such as Kubernetes cluster backup, restore, disaster recovery and scheduled backups. For more information, check out Velero’s documentation.
Step 8: Configure Database Alerts
In a Kubernetes environment where databases and other StatefulSet applications run, it is crucial to set up alert notifications to continuously monitor and avoid performance degradation, service disruption, downtime or data corruption.
Monitoring tools such as Datadog, Nagios, Prometheus and Grafana can be used to monitor and check database health. They can be integrated with alert notification platforms such as Slack and PagerDuty, so an engineer will receive a notification (often a phone call) whenever there is a degradation in service or another issue with the database.
Conclusion
Running databases in Kubernetes creates unique challenges, including state management, persistent storage and network stability. Administrators can now comfortably manage database workloads in Kubernetes ensuring database integrity and availability by leveraging Kubernetes tools like PersistentVolumes, StorageClasses, StatefulSets and PersistentVolumeClaims.
As Kubernetes continues to evolve, its support for StatefulSets will increase, making running databases in Kubernetes a powerful solution for modern infrastructures.
This article was first published on https://thenewstack.io/how-to-run-databases-on-kubernetes-an-8-step-guide/