K8ssandra: easy Cassandra management on Kubernetes
📚 Introduction:
In this part of my data on Kubernetes series, I will look at how K8ssandra makes it easy to use Apache Cassandra on Kubernetes.
K8ssandra is a tool that helps you set up and manage Cassandra in a Kubernetes environment. It includes everything you need, like automated operations, monitoring, and backup solutions. This makes it simpler to handle Cassandra clusters.
Apache Cassandra overview:
Apache Cassandra is a highly scalable, distributed NoSQL database designed to handle large amounts of data across many servers without any single point of failure.
It was originally developed at Facebook to power their inbox search feature and later became an open-source project under the Apache Software Foundation.
Key features and benefits:
- Scalability: Cassandra is designed to scale horizontally by adding more nodes to the cluster. This allows it to handle more data and more requests without any downtime.
- High availability: With its peer-to-peer architecture, Cassandra ensures that there is no single point of failure. Data is replicated across multiple nodes, ensuring that it remains available even if some nodes fail.
- Fault tolerance: Cassandra's replication strategy ensures that data is copied to multiple nodes. If one node goes down, another node can take over, ensuring continuous availability.
- Flexible data model: Cassandra uses a wide-column store model, which allows for dynamic and flexible schema design. This is particularly useful for applications that require high write throughput.
- Tunable consistency: Cassandra offers tunable consistency levels, allowing you to balance between consistency and availability based on your application's needs.
- High performance: Designed for high write and read throughput, Cassandra can handle large volumes of data with low latency.
Use cases:
- Time-Series data: Ideal for applications that need to store and query time-series data, such as IoT sensors and log data.
- Real-Time analytics: Used in applications that require real-time data processing and analytics, such as recommendation engines and fraud detection.
- Messaging systems: Suitable for high-throughput messaging systems where low latency is crucial.
- E-commerce: Powers e-commerce platforms that need to handle large volumes of transactions and user data.
- Social media: Supports social media applications that require high availability and scalability to manage user interactions and content.
Components:
- Nodes: The basic unit of storage in Cassandra. Each node stores a part of the data.
- Clusters: A collection of nodes that work together. Data is distributed across the nodes in a cluster.
- Keyspaces: The top-level namespace in Cassandra, similar to a database in relational databases.
- Tables: Within keyspaces, tables store data in a structured format.
- Commit log: A log of all write operations, used for crash recovery.
- SSTables: Immutable data files that store data on disk.
- Cassandra Query Language (CQL): A SQL-like language used to interact with Cassandra.
Why Choose Cassandra?
- Scalability: Cassandra's ability to scale horizontally without downtime makes it ideal for growing applications.
- High availability: Its peer-to-peer architecture ensures no single point of failure, providing continuous availability.
- Performance: Optimized for high write and read throughput, making it suitable for applications with heavy data loads.
- Flexibility: The wide-column store model allows for flexible schema design, accommodating various data types and structures.
- Community and Support: As an open-source project, Cassandra has a large and active community, providing extensive resources and support.
Some alternatives to Apache Cassandra are ScyllaDB, which is a high-performance, low-latency NoSQL database designed as a drop-in replacement for Cassandra, and Amazon Keyspaces which is a scalable, highly available, and managed Apache Cassandra–compatible database service.
K8ssandra Overview:
K8ssandra is an open-source distribution of Apache Cassandra that is optimized for Kubernetes. It includes everything you need to run Cassandra in a Kubernetes environment, such as automated operations, monitoring, and backup solutions. K8ssandra simplifies the deployment and management of Cassandra clusters on Kubernetes, making it easier to achieve scalability and high availability.
Key features and benefits:
- Kubernetes native: K8ssandra is designed to run seamlessly on Kubernetes, leveraging Kubernetes' orchestration capabilities to manage Cassandra clusters.
- Automated operations: K8ssandra includes tools for automated deployment, scaling, and maintenance of Cassandra clusters, reducing the operational overhead.
- Monitoring and management: Integrated with tools like Prometheus and Grafana, K8ssandra provides robust monitoring and management capabilities, allowing you to keep an eye on your Cassandra clusters' health and performance.
- Backup and restore: K8ssandra includes backup and restore solutions, ensuring that your data is safe and can be recovered in case of failures.
- Helm charts: K8ssandra uses Helm charts for easy installation and configuration, making it simple to deploy Cassandra clusters on Kubernetes.
- Community support: As an open-source project, K8ssandra benefits from a vibrant community that contributes to its development and provides support.
Installing K8ssandra on EKS:
K8ssandra Operator may be deployed in one of two modes. Control-Plane mode is the default method of installation. A Control-Plane
instance of K8ssandra Operator watches for the creation and changes to K8ssandraCluster
custom resources.
When Control-Plane
is active Cassandra resources may be created within the local Kubernetes cluster and / or remote Kubernetes clusters (in the case of multi-region
) deployments. When using K8ssandra Operator
you must only have one instance running in Control-Plane mode. Kubernetes clusters acting as remote regions for Cassandra deployments should be run in Data-Plane mode. In Data-Plane
mode K8ssandra Operator does not directly reconcile K8ssandraCluster resources
.
Requirements:
Before you start, ensure you have the following:
- Amazon S3 Bucket - Backups for K8ssandra are stored within an Amazon Simple Storage Service (S3) Bucket.
- AWS Identity & Access Management (IAM) Role - This role is used by the EKS worker nodes to control access to the S3 bucket used for backups. See this IAM policy as an example. Note your policy should limit requests to specific buckets and operations.
- kubectl installed and configured to access the cluster.
- Helm installed and configured to access the cluster.
- AWS CLI installed and configured.
- Cert Manager - Cert Manager provides a common API for management of Transport Layer Security (TLS) certificates. K8ssandra Operator uses this API for certificates used by the various K8ssandra components.
- eksctl installed.
Create the EKS cluster:
Create an EKS Cluster:
$ eksctl create cluster --name k8ssandra-cluster --version 1.31\
--region eu-west-1 --nodegroup-name standard-workers \
--node-type t3.medium --nodes 3 --nodes-min 1 --nodes-max 4 --managed
This command creates an EKS cluster named k8ssandra-cluster
in the eu-west-1
region with Kubernetes version 1.31
. It sets up a managed node group with t3.medium
instances, starting with 3 nodes
and scaling between 1 and 4 nodes.
Verify the Cluster:
$ eksctl get cluster --name k8ssandra-cluster --region eu-west-1
Ensure the cluster is up and running.
Prepare the EKS cluster for K8ssandra:
We can deploy K8ssandra Operator
for namespace-scoped
operations (the default), or cluster-scoped operations.
Deploying a
namespace-scoped
K8ssandra Operator means its operations – watching for resources to deploy in Kubernetes – are specific only tothe identified namespace
within a cluster.Deploying a
cluster-scoped
operator means its operations – again, watching for resources to deploy in Kubernetes – areglobal
to all namespace(s) in the cluster.
Add the K8ssandra Helm Repository:
$ helm repo add k8ssandra https://helm.k8ssandra.io/stable
$ helm repo update
Install K8ssandra:
Create a namespace for K8ssandra:
$ kubectl create namespace k8ssandra
Install K8ssandra:
$ helm install k8ssandra-operator k8ssandra/k8ssandra-operator -n k8ssandra
This command installs K8ssandra in the k8ssandra namespace.
Verify the Installation:
$ kubectl get pods -n k8ssandra
Ensure all pods are running and the installation is successful.
Deploying a K8ssandraCluster with medusa, reaper:
After installing the K8ssandra operator, we can now deploy a K8ssandraCluster with Medusa, Reaper, and metrics enabled.
The following YAML configuration includes the necessary components and annotations to ensure proper functionality and permissions.
where:
☑️ Medusa is a backup and restore tool for Apache Cassandra. It integrates with Kubernetes to provide automated backups to cloud storage (e.g., S3). When running in standalone mode, Medusa uses the default service account from the namespace. Ensure this service account has the necessary role annotation to write to the backup bucket.
This means:
- The service account should have permissions to write to the backup bucket.
- Every pod using the default service account will inherit the same permissions.
- Medusa should run with a specific Kubernetes service account annotated with the IAM Role ARN for IRSA.
- The IAM role should have permissions to write to the backup bucket and a trusted policy allowing it to assume a role with web identity, based on the namespace and Kubernetes service account name.
- Your Kubernetes cluster should have a correctly configured IAM OIDC provider.
☑️ Reaper is a tool for managing and scheduling repairs in Apache Cassandra. It helps maintain the health of your Cassandra cluster by performing regular repairs to prevent data inconsistencies.
Key features include:
- Auto-scheduling: Automatically schedules repairs based on defined thresholds and intervals.
- Deployment mode: Can be deployed per datacenter (PER_DC) for better control.
- Heap size: Configurable heap size for managing memory usage.
- Keyspace: Uses a dedicated keyspace (reaper_db) for storing repair schedules and state.
☑️ Stargate is an open-source data gateway that provides a unified API layer for accessing Cassandra data. It supports multiple APIs, including REST, GraphQL, and CQL, making it easier to interact with Cassandra.
Key features include:
- API Gateway: Provides REST, GraphQL, and CQL APIs for accessing Cassandra data.
- Scalability: Can be scaled independently of the Cassandra nodes.
- Telemetry: Supports Prometheus for monitoring and metrics collection.
Create the K8ssandraCluster with kubectl apply:
$ kubectl apply -n k8ssandra -f k8sc.yaml
Access Reaper:
Reaper is an interface for managing K8ssandra cluster repairs. Reaper is deployed as part of the K8ssandra Operator installation.
For details, start in the Reaper topic. Then read about the repair tasks you can perform with Reaper.
🔚 Conclusion:
K8ssandra simplifies the deployment and management of Apache Cassandra on Kubernetes, providing robust features like automated backups with Medusa, repairs with Reaper, and comprehensive monitoring with MCAC and Vector.
Until next time 🎉
_Thank you for Reading !! 🙌🏻😁📃, see you in the next blog.🤘 _
🚀Thank you for sticking up till the end. If you have any questions/feedback regarding this blog feel free to connect with me:
♻️ LinkedIn: https://www.linkedin.com/in/rajhi-saif/
♻️ Twitter: https://twitter.com/rajhisaifeddine
The end ✌🏻
🔰 Keep Learning !! Keep Sharing !!
References:
https://dok.community/blog/1000-node-cassandra-cluster-on-amazons-eks/
https://medium.com/rahasak/deploy-cassandra-cluster-on-kubernetes-with-k8ssandra-fd19c535376c
https://docs.k8ssandra.io/install/eks/
https://github.com/k8ssandra/k8ssandra