In the last few years, Prometheus has gained huge popularity as a tool for monitoring distributed systems. It has a simple yet powerful data model and query language, however, it can often pose a bit of a challenge when it comes to high availability as well as for historical metric data storage. Adding more Prometheus replicas can be used to improve availability, but otherwise, Prometheus does not offer continuous availability. For example, if one of the Prometheus replicas crashes, there will be a gap in the metric data during the time it takes to failover to another Prometheus instance. Similarly, Prometheus’s local storage is limited in scalability and durability given its single-node architecture. You will have to rely on a remote storage system to solve the long-term data retention problem. This is where the CNCF sandbox project Thanos comes in handy.
Thanos is a set of components that can be composed into a highly available metrics system with unlimited storage capacity on GCP, S3, or other supported object stores, and runs seamlessly on top of existing Prometheus deployments. Thanos allows you to query multiple Prometheus instances at once and merges data for the same metric across multiple instances on the fly to produce a continuous stream of monitoring logs. Even though Thanos is an early-stage project, it is already used in production by companies like Adobe and eBay.
Because YugabyteDB is a cloud native, distributed SQL database, it can easily interoperate with Thanos and many other CNCF projects like Longhorn, OpenEBS, Rook, and Falco.
What’s YugabyteDB?
It is an open source, high-performance distributed SQL database built on a scalable and fault-tolerant design inspired by Google Spanner. Yugabyte’s SQL API (YSQL) is PostgreSQL wire compatible.
In this blog post we’ll show you how to get up and running with Thanos so that it can be used to monitor a YugabyteDB cluster, all running on Google Kubernetes Engine (GKE).
Thanos Architecture
At a high level, Thanos has several key components worth understanding how they work.
- First, a sidecar is deployed alongside the Prometheus container and interacts with Prometheus.
- Next, an additional service called Thanos Query is deployed. It is configured to be aware of all instances of the Thanos Sidecar. Instead of querying Prometheus directly, you query the Thanos Query component.
- Thanos Query communicates with the Thanos Sidecar via gRPC and de-duplicates metrics across all instances of Prometheus when executing a query. Thanos Query also delivers a graphical user interface for querying and administration, plus exposes the Prometheus API.
An illustration of the components is shown below. You can learn more about the Thanos architecture by checking out the documentation.
Why Thanos and YugabyteDB
Because YugabyteDB already integrates with Prometheus, Thanos can be used as a resilient monitoring platform for YugabyteDB clusters that can also store the metric data long term. It ensures the continuous availability of YugabyteDB metric data by aggregating the data from multiple Prometheus instances into a single view.
Prerequisites
Here is the environment required for the setup:
- Yugabyte DB – Version 2.1.6
- Prometheus Operator – Version 2.2.1
- Thanos – Version 0.12.2
- A Google Cloud Platform account
Setting Up a Kubernetes Cluster on Google Cloud Platform
Assuming you have a Google Cloud Platform account, the first step is to set up a Kubernetes cluster using GKE.
The usual defaults should be sufficient. For the purposes of this demo I chose Machine type: n-1-standard-4 (4 vCPU, 15 GB memory).
Install YugabyteDB on GKE with Helm
Once your Kubernetes cluster is up and running, log into the shell and work through the following commands to get a YugabyteDB cluster deployed using Helm 3.
Create a namespace
$ kubectl create namespace yb-demo
Add the charts repository
$ helm repo add yugabytedb https://charts.yugabyte.com
Fetch updates from the repository
$ helm repo update
Install YugabyteDB
We are now ready to install YugabyteDB. In the command below we’ll be specifying values for a resource constrained environment.
$ helm install yb-demo yugabytedb/yugabyte \
--set resource.master.requests.cpu=0.5, \ resource.master.requests.memory=0.5Gi,\
resource.tserver.requests.cpu=0.5, \ resource.tserver.requests.memory=0.5Gi --namespace yb-demo
To check the status of the YugabyteDB cluster, execute the command below:
$ helm status yb-demo -n yb-demo
From the screenshot above we can see that the external IP is 35.239.XX.XX
and that the YSQL port is 5433
. You can use this information to connect to YugabyteDB with your favorite database admin tool, like DBeaver, pgAdmin, and TablePlus. For more information, check out the third-party tools documentation.
Congrats! At this point you have a three-node YugabyteDB cluster running on GKE.
Setting Up the Prometheus Operator
For the purposes of this blog, we will be using the Prometheus Operator deployed via Helm 3 to get Prometheus up and running.
Create a values.yaml file
By default, Helm charts install multiple components that are not required to run Thanos with Prometheus. Also, since our cluster has limited resources, we need to override the default configuration by creating a new values.yaml
file and passing this file when we install the Prometheus Operator using Helm.
$ touch values.yaml
$ vim values.yaml
The file’s contents should look like this:
defaultRules:
create: false
alertmanager:
enabled: false
grafana:
enabled: false
kubeApiServer:
enabled: false
kubelet:
enabled: false
kubeControllerManager:
enabled: false
coreDns:
enabled: false
kubeEtcd:
enabled: false
kubeScheduler:
enabled: false
kubeStateMetrics:
enabled: false
nodeExporter:
enabled: false
prometheus:
enabled: false
Install Prometheus
Install the Prometheus Operator via Helm 3 as shown below.
$ kubectl create namespace prometheus
$ helm repo add stable https://kubernetes-charts.storage.googleapis.com
$ helm repo update
$ helm install prometheus-operator stable/prometheus-operator \
--namespace prometheus \
--values values.yaml
You can verify that the Prometheus Operator is installed using the following command:
$ kubectl get pods -n prometheus
To avoid the scenario of metrics being unavailable, either permanently or for a short duration of time, we can run a second instance of Prometheus. Each instance of Prometheus will run independent of the other, however each instance will still have the same configuration as set by the Prometheus Operator. You can see this implementation detail in the bolded section below where we specify 2 replicas.
Create a file called prometheus.yaml
$ touch prometheus.yaml
$ vim prometheus.yaml
Add the following configuration:
apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
name: prometheus
namespace: prometheus
spec:
baseImage: quay.io/prometheus/prometheus
logLevel: info
podMetadata:
annotations:
cluster-autoscaler.kubernetes.io/safe-to-evict: "true"
labels:
app: prometheus
replicas: 2
resources:
limits:
cpu: 100m
memory: 2Gi
requests:
cpu: 100m
memory: 2Gi
retention: 12h
serviceAccountName: prometheus-service-account
serviceMonitorSelector:
matchLabels:
serviceMonitorSelector: prometheus
storage:
volumeClaimTemplate:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: prometheus-pvc
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
version: v2.10.0
securityContext:
fsGroup: 0
runAsNonRoot: false
runAsUser: 0
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: "prometheus-service-account"
namespace: "prometheus"
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: "prometheus-cluster-role"
rules:
- apiGroups:
- ""
resources:
- nodes
- services
- endpoints
- pods
verbs:
- get
- list
- watch
- apiGroups:
- ""
resources:
- nodes/metrics
verbs:
- get
- nonResourceURLs:
- "/metrics"
verbs:
- get
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: "prometheus-cluster-role-binding"
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: "prometheus-cluster-role"
subjects:
- kind: ServiceAccount
name: "prometheus-service-account"
namespace: prometheus
Next, apply the prometheus.yaml
file to the Kubernetes cluster using the following command:
$ kubectl apply -f prometheus.yaml
You can verify that the Prometheus Operator is installed using the following command:
$ kubectl get pods -n prometheus
You should see output like that shown below with two Prometheus pods now running:
Configuring Prometheus PVC
The Prometheus persistent volume claim (PVC) is used to retain the state of Prometheus and the metrics it captures in the event that it is upgraded or restarted. To verify that the PVC that has been created and bound to a persistent volume run the following command:
$ kubectl get persistentvolumeclaim --namespace prometheus
You should see output like that shown below:
To access the Prometheus UI we need to first run the following command:
$ kubectl port-forward service/prometheus-operated 9090:9090 --namespace prometheus
Now, go to Web preview
in the Google Console and select Change port > 9090
. You should now see the Prometheus web UI, similar to the one shown below:
Configuring Prometheus to Monitor YugabyteDB
The next step is to configure Prometheus to scrape YugabyteDB metrics. Create a file named servicemonitor.yaml
with the following content:
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
labels:
serviceMonitorSelector: prometheus
name: prometheus
namespace: prometheus
spec:
endpoints:
- interval: 30s
targetPort: 7000
path: /prometheus-metrics
namespaceSelector:
matchNames:
- yb-demo
selector:
matchLabels:
app: "yb-master"
We can now apply the servicemonitor.yaml
configuration by running the following command:
$ kubectl apply -f servicemonitor.yaml
Verify that the configuration has been applied by running the following command:
$ kubectl get servicemonitor --namespace prometheus
You should see output similar to the one shown below.
Now, return to the Prometheus UI to verify that the YugabyteDB metric endpoints are available to Prometheus by going to Status > Targets.
Setting Up Thanos
Add the following Thanos specific configurations to the prometheus.yaml
file under the spec
section that starts at line 7:
spec:
baseImage: quay.io/prometheus/prometheus
logLevel: info
podMetadata:
annotations:
cluster-autoscaler.kubernetes.io/safe-to-evict: "true"
labels:
app: prometheus
thanos-store-api: "true"
replicas: 2
thanos:
version: v0.4.0
resources:
limits:
cpu: 10m
memory: 50Mi
requests:
cpu: 10m
memory: 50Mi
resources:
limits:
cpu: 100m
memory: 2Gi
requests:
cpu: 100m
memory: 2Gi
retention: 12h
serviceAccountName: prometheus-service-account
serviceMonitorSelector:
matchLabels:
serviceMonitorSelector: prometheus
externalLabels:
cluster_environment: workshop
storage:
volumeClaimTemplate:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: prometheus-pvc
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
Finally, add the following Thanos deployment configuration to the end of the prometheus.yaml
file:
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: thanos-query
namespace: prometheus
labels:
app: thanos-query
spec:
replicas: 1
selector:
matchLabels:
app: thanos-query
template:
metadata:
labels:
app: thanos-query
spec:
containers:
- name: thanos-query
image: improbable/thanos:v0.5.0
resources:
limits:
cpu: 50m
memory: 100Mi
requests:
cpu: 50m
memory: 100Mi
args:
- "query"
- "--log.level=debug"
- "--query.replica-label=prometheus_replica"
- "--store.sd-dns-resolver=miekgdns"
- "--store=dnssrv+_grpc._tcp.thanos-store-api.prometheus.svc.cluster.local"
ports:
- name: http
containerPort: 10902
- name: grpc
containerPort: 10901
- name: cluster
containerPort: 10900
---
apiVersion: v1
kind: Service
metadata:
name: "thanos-store-api"
namespace: prometheus
spec:
type: ClusterIP
clusterIP: None
ports:
- name: grpc
port: 10901
targetPort: grpc
selector:
thanos-store-api: "true"
We are now ready to apply the configuration by running the following command:
$ kubectl apply -f prometheus.yaml
Verify that the pods are running by running the following command:
kubectl get pods --namespace prometheus
Notice that we now have Thanos running.
Connect to the Thanos UI
Connect to Thanos Query by using port forwarding. You can do this by running the following command replacing the thanos-query pod name
with your own:
$ kubectl port-forward pod/thanos-query-7f77667897-lfmlb 10902:10902 --namespace prometheus
We can now access the Thanos Web UI, using the web preview with port 10902.
Verify that Thanos is able to access both Prometheus replicas by clicking on Stores.
The YugabyteDB metric data is now available to Thanos through both Prometheus instances. A few examples are below:
Conclusion
That’s it! You now have a YugabyteDB cluster running on GKE that is being monitored by two Prometheus instances, which not only made highly available but also appear as one, with Thanos. For more information, check out the documentation on YugabyteDB metrics and integration with Prometheus.