Monitoring and Alerting for your CockroachDB cluster in Minikube

Fabio Ghirardello - Nov 25 '20 - - Dev Community

Overview

In my previous post we simulated a Multi-Region CockroachDB cluster on Minikube.

Today we add tools for Monitoring & Alerting, including accessing a S3 compatible service.

So go ahead and create the cluster first before proceeding.

You can also read more about the stack in the post for the Docker deployment.

Setup

Apply the Kubernetes definition file to create the monitoring stack.

kubectl apply -f https://gist.githubusercontent.com/fabiog1901/fc09e6fd98d0419b4528ca1c9553d478/raw/monitoring.yaml
Enter fullscreen mode Exit fullscreen mode

Check that all Pods and Services are all up and running, then ask Minikube for the services address, and open each one in your browser. Take note of the port number, see if you can located them in the deployment YAML file.

$ minikube service minio --url
http://192.168.64.6:31900

$ minikube service prom --url
http://192.168.64.6:31990

$ minikube service alertmgr --url
http://192.168.64.6:31993

$ minikube service grafana --url
http://192.168.64.6:32000
Enter fullscreen mode Exit fullscreen mode

Good job, the stack is ready! Let's review what we have deployed

MinIO

MinIO is a S3 compatible object storage service and it is very popular among private cloud deployments.

From the UI, login using username and password minioadmin, then can create a bucket named cockroach.

Load the MovR dataset, then connect to the database

$ cockroach workload init movr "postgresql://root@`minikube ip`:31257/movr?sslmode=disable"
[...]
$ cockroach sql --url "postgresql://`minikube ip`:31257/movr?sslmode=disable"  
Enter fullscreen mode Exit fullscreen mode

Execute a backup job pointing at the MinIO server. Notice the endpoint URL and the keys used

BACKUP TO 's3://cockroach?AWS_ENDPOINT=http://minio:9000&AWS_ACCESS_KEY_ID=minioadmin&AWS_SECRET_ACCESS_KEY=minioadmin'
  AS OF SYSTEM TIME '-10s';
Enter fullscreen mode Exit fullscreen mode
        job_id       |  status   | fraction_completed | rows | index_entries | bytes
---------------------+-----------+--------------------+------+---------------+--------
  610281440766132226 | succeeded |                  1 |    1 |             3 | 11524
(1 row)
Enter fullscreen mode Exit fullscreen mode

Confirm backup went well. Check also the MinIO UI

minio-ui

SHOW BACKUP 's3://cockroach?AWS_ENDPOINT=http://minio:9000&AWS_ACCESS_KEY_ID=minioadmin&AWS_SECRET_ACCESS_KEY=minioadmin';
Enter fullscreen mode Exit fullscreen mode
  database_name | parent_schema_name |        object_name         | object_type | start_time |             end_time             | size_bytes | rows | is_full_cluster
----------------+--------------------+----------------------------+-------------+------------+----------------------------------+------------+------+------------------
  NULL          | NULL               | system                     | database    | NULL       | 2020-11-25 14:05:27.012204+00:00 |       NULL | NULL |      true
  system        | public             | users                      | table       | NULL       | 2020-11-25 14:05:27.012204+00:00 |         99 |    2 |      true
  system        | public             | zones                      | table       | NULL       | 2020-11-25 14:05:27.012204+00:00 |        201 |    7 |      true
  system        | public             | settings                   | table       | NULL       | 2020-11-25 14:05:27.012204+00:00 |        374 |    5 |      true
  system        | public             | ui                         | table       | NULL       | 2020-11-25 14:05:27.012204+00:00 |        155 |    1 |      true
  system        | public             | jobs                       | table       | NULL       | 2020-11-25 14:05:27.012204+00:00 |      17994 |   21 |      true
  system        | public             | locations                  | table       | NULL       | 2020-11-25 14:05:27.012204+00:00 |        360 |    7 |      true
  system        | public             | role_members               | table       | NULL       | 2020-11-25 14:05:27.012204+00:00 |         94 |    1 |      true
  system        | public             | comments                   | table       | NULL       | 2020-11-25 14:05:27.012204+00:00 |          0 |    0 |      true
  system        | public             | role_options               | table       | NULL       | 2020-11-25 14:05:27.012204+00:00 |          0 |    0 |      true
  system        | public             | scheduled_jobs             | table       | NULL       | 2020-11-25 14:05:27.012204+00:00 |          0 |    0 |      true
  NULL          | NULL               | defaultdb                  | database    | NULL       | 2020-11-25 14:05:27.012204+00:00 |       NULL | NULL |      true
  NULL          | NULL               | postgres                   | database    | NULL       | 2020-11-25 14:05:27.012204+00:00 |       NULL | NULL |      true
  NULL          | NULL               | movr                       | database    | NULL       | 2020-11-25 14:05:27.012204+00:00 |       NULL | NULL |      true
  movr          | public             | users                      | table       | NULL       | 2020-11-25 14:05:27.012204+00:00 |       4911 |   50 |      true
  movr          | public             | vehicles                   | table       | NULL       | 2020-11-25 14:05:27.012204+00:00 |       3182 |   15 |      true
  movr          | public             | rides                      | table       | NULL       | 2020-11-25 14:05:27.012204+00:00 |     156387 |  500 |      true
  movr          | public             | vehicle_location_histories | table       | NULL       | 2020-11-25 14:05:27.012204+00:00 |      73918 | 1000 |      true
  movr          | public             | promo_codes                | table       | NULL       | 2020-11-25 14:05:27.012204+00:00 |     219973 | 1000 |      true
  movr          | public             | user_promo_codes           | table       | NULL       | 2020-11-25 14:05:27.012204+00:00 |          0 |    0 |      true
Enter fullscreen mode Exit fullscreen mode

Very good, you can now use MinIO as your Backup & Restore solution!

Monitoring stack: Prometheus, AlertManager, Grafana

Our Monitoring and Alerting stack is made up of 3 components: Prometheus, Alertmanager and Grafana.

Prometheus is an open-source systems monitoring and alerting toolkit. You can use Prometheus to grab the metrics that populate Cockroach AdminUI for your own, separate monitoring and alerting system setup.

Alertmanager is also a product of the Prometheus project.

Grafana is a very popular visualization tool and can connect to Prometheus as a source for the metrics.

Start with downloading Cockroach Labs pre-made Grafana dashboards

wget https://raw.githubusercontent.com/cockroachdb/cockroach/master/monitoring/grafana-dashboards/runtime.json  
wget https://raw.githubusercontent.com/cockroachdb/cockroach/master/monitoring/grafana-dashboards/sql.json
wget https://raw.githubusercontent.com/cockroachdb/cockroach/master/monitoring/grafana-dashboards/replicas.json
wget https://raw.githubusercontent.com/cockroachdb/cockroach/master/monitoring/grafana-dashboards/storage.json
Enter fullscreen mode Exit fullscreen mode

In Grafana, add Prometheus as a Data Source. The source URL for Prometheus is http://cockroachdb:9090.

Then, import each dashboard JSON file you downloaded.

As a test, run the YCSB workload

# initiate YCSB dataset
cockroach workload init ycsb "postgresql://root@`minikube ip`.6:31257/ycsb?sslmode=disable"
# run the YCSB workload B load balancing to all 9 nodes (3 services)
cockroach workload run ycsb "postgresql://root@`minikube ip`:31257/ycsb?sslmode=disable" "postgresql://root@`minikube ip`:31258/ycsb?sslmode=disable" "postgresql://root@`minikube ip`:31259/ycsb?sslmode=disable"
Enter fullscreen mode Exit fullscreen mode

And this is what is displayed in Grafana

grafana-dashboard

Perfect! You're all set!

Clean up

Removing the stack is as easy as creating it. Please note, this will remove also the Persistent Volumes

kubectl delete -f https://gist.githubusercontent.com/fabiog1901/fc09e6fd98d0419b4528ca1c9553d478/raw/monitoring.yaml
Enter fullscreen mode Exit fullscreen mode

Reference

CockroachDB

MinIO

Prometheus

Prometheus AlertManager

Grafana

. . . . . . . . . . . . . . . . . . . . .