Monitoring Your Spacelift Account via Prometheus

Adam Connelly - Aug 17 '22 - - Dev Community

For a while now, we have received feedback from customers that they would like to be able to connect their Spacelift account to external monitoring systems. Today we are excited to announce the Prometheus Exporter for Spacelift! We also have plans to add support for , so if you’re using another system, we should have you covered soon.

What Can It Do?

The Prometheus exporter allows you to monitor various metrics about your Spacelift account over time. You can then use tools like Grafana to visualize those changes and Alertmanager to take actions based on account metrics. Several metrics are available, and you can find the complete list of available metrics here. Below are a few examples of the information the exporter currently provides:

  • The number of runs pending and currently executing in both public and private worker pools.
  • The number of workers in a pool.
  • Usage information, including the number of public and private worker minutes used during the current billing period.

Once you have that information, it opens a number of possibilities, including visualizing information about your account via Grafana dashboards, alerting on events like a lack of private workers, as well as using that information to autoscale worker pools via the Horizontal Pod Autoscaler.

To give you a taste, here’s an example Grafana dashboard showing some of the information available from the exporter:

Image description

How Does It Work?

The Prometheus exporter is an adaptor between Prometheus and the Spacelift GraphQL API. Whenever Prometheus asks for the current metrics, the exporter makes a GraphQL request and converts it into the metrics format Prometheus expects.

The following diagram gives an overview of this process:

Image description

Getting Started

Ok, great, but how do I use the Prometheus Exporter?

To help with setup and deployment, we are going to guide you through the following:

  1. Deploy a Prometheus stack to a Kubernetes cluster.
  2. Deploy the exporter.
  3. Configure Prometheus to monitor it.

The Quick Start section in the exporter repo outlines several options for deploying the exporter. Review the best deployment option that makes sense, depending on your setup and requirements.

If you already have a Prometheus stack setup and have plenty of experience, feel free to skip to the “Installing the Prometheus Exporter” section.

We also assume that you already have a Kubernetes cluster provisioned (hopefully via Spacelift!) and available to complete the following steps. If not, a local installation like Minikube will work fine for illustration purposes.

Installing kube-prometheus-stack

The first step is to install the kube-prometheus-stack Helm chart. You can do that with the following commands:

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
helm install --create-namespace -n monitoring kube-prometheus prometheus-community/kube-prometheus-stack
Enter fullscreen mode Exit fullscreen mode

That will install a Prometheus, Grafana, and Alertmanager stack into your cluster’s namespace called monitoring. Run the kubectl get pods -n monitoring command to see the installed components.

$ kubectl get pods -n monitoring
NAME                                                     READY   STATUS    RESTARTS       AGE
alertmanager-kube-prometheus-kube-prome-alertmanager-0   2/2     Running   14 (58m ago)   20d
kube-prometheus-grafana-5cd6d47467-2rt88                 3/3     Running   21 (58m ago)   20d
kube-prometheus-kube-prome-operator-54b7488f58-7fmfv     1/1     Running   7 (58m ago)    20d
kube-prometheus-kube-state-metrics-8ccff67b4-zlfx9       1/1     Running   10 (58m ago)   20d
kube-prometheus-prometheus-node-exporter-zv8zn           1/1     Running   7 (58m ago)    20d
prometheus-kube-prometheus-kube-prome-prometheus-0       2/2     Running   14 (58m ago)   20d
Enter fullscreen mode Exit fullscreen mode

Getting API Credentials

The Prometheus exporter authenticates to the Spacelift GraphQL API using an API key. Follow the guide to create a new API key required by the explorer. After you create your key, please note the API Key ID and API Key Secret – you’ll need both when configuring the exporter.

Installing the Prometheus Exporter

The exporter is available via a Docker image published to the public.ecr.aws/spacelift/promex container registry. To deploy the exporter to Kubernetes, we need to create the following resources:

  • A Deployment – to run the exporter container.
  • A Service – to allow Prometheus to scrape the exporter.
  • A ServiceMonitor – to let Prometheus know that it needs to scrape the exporter.

The following is an example Deployment definition for running the exporter. Make sure to replace the <account name>, <API Key> and <API Secret> placeholders with the correct values:

apiVersion: apps/v1
kind: Deployment
metadata:
 name: spacelift-promex
 labels:
   app: spacelift-promex
spec:
 replicas: 1
 selector:
   matchLabels:
     app: spacelift-promex
 template:
   metadata:
     labels:
       app: spacelift-promex
   spec:
     containers:
     - name: spacelift-promex
       image: public.ecr.aws/spacelift/promex:latest
       ports:
       - name: metrics
         containerPort: 9953
       readinessProbe:
         httpGet:
           path: /health
           port: metrics
         periodSeconds: 5
       env:
       - name: "SPACELIFT_PROMEX_API_ENDPOINT"
         value: "https://<account name>.app.spacelift.io"
       - name: "SPACELIFT_PROMEX_API_KEY_ID"
         value: "<API Key>"
       - name: "SPACELIFT_PROMEX_API_KEY_SECRET"
         value: "<API Secret>"
       - name: "SPACELIFT_PROMEX_LISTEN_ADDRESS"
         value: ":9953"
Enter fullscreen mode Exit fullscreen mode

Note: The example above defines the API key and secret as normal environment variables. We would recommend that you use Kubernetes Secrets for anything other than testing purposes.

Next, create a Service to expose the exporter:

apiVersion: v1
kind: Service
metadata:
 name: spacelift-promex
 labels:
   app: spacelift-promex
spec:
 selector:
   app: spacelift-promex
 ports:
   - name: http-metrics
     protocol: TCP
     port: 80
     targetPort: metrics
Enter fullscreen mode Exit fullscreen mode

And finally create your ServiceMonitor:

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
 name: app-monitor
 labels:
   app: app-monitor
   release: kube-prometheus
spec:
 jobLabel: app-monitor
 selector:
   matchExpressions:
   - {key: app, operator: Exists}
 namespaceSelector:
   matchNames:
   - monitoring
 endpoints:
 - port: http-metrics
   interval: 15s
   path: "/metrics"
Enter fullscreen mode Exit fullscreen mode

The ServiceMonitor definition above tells Kubernetes to scrape any services that contain an app label. The granularity of the metrics can be increased or decreased depending on requirements and organizational standards. The above example is configured for 15-second intervals.

Viewing your Metrics

Once you have your Prometheus stack up and running and have deployed the Spacelift exporter, you can use port-forwarding to access each component. First, let’s port-forward the exporter to port 8080 locally using the following command:

kubectl port-forward service/spacelift-promex -n monitoring 8080:80
Enter fullscreen mode Exit fullscreen mode

Assuming all is well, you should be able to see the raw metrics output by accessing http://localhost:8080/metrics:

Image description

You can also port-forward to your Grafana instance to view the metrics in Grafana:

kubectl port-forward service/kube-prometheus-grafana -n monitoring 8081:80
Enter fullscreen mode Exit fullscreen mode

You can then quickly discover the available metrics via the Grafana Explore view:

Image description

Alerting

One of the things that I think is amazing about the Prometheus stack is the ability to use PromQL queries not just for monitoring but for defining alerts. For example, if we want to trigger an alert whenever our worker pool has no available workers for a specific time period, we can use a query like this:

max(spacelift_worker_pool_workers) by (worker_pool_id, worker_pool_name) <= 0
Enter fullscreen mode Exit fullscreen mode

Similarly, if we want to alert when the number of queued runs for a pool gets too high, we can use a query like this:

max(spacelift_worker_pool_runs_pending) by (worker_pool_id, worker_pool_name) >= 10
Enter fullscreen mode Exit fullscreen mode

Autoscaling

NOTE: This section is intended to outline some of the possibilities that exist when using the Prometheus exporter and should NOT be used as a guide for a production-ready autoscaling solution. For example, it does not consider properly draining workers before scaling them down to avoid in-progress runs being terminated.

Since we have metrics related to queued runs, we can use them to autoscale private workers. We can use the spacelift_worker_pool_runs_pending metric, which tells us how many runs for a given worker pool are waiting to be scheduled, to detect when we need to add more workers to our pool. Similarly, we can use the spacelift_worker_pool_workers and spacelift_worker_pool_workers_busy metrics to decide when to scale down.

We need to take both sets of metrics into account to avoid scaling down just because there are no queued runs. In this situation, we might still have runs in progress that need workers.

For this to work, we need the following components available:

  1. A working Prometheus stack with the Spacelift Prometheus Exporter running.
  2. A Spacelift worker pool.
  3. A prometheus-adapter installation.
  4. A Horizontal Pod Autoscaler resource to tell Kubernetes how to scale our worker pool.

For the sake of simplicity, I’m going to assume that you already have steps 1 and 2 covered, and let’s assume that you’ve deployed the Spacelift worker pool chart with the default settings. I’ll also assume that you have a standard installation of the kube-prometheus-stack chart.

The first thing we need to do is set up a Prometheus-adapter installation. The Prometheus-adapter works like a bridge between the Kubernetes metrics API and your Prometheus installation. It allows you to use Prometheus metrics to make autoscaling decisions within your cluster. We can visualize it something like this:

Image description

The Prometheus-adapter uses a configuration format to map between Prometheus queries and Kubernetes metrics. In our case, we can use something like the following to generate the two metrics we need:

prometheus:
 # The URL points at the Kubernetes Service for Prometheus
 url: "http://kube-prometheus-kube-prome-prometheus"

rules:
 default: false

 external:
 # Define the spacelift_worker_pool_runs_pending metric
 - seriesQuery: '{__name__=~"^spacelift_worker_pool_runs_pending$"}'
   resources:
     template: <<.Resource>>
   name:
     as: "spacelift_worker_pool_runs_pending"
   metricsQuery: max(spacelift_worker_pool_runs_pending) by (worker_pool_id, worker_pool_name)
 # Define the spacelift_worker_pool_utilization metric
 - seriesQuery: '{__name__=~"^spacelift_worker_pool_workers$"}'
   resources:
     template: <<.Resource>>
   name:
     as: "spacelift_worker_pool_utilization"
   metricsQuery: |
     (max(spacelift_worker_pool_workers_busy) by (worker_pool_id, worker_pool_name)
       / max(spacelift_worker_pool_workers) by (worker_pool_id, worker_pool_name))
     or vector(0)
Enter fullscreen mode Exit fullscreen mode

After deploying the adapter, you can query your metrics via the Kubernetes API, using the following commands (assuming you’ve deployed everything to a namespace called monitoring):

kubectl get --raw /apis/external.metrics.k8s.io/v1beta1/namespaces/monitoring/spacelift_worker_pool_runs_pending
kubectl get --raw /apis/external.metrics.k8s.io/v1beta1/namespaces/monitoring/spacelift_worker_pool_utilization
Enter fullscreen mode Exit fullscreen mode

Finally, we can create a Horizontal Pod Autoscaler definition to scale our worker pool Deployment based on those metrics automatically:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
 name: spacelift-worker-hpa
spec:
 scaleTargetRef:
   apiVersion: apps/v1
   kind: Deployment
   name: spacelift-worker
 minReplicas: 1
 maxReplicas: 15
 metrics:
 - type: External
   external:
     metric:
       name: spacelift_worker_pool_runs_pending
       selector:
         matchLabels:
           worker_pool_id: "01G8Y06VGHCT17453VEE9T4YBZ"
     target:
       type: AverageValue
       averageValue: 1
 - type: External
   external:
     metric:
       name: spacelift_worker_pool_utilization
       selector:
         matchLabels:
           worker_pool_id: "01G8Y06VGHCT17453VEE9T4YBZ"
     target:
       type: Value
       value: 0.8
Enter fullscreen mode Exit fullscreen mode

Notice the metrics are grouped by the worker_pool_id; we can use this to target an individual worker pool when creating our HPA!

That’s it – bask in your autoscaling glory:

Image description

We are excited to learn how our customers use the exporter and the problems they solve with it! Feel free to provide feedback and comments via Issues in the Prometheus Exporter Github repository.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .