Log aggregation and analysis are essential components of modern monitoring and observability systems. By collecting and analyzing logs from various sources, you can gain insights into the behavior of your applications and infrastructure, identify issues, and troubleshoot problems. Thanos is an open-source project that provides long-term storage and querying capabilities for Prometheus, but it can also be used for log aggregation and analysis. In this blog post, we will explore how to use Thanos for log aggregation and analysis.

Setting up Thanos for Log Aggregation

To use Thanos for log aggregation, you need to set up a Thanos cluster that includes a log receiver component. The log receiver component is responsible for collecting logs from various sources and forwarding them to the Thanos store component.

To set up a Thanos cluster with a log receiver component, you can use the Thanos Helm chart. Here is an example of deploying a Thanos cluster with a log receiver component using the Helm chart:

helm repo add thanos-charts https://storage.googleapis.com/thanos-charts
helm repo update
helm install thanos thanos-charts/thanos --namespace=thanos \
  --set thanos.global.tracing.enabled=true \
  --set thanos.global.log.level=debug \
  --set thanos.sidecar.image.repository=gcr.io/thanos-io/thanos \
  --set thanos.sidecar.image.tag=v0.23.0 \
  --set thanos.sidecar.image.pullPolicy=IfNotPresent \
  --set thanos.sidecar.log_receiver.enabled=true \
  --set thanos.sidecar.log_receiver.grpc_address=0.0.0.0:1052 \
  --set thanos.sidecar.log_receiver.grpc_tls_ca_cert_file=/etc/thanos/ca.pem \
  --set thanos.sidecar.log_receiver.grpc_tls_cert_file=/etc/thanos/tls.crt \
  --set thanos.sidecar.log_receiver.grpc_tls_key_file=/etc/thanos/tls.key \
  --set thanos.sidecar.log_receiver.grpc_tls_server_name=thanos-sidecar \
  --set thanos.sidecar.log_receiver.grpc_max_recv_msg_size=104857600 \
  --set thanos.sidecar.log_receiver.grpc_max_send_msg_size=104857600 \
  --set thanos.sidecar.log_receiver.grpc_max_concurrent_streams=1000 \
  --set thanos.sidecar.log_receiver.grpc_keepalive_time=30s \
  --set thanos.sidecar.log_receiver.grpc_keepalive_timeout=20s \
  --set thanos.sidecar.log_receiver.grpc_keepalive_permit_without_stream=true \
  --set thanos.sidecar.log_receiver.loki_push_api_url=http://loki-pushgateway.loki.svc.cluster.local:9095/api/prom/push \
  --set thanos.sidecar.log_receiver.loki_push_api_batch_wait=1s \
  --set thanos.sidecar.log_receiver.loki_push_api_batch_size=1000 \
  --set thanos.sidecar.log_receiver.loki_push_api_max_retries=3 \
  --set thanos.sidecar.log_receiver.loki_push_api_retry_backoff=1s \
  --set thanos.sidecar.log_receiver.loki_push_api_timeout=10s

In this example, we are deploying a Thanos cluster with a log receiver component that is configured to forward logs to a Loki pushgateway. We are also setting various configuration options for the log receiver component, such as the maximum message size and the number of concurrent streams.

Collecting Logs with Thanos

Once you have set up a Thanos cluster with a log receiver component, you can start collecting logs from various sources. Thanos supports several log collection methods, including the following:

File-based log collection: Thanos can collect logs from files on disk using the thanos-sidecar component.
Agent-based log collection: Thanos can collect logs from agents running on your hosts using the thanos-agent component.
Remote log collection: Thanos can collect logs from remote sources using the thanos-rpc component.

Here is an example of collecting logs from files on disk using the thanos-sidecar component:

global:
  tracing:
    enabled: true
  log:
    level: debug

sidecar:
  image:
    repository: gcr.io/thanos-io/thanos
    tag: v0.23.0
    pullPolicy: IfNotPresent
  log_receiver:
    enabled: true
    grpc_address: 0.0.0.0:1052
    grpc_tls_ca_cert_file: /etc/thanos/ca.pem
    grpc_tls_cert_file: /etc/thanos/tls.crt
    grpc_tls_key_file: /etc/thanos/tls.key
    grpc_tls_server_name: thanos-sidecar
    grpc_max_recv_msg_size: 104857600
    grpc_max_send_msg_size: 104857600
    grpc_max_concurrent_streams: 1000
    grpc_keepalive_time: 30s
    grpc_keepalive_timeout: 20s
    grpc_keepalive_permit_without_stream: true
    loki_push_api_url: http://loki-pushgateway.loki.svc.cluster.local:9095/api/prom/push
    loki\_push\_api\_batch\_wait: 1s
    loki\_push\_api\_batch\_size: 1000
    loki\_push\_api\_max\_retries: 3
    loki\_push\_api\_retry\_backoff: 1s
    loki\_push\_api\_timeout: 10s

file\_sd\_configs:
- files:
[
"/etc/thanos/file\_sd.yaml"
]

receivers:
- name: thanos
address: thanos-sidecar.thanos.svc.cluster.local:1052
tls\_config:
insecure\_skip\_verify: true

processors:
- name: prometheus
actions:
- action: remote\_write
remote\_config:
url: http://thanos-query.thanos.svc.cluster.local:9090/api/v1/write
relabel\_configs:
- source\_labels: $$__name__$$
action: keep
regex: ^job\_name

exporters:
- type: log
log:
level: debug

- type: prometheus
config:
scrape\_configs:
- job\_name: thanos
static\_configs:
- targets:
[
"thanos-query.thanos.svc.cluster.local:9090"
]

In this example, we are configuring the thanos-sidecar component to collect logs from files on disk using the file_sd_configs option. We are also configuring the thanos-sidecar component to forward logs to the Loki pushgateway using the loki_push_api_url option.

Analyzing Logs with Thanos

Once you have collected logs with Thanos, you can analyze them using various tools. Thanos provides a query API that allows you to query logs using PromQL, the same query language used by Prometheus.

Here is an example of querying logs using the Thanos query API:

kubectl port-forward svc/thanos-query 9090:9090 -n thanos
curl -G "http://localhost:9090/api/v1/query?query=sum(rate({job=~".+"}[5m]))" --data-urlencode "timeout=10s"

In this example, we are using kubectl to forward traffic from port 9090 on our local machine to port 9090 on the Thanos query component running in the thanos namespace. We are then using curl to perform a query for the sum of the rate of all logs over the past 5 minutes.

Conclusion

Thanos provides a powerful solution for log aggregation and analysis. By setting up a Thanos cluster with a log receiver component, you can collect logs from various sources and forward them to a centralized location. You can then use the Thanos query API to analyze logs using PromQL. With Thanos, you can simplify your log aggregation and analysis infrastructure and gain deeper insights into your applications and infrastructure.

Using Thanos for Log Aggregation and Analysis