Log aggregation and analysis are essential components of modern monitoring and observability systems. By collecting and analyzing logs from various sources, you can gain insights into the behavior of your applications and infrastructure, identify issues, and troubleshoot problems. Thanos is an open-source project that provides long-term storage and querying capabilities for Prometheus, but it can also be used for log aggregation and analysis. In this blog post, we will explore how to use Thanos for log aggregation and analysis.
Setting up Thanos for Log Aggregation
To use Thanos for log aggregation, you need to set up a Thanos cluster that includes a log receiver component. The log receiver component is responsible for collecting logs from various sources and forwarding them to the Thanos store component.
To set up a Thanos cluster with a log receiver component, you can use the Thanos Helm chart. Here is an example of deploying a Thanos cluster with a log receiver component using the Helm chart:
helm repo add thanos-charts https://storage.googleapis.com/thanos-charts
helm repo update
helm install thanos thanos-charts/thanos --namespace=thanos \
--set thanos.global.tracing.enabled=true \
--set thanos.global.log.level=debug \
--set thanos.sidecar.image.repository=gcr.io/thanos-io/thanos \
--set thanos.sidecar.image.tag=v0.23.0 \
--set thanos.sidecar.image.pullPolicy=IfNotPresent \
--set thanos.sidecar.log_receiver.enabled=true \
--set thanos.sidecar.log_receiver.grpc_address=0.0.0.0:1052 \
--set thanos.sidecar.log_receiver.grpc_tls_ca_cert_file=/etc/thanos/ca.pem \
--set thanos.sidecar.log_receiver.grpc_tls_cert_file=/etc/thanos/tls.crt \
--set thanos.sidecar.log_receiver.grpc_tls_key_file=/etc/thanos/tls.key \
--set thanos.sidecar.log_receiver.grpc_tls_server_name=thanos-sidecar \
--set thanos.sidecar.log_receiver.grpc_max_recv_msg_size=104857600 \
--set thanos.sidecar.log_receiver.grpc_max_send_msg_size=104857600 \
--set thanos.sidecar.log_receiver.grpc_max_concurrent_streams=1000 \
--set thanos.sidecar.log_receiver.grpc_keepalive_time=30s \
--set thanos.sidecar.log_receiver.grpc_keepalive_timeout=20s \
--set thanos.sidecar.log_receiver.grpc_keepalive_permit_without_stream=true \
--set thanos.sidecar.log_receiver.loki_push_api_url=http://loki-pushgateway.loki.svc.cluster.local:9095/api/prom/push \
--set thanos.sidecar.log_receiver.loki_push_api_batch_wait=1s \
--set thanos.sidecar.log_receiver.loki_push_api_batch_size=1000 \
--set thanos.sidecar.log_receiver.loki_push_api_max_retries=3 \
--set thanos.sidecar.log_receiver.loki_push_api_retry_backoff=1s \
--set thanos.sidecar.log_receiver.loki_push_api_timeout=10s
In this example, we are deploying a Thanos cluster with a log receiver component that is configured to forward logs to a Loki pushgateway. We are also setting various configuration options for the log receiver component, such as the maximum message size and the number of concurrent streams.
Collecting Logs with Thanos
Once you have set up a Thanos cluster with a log receiver component, you can start collecting logs from various sources. Thanos supports several log collection methods, including the following:
- File-based log collection: Thanos can collect logs from files on disk using the
thanos-sidecar
component. - Agent-based log collection: Thanos can collect logs from agents running on your hosts using the
thanos-agent
component. - Remote log collection: Thanos can collect logs from remote sources using the
thanos-rpc
component.
Here is an example of collecting logs from files on disk using the thanos-sidecar
component:
global:
tracing:
enabled: true
log:
level: debug
sidecar:
image:
repository: gcr.io/thanos-io/thanos
tag: v0.23.0
pullPolicy: IfNotPresent
log_receiver:
enabled: true
grpc_address: 0.0.0.0:1052
grpc_tls_ca_cert_file: /etc/thanos/ca.pem
grpc_tls_cert_file: /etc/thanos/tls.crt
grpc_tls_key_file: /etc/thanos/tls.key
grpc_tls_server_name: thanos-sidecar
grpc_max_recv_msg_size: 104857600
grpc_max_send_msg_size: 104857600
grpc_max_concurrent_streams: 1000
grpc_keepalive_time: 30s
grpc_keepalive_timeout: 20s
grpc_keepalive_permit_without_stream: true
loki_push_api_url: http://loki-pushgateway.loki.svc.cluster.local:9095/api/prom/push
loki\_push\_api\_batch\_wait: 1s
loki\_push\_api\_batch\_size: 1000
loki\_push\_api\_max\_retries: 3
loki\_push\_api\_retry\_backoff: 1s
loki\_push\_api\_timeout: 10s
file\_sd\_configs:
- files:
[
"/etc/thanos/file\_sd.yaml"
]
receivers:
- name: thanos
address: thanos-sidecar.thanos.svc.cluster.local:1052
tls\_config:
insecure\_skip\_verify: true
processors:
- name: prometheus
actions:
- action: remote\_write
remote\_config:
url: http://thanos-query.thanos.svc.cluster.local:9090/api/v1/write
relabel\_configs:
- source\_labels: $$__name__$$
action: keep
regex: ^job\_name
exporters:
- type: log
log:
level: debug
- type: prometheus
config:
scrape\_configs:
- job\_name: thanos
static\_configs:
- targets:
[
"thanos-query.thanos.svc.cluster.local:9090"
]
In this example, we are configuring the thanos-sidecar
component to collect logs from files on disk using the file_sd_configs
option. We are also configuring the thanos-sidecar
component to forward logs to the Loki pushgateway using the loki_push_api_url
option.
Analyzing Logs with Thanos
Once you have collected logs with Thanos, you can analyze them using various tools. Thanos provides a query API that allows you to query logs using PromQL, the same query language used by Prometheus.
Here is an example of querying logs using the Thanos query API:
kubectl port-forward svc/thanos-query 9090:9090 -n thanos
curl -G "http://localhost:9090/api/v1/query?query=sum(rate({job=~".+"}[5m]))" --data-urlencode "timeout=10s"
In this example, we are using kubectl
to forward traffic from port 9090 on our local machine to port 9090 on the Thanos query component running in the thanos
namespace. We are then using curl
to perform a query for the sum of the rate of all logs over the past 5 minutes.
Conclusion
Thanos provides a powerful solution for log aggregation and analysis. By setting up a Thanos cluster with a log receiver component, you can collect logs from various sources and forward them to a centralized location. You can then use the Thanos query API to analyze logs using PromQL. With Thanos, you can simplify your log aggregation and analysis infrastructure and gain deeper insights into your applications and infrastructure.