Distributed application troubleshooting can be a nightmare. Unless you have the budget for expensive proprietary monitoring SaaS solutions or the expertise to run and maintain an complex ELK stack you might feel as if you are stuck in a cave without a flashlight.

Luckily, viable open-source alternatives like Quickwit are here to come to the rescue. By weaving together existing tooling for log and trace ingesting as well as pairing well with dashboard and visualisation tools such as Grafana and Jaeger. And sandwiching powerful indexing storage and search capabilities in between. Even if the tool sounds new, it won’t be for long.

We recently integrated Quickwit with Glasskube and it’s available to be easily deployed to your cluster. I spoke directly with Quickwit co-founder François Massot to get the insider scoop, and to learn how the tool works. Let's dive in!

But what is Quickwit exactly? 🤷

Quickwit is a cloud-native search engine that emerged with the goal of creating an open-source alternative to expensive monitoring software like Datadog and Splunk. Along the way, they’ve also developed and open-sourced several components, including ChitChat (cluster membership protocol), mrecordlog (WAL), Whichlang (fast language detection), witty actors (actor framework), and bitpacking (SIMD algorithms for integer compression).

Quickwit, with its robust Elasticsearch-compatible API, integrates well with tooling from the OSS ecosystem, such as Grafana, Jaeger, and OpenTelemetry. Users are successfully deploying Quickwit at scale, with hundreds of nodes and hundreds of terabytes of data ingested daily, all while enjoying significant cost reductions and how thanks to Glasskube to can get up and running in no time.

Quickwit excels in handling logs, traces, security data, and append-only datasets, with plans to support metrics soon. A key feature is the usage of object storage for the indexed data, which simplifies cluster management, cuts infrastructure costs, and enhances reliability. Multiple storage options are available such as local disk, Amazon S3, Azure Blob storage or Garage, an OSS distributed object storage, are available.

Questions for the co-founder François Massot 🙋

What are the benefits of using external Object Storage as opposed to node attached storage?

There are a lot of benefits! From the beginning, we chose to decouple compute and storage to make our search engine scalable, reliable, and very cost-efficient. If you want to remember one thing distinguishing Quickwit from traditional search engines, this is decoupled storage and computing.

Firstly, it provides elasticity, allowing us to scale storage and compute resources independently, which is ideal for cloud environments. Secondly, it’s cost-efficient, as object storage like S3 is cheaper than traditional disk storage, especially for large volumes of log data. And you don’t need to replicate your index data; this is done by the object storage layer. Additionally, it ensures high durability and availability, reducing the risk of data loss. Last, but not least, it simplifies cluster management as most of Quickiwt’s components are stateless.

Performance Comparison: Is Quickwit Faster than Elasticsearch?

It depends!

On indexing, Quickwit is generally twice as fast as Elasticsearch and uses less CPU. Our users, like Binance, report a reduction of 80% of CPU usage at indexing!

The story is different regarding querying, as Elasticsearch has all its data on a local disk, typically SSD, and Quickwit has its indexed data in very slow object storage. In this case, you can expect query time to be lower. But Quickwit's main goal is to be sub-second queries, which is perfectly fine in the observability/security domains. If we look at this indicator, Quickwit is on par with Elasticsearch and is even faster for demanding analytics queries, whereas the data is stored on object storage!

What's in store for quickwit in the future?

We have a very ambitious roadmap! Here are the key features that will be added in the following 12 months:

Distributed ingest (July 2024): High-throughput indexing on tens of thousands of indexes.
OpenSearch Dashboard support (Q3 2024): This will enable OpenSearch users to migrate seamlessly to Quickwit with their existing dashboards.
Metrics support (Q4 2024): New storage engine optimized for time series data.
Distributed SQL engine (Q1 2025): Distributed SQL engine for analytics on top of Apache Arrow, Datafusion, and Ballista.
Pipe-based query language (Q2 2025): Introduction of a flexible and powerful query language similar to SPL (Splunk Query Language)

Use cases

Log management 🪵

Quickwit is built from the ground up to efficiently index unstructured data, and search it effortlessly on cloud storage. Moreover, Quickwit supports OpenTelemetry gRPC and HTTP (protobuf only) protocols out of the box and provides a REST API ready to ingest any JSON formatted logs. This makes Quickwit a perfect fit for logs!

Distributed tracing 📊

Distributed Tracing involves monitoring application requests as they traverse various services like frontend, backend, and databases. It's instrumental for understanding application behavior and diagnosing performance issues.

Additionally, Quickwit seamlessly integrates with OpenTelemetry using gRPC and HTTP protocols (protobuf only), as well as Jaeger's gRPC API (SpanReader only). This means you can store traces in Quickwit and effortlessly query them using Jaeger's UI.

Key features 🔑

Full-text search and aggregation queries
Elasticsearch query language support
Sub-second search on cloud storage (Amazon S3, Azure Blob Storage, …)
Decoupled compute and storage, stateless indexers & searchers
Schemaless or strict schema indexing
Schemaless analytics
Grafana data source
Jaeger-native
OTEL-native for logs and traces
Kubernetes ready via Glasskube
RESTful API

Installation guide 🦮

Prerequisites

Access to a Kubernetes cluster (you can easily create a local cluster by using Minikube or Kind)
kubectl isn't strictly speaking a dependency for installing packages via glasskube, but it is the recommended way to interact with the cluster. Therefore, it is highly recommended. Installation instructions are available for macOS, Linux and Windows.

Install Glasskube

If you already installed glasskube you can skip this step.
If not, glasskube can easily be installed by following your distribution's specific instructions.

For this demo I'll be using a MacOs distribution:


 bash
brew install glasskube/tap/glasskube # install the glasskube cli
minikube start # start a minikube Kubernetes cluster
glasskube bootstrap # install glasskube on the kind cluster

For more installation guides, find them here.

Once Glasskube has been installed access via the UI with:



glasskube serve

The dashboard will open up on http://localhost:8580/.

Creating an S3-Compatible Bucket

Before installing Quickwit, you'll need to create an object storage bucket to hold your Quickwit indexes. You can use use your choice of Cloud provider such as Scaleway, AWS S3 or MinIO. Refer to our official Quickwit documentation for storage configuration details.

Here I will be creating an AWS S3 bucket to store the Quickwit indexes.

Steps:

Navigate to the AWS management console and create a new S3 bucket.
In IAM generate an API key, with S3 permissions, save the 'Access Key Id' and 'Secret Key', we will need them shortly.

Deploy Quickwit

From the Glasskube dashboard, find the Quickwit pacakge and add your custom configuration parameters.

defaultIndexRootUri: for this demo it's s3://quickwit-indexes.
metastoreUri: we won't use PostgreSQL so let's pick the same value we used for defaultIndexRootUri.
s3AccessKeyId: the "Access Key Id" from AWS we generated before.
s3Endpoint: Custom endpoint for use with S3-compatible providers. Not needed for S3 configuration.
s3Flavor: we are using the default empty value for genuine S3-compatible object storage.
s3Region: US-east-1 in my case.
s3SecretAccessKey: the "Secret Key" from AWS we generated before.

Here you can find the official Quickwit documentation for parameter completion.

It's also possible to install and configure Quickwit using the Glasskube CLI by running:


 bash
glasskube install quickwit

Once installed, you can see that a quickwit namespace has been created:


 bash
default
flux-system
glasskube-system
kube-node-lease
kube-public
kube-system
kubernetes-dashboard
quickwit

Now, check to see if the pods are running


 bash
NAME                                               READY   STATUS    RESTARTS      AGE
quickwit-quickwit-control-plane-86bd9955f7-bwm2r   1/1     Running   1 (27m ago)   29m
quickwit-quickwit-indexer-0                        1/1     Running   1 (27m ago)   29m
quickwit-quickwit-janitor-9479697ff-x4x2c          1/1     Running   1 (27m ago)   29m
quickwit-quickwit-metastore-56ff74df9f-k6d2g       1/1     Running   0             29m
quickwit-quickwit-searcher-0                       1/1     Running   1 (27m ago)   29m
quickwit-quickwit-searcher-1                       1/1     Running   0             27m
quickwit-quickwit-searcher-2                       1/1     Running   0             27m

We can try to access to the Quickwit UI using the following command:


 bash
$ kubectl -n quickwit port-forward pod/quickwit-quickwit-searcher-0 7280

Head over to http://localhost:7280. And you should be ready to go!

Create your first index

Before adding documents to Quickwit, you need to create an index configured with a YAML config file. This config file notably lets you define how to map your input documents to your index fields and whether these fields should be stored and indexed. See the index config documentation.

Let's create an index configured to receive Stackoverflow posts (questions and answers).



# First, download the stackoverflow dataset config from Quickwit repository.
curl -o stackoverflow-index-config.yaml https://raw.githubusercontent.com/quickwit-oss/quickwit/main/config/tutorials/stackoverflow/index-config.yaml

The index config defines three fields: title, body and creationDate. title and body are indexed and tokenized, and they are also used as default search fields, which means they will be used for search if you do not target a specific field in your query. creationDate serves as the timestamp for each record. There are no more explicit field definitions as we can use the default dynamic mode: the undeclared fields will still be indexed, by default fast fields are enabled to enable aggregation queries. and the raw tokenizer is used for text.

And here is the complete config:


 yaml
# Index config file for stackoverflow dataset.
#
version: 0.7

index_id: stackoverflow

doc_mapping:
  field_mappings:
    - name: title
      type: text
      tokenizer: default
      record: position
      stored: true
    - name: body
      type: text
      tokenizer: default
      record: position
      stored: true
    - name: creationDate
      type: datetime
      fast: true
      input_formats:
        - rfc3339
      fast_precision: seconds
  timestamp_field: creationDate

search_settings:
  default_search_fields: [title, body]

indexing_settings:
  commit_timeout_secs: 30

Now we can create the index with the command:



./quickwit index create --index-config ./stackoverflow-index-config.yaml

Check that a directory ./qwdata/indexes/stackoverflow has been created, Quickwit will write index files here and a metastore.json which contains the index metadata. You're now ready to fill the index.

Continue on to the Quickwit documentation to add your first documents and execute your first search queries.

If you like our content and want to support us on this mission, we'd appreciate it if you could give us a star ⭐️ on GitHub.

⭐️ Star us on GitHub 🙏

Log and trace management made easy. Quickwit Integration via Glasskube