Introduction
Vector by Datadog is a high-performance, end-to-end (agent & aggregator) observability data pipeline that lets you collect, transform, and route all your logs and metrics. Additionally, it is open source. While it can serve as an aggregator itself, one can find it more effective to use Vector.dev in conjunction with a specialized data storage tool, such as Manticore.
Let's look at how they can work together. For this, we’ll use an example of indexing dpkg.log
, a standard log file of the Debian package manager. The log itself has a simple structure, as shown below:
2023-05-31 10:42:55 status triggers-awaited ca-certificates-java:all 20190405ubuntu1.1
2023-05-31 10:42:55 trigproc libc-bin:amd64 2.31-0ubuntu9.9 <none>
2023-05-31 10:42:55 status half-configured libc-bin:amd64 2.31-0ubuntu9.9
2023-05-31 10:42:55 status installed libc-bin:amd64 2.31-0ubuntu9.9
2023-05-31 10:42:55 trigproc systemd:amd64 245.4-4ubuntu3.21 <none>
Configuration
Here is an example of the Vector.dev's configuration file in the toml format:
[sources.test_file]
type = "file"
include = [ "/var/log/dpkg.log" ]
[transforms.modify_test_file]
type = "remap"
inputs = [ "test_file" ]
source = """
.vec_timestamp = del(.timestamp)"""
[sinks.manticore]
type = "elasticsearch"
inputs = [ "modify_test_file" ]
endpoints = ["http://127.0.0.1:9308"]
bulk.index = "dpkg_log"
Note that, in this example, we assume Manticore to use its default http port 9308. If you use a custom http port, you should change your Vector.dev config appropriately. Also note that we added the transforms
section to the config to rename the default timestamp
field as it's a reserved word in Manticore.
Results
Now just start Vector.dev with the config above, and the data from the dpkg log will be passed to Manticore and properly indexed.
Here is the resulting schema of the created table and an example of the inserted document:
mysql> DESCRIBE dpkg_log;
+-----------------+---------+--------------------+
| Field | Type | Properties |
+-----------------+---------+--------------------+
| id | bigint | |
| file | text | indexed stored |
| host | text | indexed stored |
| message | text | indexed stored |
| source_type | text | indexed stored |
| vec_timestamp | text | indexed stored |
+-----------------+---------+--------------------+
mysql> SELECT * FROM testlog_3 LIMIT 3\G
*************************** 1. row ***************************
id: 7856533729353672195
file: /var/log/dpkg.log
host: logstash-787f68f6f-nhdd2
message: 2023-06-05 14:03:04 startup archives install
source_type: file
vec_timestamp: 2023-08-04T15:32:50.203091741Z
*************************** 2. row ***************************
id: 7856533729353672196
file: /var/log/dpkg.log
host: logstash-787f68f6f-nhdd2
message: 2023-06-05 14:03:04 install base-passwd:amd64 <none> 3.5.47
source_type: file
vec_timestamp: 2023-08-04T15:32:50.203808861Z
*************************** 3. row ***************************
id: 7856533729353672197
file: /var/log/dpkg.log
host: logstash-787f68f6f-nhdd2
message: 2023-06-05 14:03:04 status half-installed base-passwd:amd64 3.5.47
source_type: file
vec_timestamp: 2023-08-04T15:32:50.203814031Z
Conclusion
Thus, with the integration outlined in this guide, you can now easily and effectively index your log data by employing Manticore in collaboration with Vector by Datadog, a high-performance end-to-end observability data pipeline. This synergy between Vector.dev and Manticore not only offers a streamlined approach for managing log data but also extends the functionality by allowing transformations and routing. Whether you are dealing with simple or complex log structures, this integration provides a robust solution, making the process of collecting, transforming, and storing your data more accessible and efficient.