Logging additional metadata

Nicolas Fränkel - Apr 19 '20 - - Dev Community

In February, before the lockdown, I presented my Fast logs talk at ConFoo. At the end, I had an interesting question I had to cut short because of the timing. This blog post aims to describe the relevant points of the talk, the question, as well as some possible answer.

One improvement to have faster logs

In this talk, I highlight different ways to make logs faster. Nowadays, most logs are aggregated into a single place for later usage. For example, a widespread architecture is based on the Elastic stack:

Naive sample architecture

While this design works, it has a lot of components. In particular, Logstash is here to parse the original log lines in JSON format, so that Elasticsearch can index each field separately.

2020-02-29 13:56:54.906  INFO 1 --- [  main] c.f.f.Main : Informative message
Enter fullscreen mode Exit fullscreen mode
{
 "date": [[ "20-02-29" ]],
 "MONTHDAY": [[ "29" ]],
 "MONTHNUM": [[ "02" ]],
 "YEAR": [[ "20" ]],
 "time": [[ "13:56:54.906" ]],
 "HOUR": [[ "13" ]],
 "MINUTE": [[ "56" ]],
 "SECOND": [[ "54.906" ]],
 "level": [[ "INFO" ]],
 "threadName": [[ "main" ]],
 "class": [[ "c.f.f.Main" ]],
 "message":
   [[ "Informative message" ]]
}
Enter fullscreen mode Exit fullscreen mode

One improvement is to configure the logging framework with a pattern so that it directly writes JSON-formatted one-liner logs. The following shows how to do that with Logback:

<configuration>
  <appender name="FILE" class="ch.qos.logback.core.FileAppender">
    <file>/var/log/sample.log</file>
    <encoder>
        <pattern>{ "date": "%d{dd-MM-yy}", "time": "%d{HH:mm:ss.SSS}", "thread": "%thread", "level": "%-5level", "className": "%logger{36}", "message": "%msg" }%n</pattern>
    </encoder>
  </appender>
  <root level="debug">
    <appender-ref ref="FILE" />
  </root>
</configuration>
Enter fullscreen mode Exit fullscreen mode

The output is now the following:

{ "date": "29-02-20", "time": "23:13:53.640", "thread": "main", "level": "INFO", "className": "c.f.f.Main", "message": "Informative message" }
Enter fullscreen mode Exit fullscreen mode

That allows to send logs directly to Elasticsearch, and thus to simplify the architecture by completely removing Logstash:

Simplified sample architecture

Adding additional meta-data

Now, suppose the business requires additional fields in addition to the log message for analytics purpose. With simple string messages, this is as easy as getting a reference to the value, and adding it to the field:

logger.info("Client-Id -> {} | Informative message", client.getId());
Enter fullscreen mode Exit fullscreen mode

When executed under the correct log level, this will output the desired message:

2020-02-29 13:56:54.906  INFO 1 --- [ main] ch.frankel.fastlogs.Main : Client-Id -> XYZ | Informative message
Enter fullscreen mode Exit fullscreen mode

With Logstash, it’s possible to improve the groking pattern so that it successfully parses the Client-Id value and sends it as a dedicated field to Elasticsearch. Unfortunately, the JSON-formatted log option shown above doesn’t allow this:

{ "date": "29-02-20", "time": "23:13:53.640", "thread": "main", "level": "INFO", "className": "ch.frankel.fastlogs.Main", "message": "Client-Id -> XYZ | Informative message" }
Enter fullscreen mode Exit fullscreen mode

Notice the Client-Id is part of the message, along with separators. It’s possible to send this to Elasticsearch and parse the message with custom ingesters. Yet, this will take additional processing time on each message indexed. Is it back to square one and Logstash?

Mapped Diagnostics Context to the rescue

SLF4J - and similar frameworks - allow to add additional meta-data through the usage of MDC. In short, a MDC is a hashmap context holder bound to a specific thread.

MDC class diagram

MDC can be leveraged to solve the issue. Here’s a sample code snippet:

MDC.put("client-id", client.getId());
logger.info("Informative message");
MDC.remove("client-id");
Enter fullscreen mode Exit fullscreen mode

Note that we explicitly set it before, and remove it after, the logging statement. Depending on the context, it would be a benefit to use servlet filters to do that automatically in order to avoid mistakes. For example, Logback provides an out-of-the-box MDC filter to set request-related data.

Let’s slightly update the pattern to use MDC:

<pattern>{ "date": "%d{dd-MM-yy}", "time": "%d{HH:mm:ss.SSS}", "thread": "%thread", "level": "%-5level", "className": "%logger{36}", "message": "%msg", "Client-Id": "%X{client-id}" }%n</pattern>
Enter fullscreen mode Exit fullscreen mode

Notice the %X{key} syntax. With that, we can keep our simple Logstash-free architecture, and still get the desired JSON output:

{ "date": "29-02-20", "time": "23:13:53.640", "thread": "main", "level": "INFO", "className": "ch.frankel.fastlogs.Main", "message": "Informative message", "Client-Id": "XYZ" }
Enter fullscreen mode Exit fullscreen mode

Conclusion

Logs can be made faster by removing the need to parse them. To achieve that, one can directly output single-line JSON messages. When additional meta-data is required, just use a MDC and the logging pattern appropriate to the logging implementation.

To go further:

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .