The log file has existed since programmer time immemorial. The software world moves really fast, but it's not hard to imagine someone decades ago poring over a log file. This is, perhaps, as iconic as programming itself.
But sadly, for many shops, the approach hasn't really evolved in all of that time. "Dump everything to a file and sort it out later, if you ever need it" was the original idea. And that holds up to this day.
Put another way, we tend to view the log file as a dumping ground for any whim we have while writing code. When in doubt, log it. And, while that's a good impulse, it tends to lead to a "write once, read never" outcome. If you've ever started to troubleshoot some issue you have via OS logs and then given up due to being overwhelmed, you know what I mean.
But, even if you don't, you can picture it. A log file with gigabytes of cryptic, similar looking text page after page discourages meaningful troubleshooting. Reading it is so boring you start to find yourself hypnotized before you blink, thinking to yourself that this isn't going to help, and move on to other means of troubleshooting.
Rethinking the Log File
So, with that in mind, let's rethink the log file. People focus an awful lot on intuitive user experiences these days, and we can draw from that. Think of the log file not as some dusty dumping ground for every errant thought your production code has, but as something meant to be consumed.
Who will read your log file? And for what purpose? And, with the answers to those questions, how can we make it easier for them?
This is an important enough consideration that products have emerged to help facilitate consuming log files, making troubleshooting and data gathering easier. People are buying these products to make their lives easier, so that should tell you something. They desperately want to filter the signal from the noise and get valuable data out of their logs.
So let's take a look at how you can help with that when writing to log files. These tools have sophisticated parsing capabilities, but that doesn't mean you shouldn't do your part to help consumers of your log files. Here's how you can do that.
Keep Your Audience in Mind
I've just talked about the importance of understanding that people want to consume log files. But let's get even more specific. You have two main constituents to consider when writing to your logs:
- People that will search through the file and read relevant sections in, say, a text editor.
- Programs that will parse the entries in the file for various purposes.
So as you read through the rest of this post, bear this in mind. You want to write to your log file in such a way that humans can understand it while reading it, but also in such a way that parsing algorithms remains simple and easy to implement.
Include Timestamps and Identifiers
First things first. For the sanity of everyone consuming a log file, it needs some order.
Conceptually speaking, a log file involves recording a series of events to a file. Think of these events as data, the way you would when writing to a database. Now, thinking of them this way, you probably start to understand the necessity of recording certain pieces of data. Specifically, you want a means of uniquely identifying each event, and a recording of when the event took place.
Recording this data is crucial for helping you recreate an accurate timeline and sequence of what happened in your application. A lot of logging frameworks will handle this for you, but you need to make sure it's configured and enabled.
Add Context
I stressed the importance of context in a post about logging practices, but it bears repeating here. You need to provide context for the message that you record. Say you've added a timestamp and a unique identifier, but provided no context. You see an entry like this:
[10/04/16 14:29:27:122] [fbfa1260-8b32-4270-8a54-b09c8592640d] Error occurred.
Error occurred?! What error? In what module? Was it a big deal? Did the entire application crash, or was it just that some user entered "Joe Smith" in the zip code field?
Without context, your log entries are useless. So make sure that you add enough context for any human reading to have a good sense of what happened and where to look for more information.
Use Key-Value Pairs
Context will generally help your human readers, but let's think for a moment about our machine consumers. How would you parse a traditional log file, and how can we make that easier?
I've written code over the years to parse the sort of log entries that I showed above with the timestamp, ID, and useless error message. It typically involves a LOT of string matching and regex. It also involves parsing fixed width or space delimited text, both of which will make you question why you ever decided to become a programmer.
Spare your consumers this. Key value pairs make processing these files easier by orders of magnitude, and it's not conceptually hard to do. It can be as simple as timestamp={timestamp}, id={id}, etc. Or, if you're so inclined, you can structure your log entries as JSON, letting consumers avail themselves of ready-made parsing and document storage tools.
Tag and Categorize Entries
While we're on the subject of treating your log entries as data, you should also categorize and/or tag them. You probably already understand this intuitively. The most common scheme you'll see in log files is to give them "levels" like the following.
- DEBUG
- INFO
- WARN
- ERROR
But pull back a little and understand, conceptually, what that involves. You're categorizing all of your entries for the later purposes of sorting. Adding categories makes life easier for both humans and algorithms that consume the file. But you can extend this with a tagging scheme as well, letting you treat entries in a way that isn't mutually exclusive. For instance, you might tag them according to what application layer generated the event as well as what vertical slice generated them. So, for instance, an entry might have tags, "Data Access Layer" and "Update Customer Profile."
Avoid Spamming the Log File
Consider the following tiny snippet of application code.
public File GetFile() { while (!_fileFetcher.IsReady) _logger.Log("Waiting for file fetcher to be ready."); return _fileFetcher.FetchFile(); }
Here, you have a blocking loop that polls this _fileFetcher to see if it's ready. But, while it does that, it slaughters the log file with spurious entries announcing that, nope, the file fetcher is still not ready.
This is a pretty silly example, but the idea is to illustrate a point. Whenever you're writing to a log file from application code, make sure that you think through not only what the entry will look like, but what all it will put in there. Are you providing valuable information, or are you spamming it with something that you could express much more succinctly?
But Don't Forget Anything! More Information is Better
I'll close out with a recommendation that's both pragmatic and philosophical. It also assumes that you've taken the other points to heart. When it comes to logging, more information is better. So, err on the side of more information.
If you've really considered your consumers, included all of the necessary identifiers and context, made your entries parseable, tagged and categorized them, and checked for spamming, then you can feel confident that you're storing valuable information. So, with that checklist ticked off, if you have a doubt about whether or not to log things, log them!
All of your efforts have gone toward making it easy for humans and algorithms both to easily search, process, and understand the entry. So if you err on the side of over-inclusion, filtering them out becomes trivial, and you don't need to worry.
The log file has come a long way over the decades, but only if you avail yourself of modern logging practices and tooling around it. You should make sure that you're doing so.