Should you log the Express req object and external API responses?

Corey Cleary - Apr 15 '19 - - Dev Community

Originally published at coreycleary.me. This is a cross-post from my content blog. I publish new content every week or two, and you can sign up to my newsletter if you'd like to receive my articles directly to your inbox! I also regularly send cheatsheets and other freebies.

Logging as much information as you need to be able to troubleshoot, understand what happened during a session, and even for analytics purposes is something all apps need to have in place before going to production.

You've likely got some logging in place - things like errors and successful transactions (if you're looking at logs for analytics/metrics). But now you're wondering what else you should be logging so that you have everything you need should you face some issues in production.

And you might be thinking the Express req object (request object), containing all the information sent to your API, would be a great thing to log. "Great, this will give me all the information about the session!"

Similarly, if you are calling an external service that you don't own and have control over logging for (i.e. - the Twitter API, or even an API your company owns but that isn't ever updated and has no logging), it might make sense to log the response object from axios/superagent/whatever you're using!

But should these things really be logged? Getting a clearer answer to this will help you on your way to having rock-solid logs - logs that you can easily use to troubleshoot and don't have to waste time pouring over when you detect an issue in production.

Immediate downsides

First off, the whole req object comes with a lot of information - body, headers, params, cookies, query, app, client, url, etc.

Similarly, the response from the external API call will likely have a bunch of information you don't need.

That's too much information, most of it won't be useful, it will be difficult to read through the logs (you'll likely have to JSON.stringify() it, which may not even work due to circular references), and it will take up a lot more log space.

What about just some things, like the body/headers/query/params?

On the surface of it, just logging, say the req.body, makes more sense. It's a more limited set of data and we avoid the downsides discussed above.

We can more easily search through the logs, there's less noise, and using the request information to troubleshoot will come in really handy.

But there is a problem that can go by unnoticed: personally identifiable information (PII).

PII

This is data that, in general, is confidential, de-anonymizes the user, and that should only be accessed by certain members of your company or not accessible at all. Depending on the type of application you are building and what kind of compliance requirements you have (whether defined by something like HIPPA, or even just rules that are set internally in your company by security professionals), what makes something PII will vary.

But in general, PII is usually Social Security Number, driver's license number, bank account info, things of that nature.

Say you have a web form for purchasing insurance that takes a user's first and last name, driver's license number, state, and other data about the user. That gets sent to your Express API and whatever other microservices you may have from there.

If the driver's license number accidentally gets logged, that's going to be a problem.

So this is another reason that not only should you generally not log the entire request and/or response, but also should be careful with what properties from those objects you are logging. Otherwise, PII data could sneak in.

A solution

Continuing with the insurance form example, let's say we want to log the other information from the request. You could either destructure to get only the non-PII data you need for logging:

// pseudocode-ish
const {state, purchasedPlan} = req.body

console.log({state, purchasePlan})

Or you could have some generic utility function that checks each property from the req.body. This function could take two approaches.

Approach one:

// first approach, remove PII properties from the request
const safeLog = (data) => {
  const piiProps = ['ssn', 'driverLicense']
  const safeData = {}

  // assumes data is an object (like req.body)
  for (const prop in data) {
    const value = data[prop]
    if (!piiProps.includes(prop)) {
      safeData[prop] = value
    }
  }  

  const hasDataToLog = Object.entries(safeData).length > 0 

  if (hasDataToLog) console.log(safeData) 
}

// req.body is: {ssn: '123-45-6789', purchasedPlan: 'Silver'}
safeLog(req.body) // only logs {purchasedPlan: 'Silver'}

The downside of this approach is that you could either misspell the PII properties you want to kick out or the property on the request itself might be misspelled. I.e. - the req.body could be: {sn: '123-45-6789'}, which in this case wouldn't be caught by the safeLog() function.

Approach two:

// second approach, check by regex
const safeLog = (data) => {
  const socialSecurityRegex = new RegExp(/^\d{3}-\d{2}-\d{4}$/) // assumes is in 123-45-6789 format
  const safeData = {}

  // assumes data is an object (like req.body)
  for (const prop in data) {
    const value = data[prop]
    if (!socialSecurityRegex.test(value)) {
      safeData[prop] = value
    }
  }  

  const hasDataToLog = Object.entries(safeData).length > 0 

  if (hasDataToLog) console.log(safeData) 
}

// req.body is: {ssn: '123-45-6789', purchasedPlan: 'Silver'}
safeLog(req.body) // only logs {purchasedPlan: 'Silver'}

This gets us around the problems with the previous approach, but the downside here is that we might have other data that is not PII that matches the regex for some of the PII data.

Wrapping up

I tend to just take the approach of destructuring what I need to log. Which means you and your team have to be careful and thoughtful with what you choose to log in order to make sure it doesn't A) take up unnecessary log space / be difficult to read when troubleshooting and B) doesn't violate PII data rules.

But this is something that should be caught in code review / pull requests anyways.

Feel like you haven't quite totally grasped what you should be logging in Node? Or wondering if you're even following the best practices? I publish new posts every week or two about JavaScript and Node, including logging, testing, and architecture. Here's that link again to subscribe to my newsletter!

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .