A declarative way to cache PostgreSQL queries using Node.js

Gajus Kuizinas - Nov 2 '19 - - Dev Community

There can be many reasons to make you want to cache SQL queries. Some of them are valid, e.g. reducing the number of round-trips (esp. when dealing with high-latency). Others might be micro-optimizations that are just not worth it. Regardless of your reasons for wanting to cache SQL queries, implementing them can be cumbersome.

Subject

I am going to use Slonik (PostgreSQL client for Node.js) and node-cache to demonstrate the usual way to implement cache and a declarative way to add cache to your existing codebase.

Let's assume a simple query-method to get country PK value using another unique identifier:

const getCountryIdByCodeAlpha2 = (
  connection: DatabaseConnectionType,
  countryCode: string
): Promise<DatabaseRecordIdType> => {
  return connection.maybeOneFirst(sql`
    SELECT id
    FROM country
    WHERE code_alpha_2 = ${countryCode}
  `);
};

Enter fullscreen mode Exit fullscreen mode

This type of query is particularly common when ingesting data from external inputs (e.g. user submitted input or data that has been collected using scraping).

Measuring the problem

In the particular case that prompted me to explore caching, this query was called 7k+ times/ minute. Aside from this query, there were a dozen of other similar queries that collectively were executed well over 50k+ times/ minute. None of them affect my database server performance (PostgreSQL is already good at caching), but they:

  • generate unnecessary logs
  • increase the overall time needed to complete the task

The time it takes for PostgreSQL to execute such a query is minuscule, e.g.

EXPLAIN ANALYZE
SELECT id
FROM country
WHERE code_alpha_2 = 'gb';

Index Only Scan using country_code_alpha_2_id_idx on country  (cost=0.14..1.16 rows=1 width=4) (actual time=0.425..0.426 rows=1 loops=1)
  Index Cond: (code_alpha_2 = 'gb'::citext)
  Heap Fetches: 0
Planning Time: 0.069 ms
Execution Time: 0.439 ms

Enter fullscreen mode Exit fullscreen mode

However, we have to also add the network time. In my case, the latency between the worker agent and the database is ~3ms.

ping ***.aivencloud.com                                                     17:31:54
PING ***.aivencloud.com (34.90.***.***): 56 data bytes
64 bytes from 34.90.***.***: icmp_seq=0 ttl=53 time=3.166 ms
64 bytes from 34.90.***.***: icmp_seq=1 ttl=53 time=2.627 ms
64 bytes from 34.90.***.***: icmp_seq=2 ttl=53 time=2.873 ms

Enter fullscreen mode Exit fullscreen mode

That means that executing a query and getting the result takes at least 7.5ms (0.5ms query execution time + 2 trips). Put it another way, every 60 seconds, we waste ~350 seconds of computing time (spread across many servers). Overtime, this adds up to a lot (70 hours over month).

Implementing cache

All you need to implement cache is some storage service with a mechanism to limit how long and how many items can be stored. node-cache is such an abstraction for synchronously storing/ retrieving objects in memory. Using node-cache, you use set method to store cache and get method to retrieve cache; node-cache handles invalidation and storage limits behind the scenes. This is how getCountryIdByCodeAlpha2 would look like if it used node-cache:

const cache = new NodeCache({
  checkperiod: 60,
  maxKeys: 10000,
  stdTTL: 60,
  useClones: false,
});

const getCountryIdByCodeAlpha2 = async (
  cache: NodeCache,
  connection: DatabaseConnectionType,
  countryCode: string
): Promise<DatabaseRecordIdType> => {
  const maybeCountryId = cache.get(countryCode);

  if (maybeCountryId) {
    return maybeCountryId;
  }

  const maybeResult = await connection.maybeOneFirst(sql`
    SELECT id
    FROM country
    WHERE code_alpha_2 = ${countryCode}
  `);

  cache.set(maybeResult, maybeResult);

  return maybeResult;
};

Enter fullscreen mode Exit fullscreen mode

However, this way of adding cache has a few disadvantages:

  • It introduces a lot of boilerplate around every query.
  • It introduces an additional dependency (NodeCache instance) that needs to be passed around throughout your codebase along with the database connection handle.

If you had to go this way, 9/10 I would say it is not worth it. Luckily, there is a better way.

Declarative cache

Slonik has a concept of interceptors (middlewares) that can be used to capture and modify SQL request and response. This makes them perfect for implementing cache. Such interceptor already exists: slonik-interceptor-query-cache.

slonik-interceptor-query-cache uses SQL comments to recognize which queries should be cached and for how long. Specifically, it searches for comment @cache-ttl. @cache-ttl comment indicates for how long the query should be cached. Queries without @cache-ttl are not cached at all, i.e. In order to cache the result of the earlier query for 60 seconds, the only change we need to make is to add a @cache-ttl comment to our query:

const getCountryIdByCodeAlpha2 = (
  connection: DatabaseConnectionType,
  countryCode: string
): Promise<DatabaseRecordIdType> => {
  return connection.maybeOneFirst(sql`
    -- @cache-ttl 60
    SELECT id
    FROM country
    WHERE code_alpha_2 = ${countryCode}
  `);
};

Enter fullscreen mode Exit fullscreen mode

Now this query will be cache result for each unique countryCode for 60 seconds.

slonik-interceptor-query-cache does not implement storage, though. You can use node-cache, lru-cache, Redis, or any other storage engine. To use them, you simply need to abstract their interface using get and set methods, and provide them to slonik-interceptor-query-cache. Continuing with the node-cache example, this is how you would initiate Slonik with the query cache interceptor using node-cache as a storage engine:

import NodeCache from 'node-cache';
import {
  createPool
} from 'slonik';
import {
  createQueryCacheInterceptor
} from 'slonik-interceptor-query-cache';

const nodeCache = new NodeCache({
  checkperiod: 60,
  stdTTL: 60,
  useClones: false,
});

const hashQuery = (query: QueryType): string => {
  return JSON.stringify(query);
};

const pool = createPool('postgres://', {
  interceptors: [
    createQueryCacheInterceptor({
      storage: {
        get: (query) => {
          return cache.get(hashQuery(query)) || null;
        },
        set: (query, cacheAttributes, queryResult) => {
          cache.set(hashQuery(query), queryResult, cacheAttributes.ttl);
        },
      },
    }),
  ]
});

Enter fullscreen mode Exit fullscreen mode

and that is it: with minimal code changes, now you can cache any query just by adding a comment to SQL. Among other benefits, this:

  • allows you to quickly test the impact of caching a specific query
  • allows you to quickly enable/ disable query caching (by simply adding/ removing the query cache interceptor)
  • does not affect how you write test cases
  • does not add boilerplate code to every query
  • does not require to passthrough an additional dependency to every query invocation
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .