Grayscale Release for Node.js Server in Practice

ayou - Jan 22 - - Dev Community

Preface

Grayscale release(In this article we refer to the canary release specifically) is to make there are two versions online at the same time(Here, we refer to the new version as Canary, and the old version as Stable). Then according to a specific strategy, we can expose the Canary to a portion of users, and expose the Stable to others.
At the same time, we need to adjust the traffic ratio based on the two versions' behavior. In this way, we can:

  • Help us identify potential problems in advance without affecting many users
  • Help us easily compare new and old versions during the same period

So building a grayscale release system is significant.

Solutions

Generally, a Node.js server will run under the follow architecture:

Image description

The request reaches to the gateway, then the gateway will distribute the traffic to containers according to loading balance strategy. In a container, a Node.js server will be deployed in cluster mode using PM2. The master process will receive the connection and distribute among child processed. Refer to this article to learn more about Node.js cluster.

So, we can implement grayscale release from different levels.

Containers Based

Image description

Pros:

  • No need to modify server code.
  • Complete isolation between two version.
  • It is convenient to realize with some containers management tools like K8s.

Cons:

  • Bad flexibility, ad the gateway is generic, it may not support customization like distribute traffic according to users' characteristics.

Processes Based

Image description

Pros:

  • Flexible, you can customize the traffic distribution strategy in master process.
  • Need very few changes to server code.
  • The isolation between versions is implemented by multi-processes, and it is safe enough. There is a scenario that may make grayscale release system cannot work properly is that the resources are exhausted

Cons:

  • Need to implement master process and manage processes by yourself, cannot use PM2 cluster mode to deploy. If something seriously goes wrong with the master process, the whole grayscale system will go down.

Modules Based

Image description

Taking koa as an example, the modules based approach will be like this:

// canary.js or stable.js
const koa = require('koa')
const app = new Koa()
app.use(async function(ctx) {
  ctx.body = 'canary' // or stable
})
module.exports = app

// index.js
const http = require('http')
const canaryApp = require('./canary')
const stableApp = require('./stable')

http.createServer((req, res) => {
  if (/* Canary */) {
    canaryApp.callback()(req, res)
  } else {
    stableApp.callback()(req, res)
  }

}).listen(8080)
Enter fullscreen mode Exit fullscreen mode

Pros:

  • It is also very flexible
  • Compared to processed based way, it don't need to implement master process and process management logic and we can still use PM2 cluster mode to deploy.

Cons:

  • It is very dependent on developers' skill levels to achieve a good level of isolation between two versions as they run in the same context. Fox example, if the server has the code that changes global variable, the two versions will affect each other.

Considering our requirement one: need to support customizing traffic distribution strategy flexibly, we ruled out solution which is based on container. Since our server indeed contains the operation of changing global variable, and refactoring is risky, we have to choose option two, which is processes based.

Implement Grayscale Release Based on Processes

Image description

We need to implement a Traffic Distribution(TD) server, when it is started, it will fork out some canary and stable child processes. TD will use two process pools to manage these processes by group, the process pool's functions include: load balance, port management, scale/shrink etc. TD will act as a HTTP proxy between users and child processes. TD will connect to configuration server to support update strategy in real time.

Looks good, but a bit strange. It add an extra HTTP calling, and need to manage the port carefully to avoid port conflict. Can we implement like PM2? The answer is no. As we al know that the PM2 cluster mode is implemented by passing handle, refer to this article. A simple mock demo is like this:

// master.js
const cp = require('child_process')
const child1 = cp.fork('child.js')
const child2 = cp.fork('child.js')

const tcpServer = require('net').createServer()

const processes = [child1, child2]

tcpServer.on('connection', function (socket) {
  const child = processes.pop()
  child1.send('socket', socket)
  processes.unshift(child)
})

tcpServer.listen(8081)

// child.js
const http = require('http')

const httpServer = http.createServer(function (req, res) {
  res.writeHead(200, {'Content-Type': 'text/plain'})
  res.end('handled by child, pid is ' + process.pid + '\n')
})

process.on('message', function (m, socket) {
  if (m === 'socket') {
    httpServer.emit('connection', socket)
  }
})
Enter fullscreen mode Exit fullscreen mode

From the code above, we can learn that the master process cannot access anything infomation of application layer, if the strategy is only related with IP address, it is ok. As the socket contains these information:

console.log(socket.remoteAddress, socket.remotePort)
Enter fullscreen mode Exit fullscreen mode

But if the strategy is related with the content of HTTP like cookie, it cannot achieve.

Is it possible to pass data more effectively between master and child? Of course yes, for example by IPC, here is a simple demo:

// master.js
const child = require('child_process').fork('./child.js')
const http = require('http')
const net = require('net')
const url = require('url')

child.on('message', (msg) => {
  if (msg.cmd === 'ipc_ready') {
    http
      .createServer((req, res) => {
        console.log('0000')
        const {pathname} = url.parse(req.url)
        const reqNeedToSerialize = {
          host: 'localhost',
          port: 3001,
          path: pathname,
          method: req.method,
          headers: req.headers,
          url: pathname,
        }

        const socket = net.createConnection({path: msg.ipcPath})
        socket.write(JSON.stringify({req: reqNeedToSerialize}))
        socket.pipe(res)
      })
      .listen(8080)
  }
})

// child.js
const Koa = require('koa')
const Router = require('koa-router')
const net = require('net')
const crypto = require('crypto')

const app = new Koa()
const router = new Router()

router.get('/', (ctx) => {
  ctx.body = 'hello world'
})

app.use(router.routes())

app.on('error', console.log)

const ipcPrefix =
  (process.platform != 'win32' ? '/tmp/' : '\\\\.\\pipe\\') +
  crypto.randomBytes(8).toString('hex')

const ipcPath = `${ipcPrefix}${process.pid}`

net
  .createServer() // the IPC server
  .listen(ipcPath, () => process.send({cmd: 'ipc_ready', ipcPath}))
  .on('connection', (socket) => {
    socket.on('data', (chunk) => {
      const msg = JSON.parse(chunk.toString())
      if (msg.req) {
        socket.setHeader = () => {}
        app.callback()(msg.req, socket)
      }
    })
  })
Enter fullscreen mode Exit fullscreen mode

You can see that we pass msg.req and socket as native http.IncomingMessage and http.ServerResponse to koa app to handle. But these two object is simulated by us, and they lack lots of properties and functions. So the demo can only handle some very simple requests right now. There is still a lot to do if we want to implement full functionality.

As you can see, using HTTP proxy is currently a last resort.

Summary

This article introduced how to implement grayscale release from three levels: Containers Based, Processes Based, Modules Based. Then discussed emphatically the solution of Processes Based, and why it need to use HTTP proxy.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .