When Should You Scale a Node.js App?

Thomas Sentre - Jan 3 '23 - - Dev Community

The theory around application scaling is a complex and interesting topic that continues to be refined and expanded. A comprehensive discussion of the topic will require several books, curated for different environments and needs. In this post, we will simply learn how to recognize when scaling up (or even scaling down) an App especially Node.js App is necessary.

Network Latency

When network response times are exceeding some threshold, such as each request taking several seconds, it is likely that the system has gone well past a stable state. While the easiest way to discover this problem is to wait for customer complaints about slow websites, it is better to create controlled stress tests against an equivalent application environment or server.

Apache Bench ( AB ) is a simple and straightforward way to do blunt stress tests against a server. This tool can be configured in many ways, but the kind of test you would do for measuring the network response times for your server is generally straightforward.

For example, let’s test the response times for this simple Node server:

http.createServer(function(request, response) {
response.writeHeader(200, {"Content-Type": "
text/plain"});
response.write("Hello World");
response.end();
}).listen(1337)
Enter fullscreen mode Exit fullscreen mode

Here’s how one might test running 10,000 requests against that server, with a concurrency of 100 (the number of simultaneous requests):

ab -n 10000 -c 100 http://localhost:1337
Enter fullscreen mode Exit fullscreen mode

If all goes well, you will receive a report similar to this:

Concurrency Level:      100
    Time taken for tests:   9.658 seconds
    Complete requests:      10000
    Failed requests:        0
    Write errors:           0
    Total transferred:      1120000 bytes
    HTML transferred:       110000 bytes
    Requests per second:    1035.42 [#/sec] (mean
)
    Time per request:       96.579 [ms] (mean)
    Time per request:       0.966 [ms] (mean, 
across all concurrent requests)
    Transfer rate:          113.25 [Kbytes/sec] 
received
Connection Times (ms)
                  min  mean[+/-sd] median   max
    Connect:        0    0   0.4      0       6
    Processing:    54   96  11.7     90     136
    Waiting:       53   96  11.7     89     136
    Total:         54   96  11.6     90     136

    Percentage of the requests served within a 
certain time (ms)
      50%     90
      66%     98
          ...
      99%    133
     100%    136 (longest request)
Enter fullscreen mode Exit fullscreen mode

There is a lot of useful information contained in this report. In particular, one should be looking for failed requests and the percentage of long-running requests.

Much more sophisticated testing systems exist, but ab is a good quick-and-dirty snapshot of performance. Get in the habit of creating testing environments that mirror your production systems and test them.

Running ab on the same server running the Node process you are testing will, of course, impact the test speeds. The test runner itself uses a lot of server resources, so your results will be misleading.

Full documentation for ab can be found here.

Hot CPUs

When CPU usage begins to nudge maximums, start to think about increasing the number of units processing client requests. Remember that while adding one new CPU to a single-CPU machine will bring immediate and enormous improvements, adding another CPU to a 32-core machine will not necessarily bring an equal improvement. Slowdowns are not always about slow calculations.

One simple but useful way to check the CPU and memory usage commanded by Node processes running on a server is to use the Unix ps ( process status ) command, for example, ps aux | grep node. A more robust solution is to install an interactive process manager, such as HTOP for Unix systems, or Process Explorer for Windows-based systems.

HTOP visualizes the load being put on each core in real-time, it is a great way to get an idea of what is happening. Additionally, the load average of your server is nicely summarized with three values. This is a happy server:

Load average: 0.00 0.01 0.00
Enter fullscreen mode Exit fullscreen mode

All three numbers are measuring CPU usage, presenting measurements taken at one, five, and fifteen-minute intervals. Generally, it can be expected that the short-term load will be higher than the long-term load. If on average, your server is not overly stressed over time, it is likely that clients are having a good experience.

On a single-core machine, the load average should remain between 0.00 and 1.00. Any request will take some time — the question is whether the request is taking more time than necessary — and whether there are delays due to excessive load.

If a CPU can be thought of as a pipe, a measurement of 0.00 means that there is no excessive friction, or delay, in pushing through a drop of water. A measurement of 1.00 indicates that our pipe is at its capacity; water is flowing smoothly, but any additional attempts to push water through will be faced with delays, or back pressure. This translates into latency on the network, with new requests joining an ever-growing queue.

A multicore machine simply multiplies the measurement boundary. A machine with four cores is at its capacity when load average reaches 4.00.

How you choose to react to load averages depends on the specifics of an application. It is not unusual for servers running mathematical models to see their CPU averages hit maximum capacity; in such cases, you want all available resources dedicated to performing calculations. A file server running at capacity, on the other hand, is likely worth investigating.

Generally, a load average above 0.60 should be investigated. Things are not urgent, but there may be a problem around the corner. A server that regularly reaches 1.00 after all known optimizations have been made is a clear candidate for scaling, as of course is any server exceeding that average.

Node also offers native process information via the os module:

// Returns an array containing the 1, 5, and 15 minute load averages.
console.log(os.loadavg());
// Total and free memory
console.log(os.totalmem());
console.log(os.freemem());
// Information about CPUs, as an Array
console.log(os.cpus());
Enter fullscreen mode Exit fullscreen mode

Socket usage

When the number of persistent socket connections begins to grow past the capacity of any single Node server, however, optimized, it will be necessary to think about spreading out the servers handling user sockets. Using socket.io, it is possible to check the number of connected clients at any time using the following command:

io.sockets.clients()
Enter fullscreen mode Exit fullscreen mode

In general, it is best to track web socket connection counts within the application, via some sort of tracking/logging system.

Many file descriptors

When the number of file descriptors opened in an OS hovers close to its limit, it is likely that an excessive number of Node processes are active, files are open, or other file descriptors (such as sockets or named pipes) are in play.

If these high numbers are not due to bugs or a bad design, it is time to add a new server.

Checking the number of open file descriptors of any kind can be accomplished using lsof Linux command:

# lsof | wc -l     // 1337
Enter fullscreen mode Exit fullscreen mode

Data Creep

When the amount of data being managed by a single database server begins to exceed many millions of rows or many gigabytes of memory, it is time to think about scaling. Here, you might choose to simply dedicate a single server to your database, begin to share databases, or even move into a managed cloud storage solution earlier rather than later.

Recovering from a data layer failure is rarely a quick fix, and in general, it is dangerous to have a single point of failure for something as important as all of your data.

If you’re using Redis, the info command will provide most of the data you will need, to make these decisions. Consider the following example:

redis> info
# Clients
connected_clients:1
blocked_clients:0
# Memory
used_memory:17683488
used_memory_human:16.86M
used_memory_rss:165900288
used_memory_peak:226730192
used_memory_peak_human:216.23M
used_memory_lua:31744
mem_fragmentation_ratio:9.38
# CPU
used_cpu_sys:13998.77
used_cpu_user:21498.45
used_cpu_sys_children:1.60
used_cpu_user_children:7.19
Enter fullscreen mode Exit fullscreen mode

More information on INFO can be found here.

For MongoDB, you might use the db.stats() command:

> db.stats(1024)
{    "collections" : 3,
"objects" : 5,
"avgObjSize" : 39.2,
"dataSize" : 0,
"storageSize" : 12,
"numExtents" : 3,
"indexes" : 1,
"indexSize" : 7,
"fileSize" : 196608,
"nsSizeMB" : 16,
...
"ok" : 1 }
Enter fullscreen mode Exit fullscreen mode

Passing the argument 1024 flags stats to display all values in kilobytes.

More information can be found here.

Summary

In this post, we have learned how to recognize when scaling a Node.js App is necessary. But the big challenge is not knowing “when to scale ?”. It is “how to scale ?”. In the next post, we will answer this question and will learn some good strategies for scaling Node servers, from analyzing CPU usage to communicating across processes.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .