When it comes to scaling WebSockets, there’s no such thing as a one-size-fits-all solution. Supporting a high volume of connections while maintaining the lowest possible latency is the name of the game, but your specific scaling strategy likely depends on your project’s requirements.
On this page, we’ll outline the considerations you need to make and present a summary of the options for scaling WebSockets.
Based on our experience scaling Ably to support an almost-infinite number of connections, we’re presenting not only the theory behind scaling WebSockets, but drawing your attention to some hidden scaling issues that creep up in the real world as well.
Read on below or, if you’re more visually inclined, we’ve summarized the key tips in this video.
What are WebSockets?
In a nutshell, WebSocket is a realtime web technology that enables bidirectional, full-duplex communication between client and server over a persistent connection. The WebSocket connection is kept alive for as long as needed (in theory, it can last forever), allowing the server and the client to send data at will, with minimal overhead.
Learn more:
Are WebSockets scalable?
Yes, WebSockets are scalable. Companies like Slack, Netflix, and Uber use WebSockets to power realtime features in their apps for millions of end-users. For example, Slack uses WebSockets for instant messaging between chat users.
However, scaling WebSockets is non-trivial, and involves numerous engineering decisions and technical trade-offs. This blog post describes the challenges of scaling WebSockets dependably, so you can be better prepared to tackle them. We’ll go through the following topics:
- The disadvantages of vertical scaling for WebSockets
- The complexities of scaling WebSockets horizontally
- Load balancing WebSockets
- WebSocket fallback strategy
- Dealing with unpredictable loads
- Managing WebSocket connections at scale
- Best practices for using WebSockets at scale
Scaling the WebSocket server layer
There are two main paths you can take to scale your server layer: vertical scaling or horizontal scaling.
Horizontal scaling can be challenging but is an effective way to scale your WebSocket application.
What is vertical scaling?
Vertical scaling, or scale-up, adds more power (e.g., CPU cores and memory) to an existing server.
Disadvantages of vertical scaling for WebSockets
At first glance, vertical scaling seems attractive, as it’s pretty straightforward to implement. However, imagine you’ve developed an increasingly popular live user experience such as a live sports data app. It’s a success story, but you’re dealing with an ever-growing number of WebSocket connections as more people sign up and use the app. If you have just one WebSocket server, there’s a finite amount of resources you can add to it, which limits the number of concurrent connections your system can handle.
Some server technologies, such as NodeJS, cannot take advantage of extra CPU cores. Running multiple server processes can help, but you’ll need an additional mechanism, such as another intervening reverse proxy, to balance traffic between the server processes.
With vertical scaling, you have a single point of failure. What happens if your WebSocket server fails or you need to do some maintenance and take it offline? It’s not a suitable approach for a product system, which is why the alternative, horizontal scaling, is recommended instead.
What is horizontal scaling?
Horizontal scaling, or scale-out, involves adding more servers to share the processing workload. If you compare horizontal and vertical scaling, horizontal scaling is a more available model in the long run. Even if a server crashes or needs to be upgraded, you are in a much better position to protect your overall availability because the workload is distributed to the other nodes in the network.
- TIP: Horizontal scaling is a superior alternative to vertical scaling because there is no single point of failure or absolute limit to which you can scale.
Challenges of horizontal scaling for WebSockets
To ensure horizontal scalability, you’ll need to tackle some engineering complexity.
A major element is that you need to ensure the servers share the compute burden evenly: you need a load balancer layer that can handle traffic at scale.
Load balancers detect the health of backend resources. If a server goes down, the load balancer redirects its traffic to the remaining operational servers. The load balancer automatically starts distributing traffic whenever you add a new WebSocket server.
Load balancing distributes incoming network traffic across a group of backend servers.
- TIP: It’s much more complicated to balance load efficiently and evenly across machines with different configurations, so aim for homogeneity.
Load balancing WebSockets
Load balancing is an essential part of horizontal scalability, since an effective load balancing strategy endows your architecture with the following characteristics:
- Fault tolerance, high availability, and reliability.
- Ensures no single server is overworked, which can degrade performance.
- Minimizes server response time and maximizes throughput.
Load balancing also enables you to add or remove servers as demand dictates.
Load balancing algorithms
A load balancer will follow an algorithm to determine how to distribute requests across your server farm. There are many different ways to load balance traffic, and the algorithm you select will depend on your needs.
Round-robin: Each server gets an equal share of the traffic. For a simplified example, let’s assume we have two servers, A and B. The first connection goes to server A; the second goes to server B; the third goes to server A; the fourth goes to B, and so on.
Weighted round-robin: Each server gets a different share of the traffic based on capacity.
Least-connected: Each server gets a share of the traffic based on how many connections it currently has.
Least-loaded: Each server gets a share of the traffic based on how much load it currently has.
Least response time: Traffic is routed to the server that takes the least time to respond to a health monitoring request (the response speed indicates how loaded a server is). Some load balancers might also factor in each server’s number of active connections.
Hashing methods: Routing is decided by a hash of various data from the incoming connection, such as port number, domain name, and IP address.
Random two choices: The load balancer randomly picks two servers and routes a new connection to the machine with the fewest active connections.
Custom load: The load balancer queries the load on individual servers using the Simple Network Management Protocol (SNMP) and assigns a new connection to the server with the best load metrics. You can define various metrics to look at, such as CPU usage, memory, and response time.
Load balancing depends on the use case
Your choice of algorithm will depend on your most common usage scenario. Consider, for example, a situation where you have the same number of messages sent to all connected clients, such as live score updates. You can use the round-robin approach if your servers have roughly identical computing capabilities and storage capacity.
By contrast, if your typical use case involves some connections being more resource-intensive than others, a round-robin strategy might not distribute the load evenly, and you would find it better to use the least bandwidth algorithm.
- TIP: Have a good understanding of your use case, in terms of typical usage patterns and bandwidth, before you choose a load-balancing algorithm.
Sticky sessions
A sticky session is a load-balancing strategy where each user is “sticky” to a specific server. For example, if a user connects to server A, they will always connect to server A, even if another server has less load.
Sticky sessions can be helpful in some situations but can also be fragile and hinder your approach to scale dynamically. For example, if your WebSocket server becomes overwhelmed and needs to shed connections to balance traffic, or if it fails, a sticky client will keep trying to reconnect to it. It’s hard to rebalance a load when sessions are sticky, and it’s more optimal to use non-sticky sessions accompanied by a mechanism that allows your WebSocket servers to share connection state to ensure stream continuity without needing a connection to the same server.
WebSocket fallback strategy
While this article covers WebSockets in particular, there is rarely a one-size-fits-all protocol in large-scale systems. Different protocols serve different purposes better than others. Under some circumstances, you won’t be able to use WebSockets. For example:
- Some proxies don’t support the WebSocket protocol or terminate persistent connections.
- Some corporate firewalls, VPNs, and networks block specific ports, such as 443 (the standard web access port that supports secure WebSocket connections).
- WebSockets are still not entirely supported across all browsers.
Your system needs a fallback strategy, and many WebSocket solutions offer such support. For example, Socket.IO will opaquely try to establish a WebSocket connection if possible and will otherwise fall back to HTTP long polling.
In the context of scale, it’s essential to consider the impact that fallbacks may have on the availability of your system. Suppose you have many simultaneous users connected to your system, and an incident causes a significant proportion of the WebSocket connections to fall back to long polling. Your server will experience greater demand as that protocol is more resource-intensive (increased RAM usage).
To ensure your system’s availability and uptime, your server layer needs to be elastic and have enough capacity to deal with the increased load.
- TIP: Falling back to another protocol changes your scaling parameters because stateful WebSockets fundamentally differ from stateless HTTP. An ideal load balancing strategy for WebSockets might not always apply equally well; you may consider different server farms to handle WebSocket vs. non-WebSocket traffic.
Handling unpredictable loads on WebSockets
In addition to horizontal scaling, you should also consider the elasticity of your WebSocket server layer so that it can cope with unpredictable numbers of end-user connections. Design your system in such a way that it’s able to handle an unknown and volatile number of simultaneous users.
There’s a moderate overhead in establishing a new WebSocket connection — the process involves a non-trivial request/response pair between the client and the server, known as the opening handshake. Imagine tens of thousands or millions of client devices trying to open WebSocket connections simultaneously. Such a scenario leads to a massive burst in traffic, and your system needs to be able to cope.
You should architect your system based on a pattern designed to scale sufficiently to handle unpredictability. One of the most popular and dependable choices is the publish and subscribe (Pub/Sub) pattern.
Pub/Sub provides a framework for message exchange between publishers (typically your server) and subscribers (often, end-user devices).
Publishers and subscribers are unaware of each other, as they are decoupled by a message broker, which usually groups messages into channels (or topics). Publishers send messages to channels, while subscribers receive messages by subscribing to relevant channels.
Decoupling can reduce engineering complexity. There can be limitless subscribers as only the message broker needs to handle scaling connection numbers. As long as the message broker can scale predictably and reliably, your system can deal with the unpredictable number of concurrent users connecting over WebSockets.
Numerous projects are built with WebSockets and Pub/Sub; many open-source libraries and commercial solutions combine these elements. Examples include Socket.IO with the Redis Pub/Sub adapter, SocketCluster.io, or Django Channels.
Managing WebSocket connections
We’ll now go through a few considerations related to WebSocket connection management: restoring connections, managing heartbeats, and handling backpressure.
Load shedding WebSocket connections
At a particular scale, you may have to deal with traffic congestion since if the situation is left unchecked, it can lead to cascading failures and even a total collapse of your system.
You need a load shedding strategy to detect congestion and fail gracefully when a server approaches overload by rejecting some or all of the incoming traffic.
Here are a few things to have in mind when shedding connections:
- You should run tests to discover the maximum load that your system is generally able to handle. Anything beyond this threshold should be a candidate for shedding.
- You must consider a backoff mechanism to prevent rejected clients from attempting to reconnect immediately; this would just put your system under more pressure.
You might also consider dropping existing connections to reduce the load on your system; for example, the idle ones (which, even though idle, are still consuming resources due to heartbeats).
TIP: You need to have a load shedding strategy; failing gracefully is always better than a total collapse of your system.
Restoring connections
Connections inevitably drop, for example, as users lose connectivity or if one of your servers crashes or sheds connections. When scenarios like these occur, WebSocket connections need to be restored.
Reconnection strategies
You could implement a script to reconnect clients automatically. However, suppose reconnection attempts occur immediately after the connection closes. If the server does not have enough capacity, it can put your system under more pressure and lead to cascading failures.
An improvement would be to exponentially increase the delay after each reconnection attempt, increasing the waiting time between retries to a maximum backoff time. Compared to a simple reconnection script, this is better because it gives you some time to add more capacity to your system so that it can deal with all the WebSocket reconnections.
You can improve exponential backoff by making it random, so not all clients reconnect simultaneously.
- TIP: Use a random exponential backoff mechanism when handling reconnections to protect your server layer from being overwhelmed, prevent cascading failures, and allow time to add more capacity.
Reconnections with continuity
Data integrity (guaranteed ordering and exactly-once delivery) is crucial for some use cases. Once a WebSocket connection is re-established, the data stream must resume precisely where it left off. Think, for example, of features like live chat, where missing messages due to a disconnection or receiving them out of order leads to a poor user experience and causes confusion and frustration.
If resuming a stream exactly where it left off after brief disconnections is essential to your use case, you’ll need to consider how to cache messages and whether to transfer data to persistent storage. You’ll also need to manage stream resumes when a WebSocket client reconnects and think about how to synchronize the connection state across your servers.
- TIP: Some connections will inevitably break at some point. Determine your strategy to ensure that after WebSocket connections are restored, you can resume the stream with guaranteed ordering and (preferably exactly once) delivery.
Managing heartbeats
The WebSocket protocol natively supports control frames known as Ping and Pong. These control frames are an application-level heartbeat mechanism to detect whether a WebSocket connection is alive. At scale, you should closely monitor heartbeats’ effect on your system. Thousands or millions of concurrent connections with a high heartbeat rate will add a significant load to your WebSocket servers. If you examine the ratio of Ping/Pong frames to actual messages sent over WebSockets, you might send more heartbeats than messages. If your use case allows, reduce the frequency of heartbeats to make it easier to scale.
- TIP: Keep track of idle connections and close them. Even if no messages (text or binary frames) are being sent, you are still sending ping/pong frames periodically, so even idle connections consume resources.
Handling backpressure
Backpressure is one of the critical issues you will have to deal with when streaming data to client devices at scale over the internet. For example, let’s assume you are streaming 20 messages per second, but a client can only handle 15 messages per second. What do you do with the remaining five messages per second that the client cannot consume?
You need a way to monitor the buffers building up on the sockets used to stream data to clients and ensure a buffer never grows beyond what the downstream connection can sustain. Beyond client-side issues, if you don’t actively manage buffers, you’re risking exhausting the resources of your server layer — this can happen very fast when you have thousands of concurrent WebSocket connections.
A typical backpressure corrective action is to drop packets indiscriminately. To reduce bandwidth and latency, in addition to dropping packets, you should consider something like message delta compression, which generally uses a diff algorithm to send only the changes from the previous message to the consumer rather than the entire message.
Best practices for using WebSockets at scale
Here are best practices for scaling your WebSocket infrastructure:
✅ Use horizontal scaling rather than vertical scaling. It’s more reliable, especially for use cases where you can’t afford your system to be unavailable under any circumstances.
✅ If possible, use smaller machines (servers) rather than large ones. They are easier and faster to spin up, and costs are more granular.
✅ Aim to have a homogeneous server farm. It’s much more complicated to balance load efficiently and evenly across machines with different configurations.
✅ Have a good understanding of your use case and relevant parameters (such as usage patterns and bandwidth) before choosing a load balancing algorithm.
✅ Ensure your server layer is dynamically elastic, so you can quickly scale out when you have traffic spikes. You should also operate with some capacity margin, and have backups for various system components, to ensure redundancy and remove single points of failure.
✅ There is rarely a one-size-fits-all protocol in large-scale systems; different protocols serve different purposes better than others. You need to think about what other options your system needs to support in addition to WebSockets, and consider ways to ensure protocol interoperability.
✅ You most likely need to support fallback transports, such as Comet long polling, because WebSockets, although widely supported, are blocked by certain enterprise firewalls and networks. Note that falling back to another protocol changes your scaling parameters; after all, stateful WebSockets are fundamentally different from stateless HTTP, so you need a strategy to scale both.
✅ Run load and stress testing to understand how your system behaves under peak load, and enforce hard limits (for example, maximum number of concurrent WebSocket connections) to have some predictability.
✅ WebSocket connections and traffic over the public internet are unpredictable and rapidly shifting. You need a robust realtime monitoring and alerting stack, to enable you to detect and quickly implement remedial measures when issues occur.
✅ You need to have a load shedding strategy; failing gracefully is always better than a total collapse of your system.
✅ Use a tiered infrastructure to enable you to recover from faults and coordinate between servers.
✅ Some WebSocket connections will inevitably break at some point. You need a strategy for ensuring that after the WebSocket connections are restored, you can resume the stream with ordering and delivery (preferably exactly-once) guaranteed.
✅ Use a random exponential backoff mechanism when handling reconnections. This allows you to protect your server layer from being overwhelmed, prevents cascading failures, and gives you time to add more capacity to your system.
✅ Keep track of idle connections and close them. Even if no messages (text or binary frames) are being sent, you are still sending ping/pong frames periodically, so even idle connections consume resources.
Avoiding the challenge of scaling WebSockets
We have discussed the challenge of scaling WebSockets for production-quality apps and online services, which can involve significant engineering complexity and draw heavily upon resources and time. You could spend several months and a heap of money. So how do you avoid this complexity?
Before anything else, you should consider if WebSocket is the best choice for your use case, or if a WebSocket alternative is better suited. For example, if you only need to push text (string) data to browser clients, and you never expect to require bidirectional communication, then you could use something like Server-Sent Events (SSE). Compared to WebSockets, SSE is less complex and demanding, and easier to scale.
Read about the differences between SSE and WebSockets
On the other hand, if you’re building a web app where bidirectional communication is needed (e.g, a chat app), then WebSockets is indeed the ideal choice. But this doesn’t mean you have to deal with the challenges and costs of scaling and maintaining WebSocket infrastructure in-house; you can offload this complexity to a managed third-party PaaS such as Ably and reduce your cost.
Explore our documentation to find out more and get started with a free Ably account.