One of the biggest reasons a team might consider moving into microservices is its ability to scale quickly. A microservice is designed, developed, and deployed as an independent service; therefore, developers can scale parts of an application quickly and easily.

All you'd have to do is two things:

Spin up a new instance of the service (AKA - Horizontal Scaling).
Introduce a load balancer that distributes the load across the two instances of the service.

This approach works well for stateless microservices. But, scaling a stateful microservice is not as simple as this.

Understanding stateless and stateful microservices

Well, before we dive into the concepts of scaling, let's establish the differences between a stateless and a stateful microservice.

Stateless Microservices

Stateless microservices do not maintain any state or store session-specific data between requests. Each request a stateless microservice receives is processed independently, without relying on previous interactions.

They operate based on the concept of "share nothing. This allows them to be horizontally scaled across multiple instances without any impact on their functionality.

Stateful Microservices

Stateful microservices maintain and manage session-specific states throughout multiple requests. They store data and maintain context, which allows them to track and remember information between interactions.

Scaling Stateless Microservices

Scaling a stateless microservice is straightforward, unlike a stateful microservice. There are a few recommended architectural patterns to scale a stateless microservice.

1. Horizontal Scaling

One of the most common ways of scaling a stateless microservice is through horizontal scaling, or "scaling out." Scaling out involves the addition of more nodes (or instances) of your microservice, which helps increase your service's overall capacity.

For example, consider the diagram below:

Figure: Scaling out a Lambda Function

For example, suppose you were building a microservice using AWS Lambda. In that case, the microservice will scale out and create new instances of the microservice (Function) to handle the incoming user requests based on the user load.

It's important to understand that scaling out is done automatically in a serverless environment. Services like AWS Lambda will automatically scale out and create new Function instances to ensure the load is met.

However, suppose you're using a server deployment through a Virtual Machine or Container using Docker/Kubernetes. In that case, you must configure a scaling policy that will spin up new microservices instances based on a given threshold. Don't forget also to scale down when the load subsides, or your CFO will not be happy about it.

2. Load Balancing

Sometimes, spinning up new instances of a microservice is not enough. Your incoming application requests must be intelligently routed to each instance of the microservice based on the load exerted on each service.

This is where Load Balancers come into play.

A load balancer distributes incoming requests across multiple instances of a microservice. It ensures that the load is evenly distributed across each microservice instance to ensure that no instance is overloaded, thus, improving service availability. Load Balancers can also detect non-responsive nodes and stop sending new requests to those nodes.

One common application of this is to use the Application Load Balancer offered by AWS.

Figure: Using an Application Load Balancer (ALB)

As shown above, two microservice instances (Instances A and B) have an entry point through the ALB (that the users interact with). The ALB will route the request over to Instance A or B by considering each instance's workload.

Being stateless, it becomes easy to route requests to the two microservice instances, as each instance could process any request without considering any previous state.

3. Auto Scaling

Auto-scaling plays a massive part in scaling stateless microservices. For example, think about scaling a service with varying workloads daily. On some days, your system would manage a million users, but on a few rare occasions, it could handle up to ten million users. Manually scaling your microservices and their databases at this level is nearly impossible. Well, this is precisely where auto-scaling comes into the picture.

With auto-scaling, you configure your infrastructure to automatically add or remove instances based on predefined metrics such as CPU utilization, memory usage, or network traffic. This lets your microservice adapt dynamically to changing demand, ensuring optimal performance and cost efficiency.

This doesn't apply only to an instance but to a database as well. As we all know, it's recommended to use a single database per microservice to ensure that you can scale a part of your database up on an on-demand basis.

Figure: Scaling a database for your microservice

For example, imagine a scenario where you'd have to scale a database for your microservice. Since your service has unpredictable workloads, you can set up an autoscaling policy that automatically increases and decreases the database throughput to help meet the demands.

Additionally, you can apply auto-scaling policies to your Kubernetes cluster or your VM cluster to automatically spin up or remove instances of your microservice based on the load.

4. Caching

Storing frequently accessed data in an in-memory database with fast data access speeds is a great way to scale a microservice to help improve its performance. For example, consider the following diagram:

Figure: Using a cache

The diagram above showcases a simple microservice fetching data from an image-to-text converter.

It's important to note that the "image to text converter" is a highly resource-intensive resource and highly time-consuming when converting a single image to text.

Therefore, it's essential that once an image has been converted to text, its reference, along with the text, must be stored in a cache for quick access. Then, when the user requests for the text representation of the same image, it can be returned from a cache, thus avoiding long waits and ensuring better scalability on your stateless microservice.

Note that you can share the Cache with multiple scaled-out instances of your microservice, but you need to be careful about data integrity and conflicting writes, and it is recommended to use a distributed read-write lock in such a case.

Scaling Stateful Microservices

On the other hand, scaling a stateful microservice is not as simple as scaling a stateless microservice. Apart from scaling the service, you must consider maintaining the consistency of your data while you scale. This is where things get challenging.

But here are a few recommended architectural patterns you can adopt to help better scale your stateful microservices.

1. Vertical Scaling

Vertical scaling is sometimes known as "scaling up." Scaling up is upgrading the configuration of a single instance of a microservice to improve its performance. By adopting a vertical scaling approach, you ensure that you don't create new instances of your service, thus ensuring that your data remains consistent within the single stateful service.

Figure: Scaling a microservice up

As shown above, when you scale up, you ultimately increase the capacity of your existing instance. For example, if you initially created the instance with 16 GB of RAM, you can improve it by increasing its memory to 64 GB.

However, it's important to understand that there is a limit to vertical scaling. For instance, most internal hardware that runs in the server has its scaling limit. Sometimes, a system could support only 32GB of memory. This means that no matter what, you cannot improve the memory to more than 32GB.

In such cases, creating a new instance with better configurations (as the base setting) and decommissioning the low-spec service is recommended.

2. Stateful Service Discovery and Load Balancing

For stateful microservices, it's encouraged to use a service discovery tool that supports them. Doing so allows you to implement load balancers built for stateful applications that route requests to particular instances while considering session affinity intelligently.

This ensures that requests that belong to a specific session are consistently routed to the same service instance, thus letting you scale the stateful microservice without considering data synchronization.

3. Data Replication

Data replication plays a crucial part in scaling stateful microservices. This technique ensures high availability, durability, and recoverability of data in the event of a service instance failure or disaster.

Development teams responsible for data replication can adopt an Active-Active or an Active-Passive strategy and employ different types of Primary and Replica DB strategies. By doing so, it enables a stateful microservice to:

Improve read scalability: Data replication lets you create multiple replicas of the primary database of your microservice. By distributing read operations across these replicas, you can significantly improve read scalability and speed by letting each replica handle read requests independently. However, it's essential to understand that this performance improvement comes at the tradeoff of consistency as this is an eventually consistent approach and not strongly consistent.
Improve availability: Replicating data across multiple instances improves data redundancy. If a node becomes unavailable due to a failure, the other replicas can continue to serve read operations and maintain system availability by adopting an automatic failover.

Wrapping Up

Scaling stateful microservices is a challenging task that requires a well-thought-out approach and an understanding of data consistency tradeoffs.

While stateless microservices can be scaled with relative ease using horizontal scaling and load balancing, stateful microservices demand more careful planning and consideration to achieve efficient and reliable scalability. If you're looking to build highly scalable stateless or stateful microservices, consider using tools like Amplication to seamlessly bootstrap and deploy microservices with ease.

Differences in Scaling Stateless vs. Stateful Microservices