Introduction
So many articles have already been written and talks given by people much smarter than me that it feels difficult to start writing the series about the AWS Serverless scalability in 2024. I also had my take on this topic and presented the talk Making sense of service quotas of AWS Serverless services and how to deal with them at AWS Community Day DACH 2023 in Munich which I then updated for the Serverless Architecture Conference London 2024. I initially focused more on understanding the Serverless service quotas, which is only a part of the Serverless scalability and operations topic. I recently rebranded this talk and gave it a new name "Making sense of AWS Serverless operations" and constantly adjust the content. I'll present this talk at the Serverless Architecture Conference in Berlin in October this year.
Importance of the constant learning
In the first part the of series I'd like to outline the importance of the constant learning in the area of Serverless operations and give you an overview about some new services and features AWS released or the Serverless quotas raised by AWS since my first take on this topic in 2023. As you'll see many cool things have indeed happened in the last 12 months or so.
AWS Lambda functions now scale 12 times faster when handling high-volume requests.
Each synchronously invoked Lambda function now scales by 1,000 concurrent executions every 10 seconds until the aggregate concurrency across all functions reaches the account’s concurrency limit. In addition, each function within an account now scales independently from each other, no matter how the functions are invoked. This is a huge improvement comparing to the various old AWS Lambda’s invoke throttling limits.Introducing the Data API for Amazon Aurora Serverless v2 and Amazon Aurora provisioned clusters.
The Data API for Amazon Aurora Serverless v2 eliminates the use of drivers and improves application scalability by automatically pooling and sharing database connections (connection pooling) rather than requiring you to manage connections. You can call the Data API via an AWS SDK or the AWS Command Line Interface (AWS CLI). One of the biggest improvements comparing to the Aurora Serverless v1 (which will no longer be supported after December 31, 2024) is that AWS removed the 1,000 requests per second hard quota (which I also ran into). The only factor that limits requests per second with the Data API for Aurora Serverless v2 and Aurora provisioned is the size of the database instance and therefore the available resources. I dedicated the whole series Data API for Amazon Aurora Serverless v2 with AWS SDK for Java about this topic. So please check it out!Amazon API Gateway integration timeout limit increase beyond 29 seconds.
Amazon API Gateway enables customers to increase their integration timeout beyond the prior limit of 29 seconds. This setting represents the maximum amount of time API Gateway will wait for a response from the integration to complete. You can raise the integration timeout to greater than 29 seconds for Regional REST APIs and private REST APIs, but this might require a reduction in your account-level throttle quota limit. With this launch, customers with workloads requiring longer timeouts, such as Generative AI use cases with Large Language Models (LLMs) or 3rd party APIs with slow response times, can leverage this API Gateway feature.-
Amazon DynamoDB introduces configurable maximum throughput for On-demand tables.
Amazon DynamoDB on-demand is a serverless, pay-per-request billing option that can serve thousands of requests per second without capacity planning. Previously, the on-demand request rate was only limited by the default throughput quota (40K read request units and 40K write request units), which uniformly applied to all tables within the account, and could not be customized or tailored for diverse workloads and differing requirements. Since on-demand mode scales instantly to accommodate varying traffic patterns, a piece of hastily written or unoptimized code could rapidly scale up and consume resources, making it difficult to keep costs and usage bounded.With this feature you can optionally configure maximum read or write (or both) throughput for individual on-demand tables and associated secondary indexes, making it easy to balance costs and performance. Throughput requests in excess of the maximum table throughput will automatically get throttled, but you can easily modify the table-specific maximum throughput at any time based on your application requirements. Customers can use this feature for predictable cost management, protection against accidental surge in consumed resources and excessive use, and safe guarding downstream services with fixed capacities from potential overloading and performance bottlenecks.
Announcing throughput increase and dead letter queue redrive support for Amazon SQS FIFO queues.
With Amazon Simple Queue Service (Amazon SQS), you can send, store, and receive messages between software components at any volume. Amazon SQS has introduced two new capabilities for first-in, first-out (FIFO) queues:
Maximum throughput has been increased up to 70,000 transactions per second (TPS) per API action in selected AWS Regions, supporting sending or receiving up to 700,000 messages per second with batching.
Dead letter queue (DLQ) redrive support to handle messages that are not consumed after a specific number of retries in a way similar to what was already available for standard queues.Amazon SNS increases default FIFO topic throughput by 10x to 3,000 messages per second
Amazon Simple Notification Service (Amazon SNS) First-In-First-Out (FIFO) topics now support 3,000 messages per second, per topic. All existing and future FIFO topics now have this new quota by default, with no configuration change required. To benefit from maximum throughput in FIFO topics where message order is strictly maintained within each message group, distribute your messages evenly over a large number of message group IDs, as messages from different message groups are delivered in parallel.AWS Lambda supports faster polling scale-up rate for Amazon SQS as an event source.
AWS Lambda now supports up to 5x faster polling scale-up rate (adding up to 300 concurrent executions per minute) for spiky Lambda workloads configured with Amazon Simple Queue Service (Amazon SQS) as an event source using Lambda event source mapping or Amazon EventBridge Pipes. This enables customers building event-driven applications using Lambda and SQS queues (standard or first-in, first-out) to achieve more responsive scaling during a sudden burst of messages in their queues, and reduces the need to duplicate Lambda functions or SQS queues to achieve faster message processing.AWS Lambda improves responsiveness for configuring stream and queue-based event sources.
AWS Lambda improves the responsiveness for configuring Event Source Mappings (ESMs) and Amazon EventBridge Pipes with event sources such as self-managed Apache Kafka, Amazon Managed Streaming for Apache Kafka (MSK), Amazon DocumentDB, and Amazon MQ. This enhancement allows changes—such as updating, disabling, or deleting ESMs or Pipes—to take effect within 90 seconds, an improvement from the previous time frame of up to 15 minutes.Amazon DynamoDB Import from S3 now supports up to 50,000 Amazon S3 objects in a single bulk import.
Amazon DynamoDB Import from S3 now supports up to 50,000 Amazon S3 objects in a single bulk import. With the increased default service quota for import from S3, customers who need to bulk import a large number of Amazon S3 objects, can now run a single import to ingest up to 50,000 S3 objects, removing the need to consolidate S3 objects prior to running a bulk import.-
AWS AppSync increases existing service quota and adds subscription service quotas.
AWS AppSync is a fully managed service that allows customers to connect applications to data and enable real-time experiences with GraphQL APIs. AppSync now supports a higher default value for the rate of request tokens service quota. The default value is increased from 2,000 to 5,000 in all AWS Regions that AppSync supports, except for the following Regions where (in the time of writing) the default value is increased to 10,000: Asia Pacific (Tokyo, Seoul, Mumbai, Singapore, Sydney), Europe (Frankfurt, Ireland, London), US East (N. Virginia, Ohio), and US West (Oregon).AppSync also introduces three new adjustable service quotas in all Regions that AppSync supports that control the behavior of AppSync’s real-time capabilities.
- rate of inbound messages per API per second (default value: 10,000): controls the maximum number of subscription field invocations.
- rate of outbound messages per API per second (default value: 1,000,000): controls the maximum number of messages (per 5kb payload ) delivered to WebSocket clients.
- rate of connection requests per API per second (default value: 2,000): controls the maximum number of WebSocket connection requests
- Scaling improvements when processing Apache Kafka with AWS Lambda. AWS Lambda is improving the automatic scaling behavior when processing data from Apache Kafka event-sources. Lambda is increasing the default number of initial consumers, improving how quickly consumers scale up, and helping to ensure that consumers don’t scale down too quickly.
Conclusion
In this part the of series I outlined the importance of the constant learning in the area of the AWS Serverless scalability and gave you an overview about variety of new services, features AWS released or the Serverless quotas raised by AWS in the last 12 months or so. Depending on the AWS services used some of the improvements may be less relevant for you. But some improvements may enable you to implement completely new use cases with less code on your side or say good bye to the existing workarounds that you were forced to build (and maintain) as long as there was no adequate functionality or feature in the AWS managed services in use.
So, constantly educate yourself and apply your knowledge!