As serverless developers, we often find ourselves stringing multiple services together. Be it for resiliency needs or to aid us in creating asynchronous communication channels. Thankfully, insightful people from AWS serverless community came up with various patterns one can tweak to achieve their architecture needs.
One of the most common integration patterns I see in the wild is the Amazon SQS to AWS Lambda pattern. There is a surprising amount of minutia that developers must keep in mind while using it in production – like error handling.
This blog post will learn about the Amazon SQS to AWS Lambda integration itself, how to handle errors while reading from Amazon SQS queues, and what the latest pre-reinvent (2021) announcement has to do with it.
The integration
The following is a diagram depicting Amazon SQS to AWS Lambda integration.
The diagram exposes a bit of the internals, at least the ones that I could infer from AWS documentation, of the integration.
The event source mappings component plays a crucial role in this architecture – it is responsible for polling the events from the Amazon SQS queue, batching them, and handling the AWS Lambda responses. If you want to learn more about event source mappings, I have written an article on this topic.
Whether the Data Queue backlog gets smaller (what you most likely want) depends on how our AWS Lambda function behaves – whether or not it throws an error and, as os recently, if it returns a specific set of data.
Before showing you what I would do to ensure my AWS Lambda plays nicely with the event source mappings, let us explore what, in my humble opinion, you should not be doing.
How to get yourself into trouble
In my opinion, in most scenarios, the worst thing that could happen is for the Reader AWS Lambda to throw an error.
This behavior in itself is not inherently wrong. The problem lies in the fact that many people do not realize the consequences of throwing a runtime error in the Reader AWS Lambda.
So what are the consequences, you might ask? According to the event source mappings AWS Documentation page:
By default, if your function returns an error, the entire batch is reprocessed until the function succeeds, or until the items in the batch expire. To ensure in-order processing, Lambda pauses processing for the affected shard until the error is resolved.
Now, imagine processing a batch of 100s of Amazon SQS events. If one of them fails (might be the case of a poison pill), the event source mappings component will retry the whole batch! If your system is not idempotent, you might be in trouble.
What can you do to ensure the situation where the entire batch is retried never happens?
How to handle errors gracefully
We have two options at our disposal. First uses the newly announced feature of returning partial batch responses from the Reader AWS Lambda. The second is a method AWS serverless developers used before the feature mentioned above was available.
Let us talk about the "new way" signaling errors to the Amazon SQS event source mappings first.
Partial batch responses
The Reader AWS Lambda code keeps track of failed messages. Those message identifiers are sent back to the event source mappings.
The following is an example TypeScript code snippet of how one might do just that.
import { SQSEvent, SQSRecord } from "aws-lambda";
type Response = { batchItemFailures: { itemIdentifier: string }[] };
export const handler = async (event: SQSEvent): Promise<Response> => {
const records = event.Records;
const response: Response = { batchItemFailures: [] };
const promises = records.map(async record => {
try {
await doWork(record);
} catch (e) {
response.batchItemFailures.push({ itemIdentifier: record.messageId });
}
});
await Promise.all(promises);
return response;
};
async function doWork(record: SQSRecord) {
/** Your implementation*/
}
Since I'm using the Promise.all
interface, the record processing happens in parallel. This way of handling errors is, in my opinion, much easier than the one I'm about to show you next – let us explore why.
Before partial batch responses feature
Before AWS introduced the partial batch response feature, the error handling part had to be more involved – we were required to manually delete the successfully processed messages. If we delete the message, even if we throw an error in the AWS Lambda function, the message is gone.
The following is an example TypeScript code snippet of how one might write such logic.
import { SQSHandler, SQSRecord } from "aws-lambda";
export const handler: SQSHandler = async event => {
const records = event.Records;
const promises = records.map(async record => {
await doWork(record);
await deleteRecordViaSDK(record);
});
const response = await Promise.allSettled(promises);
const hasFailedRecords = response.find(
record => record.status === "rejected"
);
if (hasFailedRecords) {
throw new Error("Failed to process one or more records");
}
};
async function doWork(record: SQSRecord) {
/** Your implementation*/
}
async function deleteRecordViaSDK(record: SQSRecord) {
/**Use aws-sdk to delete the message */
}
Like in the previous example, the records are processed in parallel by using the Promise.allSettled
(difference between Promise.allSettled
and Promise.all
).
After skimming through the code, you might be asking yourself – "What if the deleteRecordViaSDK
fails?". Great question!
Sadly, I do not have a one-size-fits-all solution for you. The message that we did not successfully delete will be re-queued and pushed to your AWS Lambda function again by the event source mappings.
One piece of advice I could give here is to use the messageId
attribute as a unique identifier used to determine if you have already processed the message or not. I encourage you to lean on existing resources as much as possible. For example, this AWS blog post is a good start.
Closing words
With every service integration (not only AWS ones) comes various nuances developers have to account for while building features. The Amazon SQS to AWS Lambda is no different.
I believe that the best thing one can do in such situations is to thoroughly read the documentation. You will save yourself a lot of time in the long run.
For more AWS serverless related content, follow me on Twitter – @wm_matuszewski.
Thank you for your time.