Python error handling in AWS Lambda

Taavi Rehemägi - Nov 1 '21 - - Dev Community

Serverless is slowly becoming the new norm and with that much traction around this all-new way to run applications is only normal that developers from all over have been jumping at the chance to test it out. Python, used in around 53% of all AWS Lambda functions, is the most popular language for doing serverless.

It doesn't matter if you are fluent in Python or just dipping your toes in the scripting language; sooner or later, you will encounter an error. Python error handling might seem complicated to most newbies, but once you get used to what you need to look at, you'll be fine.

In this article, you'll get an overview of the need-to-knows for error handling Python in AWS Lambda.

Python Errors

Let's look at general Python error types first. They aren't directly related to AWS Lambda and can happen even in more classical backend environments.

Syntax Errors

Syntax errors, also known as parsing errors, are perhaps the most common kind of failure.

While you type code, misplace a comma or forget to add a colon in the print() function. It seems simple, yet that dreaded comma has plagued developers since the beginning of time. They get thrown before any program command is executed when the Python CLI reads the code file.

Here's how one looks like:

>>> while True print('Hello world')
File "<stdin>", line 1, in ?
while True print('Hello world')
^
SyntaxError: invalid syntax

Exceptions

Exceptions occur if a statement or expression is syntactically correct but an error is caused when executing it. As a developer, you can handle exceptions and make them non-fatal to your program.

One such example is KeyError which appears at execution if a mapping key is not found among existing keys in a dictionary. Another Python error handling output is MemoryError which is invoked when you run out of memory while running a Python script.

Exceptions that aren't handled and result in an error message like this:

>>> 10 * (1/0)
Traceback (most recent call last):
File "<stdin>", line 1, in ?
ZeroDivisionError: division by zero

>>> 4 + spam*3
Traceback (most recent call last):
File "<stdin>", line 1, in ?
NameError: name 'spam' is not defined

>>> '2' + 2
Traceback (most recent call last):
File "<stdin>", line 1, in ?
TypeError: Can't convert 'int' object to str implicitly

Failed to import module

Worth noting separately is the import module exception. In essence, this is an exception as every other, yet it requires some special attention in Lambdas. It is raised before the execution reaches the function handler, meaning it does not reach the execution wrapped by the function handler. This usually prevents this type of failure to be reported by error alerting agents.

The error would look like this in CloudWatch Logs:

START RequestId: db1e9421-724a-11e7-a121-63fe49a029e8 Version: $LATEST

Unable to import module 'lambda_funxction': No module named 'lambda_funxction'

REPORT RequestId: db1e9421-724a-11e7-a121-63fe49a029e8 Duration: 15.11 ms Billed Duration: 100 ms Memory Size: 128 MB Max Memory Used: 18 MB

For background: The ModuleNotFoundError made its appearance in version 3.6 of the scripting language. Sometimes error messages are redefined for better understanding. Until Python version 3.5, the user would receive a simple RuntimeError whenever a recursion depth limit was exceeded, which was vague, at best. Since version 3.5, the exception has been redefined as RecursionError. A programmer won't have to dig through Python error handling manuals and compare them to the written code to determine where the code segment causing said error could be.

Read more about how to handle exceptions in Python.

User-Defined Errors

Some errors are user-defined. Say -- for example -- you are using a set of open-source GitHub scripts for AWS memory monitoring. You need them for monitor disk inode usage, for memory buffer monitoring, or load monitoring for each CPU. The scripts require Python 2.6 or better or 3.3 or better.

In such cases, the Python scripts you use might output user-defined errors, which could be hard to interpret since they are defined by using classes. class Error(Exception) is such an example; class TransitionError(Error) is another. They are followed by a message that the script's author has defined, and it depends on its coherence to make sense of it. If one is lazy, the output could be as simple as "error."

Most developers, however, annotate their code or specify clear instructions to make Python error handling as easy as possible. A class InputError(Error) exception could -- for example -- clearly establish that the input the user typed is faulty and output a list of available attribute options for the command.

Exception("Invalid character: " + char)

It's easy to write a simple calculator in Python. You define the operations as Lambda expressions (x and y) and specify the permitted operations (+ - / *). In this case, Python error handling could be done with the help of raise Exception() whenever a character that was not previously defined is being used as input. Suppose an undefined character is used in the input field. In that case, error handling goes into effect, and the simple script of the calculator will display an 'Invalid character:' thanks to the raise Exception('Invalid character: '+ char) line.

Raise Error

As a Python developer, you can also force errors to appear via the raise statement. For example, raise NameError('My error occurred') inserted in the code will output NameError: My error occurred.

The raise exception forces predefined errors to appear and can be helpful when you want any input forced upon the user except the predefined ones. Furthermore, to better clarify things for the user, you could use a simple print('You typed it wrong') or print('This error occurred because you did this'). This clears the confusion caused by user-defined Python errors and better informs users about what they did wrong.\
You can learn more about how to handle exceptions in Python.

AWS Lambda errors

Next, we'll look at AWS Lambda-related Python errors. These errors might be new for seasoned Python developers that are just starting with serverless development.

Resource constraint: TIMEOUT

The default timeout is 6 seconds when using The Serverless Framework, but you can configure it for up to 15 minutes.

Here's how a timeout error looks in CloudWatch Logs:

REPORT RequestId: 41a10717-e9af-11e7-892c-5bb1a4054ed6 Duration: 300085.71 ms Billed Duration: 300000 ms Memory Size: 128 MB Max Memory Used: 92 MB
2017-12-25T20:12:38.950Z 41a10717-e9af-11e7-892c-5bb1a4054ed6 Task timed out after 300.09 seconds

Resource constraint: OUT OF MEMORY

Lambda executions can run into memory limits. You can recognize the failure when both the Max Memory Used and Memory Size values in the REPORT line are identical.

Example:

START RequestId: b86c93c6-e1d0-11e7-955b-539d8b965ff9 Version: $LATEST

REPORT RequestId: b86c93c6-e1d0-11e7-955b-539d8b965ff9 Duration: 122204.28 ms Billed Duration: 122300 ms Memory Size: 256 MB Max Memory Used: 256 MB

RequestId: b86c93c6-e1d0-11e7-955b-539d8b965ff9 Process exited before completing request

Configuration failures

In this case, the Lambda function handler function that is referenced does not exist in the target Python code file.

START RequestId: db1e9421-724a-11e7-a121-63fe49a029e8 Version: $LATEST

Handler 'lambda_handlerx' missing on module

REPORT RequestId: db1e9421-724a-11e7-a121-63fe49a029e8 Duration: 15.11 ms Billed Duration: 100 ms Memory Size: 128 MB Max Memory Used: 18 MB

Handling Failures

Okay, so now we know what can go wrong. Fortunately, Lambda has a few tricks up its sleeve that we can use to remedy the situation.

Retry behavior in AWS

Synchronous invocations: (API Gateway, Amazon Alexa, etc.)

In this case, Lambda returns a 429 error to the invoking application, which is responsible for retries. Some synchronous event sources might have retry logic built-in, so be sure the check the Supported Event Sources from AWS.

Asynchronous invocations: (AWS SNS, AWS SES, AWS CloudWatch, etc)

These events are queued before they are invoked and if the execution fails, they are retried twice with delays between invocations. Optionally, you can specify a Dead Letter Queue for your function and have the failed events go to AWS SQS or SNS. However, if you do not specify a DLQ, the event is discarded after two retries.

Stream-based event sources (Amazon Kinesis Data Streams and DynamoDB streams):

In this case, Lambda polls your stream and invokes a Lambda function. If the invocation fails, Lambda will try to process the batch again until the data expires. To ensure that stream events are processed in order, the exception is blocking and the function will not read any new records until the failed batch is either successfully processed or expired.

Idempotent functions

Depending on the flow of your system, retries can be harmful. For instance, let's imagine a function that is responsible for adding a user row to the database and sending a welcome email. If the function fails after creating the user and gets retried, you will have a duplicate row in the database.

A good way to overcome this is to design your functions to be idempotent.

Idempotent functions are functions with a single task, which either succeeds or can be retried without any damage to the system. You could redesign the aforementioned function using AWS Step-Functions. The first step being the function responsible for adding the user to the database and as a second step, another function sends the email. Read more about step functions here.

Improve logging

For later ease of debugging, I recommend logging out useful information like the event object (mind not logging out passwords, etc.), fishy DB and network requests, and other possible points of failure. Also, make sure if you handle a critical exception, to log the trace out. This makes it possible for log-based monitoring solutions like Dashbird to catch and process.

Log-based monitoring & alerting

It's important to note here that most of these errors don't get reported by default. In the best-case scenario, you will notice them in the CloudWatch metrics dashboard if you happen to have it open. Also, failures outside the program execution are difficult or impossible to pick up by agents since the execution is halted before it reaches the handler or from an upper level.

An excellent solution to that problem is detecting these problems from CloudWatch logs. Using Dashbird -- an easy-to-set-up serverless monitoring tool -- on top of that makes it super easy and fast to detect errors and troubleshoot them in one place.

With Dashbird, you'll be able to track your Python errors while getting an overall view of the status of your application. Once you have finished setting up your account, you'll immediately be able to see every function invocation, live tailing, error reports, cost breakdown, and much much more.

The good thing about Dashbird is that it has zero effect on your Lambda performance or AWS cost. It also integrates with your Slack, Pagerduty (via webhooks), or email account, which brings alerting right to your development chat.

Conclusion

This covers much of what you need to know about Python error handling in AWS Lambdas. Learn more about AWS Lambda errors and how to solve them in our Events Library.


Further reading:

Log-based monitoring for AWS Lambda

AWS Kinesis vs SNS vs SQS (using Python examples)

Top 3 tools to monitor Python in AWS Lambda

Explaining boto3: How to use any AWS service with Python

How I manage AWS credentials in python using AWS secrets manager

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .