Comparison of top observability and debugging tools to help you monitor Python in AWS Lambda.

Serverless architectures, comprised of lambda functions, are an extension of the principles of the service-oriented architectures, which popularized the approach where services communicate using messages or events. If implemented correctly, serverless architectures can reduce code complexity and easier manage the application.

AWS Lambda is a service that runs your code deployed to a container with pre-allocated CPU, disk, and memory. Together, your code and its associated configuration are generically called Lambda functions. It abstracts the actual infrastructure, allowing developers to focus solely on the code. The functions are called in response to external events or triggers. Lambda functions are stateless, with no affinity to the underlying infrastructure, and your code must consider that local resources and processes will not survive the current session.

FaaS solves many problems that the previous architectural models had to cope with. From a developer's point of view, the most critical aspect is that you just run your code without bothering with server administration, scalability, and availability. On the other side, some factors have to be dealt with differently than before.

One of these aspects is logging.

The old tools used for logging (daemons) are no longer an option since there is no "server" in the traditional sense. Also, since the actual infrastructure is abstracted, you don't have access to system-level metrics like RAM, CPU, or disk usage.

However, the requirements for a sound logging system are pretty much the same:

information should be granular
data should be available in the shortest amount of time
log collection should not impact application performance.

Monitoring AWS Lambda Functions

Maybe you are asking yourself why you should monitor your Lambda Functions. Well... there are two equally important reasons to do it. First, if you are reading this, you probably know AWS is not a free service, and every cent counts, especially when you are trying to start your new business. Second, customers are demanding and expect nothing but the best in terms of responsiveness. If an application fails to deliver on this, it will not make any difference in the design or how much marketing it gets; customers will go with the competition.

Monitoring Lambda functions allow developers to gain meaningful insight into what happens during each execution step. You will be able to see the errors and also measure resource consumption for each invocation. Simply put, there is no better way to optimize the costs and performance of your applications than using a monitoring tool.

A Few Words on CloudWatch

Let's start with what's already in the box.

The built-in tool for Lambda, CloudWatch, organizes logs based on function, version, and containers while Lambda adds metadata for each invocation. In addition, runtime and container errors are included in the logs. CloudWatch provides two methods to write a log entry:

print statements:

print('You will see this in CloudWatch')

logger functions available through the logging module:

logger.info('You will also see this in CloudWatch')

Amazon advises developers to use the logger function to get the additional info available like log levels, timestamps, and so on. When you start building your first FaaS application, chances are you will begin using CloudWatch. Logging will likely be the most used feature. CloudWatch will let you track issues, and for some time, you will rely on it. However, as your applications become more complex and eventually make it into production, you will become aware of their drawbacks. Fortunately, several tools can make your life easier. In this article, we'll have an in-depth look at the top 3.

Dashbird

Dashbird excels in providing error alerting and monitoring support. It works by collecting and analyzing CloudWatch logs and has zero effect on your Lambda performance or AWS cost. It also integrates with your Slack account, which brings alerting right to your #development chat. There are no third-party agents or wrappers, and all the information is available on a single dashboard which includes:

an overview of all invocations
top active functions
recent errors
system health

The dashboard allows you to drill down to invocation level data and analyze each function individually. It is straightforward to use, and it provides all the information you can ask for intuitively.

Dashbird provides detailed views for:

python tracking and error monitoring
performance tracking
troubleshooting
optimization and error handling

Suppose your application logic is distributed over large amounts of functions. In that case, it makes more sense to collect info from the logs rather than sending telemetry at the execution time, which is precisely where Dashbird shines.

Code Optimization and Error Handling

Dashbird includes a time-series view to provide developers with information regarding latency and memory usage. There are metrics available for invocation volumes, memory utilization, duration, and cost. Error handling is supported, and lots of details like stack traces, a list of affected invocations, logs for each invocation are collected and analyzed.

Failure Recognition and Debugging

A nice feature baked in Dashbird is failure recognition in logs. Dashbird detects all types of failures for Python, Node.js, and Java. This includes crashes, early exits, timeouts, and configuration errors. Dashbird detects errors in your Lambda functions and alerts you via Slack or email. Each error contains detailed stack traces and logs linked to similar errors for more accessible debugging efforts.

New Relic

New Relic works by adding the New Relic Lambda extension to your Python functions. Once added, your functions will be instrumented and send their monitoring data to the New Relic service.

Reporting Exceptions

The New Relic Lambda layer will automatically catch, trace and raise any uncaught exceptions in your function.

Custom Events

New Relic provides support for reporting custom events:

@newrelic.agent.lambda_handler()

def handler(event, context):

newrelic.agent.record_custom_event(

"my_metric",

custom_event_object

)

New Relic is easy to start with; it's simple, and... it could be fun!

DataDog

Datadog makes use of a helper function that lets the developer send custom metrics as pre-formatted log messages:

There is no overhead or latency experienced by end-user since data collection works in the background, and it allows correlation between Lambda metrics and other operational data, providing context.

There are several metrics available out-of-the-box for function's execution:

execution times
invocations
invocation errors
throttled functions

Datadog defines the following metric types that can be used:

gauge, used for instantaneous values,
count, a long-running counter that can be incremented/decremented,
histogram, aggregate metrics with one-second granularity,
service check, integer value representing the state of the service

Customization

Additionally, Datadog allows developers to define their custom metrics by printing out the Lambda functions following its predefined format. Alerting Once metrics are collected from the Lambda functions, Datadog can send alerts using the most popular communication tools like Slack or OpsGenie. An exciting feature of Datadog is their machine learning algorithms which can provide signals for events like an outlier or anomaly detection.

Conclusion

You have to pay a small price to fully enjoy the benefits of Lambda functions: abstraction, parallel execution, and high scalability. You must learn and use the new approach to monitoring serverless architectures. FaaS doesn't provide metrics in the traditional sense but, with a proper understanding of the AWS infrastructure, you can achieve a great level of observability for your lambda functions.

Dashbird offers the easiest way to get started with Lambda monitoring and the most bang for the buck.

Further reading:

Python error handling in AWS Lambda

8 Must-Know tricks to use S3 more effectively in Python

Explaining boto3: how to use any AWS service with Python