The world of serverless is constantly growing both with feature set and in number of adopters. While this might seem great, it can lead to rapid growth issues like losing the true meaning of serverless.
As a community, we need to bring it back to the basics. We've learned on our own and established patterns we're comfortable with, but we didn't share. Not enough, anyway. If you take two serverless devs and ask them how to accomplish a task, you'll get two very different answers.
2023 is going to be a year of convergence. One where we focus on best practices, normalize architectural patterns, and design for growth. Think of it as a grassroots year.
We need to familiarize ourselves with the established best practices, architecture norms, and design principles from reputable sources like Serverless Land and gradually refactor our applications to adhere to them.
One of the most important components we must consider and build into our apps is also the focus of the year. We need to focus on observability.
Why Observability?
Many of us get sucked into the draw of building applications quickly with serverless and forget about some of the basic fundamentals of supportability. I am no exception. I have built many reference architectures that don't include any form of observability in the template.
I know I'm not the only one.
I see reference architectures all the time that omit observability. As a result, developers who use these reference projects as a basis for their work don't include it either. Then those projects are used as a foundation for follow-on projects and they don't include observability tools. So on and so forth.
Before you know it we're 2 weeks away from a production go-live walking through an Operational Readiness Review and realize we have no way to monitor the app!
By going back to basics and building an observability mindset, we can avoid situations like this and build strong, maintainable applications from the beginning.
What Kind Of Observability?
The initial build of an application is only a tiny portion of its life. When the build is done, it enters the maintenance life cycle. This is where support and maintenance teams come in to triage issues and make enhancements.
For many applications, the maintenance life cycle will be >90% of its time in the field. Which means tooling for supportability needs to be on point.
The best way to enhance supportability of a serverless application is to build monitoring capabilities that enable users to trace workflows through your system. You must be able to see payloads that are coming in via the initial request and track the path it takes through your infrastructure.
Seeing how data traverses through the system and how it transforms between services is critical to identifying and resolving issues. Data moves around event-driven architectures quickly. Relying on logs from a single Lambda function often isn't good enough to isolate a problem.
But that's not the only thing we need.
To provide the best experience to our users, we want to provide as few service degradations or outages as possible. Speaking from an observability point of view, this means we need to implement proactive monitoring to watch for and find problems before users do.
With serverless, proactive monitoring is typically achieved by alarms. These alarms could watch dead letter queues, detect metric anomalies, track latencies, monitor 5XX response codes, etc... The right alarms to build in your application are the ones that make sure you're hitting your SLA and KPIs.
How To Get Started
It would be easy to jump right in and instrument your applications with one of the many Application Performance Monitoring (APM) vendors available today. But I'd encourage you to not do that immediately.
Before you write any code, do some analysis.
What are your KPIs? What do your workloads look like? Are you retrying failures or sending them to DLQs? Do you know how data flows through your system?
Understanding your system is the first step to proper observability.
When I first got into serverless development, I was gung-ho about pushing logs and trace data to a third party APM tool. We added it early on and data started flowing into the tool. After the initial self pat on the back I was left wondering, "now what?"
I didn't understand the workload or have established KPIs, so I didn't know how to build meaningful monitors or dashboards. I didn't know what metrics to create alarms from or what values made sense to display for SREs.
Since workloads vary so greatly from app to app, it's hard to say what a great "serverless dashboard" is. However, one thing will always ring true - the best dashboard is the one that helps you maintain your SLA.
Summary
We learned so much in 2022. We probably moved a little faster than we should have. This year is about revisiting our work, analyzing what we've done, and using the right tooling to help us build highly maintainable software.
We're going back to grassroots. Take the lessons we've learned and build best practices from them. Converge on patterns and proven architectures. Be consistent.
Observability is often an afterthought. It shouldn't be. It is a critical part of a serverless architecture.
Following workloads in end-to-end processes will provide you with unbeatable supportability. Alarming on infrastructure metrics that fall out of your KPIs provide you with unrivaled availability.
Together, both supportability and availability give our end users the best experience possible, which is really what it's all about.
Let's spend the time to consider observability first, understand what we're building, and deliver the best software we possibly can.
Happy coding!