As we bring 2024 to a close, and after an invigorating week at AWS re:Invent, many will be writing their year in review summaries. I've decided to dedicate those column inches to the state of serverless sustainability today. The observant among us are quite aware of how the artificial intelligence (AI) craze has wormed it's way into every product, industry and conversation over the past year. It's been making headlines as shuttered power plants like Three Mile Island are reopened. And it's on a collision course with the world's climate change goals. (AWS also announced in October they signed agreements for 3 small modular nuclear reactor projects for their own data centers.)
Running servers in data centers is an energy intensive process. Every search for a restaurant or game score requires a response from a server. Running a busy webserver requires the same power as one that gets 5 hits per day. Serverless technologies like AWS Lambda, S3, DynamoDB can be used to reduce that constant power usage when there is no demand.
Today we'll discuss the current state of affairs with serverless sustainability, and perhaps a little about sustainable computing in general.
What is Sustainability
In the context of technology and computing in particular, sustainability equates primarily to power usage and carbon emissions. The computing industry has been on an upward trend in energy use since, well, the beginning of the computer era. During that time, increases in computational capabilities (faster processors) have driven higher demands for electricity. The next (current?) era of computing is going to require rethinking these needs by reducing the power requirements of computers and data centers, recycling hardware, and becoming better climate citizens. (AWS discusses some of their sustainability improvements related to the circular economy in this re:Invent talk "Advancing sustainable AWS infrastructure to power AI solutions" here.)
AWS defines sustainability as one of the 6 pillars of their Serverless Applications Lens for the AWS Well-Architected Framework. Their Well-Architected Framework documentation covering Sustainability indicates their focus on environmental sustainability in this pillar. It's important that they make that distinction, and even more telling that they are making a large commitment to improving energy efficiency and carbon emissions in their own operations. This document (and others they've released [PDF]) lay out their goals as well as identify how AWS cloud customers can optimize their operations to become more sustainable.
Sustainable Workloads
As is common when working with AWS, they assert customers approach cloud sustainability by using the shared responsibility model. This is akin to their approach for security in the cloud, so it makes sense that they want users to take our part seriously.
Region Selection
One way to make your application more sustainable is to use the lowest carbon intense AWS region available, taking into account other business requirements such as latency. If you are a small company with only customers in one country or region, you may not be able to take advantage, but if business requirements don't disallow using a more remote location you can use this technique. Obviously some AWS regions are in locations that have a higher percentage of renewable energy entering it's power grid. Deciding to use one of these regions should be based on latency of course too, but carbon intensity must be part of the decision.
How, then, do we choose a region for sustainability? A good reference is this AWS Architecture Blog: How to select a Region for your workload based on sustainability goals. They suggest using the site electricitymaps.com to identify carbon intensity and renewal energy percentage for each regional electricity grid.
On the electicitymaps 24 hour climate impact map, hover over the PJM region (which covers Virginia, home to the us-east-1
AWS region) we see the carbon intensity (as of 12/21/2024) is 409g CO2/kWh, and the renewable energy mix is 8%. If we now look at the California power grid (home to the AWS us-west-1
region), we see 139g CO2/kWh, and the renewable energy mix is 67%.
Over-provisioned Capacity and Time Shifting
Earlier this year I ran across a podcast from the Green Software Foundation called Environment Variables. In one episode the guest Kate Goldenring talked about the sustainability merits of serverless. She pointed out a statistic from the Sysdig 2023 Cloud Native Security and Usage Report that showed 69% of requested CPU resources go unused. That means as an industry we're over provisioning by more than two thirds (2/3). The goal should be higher hardware utilization according to Kate.
This proves what we've been saying about how running applications on servers (such as EC2) or containers is less efficient than serverless. Provisioning for peak capacity is what makes these "always on" deployments wasteful because traffic is rarely at it's expected peak usage. Serverless technologies that utilize multi-tenancy are intentionally more sustainable than traditional deployment methods because they enable higher hardware utilization.
It's idealistic to assume we can perfectly utilize hardware via multi-tenancy, but one approach Meta has begun using in their internal private cloud functions product called XFaaS (similar to AWS Lambda) is a technique called "time shifting". Instead of running all workloads immediately, they will delay some operations that are not time sensitive. This increases the average utilization of their hardware due to scheduling these delay tolerant functions during off-peak times, ultimately lowering their peaks and raising their troughs. That equates to a higher average utilization.
This introduces an alternative approach to managing sustainability, seldom discussed in the serverless realm as far as I've seen. In many event driven systems events don't require immediate processing, and are delay tolerant by design. This characteristic lends itself well to having the processing delayed until utilization in your system is below a certain threshold (i.e. off hours.) Of course this is only possible for data that isn't expected to be immediately consistent. This type of processing emerged in the mainframe computing era, and was called "batch processing". It's still in use today, did you ever wonder why your bank statement can take up to 24 hours to reflect purchases? They're processing payments either overnight or in some batch processing methodology.
Time Shifting on Kubernetes
Although I don't consider Kubernetes to be serverless by definition, I happened across a Kubernetes admission controller recently that accepts delay tolerant workloads. This paradigm is atypical for Kubernetes (which usually processes a request as soon as it's received,) but the pattern hints at the potential future of all workload processing systems.
You may have a request sent to your backend system that doesn't require immediate processing, but rather is capable of running whenever is most efficient to do so. The research paper describing the Cucumber and a (2 year old but unmodified) reference implementation are available. While this is more geared towards running workloads when excess solar power is being generated, the concept could be extended to many types of situations such as how heavily loaded your servers are.
Summary
You've just read a few of the trends that have been developing in the sustainable serverless computing arena. Hope this has provided some new information for you to go delve into on your own. Thanks and have a great 2025!
Cover photo by Samuel Faber from Pixabay