Lambda Code Execution Freeze/Thaw

Omid Eidivandi - Nov 5 - - Dev Community

AWS Lambda, a serverless computing service, enables code execution on demand while providing an isolated environment for each individual request. This design inherently ensures reliability, as any failure in one request does not affect others.

When working with Lambda, it's important to explore the intricacies of its execution environment and how it is managed. In this article, I will share my insights and observations, though some aspects may be open to debate.

Execution Environment

An execution environment is an isolated container (Micro-VM) that is launched on demand to handle incoming requests. Each environment processes one request at a time but can handle subsequent requests once the previous one is complete. If a new request arrives before the ongoing one finishes, an additional execution environment is created to manage it. The diagram below illustrates the lifecycle of an execution environment.

An Execution environment follows 3 phases: Initialization, Invocation, and shutdown

Init phase

The init includes initiating runtime, registering configured extensions, and downloading code packages which happen sequentially in respective order.

Invocation Phase

The Runtime, extensions, and function code will be invoked during the Invocation phase.

Shutdown Phase

The shutdown phase will shut down the runtime and send the shutdown signal to all extensions letting them clean up and finish the remaining work.

Reuse of Environment

An Execution Environment will be used for the subsequent requests, so the lambda will enter the shutdown phase if an execution environment does not receive any request for a period of time. The duration between the end of the invocation phase and the start of the shutdown phase is the idle duration when the Lambda service allows the reuse of that available environment.

AWS Lambda has a lot of interesting technical details. Lambda will not leave the execution environment running while there is no new demand, but the execution environment gets frozen and will be Thawed if it receives any new demand.

In the case of unintentional interruptions, the lambda will run the initialization as part of the next invocation, but this just seems to be a light init.

Code Execution

Lets first see how the code is executed for any request. When the first request is received, the lambda initiates the environment by running the top-level code. How the init phase behaves depends on the programming language used and how the code is written. Typically, when using NodeJs what happens during the initialization phase is like running the following command

> node index.js
Enter fullscreen mode Exit fullscreen mode

The lambda service will then execute the function handler, when the execution is finished, the execution environment gets frozen. Thawing the execution environment i the tricky part. When lambda Freezes and the execution environment all background processes will be frozen and will be executed by thawing back the execution environment. But what about if the execution environment gets no more requests? The execution environment gets shut down and everything will be lost.

Freeze

When the execution environment gets frozen under the hood, the container gets into kinda hibernate state, when all resources get in a sleep state, like when a PC goes into hibernate mode, The address space of all running processes will be registered, allowing reconstruction of same state next.

Thaw

The thawing stage is part of runtime invoke when a container is reused as part of the invocation phase; the lambda service invokes the runtime, but this theoretically must be behind the Frozen background process reconstruction.

I could not find a response to what happens when the execution environment wakes up, but this must be when the container gets awake and not the runtime.

Try it out

The following example will create a lambda-based Api with two endpoints, one for awaited and another for non-awaited tasks to observe how lambda will behave in real time.

Both functions have the same code except the first use the await and the second non.

import { LambdaFunctionURLEvent, LambdaFunctionURLResult } from "aws-lambda";const delay = async (ms: number) => { return new Promise((resolve) => { setTimeout(resolve, ms); });}const Task = async (req: string, name: string, sleep: number) => { await delay(sleep); console.log(`${name} : `, req ); return { name: `${name}` };}export const handler = async (_event: LambdaFunctionURLEvent): Promise<LambdaFunctionURLResult> => { const resultA = await Task(_event.requestContext.requestId, "TaskA", 1000); const resultB = await Task(_event.requestContext.requestId, "TaskB", 2000); const resultC = await Task(_event.requestContext.requestId, "TaskC", 3000); const result = { resultA, resultB, resultC, }; return { statusCode: 200, body: JSON.stringify(result, null, 2), headers: { "Content-Type": "application/json", }, };}
Enter fullscreen mode Exit fullscreen mode

Running the awaited function will give a response time of 6.XX seconds accumulating 1000, 2000, and 3000 milliseconds.

The non-awaited function has the same code but calls TaskB and TaskC without waiting.

const resultA = await Task(_event.requestContext.requestId, "TaskA", 1000);const resultB = Task(_event.requestContext.requestId, "TaskB", 2000);const resultC = Task(_event.requestContext.requestId, "TaskC", 3000);
Enter fullscreen mode Exit fullscreen mode

Running the first request gives the following logs, showing that only TaskA is terminated and the log is present.

Running a second request results in the logs as below, the previous execution remaining tasks are executed as part of subsequent invocations.

But the interesting part is how long they took to log. The TaskB and TaskC are executed at the same time and ended instantly. The following image shows the response time as 1105 ms, which is normal for new invocation TaskA.

Looking at the Non-Awaited code previously, the TaskB and TaskC must take around 5000 ms together that is not the case. Looking at Billing Duration the Billed duration corresponds TaskA execution.

Real scenario

To trust better the hypothesis ( i did not ) and validate that there will be no tricky side effect, we gonna send a message to an SQS queue just to prove the idea behind the observations is real.

By running the first request there will be nothing fancy except a new message in the queue for TaskA.

Now, lets run another request and see what happens. Here, TaskB and TaskC just got included in the execution, and the new messages are present in the queue. the top 1 in the following screenshot is the first invocation message, and the three others are TaskB/TaskC of the first invocation + New Invocation TaskA.

In the above examples, the logs ingested were the logs after treatment. I was curious to see if I could observe how those function calls happen so i kept it simple by adding logs for the start and end of each treatment.

Here how Task method looks like

export const Task = async (req: string, name: string, sleep: number) => { console.log(`Starting ${name} : `, req ); await delay(sleep); console.log(`Waking up ${name} : `, req ); await client.send(new SendMessageCommand({ QueueUrl: process.env.QUEUE_URL, MessageBody: JSON.stringify({ req, name }), })); console.log(`End ${name} : `, req ); return { name: `${name}` };}
Enter fullscreen mode Exit fullscreen mode

The Task method has three levels of Logs indicating the Starting, WakingUp, and End. According to the following logs, the first represents an execution that the frozen tasks are executed before the current execution, and the second one, TaskB woke up at the same time as the current invocation start ( CloudWatch log ordering is defaulted by time ). based on these observations frozen tasks are not executed at the function handler invoke phase but at runtime invoke apparently.

These observations can prove that the frozen processes will be run as soon as the execution environment is thawed, and this seems like a kind of decoupling ( this is only a hypothesis ) to prove the coupling or decoupling running some failures for TaskB and TaskC will illustrate better the internal state.

Simulating Failures

The Task Method will be changed to cover failing explicitly per Task name as following code snippet.

export const Task = async ( req: string, name: string, sleep: number, extendedProcess?: Function) => { console.log(`Starting ${name} : `, req ); await delay(sleep); console.log(`Waking up ${name} : `, req ); if( extendedProcess ){ extendedProcess(); } await client.send(new SendMessageCommand({ QueueUrl: process.env.QUEUE_URL, MessageBody: JSON.stringify({ req, name }), })); console.log(`End ${name} : `, req ); return { name: `${name}` };}
Enter fullscreen mode Exit fullscreen mode

The extendedProcess is passed in TaskB initialization call as below

const extendedProcess = (name: string, req: string) => { console.log(`Failing ${name} : `, req ); throw new Error('TaskB Failed');}export const handler = async (_event: LambdaFunctionURLEvent): Promise<LambdaFunctionURLResult> => { ... const resultB = Task( _event.requestContext.requestId, "TaskB", 2000, extendedProcess("TaskB", _event.requestContext.requestId)); ...}
Enter fullscreen mode Exit fullscreen mode

The first request will behave as before, but during the second execution, the previously frozen TaskB will result in the current execution interruption by throwing an UnresolvedPromiseRejection error. This proves that the Freeze/Thaw of incompleted tasks can be dangerous and breaks the AWS Lambda design principle of having isolated event-based processing. this approves a level of coupling that can become dangerous without careful implementation.

Tracking state

💡

This is just a game play not a way to implement production ready solutions

This experimentation pushed me to think about how the lambda can behave like a stateful container and act based on the state. To experiment with state tracking, a use case can be a task that will provide some state in the execution environment, which can be done using a variable outside the handler, but this time, I want to try not to keep only the state but defer the actions and see if possible.

How the idea behaves can be demonstrated by the following sequence diagram

The idea is that the un-waited task will check the execution environment state and and modify it. What was achived in this test :

  • Accumulting State in a Dictionary outside handler

  • Pushing accumulated items to an SQS when count reaches 10 item

But what about undesired situation , during the time i played i had some sort of timeout occasionally, while deep diving i discovered the dictionary state got empty after a timeout as per following screenshot from cloudwatch logs.

Conclusion

While this is cool to try and fail, this article just was a game around how lambda execution environment behaves, and the behavior seems like an hibernate , when again awaked, the process will come back and resuming, did you ever tried a hibernate while coping a huge folder from one drive to other ? this is the same behavior.

The final note is, a controlled and imperative programming model is many times more efficient that behavioral programming, actually for this case a better approach is using the lambda layer to push the logs but again this was a fun and i thought it is worth to share with the community.

Enjoy reading

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .