During the last cohort of my Production-Ready Serverless workshop, a student asked:
If I have to query an ERP system and wait for its response, and it sometimes takes more than 15 minutes to respond, is there a serverless way to do this?
This is a surprisingly hard question to answer because:
A. It’s a query and not a fire-and-forget request, they do have to wait for the ERP system to respond.
B. It’s a third-party system that they have no control over so they can’t just add a callback mechanism to notify them when the query result is ready.
C. None of the common serverless solutions let you hang onto the web connection for more than 15 minutes. Lambda’s max timeout is 15 mins. EventBridge’s API Destination is fire-and-forget and has a max timeout of 5s anyway. Step Function’s API integration only works with API Gateway and API Gateway has a max integration timeout of 30s.
The “obvious” answers are:
- Switch to another ERP system! It’s probably not a feasible solution and likely not an engineering decision either.
- Run a container. A Fargate service is going to charge you for uptime but you can make concurrent requests and make better use of the idle CPU cycles. If the calls to the ERP system are infrequent then you can also run Fargate tasks on-demand instead so you don’t pay for the idle time between calls.
In the future, I suspect Step Functions will be able to call any public HTTP endpoints, not just API Gateway endpoints. While Express Workflows are limited to 5 minutes, Standard Workflows can run for up to a year and are very good at waiting.
But I suspect AWS will impose a max timeout there as well. Ultimately, someone has to pay for the idle time while we wait for the ERP system to respond. Because Standard Workflows is charged based on the number of state transitions, this 15-minute idle time would be paid for by AWS.
In any case, Step Functions is not a viable option today.
One outside-the-box solution is to abuse the Lambda internals and “skip” the Lambda timeout (you still pay for execution time!). As described in this post. But this is a dangerous approach and one that I don’t recommend.
So that led me back to “you have to run a container”, but is there a more serverless way to do this?
I asked around on Twitter and received some really interesting suggestions! Most suggestions were ways to run a container service or task with low operational overhead. Including:
- Use CodeBuild (aka Corey Quinn’s favourite container service) to run ephemeral containers. It takes less work to set up compared to Fargate. SST has a handy
Job
construct that lets you run a container job with CodeBuild.
- AppRunner. As mentioned above, if the calls to the ERP system are frequent then it would be more cost-efficient to run a long-running service. AppRunner is yet another low-effort way to run a containerised service.
My favourite suggestion is to run a Python shell job with AWS Glue. There are no container images to configure and maintain. Just set up an IAM role and point Glue to a Python script in S3 and you’re good to go.
You’d pay $0.44 per hour for Python shell jobs. If the calls to the ERP system are frequent then this will be an expensive solution. But if the calls are infrequent then it can be a viable solution. And it’s the most serverless way to wait for a slow (> 15 mins) HTTP request.
I hope you’ve enjoyed this article. If you want to level up your serverless game, why not check out the Production-Ready Serverless workshop? I will teach you everything I know about building serverless applications. From structuring projects, testing, deployment and security, to monitoring and troubleshooting in production.
Oh, and when you sign up for one of the upcoming workshops right now, you will also get access to my “Testing Serverless Architectures” course at no extra charge.
Hope to see you there :-)
The post What’s the most serverless way to wait for a slow HTTP response? appeared first on theburningmonk.com.