Testing Step Functions: how to skip time when testing Timeout and Wait states

Yan Cui - Jun 19 '23 - - Dev Community

When I previously wrote about testing Step Functions, I gave you a general strategy that consists of:

  • Component tests that target the Lambda functions (specifically, the custom code you wrote in those functions).
  • End-to-end tests that execute the state machine in the cloud.
  • Local tests using Step Functions Local where you can use mocks to help you test those hard-to-reach execution paths.

However, there’s one common problem that Step Functions Local won’t help you with—dealing with time. E.g. when you need to test an execution path behind a long wait state or an error path that is behind a long Timeout clause.

Because Step Functions Local doesn’t support skipping forward in time, I find the best solution is to rewrite the state machine definition in the test setup.

Let’s say you have a state machine for processing food orders, like this:

The “Notify restaurant” state has a timeout of 300 seconds and we want to test this error path.

I would write a test case like this:

const given = require('../../steps/given')
const when = require('../../steps/when')
const then = require('../../steps/then')
const chance = require('chance').Chance()
const retry = require('async-retry')

describe("Test case: restaurant doesn't respond to the order in time", () => {
  const orderId = chance.guid()

  describe('Given a local instance of the state machine', () => {
    let stateMachineArn

    beforeAll(async () => {
      stateMachineArn = await given.a_local_statemachine_instance(
        process.env.StateMachineArn,
        chance.guid(),
        (definitionJson) => {
          const definition = JSON.parse(definitionJson)
          definition.States['Notify restaurant'].TimeoutSeconds = 1
          return JSON.stringify(definition)
        }
      )
    })

    describe('When we start a local execution', () => {
      let executionArn

      beforeAll(async () => {
        executionArn = await when.we_start_local_execution(
          stateMachineArn, 
          { orderId })
      })

      it('Should add the order to the database', async () => {
        await then.an_order_exists_in_dynamodb(orderId)
      })

      it('Should send a SNS notification to the restaurant topic', async () => {
        const restaurantNotification = await then.a_restaurant_notification_is_received(orderId)
        expect(restaurantNotification.TaskToken).toBeTruthy()
      })

      it('Should update the order status to "NO_RESPONSE"', async () => {
        await retry(async () => {
          const order = await then.an_order_exists_in_dynamodb(orderId)
          expect(order.status).toEqual("NO_RESPONSE")
        }, {
          retries: 3,
          maxTimeout: 1000
        })
      })
    })
  })
})
Enter fullscreen mode Exit fullscreen mode

The given.a_local_statemachine_instance helper function defines a state machine against Step Functions Local. But importantly, it allows me to rewrite the definition of the state machine and change the TimeoutSeconds setting to 1.

(definitionJson) => {
  const definition = JSON.parse(definitionJson)
  definition.States['Notify restaurant'].TimeoutSeconds = 1
  return JSON.stringify(definition)
}
Enter fullscreen mode Exit fullscreen mode

This way, we only have to wait for a one-second delay (instead of 300!) before we can verify that the order’s status has been changed to NO_RESPONSE.

it('Should update the order status to "NO_RESPONSE"', async () => {
  await retry(async () => {
    const order = await then.an_order_exists_in_dynamodb(orderId)
    expect(order.status).toEqual("NO_RESPONSE")
  }, {
    retries: 3,
    maxTimeout: 1000
  })
})
Enter fullscreen mode Exit fullscreen mode

As you can see, this approach is quite simple and lets you skip time when testing Wait states and Timeout clauses.

If you want to learn more about testing serverless architectures and see the full example in action, then check out my latest course “Testing Serverless Architectures”. It gives you practical advice on how to test different types of serverless architectures and deal with the specific challenges that come with them. Including API Gateway, AppSync, Step Functions and event-driven architectures.

The post Testing Step Functions: how to skip time when testing Timeout and Wait states appeared first on theburningmonk.com.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .