Creating resilient API synthetic canary tests using CloudWatch Synthetics

Wojciech Matuszewski - Aug 22 '21 - - Dev Community

Developers test their software in multiple ways. Unit, integration, and end-to-end tests are, arguably, the most common ways to do so.

In addition to the three pillars of testing mentioned above, one might make a case for synthetic tests. These tests run on a schedule and continuously exercise the application to ensure it is working as expected.

This blog post will show you how to build such synthetic tests using the CloudWatch Synthetics. We will cover how to deploy them and ensure they stay resilient to any network intermittent issues that might arise.

Let us dive in.

All the code examples in this blog post are written in TypeScript. I will be using AWS CDK as my IaC tool of choice.

The Synthetics package

Before writing any test code, we must familiarize ourselves with a tool the CloudWatch Synthetics exposes - mainly the Synthetics package.

First off, the Synthetics package above is not available on npm, which means that we cannot test our canary code locally before deploying it to AWS.

My development workflow while working with CloudWatch Synthetics

Secondly, no public TypeScript definitions exist for the Synthetics package. We will get over this hurdle later on by creating them manually.

Thirdly, the Synthetics package does not support retrying network requests out of the box. I would argue that any test dealing with network communication or UIs should retry the assertion if it did not pass the first time around. We would not want to get woken up in the middle of the night because of intermittent network communication issues.
We will also look at how one might add that capability later on.

Deploying a simple canary

With the previous chapter behind us, we are ready to deploy a simplistic CloudWatch Synthetics canary test that we will be improving upon as time progresses. The AWS CDK makes the deployment part hustle-free and painless.

Here is our starting point in terms of a canary test.



// canary.ts

// We will switch to `import` whenever we add types for this module.
const synthetics = require("Synthetics");
import http from "http";

export const handler = async () => {
  const requestOptions: http.RequestOptions = {
    hostname: "jsonplaceholder.typicode.com",
    method: "GET",
    port: 443,
    protocol: "https:",
    path: "/todos/1"
  };

  await synthetics.executeHttpStep(
    "ping",
    requestOptions,
    async (res: http.ServerResponse) => {
      return new Promise(resolve => {
        res.on("error", error => {
          reject(error);
        });

        res.on("end", () => {
          resolve(undefined);
        });
      });
    }
  );
};


Enter fullscreen mode Exit fullscreen mode

One significant thing to note here - the executeHttpStep does not propagate data passed to the resolve callback.

A "global" variable needs to be created and mutated as the data streams in chunks to retrieve data returned by the HTTP response.

Here is how one might read the data returned from the jsonplaceholder API and use it outside of the executeHttpStep callback:



// canary.ts

let rawResponse = "";
await synthetics.executeHttpStep("ping", requestOptions, async res => {
  return new Promise(resolve => {
    res.on("data", chunk => {
      rawResponse += chunk;
    });

    res.on("close", () => {
      resolve(undefined); // Using `undefined` to avoid TypeScript errors.
    });
  });
});

const response = JSON.parse(rawResponse);


Enter fullscreen mode Exit fullscreen mode

This API is very unfortunate and will be a major thorn in our side whenever we implement retries.

Code is much more sparse on the AWS CDK side of things.
To deploy the canary, we can leverage the synthetics construct.



import * as synthetics from "@aws-cdk/aws-synthetics"

// Class definition and so on.

const buildResult = buildSync({
  external: ["Synthetics"],
  minify: true,
  platform: "node",
  bundle: true,
  entryPoints: [join(__dirname, "./canary.ts")],
  write: false
});
const canaryCode = Buffer.from(buildResult.outputFiles[0].contents).toString(
  "utf-8"
);

new synthetics.Canary(this, "MyCanary", {
  schedule: synthetics.Schedule.rate(cdk.Duration.minutes(1)),
  runtime: synthetics.Runtime.SYNTHETICS_NODEJS_PUPPETEER_3_1,
  successRetentionPeriod: cdk.Duration.days(1),
  failureRetentionPeriod: cdk.Duration.days(20),
  test: synthetics.Test.custom({
    code: synthetics.Code.fromInline(canaryCode),
    handler: "index.handler"
  })
});


Enter fullscreen mode Exit fullscreen mode

Since the Canary construct does not support canaries written in TypeScript out of the box, I'm leveraging esbuild for bundling and transpilation.

After deploying the stack, navigate to the CloudWatch dashboard and select the Synthetics Canaries tab. Our canary should be green with one step marked as "Passed".

Adding TypeScript type definitions

As I eluded earlier, there are no publicly available TypeScript typings for the Synthetics package, meaning that the synthetics variable defined in the canary.ts file is untyped - TypeScript evaluates the type of that variable as any.

Thankfully, TypeScript exposes a way to declare those typings manually through ambient modules.

Here is a very bare-bones Synthetics ambient module declaration.



// synthetics.d.ts

declare module "Synthetics" {
  import type { ServerResponse, RequestOptions } from "http";

  declare function executeHttpStep(
    stepName: string,
    options: RequestOptions,
    validationFunction?: (res: ServerResponse) => Promise<unknown>
  ): Promise<void>;

  export = { executeHttpStep };
}


Enter fullscreen mode Exit fullscreen mode

With type declaration in place, we can now retire the CJS require in favor of ES6 import within the canary.ts file. Doing so will make TypeScript infer typings for Synthetics package from the ambient module we have declared.



// canary.ts

- const synthetics = require("Synthetics")
+ import synthetics from "Synthetics";

-   await synthetics.executeHttpStep("ping", requestOptions, async (res: http.ServerResponse) => {
+   await synthetics.executeHttpStep("ping", requestOptions, async (res) => {


Enter fullscreen mode Exit fullscreen mode

Validating the response status

Let us begin with asserting the status of the response. If the response returns a status outside of <200, 299> range, we should fail the ping step, thus failing the whole canary.



// canary.ts

await synthetics.executeHttpStep("ping", requestOptions, async res => {
  return new Promise((resolve, reject) => {
    // Asserting the response `statusCode`
    if (res.statusCode < 200 || res.statusCode > 299) {
      reject(`${res.statusCode}: ${res.statusMessage}`);
    }

    // Rest of the code from the previous section
  });
});


Enter fullscreen mode Exit fullscreen mode

One important thing to note here - the rejection inside the executeHttpStep callback does not mean rejection of the executeHttpStep function.
In fact, the rejection is swallowed. Depending on the provided executeHttpStep settings, the test might or might not continue.

Adding retries

Now it's time to start thinking about the resiliency of our tests.

Retrying on 500 statusCode

I would argue that retrying once or twice whenever the returned response has a 5xx status code makes sense. In the era of microservices, it's not uncommon for intermittent network errors to occur. If the same 5xx response status persists, though, we should be pretty confident that something is not working.

Since the Synthetics package does not support retrying requests out of the box, we must implement that logic ourselves.

I will be using p-retry module to carry out the retrying. I like the p-retry interface and value its ease of use.



// canary.ts

const numOfRetries = 2;
const shouldFailStep = (attemptCount: number) =>
  attemptCount == numOfRetries + 1;
pRetry(
  async attemptCount => {
    let shouldRetryStep = false;
    let stepFailed = true;

    await synthetics.executeHttpStep("ping", requestOptions, async res => {
      return new Promise((resolve, reject) => {
        log.warn(`response status code: ${res.statusCode}`);

        // (1)
        if (res.statusCode >= 500) {
          if (shouldFailStep(attemptCount)) {
            return reject("Retries exhausted");
          }

          // (2)
          shouldRetryStep = true;
          resolve(undefined);
        }

        if (res.statusCode < 200 || res.statusCode > 299) {
          reject(`${res.statusCode}: ${res.statusMessage}`);
        }

        res.on("close", () => {
          stepFailed = false;

          resolve(undefined);
        });
      });
    });

    if (shouldRetryStep) {
      throw new Error("Retrying step");
    }

    // (3)
    if (stepFailed) {
      throw new pRetry.AbortError("Step failed");
    }

    // You might be interested in reacting to the `res.on('data')` event and returning the result here.
  },
  {
    retries: numOfRetries,
    maxTimeout: 2_000,
    minTimeout: 500
  }
);


Enter fullscreen mode Exit fullscreen mode

There is a lot to unpack here, so let us move through the code step by step.

  1. As I eluded earlier, retrying whenever a request returns a 5xx status code makes sense. This logic checks whether we have exhausted our retries. If so, calls the reject callback failing the step, thus failing the canary in the process.

  2. The executeHttpStep API design gives us no alternative to communicating with the world outside of the callback other than through a "global" variable. Here I'm signaling that the request should be retried, but I'm using resolve to ensure the step is not marked as "failed".

  3. I have to manually propagate the reject call because the executeHttpStep will never reject. Again, using a "global" variable due to API constraints.

Retrying on timeouts

Sadly, I could not trigger any timeout errors despite the aggressive one-millisecond timeout or agent properties specified on the requestOption object. Given this experience, I conclude that the executeHttpStep does not support timeouts and ignores them.

If this is indeed true, it makes the Synthetics package much less viable in the context of API canary tests.

Alternative approach

Let us discuss the alternative AWS setup one might use to deploy API canary tests, effectively avoiding the Synthetics package altogether.

While the requests made with the executeHttpStep function produce nice visuals in the AWS CloudWatch Synthetics console, if all we require are logs of the process that runs the test, we might want to look for other solutions.

In this case, I would recommend looking into EventBridge and the capability of invoking a target, for example, an AWS Lambda function, on a fixed schedule. Here is an excellent resource on how to implement this infrastructure.

Summary

I hope that you find this exploration of the Synthetics package helpful.

While there might be better options for a platform to deploy API canary tests on, the Synthetics package can also monitor web applications. I do not have that much experience in that area, but it is an avenue worth exploring if that is your use case.

As always, if you have noticed some facts that are incorrect or misleading, please let me know!

You can find me on Twitter - @wm_matuszewski

Thank you for your time.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .