Test Observability for AWS Lambda with Grafana Tempo and OpenTelemetry Layers

Oscar Reyes - Jun 26 - - Dev Community

I got great feedback from my Pulitzer award-winning blog post, "Testing AWS Lambda & Serverless with OpenTelemetry". The community wanted a guide on using the official OpenTelemetry Lambda layers instead of a custom TypeScript wrapper. šŸ˜„

I decided to write this follow-up but to spice it up a little šŸ„µ. Today Iā€™m using Grafana Cloud, which has become one of my favorite tools! We use it extensively at Tracetest for our internal tracing, metrics, profiling, and overall observability.

See the full code for the example app youā€™ll build in the GitHub repo, here.

OpenTelemetry Lambda Layers

With a decade of development experience, one thing Iā€™ve learned is that no-code solutions help save time and delegate maintenance and implementation to a third party. It becomes even better when it's free šŸ¤‘Ā and from the OpenTelemetry community!

There are two different layers we will use today:

  1. The Node.js auto-instrumentation for AWS Lambda enables tracing for your functions without writing a single line of code, as described in the official OpenTelemetry docs, here and on GitHub, here.
  2. The OpenTelemetry collector AWS Lambda layer enables the setup to be 100% serverless without any need to maintain infrastructure yourself. You still need to pay for it though šŸ‘€.

Grafana Cloud

Grafana Cloud has become a staple tool to store everything related to observability under one umbrella. It allows integration with different tools like Prometheus for metrics or Loki for logs.

In this case, Iā€™ll use Tempo, a well-known tracing backend where you store the OpenTelemetry spans generated by the Lambda functions.

Trace-based testing everywhere and for everyone!

Trace-based testing involves running validations against the telemetry data generated by the distributed systemā€™s instrumented services.

Tracetest, as an observability-enabled testing tool for Cloud Native architectures, leverages these distributed traces as part of testing, providing better visibility and testability to run trace-based tests.

trace testing

The Service under Test

Who said Pokemon? We truly love them at Tracetest, so today we have a new way of playing with the PokeAPI!

Using the Serverless Framework, Iā€™ll guide you through implementing a Lambda function that sends a request to the PokeAPI to grab Pokemon data by id, to then store it in a DynamoDB table.

Serverless X Tracetest Diagram.png

Nothing fancy, but this will be enough to demonstrate how powerful instrumenting your Serverless functions and adding trace-based testing on top can be! šŸ’„

Requirements

Tracetest Account

  • Sign up toĀ app.tracetest.ioĀ or follow theĀ get startedĀ docs.
  • Create anĀ environment.
  • SelectĀ Application is publicly accessibleĀ to get access to the environment'sĀ Tracetest Cloud Agent endpoint.
  • Select Tempo as the tracing backend.
  • Fill in the details of your Grafana Cloud Tempo instance by using the HTTP integration. Check out the tracing backend resource definition, here.
  • Test the connection and save it to finish the process.

AWS

  • Have access to anĀ AWS Account.
  • Install and configure theĀ AWS CLI.
  • Use a role that is allowed to provision the required resources.

What are the steps to run it myself?

If you want to jump straight ahead to run this example yourself ā­ļø.

First, clone the Tracetest repo.

git clone https://github.com/kubeshop/tracetest.git
cd examples/quick-start-serverless-layers
Enter fullscreen mode Exit fullscreen mode

Then, follow the instructions to run the deployment and the trace-based tests:

  1. Copy theĀ .env.templateĀ file toĀ .env.
  2. Fill theĀ TRACETEST_API_TOKENĀ value with the one generated for your Tracetest environment.
  3. Set the Tracetest tracing backend to Tempo. Fill in the details of your Grafana Cloud Tempo instance by using the HTTP integration including headers looking like authorization: Basic <base 64 encoded>. It should be encoded base64 with the format of username:token. Follow this guide to learn how. And, check out this tracing backend resource definition. You can apply it with the Tracetest CLI like this tracetest apply datastore -f ./tracetest-tracing-backend.yaml.
  4. Fill theĀ authorization header in the collector.yaml fileĀ from your Grafana Tempo Setup. It should be encoded base64 with the format of username:token. Follow this guide to learn how.
  5. RunĀ npm i.
  6. Run the Serverless Framework deployment withĀ npm run deploy. Use the API Gateway endpoint from the output in your test below.
  7. Run the trace-based tests withĀ npm test https://<api-gateway-id>.execute-api.us-east-1.amazonaws.com.

Now, letā€™s dive into the nitty-gritty details. šŸ¤“

The Observability Setup

Instrumenting a Lambda function is easier than ever, depending on your AWS region, add the ARN of the OpenTelemetry Collector and the Node.js tracer.

# serverless.yaml
functions:
  api:
      # Handler and events definition
    handler: src/handler.importPokemon
    events:
      - httpApi:
          path: /import
          method: post

          # ARN of the layers
    layers:
      - arn:aws:lambda:us-east-1:184161586896:layer:opentelemetry-nodejs-0_6_0:1
      - arn:aws:lambda:us-east-1:184161586896:layer:opentelemetry-collector-amd64-0_6_0:1
Enter fullscreen mode Exit fullscreen mode

Next, add a couple of environment variables to configure the start of the handler functions and the configuration for the OpenTelemetry collector.

# serverless.yaml
environment:
    OPENTELEMETRY_COLLECTOR_CONFIG_FILE: /var/task/collector.yaml
    AWS_LAMBDA_EXEC_WRAPPER: /opt/otel-handler
Enter fullscreen mode Exit fullscreen mode

The opentelemetry-nodejs layer will spin off the Node.js tracer, configure the supported auto-instrumentation libraries, and set up the context propagators.

While the opentelemetry-collector layer is going to spin off a version of the collector executed in the same context as the AWS lambda layers, configured by the collector.yaml file.

# collector.yaml
receivers:
  otlp:
    protocols:
      grpc:
        endpoint: "0.0.0.0:4317"
      http:
        endpoint: "0.0.0.0:4318"

exporters:
  otlp:
    endpoint: tempo-us-central1.grafana.net:443
    headers:
      authorization: Basic <your basic64 encoded token>

service:
  pipelines:
    traces:
      receivers: [otlp]
      exporters: [otlp]
Enter fullscreen mode Exit fullscreen mode

Easy peezy lemon squeezy šŸ‹ right? well, this is everything you need to do to start your observability journey!

For every trace, there should be a test!

After having the observability setup, now is time to go to the next level by leveraging it by running some trace-based tests. This is our test case:

  • Execute an HTTP request against the import Pokemon service.
  • This is a two-step process that includes a request to the PokeAPI to grab the Pokemon data.
  • Then, it executes the required database operations to store the Pokemon data in DynamoDB.

What are the key parts we want to validate?

  1. Validate that the external service from the worker is called with the properĀ POKEMON_IDĀ and returnsĀ 200.
  2. Validate that the duration of the DB operations is less thanĀ 100ms.
  3. Validate that the response from the initial API Gateway request isĀ 200.

Running the Trace-Based Tests

To run the tests, we are using theĀ @tracetest/clientĀ NPM package. It allows teams to enhance existing validation pipelines written in JavaScript or TypeScript by including trace-based tests in their toolset.

The code can be found inĀ theĀ tracetest.tsĀ file.

import Tracetest from '@tracetest/client';
import { TestResource } from '@tracetest/client/dist/modules/openapi-client';
import { config } from 'dotenv';

config();

const { TRACETEST_API_TOKEN = '' } = process.env;

const [raw = ''] = process.argv.slice(2);

let url = '';

try {
  url = new URL(raw).origin;
} catch (error) {
  console.error(
    'The API Gateway URL is required as an argument. i.e: `npm test https://75yj353nn7.execute-api.us-east-1.amazonaws.com`'
  );
  process.exit(1);
}

const definition: TestResource = {
  type: 'Test',
  spec: {
    id: 'ZV1G3v2IR',
    name: 'Serverless: Import Pokemon',
    trigger: {
      type: 'http',
      httpRequest: {
        method: 'POST',
        url: '${var:ENDPOINT}/import',
        body: '{"id": "${var:POKEMON_ID}"}\n',
        headers: [
          {
            key: 'Content-Type',
            value: 'application/json',
          },
        ],
      },
    },
    specs: [
      {
        selector: 'span[tracetest.span.type="database"]',
        name: 'All Database Spans: Processing time is less than 100ms',
        assertions: ['attr:tracetest.span.duration < 100ms'],
      },
      {
        selector: 'span[tracetest.span.type="http"]',
        name: 'All HTTP Spans: Status  code is 200',
        assertions: ['attr:http.status_code = 200'],
      },
      {
        selector:
          'span[name="tracetest-serverless-dev-api"] span[tracetest.span.type="http" name="GET" http.method="GET"]',
        name: 'The request matches the pokemon Id',
        assertions: ['attr:http.url  =  "https://pokeapi.co/api/v2/pokemon/${var:POKEMON_ID}"'],
      },
    ],
  },
};

const main = async () => {
  const tracetest = await Tracetest(TRACETEST_API_TOKEN);

  const test = await tracetest.newTest(definition);
  await tracetest.runTest(test, {
    variables: [
      {
        key: 'ENDPOINT',
        value: url.trim(),
      },
      {
        key: 'POKEMON_ID',
        value: `${Math.floor(Math.random() * 100) + 1}`,
      },
    ],
  });
  console.log(await tracetest.getSummary());
};

main();
Enter fullscreen mode Exit fullscreen mode

Get True Test Observability

Make sure to apply the Tempo tracing backend in Tracetest. Create your Basic auth token, and use this resource file for reference. View the tracetest-tracing-backend.yaml resource file on GitHub, here.

type: DataStore
spec:
  id: tempo-cloud
  name: Tempo
  type: tempo
  tempo:
    type: http
    http:
      url: https://tempo-us-central1.grafana.net/tempo
      headers:
        authorization: Basic <base 64 encoded>
      tls: {}
Enter fullscreen mode Exit fullscreen mode

Apply the resource with the Tracetest CLI.

tracetest config -t TRACETEST_API_TOKEN
tracetest apply datastore -f ./tracetest-tracing-backend.yaml
Enter fullscreen mode Exit fullscreen mode

Or, add it manually in the Tracetest Web UI.

tracetest infra graph

With everything set up and the trace-based tests executed against the PokeAPI, we can now view the complete results.

Run the test with the command below.

npm test https://<api-gateway-id>.execute-api.us-east-1.amazonaws.com
Enter fullscreen mode Exit fullscreen mode

Follow the links provided in theĀ npm testĀ command output to find the full results, which include the generated trace and the test specs validation results.

[Output]

> tracetest-serverless@1.0.0 test
> ENDPOINT="$(sls info --verbose | grep HttpApiUrl | sed s/HttpApiUrl\:\ //g)" ts-node tracetest.ts https://<api-gateway-id>.execute-api.us-east-1.amazonaws.com/import

Run Group: #618f9cda-a87e-4e35-a9f4-10cfbc6f570f (https://app.tracetest.io/organizations/ttorg_ced62e34638d965e/environments/ttenv_a613d93805243f83/run/618f9cda-a87e-4e35-a9f4-10cfbc6f570f)
Failed: 0
Succeed: 1
Pending: 0

Runs:

āœ” Serverless: Import Pokemon (https://app.tracetest.io/organizations/ttorg_ced62e34638d965e/environments/ttenv_a613d93805243f83/test/ZV1G3v2IR/run/22) - trace id: d111b18ca75fb6dbf170b66d963363f9
Enter fullscreen mode Exit fullscreen mode

Find the trace in Grafana Cloud Tempo

The full list of spans generated by the AWS Lambda function can be found in your Tempo instance, these are the same ones that are displayed in the Tracetest App after fetching them from Tempo.

tracing backend tempo tracetest integration

šŸ‘‰Ā Join the demo organization where you can start playing around with the Serverless example with no setup!!Ā šŸ‘ˆ

From the Tracetest test run view, you can view the list of spans generated by the Lambda function, their attributes, and the test spec results, which validate the key points.

grafana cloud tempo

Key Takeaways

Simplified Observability with OpenTelemetry Lambda Layers

In this post Iā€™ve highlighted how using OpenTelemetry Lambda layers allows for automatic tracing without additional code, making it easier than ever to set up observability for your Serverless applications.

Powerful Integration with Grafana Cloud

Grafana Cloud has become an essential tool in our observability toolkit. By leveraging Grafana Tempo for tracing, we can store and analyze OpenTelemetry spans effectively, showcasing the seamless integration and its benefits.

Enhanced Trace-Based Testing with Tracetest

Tracetest is a game-changer for trace-based testing. By validating telemetry data from our instrumented services, it provides unparalleled visibility and testability, empowering us to ensure our distributed systems perform as expected.

Would you like to learn more about Tracetest and what it brings to the table? Check theĀ docsĀ and try it out today byĀ signing up for free!

Also, please feel free to join our Slack community, give Tracetest a star on GitHub, or schedule a time to chat 1:1.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .