Trace-based Testing AWS Lambda with Tracetest, ECS Fargate, and Terraform

Oscar Reyes - Jul 6 '23 - - Dev Community

Since the initial release of AWS Lambda back in 2014, developers have seen the great potential of having a FaaS (Function as a Service) distributed system. It avoided the pain of maintaining EC2 instances, load balancers, target groups and more. Soon enough, we saw the birth of frameworks like Serverless or SAM that make managing these serverless functions easier.

We started building REST APIs, creating workers and cron jobs using AWS CloudWatch and listening to different AWS services events like S3. However, we did not realize that at some point it would get complex to understand how all of these services interact with each other. It would get even harder to ensure full visibility of every step that is taken to complete several tasks that move through the distributed system.

Today, standard protocols like OpenTelemetry exist to help teams implement observability as part of the development process and provide a clear understanding of what their serverless application is doing.

After reading this blog post, you will learn how to instrument a basic Node.js Lambda function using OpenTelemetry to provide better insights into what the code is doing, store the tracing data in Jaeger where you can visualize and query the information and run trace-based testing with Tracetest to include automated validations based on the generated telemetry data.

You can also find a recipe for this use case in the Tracetest official docs, here.

Let’s begin!

Basic Concepts

First, let’s explain a few basic concepts to give you context and clarity about what we are going to do once we begin the setup.

What is AWS Lambda?

AWS Lambda is a computing service that lets you run code without provisioning or managing servers. Lambda runs your code on a high-availability compute infrastructure and performs all of the administration of the compute resources, including server and operating system maintenance, capacity provisioning and automatic scaling and logging. With Lambda, you can run code for virtually any type of application or back-end service. All you need to do is supply your code in one of the languages that Lambda supports.

Currently, there are several frameworks to manage Lambda functions by enabling a streamlined way for you to create APIs, workers, cron jobs, etc. So your team can focus on the product requirements and business logic. Some of them are:

What is AWS ECS Fargate?

AWS Fargate is a serverless, pay-as-you-go compute engine that lets you focus on building applications without managing servers. AWS Fargate is compatible with both Amazon Elastic Container Service (ECS) and Amazon Elastic Kubernetes Service (EKS).

We are going to be using ECS Fargate as a streamlined way of deploying Tracetest to the AWS cloud.

What is Terraform?

HashiCorp Terraform is an infrastructure as code (IaC) tool that lets you define both cloud and on-prem resources in human-readable configuration files that you can version, reuse and share. You can then use a consistent workflow to provision and manage all of your infrastructure throughout its lifecycle. Terraform can manage low-level components like computing, storage and networking resources, as well as high-level components like DNS entries and SaaS features.

Terraform will be the base framework we’ll use to manage all AWS services, as well as package and deploy the Lambda function

What is Tracetest?

Tracetest is an open-source project, part of the CNCF landscape. It allows you to quickly build integration and end-to-end tests, powered by your OpenTelemetry traces.

Tracetest uses your existing OpenTelemetry traces to power trace-based testing with assertions against your trace data at every point of the request transaction. You only need to point Tracetest to your existing trace data source, or send traces to Tracetest directly!

Tracetest makes it possible to:

  • Define tests and assertions against every single microservice that a transaction goes through.
  • Work with your existing distributed tracing solution, allowing you to build tests based on your already instrumented system.
  • Define multiple transaction triggers, such as a GET against an API endpoint, a GRPC request, etc.
  • Define assertions against both the response and trace data, ensuring both your response and the underlying processes worked correctly, quickly and without errors.
  • Save and run the tests manually or via CI build jobs with the Tracetest CLI.

Setting up the AWS Infrastructure

We are going to use Terraform to provision all of the required AWS infrastructure. It includes:

  1. *******************Networking.******************* VPCs, Security Groups, Internal, and Public Load Balancers.
  2. The REST API. Lambda, API Gateway, S3 bucket.
  3. ***************Telemetry Side Cart.*************** ECS Cluster, Fargate Task definitions, Jaeger, Tracetest.

Cloud Infrastructure Diagram

https://res.cloudinary.com/djwdcmwdz/image/upload/v1681227224/Blogposts/tracetest_aws_terraform/terraform-serverless-diagram_uoso3s.png

What do I need to run the Use Case?

Here are the requirements to run the use case:

  1. Terraform
  2. AWS Access
  3. Tracetest CLI

Initial Terraform Definitions

To encourage a better understanding of the Terraform code definitions, we’ll be using external and local variables defined in the [variables.tf](http://variables.tf) file, which also, includes the initial setup for the required providers such as AWS.

// variables.tf

// defining the dependencies and packages
terraform {
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "4.55.0"
    }
    tls = {
      source  = "hashicorp/tls"
      version = "4.0.4"
    }
  }
}

provider "aws" {
  region = local.region
}

data "aws_caller_identity" "current" {}
data "aws_availability_zones" "available" {}

// external variables 
variable "aws_region" {
  type    = string
  default = "us-west-2"
}

variable "tracetest_version" {
  type    = string
  default = "latest"
}

variable "environment" {
  type    = string
  default = "dev"
}

// prefixes, naming convention and static variables
locals {
  name            = "tracetest"
  region          = var.aws_region
  tracetest_image = "kubeshop/tracetest:${var.tracetest_version}"
  environment     = var.environment

  db_name     = "postgres"
  db_username = "postgres"

  vpc_cidr = "192.168.0.0/16"
  azs      = slice(data.aws_availability_zones.available.names, 0, 3)


  // used to provision the Tracetest server
  provisioning = <<EOF
---
type: PollingProfile
spec:
  name: default
  strategy: periodic
  default: true
  periodic:
    retryDelay: 5s
    timeout: 10m

---
type: DataStore
spec:
  name: jaeger
  type: jaeger
  jaeger:
    endpoint: ${aws_lb.internal_tracetest_alb.dns_name}:16685
    tls:
      insecure_skip_verify: true
  EOF

  tags = {
    Name    = local.name
    Example = local.name
  }
}
Enter fullscreen mode Exit fullscreen mode

In the following Terraform definition files, you’ll find references to the local variables defined in this file.

Network Definitions

First, we need to set up the base definitions that will allow communication between the different services we will be generating. Part of this section is to provide public access to certain services like the Serverless API, Jaeger UI and Tracetest as well as ensure VPC protection for others like the RDS Postgres instance.

// network.tf
// creating the main VPC resources, including private and public subnets
module "network" {
  source                                      = "cn-terraform/networking/aws"
  name_prefix                                 = local.name
  vpc_cidr_block                              = local.vpc_cidr
  availability_zones                          = ["${local.region}a", "${local.region}b", "${local.region}c", "${local.region}d"]
  public_subnets_cidrs_per_availability_zone  = ["192.168.0.0/19", "192.168.32.0/19", "192.168.64.0/19", "192.168.96.0/19"]
  private_subnets_cidrs_per_availability_zone = ["192.168.128.0/19", "192.168.160.0/19", "192.168.192.0/19", "192.168.224.0/19"]
}

// configuration pieces for the public ALB
module "tracetest_alb_security_group" {
  source  = "terraform-aws-modules/security-group/aws"
  version = "~> 4.0"

  name        = local.name
  description = "Load balancer security group"
  vpc_id      = module.network.vpc_id

  ingress_with_cidr_blocks = [
    {
      from_port   = 11633
      to_port     = 11633
      protocol    = "tcp"
      description = "HTTP access for Tracetest"
      cidr_blocks = "0.0.0.0/0"
      }, {
      from_port   = 16686
      to_port     = 16686
      protocol    = "tcp"
      description = "HTTP access for Jaeger UI"
      cidr_blocks = "0.0.0.0/0"
      }, {
      from_port   = 16685
      to_port     = 16685
      protocol    = "tcp"
      description = "HTTP access for Jaeger API"
      cidr_blocks = "0.0.0.0/0"
  }]

  egress_with_cidr_blocks = [
    {
      from_port   = 0
      to_port     = 65535
      protocol    = "-1"
      description = "HTTP access to anywhere"
      cidr_blocks = "0.0.0.0/0"
  }]
}

resource "aws_lb" "tracetest-alb" {
  name               = "tracetest-alb"
  internal           = false
  load_balancer_type = "application"
  security_groups    = [module.tracetest_alb_security_group.security_group_id]
  subnets            = module.network.public_subnets_ids

  enable_deletion_protection = false
  tags                       = local.tags
}

// creation of the internal ALB
module "internal_tracetest_alb_security_group" {
  source  = "terraform-aws-modules/security-group/aws"
  version = "~> 4.0"

  name        = local.name
  description = "Internal Load balancer security group"
  vpc_id      = module.network.vpc_id

  ingress_with_cidr_blocks = [
    {
      from_port   = 16685
      to_port     = 16685
      protocol    = "tcp"
      description = "HTTP access for Jaeger API"
      cidr_blocks = local.vpc_cidr
      }, {
      from_port   = 4318
      to_port     = 4318
      protocol    = "tcp"
      description = "HTTP access for Jaeger Collector"
      cidr_blocks = local.vpc_cidr
  }]

  egress_with_cidr_blocks = [
    {
      from_port   = 0
      to_port     = 65535
      protocol    = "-1"
      description = "HTTP access to Anywhere"
      cidr_blocks = "0.0.0.0/0"
  }]
}

resource "aws_lb" "internal_tracetest_alb" {
  name               = "tracetest-internal-alb"
  internal           = true
  load_balancer_type = "application"
  security_groups    = [module.internal_tracetest_alb_security_group.security_group_id]
  subnets            = module.network.private_subnets_ids

  enable_deletion_protection = false
  tags                       = local.tags
}

// security groups for services
module "lambda_security_group" {
  source  = "terraform-aws-modules/security-group/aws"
  version = "~> 4.0"

  name        = "${local.name}_lambda_security_group"
  description = "Lambda security group"
  vpc_id      = module.network.vpc_id

  ingress_with_cidr_blocks = [
    {
      from_port   = 0
      to_port     = 65535
      protocol    = "-1"
      description = "HTTP access from anywhere"
      cidr_blocks = "0.0.0.0/0"
  }]

  egress_with_cidr_blocks = [
    {
      from_port   = 0
      to_port     = 65535
      protocol    = "-1"
      description = "HTTP access to anywhere"
      cidr_blocks = "0.0.0.0/0"
  }]
}

module "tracetest_ecs_service_security_group" {
  source  = "terraform-aws-modules/security-group/aws"
  version = "~> 4.0"

  name        = "tracetest_ecs_service_security_group"
  description = "ECS Service security group"
  vpc_id      = module.network.vpc_id

  ingress_with_cidr_blocks = [
    {
      from_port   = 0
      to_port     = 65535
      protocol    = "tcp"
      description = "HTTP access from VPC"
      cidr_blocks = local.vpc_cidr
  }]

  egress_with_cidr_blocks = [
    {
      from_port   = 0
      to_port     = 65535
      protocol    = "-1"
      description = "HTTP access to anywhere"
      cidr_blocks = "0.0.0.0/0"
  }]
}
Enter fullscreen mode Exit fullscreen mode

REST API Definitions

The REST API is composed of two major sections. The first section contains the Node.js handler code for the Lambda functions. The second covers the creation of AWS resources.

Let’s start with the Node.js handler code and the instrumentation code to enable distributed tracing.

You can find the code described in this section in the Tracetest GitHub repo under the examples folder. There you can find the src and api.tf files.

The handler code for this use case is included in the src folder and is composed of the package.json file, the main handler function called hello.js and the tracing.js file which includes the OpenTelemetry instrumentation.

The Node.js Handler Code for the Lambda Function

In this case, we’ll be building a pretty basic Javascript function that returns a “Hello, World!” message on each request.

// src/hello.js
const handler = async (event) => {
  console.log("Event: ", event);
  let responseMessage = "Hello, World!";

  return {
    statusCode: 200,
    headers: {
      "Content-Type": "application/json",
    },
    body: JSON.stringify({
      message: responseMessage,
    }),
  };
};

module.exports.handler = handler;
Enter fullscreen mode Exit fullscreen mode

Instrumenting the Handler Function with OpenTelemetry

Next, let’s add a second Javascript file that will include all of the setup and configuration to enable automated instrumentation for distributed tracing with OpenTelemetry.

// src/tracing.js
const { Resource } = require("@opentelemetry/resources");
const api = require("@opentelemetry/api");
const { BatchSpanProcessor } = require("@opentelemetry/sdk-trace-base");
const { OTLPTraceExporter } = require("@opentelemetry/exporter-trace-otlp-http");
const { NodeTracerProvider } = require("@opentelemetry/sdk-trace-node");
const { registerInstrumentations } = require("@opentelemetry/instrumentation");
const { getNodeAutoInstrumentations } = require("@opentelemetry/auto-instrumentations-node");
const { SemanticResourceAttributes } = require("@opentelemetry/semantic-conventions");

const provider = new NodeTracerProvider({
  resource: new Resource({
    [SemanticResourceAttributes.SERVICE_NAME]: "tracetest",
  }),
});
const spanProcessor = new BatchSpanProcessor(new OTLPTraceExporter());

provider.addSpanProcessor(spanProcessor);
provider.register();

registerInstrumentations({
  instrumentations: [
    getNodeAutoInstrumentations({
      "@opentelemetry/instrumentation-aws-lambda": {
        disableAwsContextPropagation: true,
        eventContextExtractor: (event) => {
          // Tracetest sends the traceparent header as Traceparent
          // passive aggressive author comment 🥴 The propagators doesn't care about changing the casing before matching
          event.headers.traceparent = event.headers.Traceparent;
          const eventContext = api.propagation.extract(api.context.active(), event.headers);

          return eventContext;
        },
      },
    }),
  ],
});
Enter fullscreen mode Exit fullscreen mode

Installing the Dependencies

Now that we have the base JavaScript code ready, its time to include the required modules to run the app, this is the package.json we are going to be using:

# src/package.json
{
  "name": "tracetest-aws",
  "version": "1.0.0",
  "description": "",
  "main": "hello.js",
  "scripts": {
    "test": "echo \"Error: no test specified\" && exit 1"
  },
  "author": "",
  "license": "ISC",
  "dependencies": {
    "@opentelemetry/auto-instrumentations-node": "^0.36.3",
    "@opentelemetry/instrumentation": "^0.35.1",
    "@opentelemetry/resources": "^1.9.1",
    "@opentelemetry/sdk-node": "^0.35.1",
    "@opentelemetry/sdk-trace-node": "^1.9.1",
    "@opentelemetry/semantic-conventions": "^1.9.1"
  }
}
Enter fullscreen mode Exit fullscreen mode

To match the Lambda Node.js versions, switch any 12.x version to install the packages

Then, the last step is to run the npm install command to have all of the packages installed.

Provision a Serverless API with Terraform

To finalize the API section, we will define the required services using Terraform.

// api.tf
resource "aws_lambda_function" "hello_world" {
  function_name = "HelloWorld"

  s3_bucket = aws_s3_bucket.lambda_bucket.id
  s3_key    = aws_s3_object.lambda_hello_world.key

  runtime = "nodejs12.x"
  handler = "hello.handler"
  timeout = 30

  // This allows the tracing.js file to be loaded before the handler hello.js function
/ // to ensure the tracing logic is the first piece of code to be executed
  environment {
    variables = {
      NODE_OPTIONS                = "--require tracing.js"
      OTEL_EXPORTER_OTLP_ENDPOINT = "http://${aws_lb.internal_tracetest_alb.dns_name}:4318"
    }
  }

  source_code_hash = data.archive_file.lambda_hello_world.output_base64sha256

  vpc_config {
    subnet_ids         = module.network.public_subnets_ids
    security_group_ids = [module.lambda_security_group.security_group_id]
  }

  role = aws_iam_role.lambda_exec.arn
}

resource "aws_cloudwatch_log_group" "hello_world" {
  name = "/aws/lambda/${aws_lambda_function.hello_world.function_name}"

  retention_in_days = 30
}

resource "aws_iam_role" "lambda_exec" {
  name = "${local.name}_serverless_lambda"

  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [{
      Action = "sts:AssumeRole"
      Effect = "Allow"
      Sid    = ""
      Principal = {
        Service = "lambda.amazonaws.com"
      }
      }
    ]
  })
}

resource "aws_iam_role_policy_attachment" "lambda_policy" {
  role       = aws_iam_role.lambda_exec.name
  policy_arn = "arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole"
}

resource "aws_iam_role_policy_attachment" "lambda_network_policy_attachment" {
  role       = aws_iam_role.lambda_exec.name
  policy_arn = aws_iam_policy.lambda_network_policy.arn
}

resource "aws_iam_policy" "lambda_network_policy" {
  name   = "${local.name}_network_policy"
  policy = <<EOF
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "ec2:DescribeNetworkInterfaces",
        "ec2:CreateNetworkInterface",
        "ec2:DeleteNetworkInterface",
        "ec2:DescribeInstances",
        "ec2:AttachNetworkInterface"
      ],
      "Resource": "*"
    }
  ]
}
EOF
}

resource "aws_apigatewayv2_api" "lambda" {
  name          = "${local.name}_lambda_gw"
  protocol_type = "HTTP"
}

resource "aws_apigatewayv2_stage" "lambda" {
  api_id = aws_apigatewayv2_api.lambda.id

  name        = local.environment
  auto_deploy = true

  access_log_settings {
    destination_arn = aws_cloudwatch_log_group.api_gw.arn

    format = jsonencode({
      requestId               = "$context.requestId"
      sourceIp                = "$context.identity.sourceIp"
      requestTime             = "$context.requestTime"
      protocol                = "$context.protocol"
      httpMethod              = "$context.httpMethod"
      resourcePath            = "$context.resourcePath"
      routeKey                = "$context.routeKey"
      status                  = "$context.status"
      responseLength          = "$context.responseLength"
      integrationErrorMessage = "$context.integrationErrorMessage"
      }
    )
  }
}

resource "aws_apigatewayv2_integration" "hello_world" {
  api_id = aws_apigatewayv2_api.lambda.id

  integration_uri    = aws_lambda_function.hello_world.invoke_arn
  integration_type   = "AWS_PROXY"
  integration_method = "POST"
}

resource "aws_apigatewayv2_route" "hello_world" {
  api_id = aws_apigatewayv2_api.lambda.id

  route_key = "GET /hello"
  target    = "integrations/${aws_apigatewayv2_integration.hello_world.id}"
}

resource "aws_cloudwatch_log_group" "api_gw" {
  name = "/aws/api_gw/${aws_apigatewayv2_api.lambda.name}"

  retention_in_days = 30
}

resource "aws_lambda_permission" "api_gw" {
  statement_id  = "AllowExecutionFromAPIGateway"
  action        = "lambda:InvokeFunction"
  function_name = aws_lambda_function.hello_world.function_name
  principal     = "apigateway.amazonaws.com"

  source_arn = "${aws_apigatewayv2_api.lambda.execution_arn}/*/*"
}

resource "random_id" "server" {
  byte_length = 8
}

resource "aws_s3_bucket" "lambda_bucket" {
  bucket = "${local.name}-${random_id.server.hex}"
}

resource "aws_s3_bucket_acl" "bucket_acl" {
  bucket = aws_s3_bucket.lambda_bucket.id
  acl    = "private"
}

data "archive_file" "lambda_hello_world" {
  type = "zip"

  source_dir  = "${path.module}/src"
  output_path = "${path.module}/src.zip"
}

resource "aws_s3_object" "lambda_hello_world" {
  bucket = aws_s3_bucket.lambda_bucket.id

  key    = "src.zip"
  source = data.archive_file.lambda_hello_world.output_path

  etag = filemd5(data.archive_file.lambda_hello_world.output_path)
}
Enter fullscreen mode Exit fullscreen mode

This file includes packaging the JavaScript source code, creating the AWS S3 bucket, uploading the package to S3, creating the Lambda functions, and the AWS API Gateway requirements to expose the functionality to the public using REST.

Telemetry Side Cart Definitions

In this section, we’ll focus on creating the infrastructure to handle telemetry data on AWS. We’ll be following a serverless containerized approach using ECS Fargate to run tasks.

The ECS Cluster

The first thing we need to do is create the cluster to include all the tasks and services we need for the side cart.

# cluster.tf
// creating the ecs cluster
resource "aws_ecs_cluster" "tracetest-cluster" {
  name = "tracetest"
  tags = local.tags
}

// generic task execution role for both jaeger and tracetest
resource "aws_iam_role" "tracetest_task_execution_role" {
  name = "tracetest_task_execution_role"

  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Action = "sts:AssumeRole"
        Effect = "Allow"
        Sid    = ""
        Principal = {
          Service = ["ecs-tasks.amazonaws.com", "ecs.amazonaws.com"]
        }
      },
    ]
  })
  managed_policy_arns = ["arn:aws:iam::aws:policy/service-role/AmazonEC2ContainerServiceRole"]

  tags = local.tags
}

resource "aws_iam_role_policy" "tracetest_task_execution_role_policy" {
  name = "tracetest_task_execution_role_policy"
  role = aws_iam_role.tracetest_task_execution_role.id

  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Action = [
          "logs:PutLogEvents",
          "logs:CreateLogGroup",
          "logs:CreateLogStream",
          "logs:DescribeLogStreams",
          "logs:DescribeLogGroups",
        ]
        Effect   = "Allow"
        Resource = "*"
      },
    ]
  })
}
Enter fullscreen mode Exit fullscreen mode

Creating the Jaeger Service and Task

Once the cluster is defined, we can focus on adding all services and tasks we need for our telemetry data. In this case, we’ll use the [jaeger-all-in-one Docker image](https://www.jaegertracing.io/docs/1.43/getting-started/#all-in-one) as it provides everything we need for a basic demo.

// jaeger.tf

// creating the task definition
resource "aws_ecs_task_definition" "jaeger" {
  family                   = "${local.name}_jaeger"
  requires_compatibilities = ["FARGATE"]
  network_mode             = "awsvpc"
  cpu                      = 1024
  memory                   = 2048
  execution_role_arn       = aws_iam_role.tracetest_task_execution_role.arn
  container_definitions = jsonencode([
    {
      "name" : "jaeger",
      # using the all in one docker image
      "image" : "jaegertracing/all-in-one:1.42",
      "cpu" : 1024,
      "memory" : 2048,
      "essential" : true,
      // enabling the OTLP listeners
      "environment" : [{
        "name" : "COLLECTOR_OTLP_ENABLED",
        "value" : "true"
      }],
      // exposing the required ports
      "portMappings" : [
        {
          "hostPort" : 14269,
          "protocol" : "tcp",
          "containerPort" : 14269
        },
        {
          "hostPort" : 14268,
          "protocol" : "tcp",
          "containerPort" : 14268
        },
        {
          "hostPort" : 6832,
          "protocol" : "udp",
          "containerPort" : 6832
        },
        {
          "hostPort" : 6831,
          "protocol" : "udp",
          "containerPort" : 6831
        },
        {
          "hostPort" : 5775,
          "protocol" : "udp",
          "containerPort" : 5775
        },
        {
          "hostPort" : 14250,
          "protocol" : "tcp",
          "containerPort" : 14250
        },
        {
          "hostPort" : 16685,
          "protocol" : "tcp",
          "containerPort" : 16685
        },
        {
          "hostPort" : 5778,
          "protocol" : "tcp",
          "containerPort" : 5778
        },
        {
          "hostPort" : 16686,
          "protocol" : "tcp",
          "containerPort" : 16686
        },
        {
          "hostPort" : 9411,
          "protocol" : "tcp",
          "containerPort" : 9411
        },
        {
          "hostPort" : 4318,
          "protocol" : "tcp",
          "containerPort" : 4318
        },
        {
          "hostPort" : 4317,
          "protocol" : "tcp",
          "containerPort" : 4317
        }
      ],
      "logConfiguration" : {
        "logDriver" : "awslogs",
        "options" : {
          "awslogs-create-group" : "true",
          "awslogs-group" : "/ecs/jaeger",
          "awslogs-region" : "us-west-2",
          "awslogs-stream-prefix" : "ecs"
        }
      },
    }
  ])
}

// creating the jaeger service
resource "aws_ecs_service" "jaeger_service" {
  name            = "jaeger-service"
  cluster         = aws_ecs_cluster.tracetest-cluster.id
  task_definition = aws_ecs_task_definition.jaeger.arn
  desired_count   = 1
  launch_type     = "FARGATE"

// defining three entry points to the Jaeger task using albs
  load_balancer {
    target_group_arn = aws_lb_target_group.tracetest-jaeger-tg.arn
    container_name   = "jaeger"
    container_port   = 16686
  }

  load_balancer {
    target_group_arn = aws_lb_target_group.tracetest-jaeger-api-tg.arn
    container_name   = "jaeger"
    container_port   = 16685
  }

  load_balancer {
    target_group_arn = aws_lb_target_group.tracetest-jaeger-collector-tg.arn
    container_name   = "jaeger"
    container_port   = 4318
  }

  network_configuration {
    subnets          = module.network.private_subnets_ids
    security_groups  = [module.tracetest_ecs_service_security_group.security_group_id]
    assign_public_ip = false
  }
}

resource "aws_lb_target_group" "tracetest-jaeger-api-tg" {
  name             = "tracetest-jaeger-api-tg"
  port             = 16685
  protocol         = "HTTP"
  protocol_version = "GRPC"
  vpc_id           = module.network.vpc_id
  target_type      = "ip"
}

resource "aws_lb_target_group" "tracetest-jaeger-collector-tg" {
  name        = "tracetest-jaeger-collector-tg"
  port        = 4318
  protocol    = "HTTP"
  vpc_id      = module.network.vpc_id
  target_type = "ip"

  health_check {
    path              = "/"
    port              = "4318"
    protocol          = "HTTP"
    healthy_threshold = 2
    matcher           = "200-499"
  }
}

resource "aws_lb_target_group" "tracetest-jaeger-tg" {
  name        = "tracetest-jaeger-tg"
  port        = 16686
  protocol    = "HTTP"
  vpc_id      = module.network.vpc_id
  target_type = "ip"
}

// alb listeners for the public and internal ALBS, depending on port
resource "aws_lb_listener" "tracetest-jaeger-alb-listener" {
  load_balancer_arn = aws_lb.tracetest-alb.arn
  port              = "16686"
  protocol          = "HTTP"

  default_action {
    type             = "forward"
    target_group_arn = aws_lb_target_group.tracetest-jaeger-tg.arn
  }
}

resource "aws_lb_listener" "tracetest-jaeger-collector-alb-listener" {
  load_balancer_arn = aws_lb.internal_tracetest_alb.arn
  port              = "4318"
  protocol          = "HTTP"

  default_action {
    type             = "forward"
    target_group_arn = aws_lb_target_group.tracetest-jaeger-collector-tg.arn
  }
}

resource "aws_lb_listener" "tracetest-jaeger-api-alb-listener" {
  load_balancer_arn = aws_lb.internal_tracetest_alb.arn
  port              = "16685"
  protocol          = "HTTPS"
  certificate_arn   = aws_acm_certificate.cert.arn

  default_action {
    type             = "forward"
    target_group_arn = aws_lb_target_group.tracetest-jaeger-api-tg.arn
  }
}

resource "tls_private_key" "tracetest_private_key" {
  algorithm = "RSA"
}

resource "tls_self_signed_cert" "tracetest_self_signed_cert" {
  private_key_pem = tls_private_key.tracetest_private_key.private_key_pem

  subject {
    common_name  = "tracetest.com"
    organization = "Tracetest"
  }

  validity_period_hours = 720

  allowed_uses = [
    "key_encipherment",
    "digital_signature",
    "server_auth",
  ]
}

resource "aws_acm_certificate" "cert" {
  private_key      = tls_private_key.tracetest_private_key.private_key_pem
  certificate_body = tls_self_signed_cert.tracetest_self_signed_cert.cert_pem
}
Enter fullscreen mode Exit fullscreen mode

To enable Jaeger’s gRPC OTLP endpoint and expose it through an Application Load Balancer (ALB), you have to provide a valid certificate, as ALBs require HTTPS as protocol and a valid cert to work with GRPC.

Deploying Tracetest to ECS Fargate

One of Tracetest’s biggest goals is to provide numerous easy ways to deploy a Tracetest instance anywhere. The goal is to fit into any user’s current setup with minimal edits, or none at all. The case of ECS Fargate is no different. It’s a user request that was turned into a use case!

When deploying Tracetest in a container-based infrastructure you can configure the Docker image using environment variables. This makes the process easier, because you do not need to provide any configuration files.

// tracetest.tf

// creating the Tracetest task definition
resource "aws_ecs_task_definition" "tracetest" {
  family                   = "tracetest"
  requires_compatibilities = ["FARGATE"]
  network_mode             = "awsvpc"
  cpu                      = 1024
  memory                   = 2048
  execution_role_arn       = aws_iam_role.tracetest_task_execution_role.arn
  container_definitions = jsonencode([
    {
      "name" : "${local.name}",
      "image" : "${local.tracetest_image}",
      "cpu" : 1024,
      "memory" : 2048,
      "essential" : true,
      "portMappings" : [
        {
          "containerPort" : 11633,
          "hostPort" : 11633,
          "protocol" : "tcp"
        }
      ],
      // configuring and provisioning Tracetest using env variables
      "environment" : [
        {
          "name" : "TRACETEST_POSTGRES_HOST",
          "value" : "${module.db.db_instance_address}"
        },
        {
          "name" : "TRACETEST_POSTGRES_PORT",
          "value" : "${tostring(module.db.db_instance_port)}"
        },
        {
          "name" : "TRACETEST_POSTGRES_DBNAME",
          "value" : "${module.db.db_instance_name}"
        },
        {
          "name" : "TRACETEST_POSTGRES_USER",
          "value" : "${module.db.db_instance_username}"
        },
        {
          "name" : "TRACETEST_POSTGRES_PASSWORD",
          "value" : "${module.db.db_instance_password}"
        },
        {
          "name" : "TRACETEST_PROVISIONING",
          "value" : base64encode(local.provisioning),
        }
      ],
      "logConfiguration" : {
        "logDriver" : "awslogs",
        "options" : {
          "awslogs-create-group" : "true",
          "awslogs-group" : "/ecs/tracetest",
          "awslogs-region" : "us-west-2",
          "awslogs-stream-prefix" : "ecs"
        }
      },
    }
  ])
}

// tracetest service
resource "aws_ecs_service" "tracetest-service" {
  name            = "${local.name}-service"
  cluster         = aws_ecs_cluster.tracetest-cluster.id
  task_definition = aws_ecs_task_definition.tracetest.arn
  desired_count   = 1
  launch_type     = "FARGATE"

  // exposing the UI and API listeners using ALB
  load_balancer {
    target_group_arn = aws_lb_target_group.tracetest-tg.arn
    container_name   = "tracetest"
    container_port   = 11633
  }

  network_configuration {
    subnets          = module.network.private_subnets_ids
    security_groups  = [module.tracetest_ecs_service_security_group.security_group_id]
    assign_public_ip = false
  }
}

// Postgres RDS instance used as main store
module "db" {
  source = "terraform-aws-modules/rds/aws"

  identifier = local.name

  engine               = "postgres"
  engine_version       = "14"
  family               = "postgres14"
  major_engine_version = "14"
  instance_class       = "db.t4g.micro"

  allocated_storage     = 20
  max_allocated_storage = 100

  db_name  = local.db_name
  username = local.db_username
  port     = 5432

  create_db_subnet_group = true
  subnet_ids             = module.network.private_subnets_ids

  vpc_security_group_ids = [module.db_security_group.security_group_id]
  deletion_protection    = false

  tags = local.tags
}

module "db_security_group" {
  source  = "terraform-aws-modules/security-group/aws"
  version = "~> 4.0"

  name        = local.name
  description = "PostgreSQL security group"
  vpc_id      = module.network.vpc_id

  ingress_with_cidr_blocks = [
    {
      from_port   = 5432
      to_port     = 5432
      protocol    = "tcp"
      description = "PostgreSQL access from within VPC"
      cidr_blocks = local.vpc_cidr
    },
  ]

  tags = local.tags
}

resource "aws_lb_target_group" "tracetest-tg" {
  name        = "tracetest-tg"
  port        = 11633
  protocol    = "HTTP"
  vpc_id      = module.network.vpc_id
  target_type = "ip"
}

// allowing public access to Tracetest
resource "aws_lb_listener" "tracetest-alb-listener" {
  load_balancer_arn = aws_lb.tracetest-alb.arn
  port              = "11633"
  protocol          = "HTTP"

  default_action {
    type             = "forward"
    target_group_arn = aws_lb_target_group.tracetest-tg.arn
  }
}
Enter fullscreen mode Exit fullscreen mode

Take a moment to look at this part of the file:

{
  "name" : "TRACETEST_PROVISIONING",
  "value" : base64encode(local.provisioning),
}
Enter fullscreen mode Exit fullscreen mode

The provisioning configuration for Tracetest can be found in the variables.tf file under the locals configuration which uses the internal load balancer DNS endpoint to build the data store entry.

provisioning = <<EOF
---
type: PollingProfile
spec:
  name: default
  strategy: periodic
  default: true
  periodic:
    retryDelay: 5s
    timeout: 10m

---
type: DataStore
spec:
  name: jaeger
  type: jaeger
  jaeger:
    endpoint: ${aws_lb.internal_tracetest_alb.dns_name}:16685
    tls:
      insecure_skip_verify: true
  EOF

  tags = {
    Name    = local.name
    Example = local.name
  }
}

  EOF
Enter fullscreen mode Exit fullscreen mode

This file is base64 encoded and passed into the TRACETEST_PROVISIONING environment variable. To learn more about provisioning a Tracetest service, check out the official docs.

Defining Terraform Outputs

The final setup step is to generate outputs to provide links that we can use to reach the generated services.

// outputs.tf
output "api_endpoint" {
  value       = "${aws_apigatewayv2_stage.lambda.invoke_url}/hello"
  description = "The API endpoint"
}

output "tracetest_url" {
  value       = "http://${aws_lb.tracetest-alb.dns_name}:11633"
  description = "Tracetest public URL"
}

output "jaeger_ui_url" {
  value       = "http://${aws_lb.tracetest-alb.dns_name}:16686"
  description = "Jaeger public URL"
}

output "internal_jaeger_api_url" {
  value       = "${aws_lb.internal_tracetest_alb.dns_name}:16685"
  description = "Jaeger internal API URL"
}
Enter fullscreen mode Exit fullscreen mode

Provisioning the AWS Infra

Finally! We have written down every required definition piece we need to generate the AWS resources, now it's time to take it to the next level by actually creating them using the Terraform CLI.

The first step is to initialize the Terraform project and install dependencies.

terraform init

[Output]
Initializing modules...

Initializing the backend...

Initializing provider plugins...
- Reusing previous version of hashicorp/tls from the dependency lock file
- Reusing previous version of hashicorp/random from the dependency lock file
- Reusing previous version of hashicorp/archive from the dependency lock file
- Reusing previous version of hashicorp/aws from the dependency lock file
- Using previously-installed hashicorp/aws v4.55.0
- Using previously-installed hashicorp/tls v4.0.4
- Using previously-installed hashicorp/random v3.4.3
- Using previously-installed hashicorp/archive v2.3.0

Terraform has been successfully initialized!

You may now begin working with Terraform. Try running "terraform plan" to see
any changes that are required for your infrastructure. All Terraform commands
should now work.

If you ever set or change modules or backend configuration for Terraform,
rerun this command to reinitialize your working directory. If you forget, other
commands will detect it and remind you to do so if necessary.
Enter fullscreen mode Exit fullscreen mode

The second step is to run the apply command.

terraform apply
Enter fullscreen mode Exit fullscreen mode

This is going to prompt you with the question to continue or not, as well as show a list of changes it will execute.

Plan: 100 to add, 0 to change, 0 to destroy.

Changes to Outputs:
  + tracetest_url = (known after apply)
  + jaeger_ui_url = (known after apply)
  + internal_jaeger_api_url = (known after apply)
  + api_endpoint = (known after apply)

Do you want to perform these actions?
  Terraform will perform the actions described above.
  Only 'yes' will be accepted to approve.

  Enter a value:
Enter fullscreen mode Exit fullscreen mode

By explicitly typing yes and hitting enter, the provisioning process will start.

This is where you can go grab lunch, walk your dog or do any pending chores… or just stare at the Terminal until it’s done 😅.

You will see something similar to this when the provisioning is done.

https://res.cloudinary.com/djwdcmwdz/image/upload/v1681237854/Blogposts/tracetest_aws_terraform/terraform_output_y9xbh3.png

You’ll see four URLs as outputs, the same ones we defined in the Terraform files.

We can go one by one to validate that everything is working as expected. First, let’s try following the jaeger_ui_url. You should be able to see the angry Gopher and the Jaeger search page.

https://res.cloudinary.com/djwdcmwdz/image/upload/v1681226943/Blogposts/tracetest_aws_terraform/ready_jaeger_cb053c.png

After that, we can continue with the tracetest_url, where you should see the Tracetest dashboard page with an empty test list.

https://res.cloudinary.com/djwdcmwdz/image/upload/v1681226941/Blogposts/tracetest_aws_terraform/ready_tracetest_q2ftbq.png

From the /settings page we can validate that Tracetest has access to the Jaeger GRPC endpoint by clicking the Test Connection button.

https://res.cloudinary.com/djwdcmwdz/image/upload/v1681226943/Blogposts/tracetest_aws_terraform/ready_test_connection_pmdky7.png

Lastly, we can use the following CURL command to validate the api_endpoint.

curl <api_endpoint>

[Output]
{"message": "Hello, World!"}
Enter fullscreen mode Exit fullscreen mode

Good job 🎉! With this set of steps, we have validated that the AWS Infrastructure is ready. In the following section, we’ll move on to validating the tracing data.

Trace-based Testing with Tracetest

You made it! You’ve crossed the desert of creating the Terraform definitions and conquered the fires of waiting an eternity to provision AWS services. Now it’s time to have some fun!

Let’s add some validations based on the Node.js Lambda telemetry. To do so, we’ll create a test based on the following checks:

  1. There should be one span of type FaaS (Function as a Service) called HelloWorld.
  2. The HelloWorld span should take less than 1 second.
# tests/test.yaml

type: Test
spec:
  id: 4iiL0j1Vg
  name: Hello World
  trigger:
    type: http
    httpRequest:
      url: <your_api_endpoint>
      method: GET
      headers:
        - key: Content-Type
          value: application/json
  specs:
    - selector: span[tracetest.span.type="faas" name="HelloWorld"]
      assertions:
        - attr:tracetest.selected_spans.count = 1
        - attr:tracetest.span.duration < 1s
Enter fullscreen mode Exit fullscreen mode

Save this file in your tests folder as test.yaml.

Remember to update the <your_api_endpoint section of the test with the api_endpoint Terraform output.

Running the Trace-based Test

First, configure the Tracetest CLI to point to the proper public ALB and Tracetest port, using the following:

# Replace the <tracetest_url> placeholder for the Terraform output
tracetest configure --endpoint <tracetest_url> --analytics 
Enter fullscreen mode Exit fullscreen mode

Then, create and run the test with this command:

tracetest test run -d tests/test.yaml

[Output]
✔ Hello World (<tracetest_url>/test/4iiL0j1Vg/run/1/test)
Enter fullscreen mode Exit fullscreen mode

Finally, follow the output response URL from the run command to find the test run alongside the test specs we just created.

https://res.cloudinary.com/djwdcmwdz/image/upload/v1681226942/Blogposts/tracetest_aws_terraform/ready_trace_mmajdx.png

Tracetest allows you to create assertions against any span from the trace, using the selector query language to pick exactly the spans you want to validate to then add checks based on attributes from the matched spans.

Not only that, but you can create, run and validate your trace-based tests entirely from the CLI, enabling your team to have them be part of your CI/CD pipelines.

Conclusion

At Tracetest we share a mix of different technical backgrounds, we have team members like the awesome Matheus with distributed Mobile Game services experience, or Jorge with experience in huge enterprise front-end applications.

Every single one of us has been in a position where we are moments away from throwing our computer in the air to test if airplane mode actually transforms it into an actual plane! It always boils down to wishing we could have better ways of understanding what’s happening in the code.

With that in mind, I wanted to display the power of trace-based testing and how it can help teams deliver quality software and alleviate those nightmare scenarios we have all been through.

In summary, you’ve learned how to:

  1. Provision an Observability side cart with Tracetest and Jaeger on AWS Fargate using Terraform.
  2. Write, provision and instrument an AWS Lambda function with Node.js using OpenTelemetry and Terraform.
  3. Use trace-based testing with Tracetest to validate the generated telemetry data.

Last, but not least - if would you like to learn more about Tracetest and what it brings to the table? Check the docs and try it out by downloading it today!

Also, please feel free to join our Discord community, give Tracetest a star on GitHub, or schedule a time to chat 1:1.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .