Distributed Tracing and OpenTelemetry Guide

Daniele Iasella - Sep 29 '23 - - Dev Community

Microservices have become popular for modern web applications since they provide many benefits over traditional monolithic architectures. However, microservices are not a silver bullet; they also have a fair share of challenges. For example, debugging and troubleshooting errors in microservices can be challenging since tracking the request flow across multiple services is difficult.

That's where distributed tracing and OpenTelemetry come in. OpenTelemetry is an Observability framework designed to create and manage telemetry data like traces, metrics, and logs from distributed systems. So, in this article, I will take you through the steps of using OpenTelemetry within a Node.js ecosystem to trace your microservices applications effectively.

What is distributed tracing

The complexity of microservices makes it difficult to track the request path through multiple microservices. Distributed tracing is an observability technique used to track these requests across microservices. In other words, we can define it as a flashlight that helps you understand the request flow across your system.

Distributed tracing is beneficial for developers in many scenarios. For example, there can be a single microservice with a slow response time, slowing down the whole application. Tracing data lets you pinpoint the exact origin and easily troubleshoot the issue.

Benefits of using distributed tracing:

  • Identify performance bottlenecks.
  • Provides a comprehensive view of the system.
  • Provides insights into the dependencies between different services.
  • Identify potential security vulnerabilities.
  • Support both synchronous (gRPC, REST, GraphQL) and asynchronous (event sourcing, pub-sub) application architectures.

Components of distributed tracing?

A typical distributed tracing system is built up with the below components:

  • Trace: End-to-end path of a single user request as it moves through various services.
  • Span: A single operation or unit of work within a distributed system. It captures information like start time, end time, metadata, or annotation that might be useful to understand what is happening.
  • Context Propagation: Passing contextual information between different services within a distributed system. It is essential for connecting spans to construct a complete trace of a request.

Since you now have a brief idea of what distributed tracing is, let's see how to implement distributed tracing with Node.js.

Instrumenting Node.js app with OpenTelemetry

In this example, I will create 3 Node.js services (shipping, notification, and courier) using Amplication, add traces to all services, and show how to analyze trace data using Jaeger.

Step 1: Generating services using Amplication

As the first step, you must create the Node.js services with Amplication. In this example, I will be using three already created Prisma schemas. You can find those schemas in this GitHub repository.

Once you are ready with schemas, go to the Amplication dashboard and create a new Project.

Then, select the project from the dashboard and connect the GitHub repository with Prisma schemas to that project.

Now, you can start creating services. For that, return to the Amplication dashboard and click the Add Resources button.

Then, enter the necessary information to make the service. In this case, I have named the services as "courier gateway service" with the below settings:

  • Git Repository: I've used the GitHub repo, which I connected earlier.

  • REST or GraphQL: I've enabled both options to show the file structure generated by Amplication.

  • Repo type: Monorepo.

  • Database: PostgreSQL

  • Authentication: Included

It will take a few seconds to generate the service.

After that, you need to modify a few database settings to avoid collision between databases when sharing the same Docker service. For that, navigate to the Plugins tab, select the PostgreSQL DB plugin, and click the Settings button.

There, you will see a JSON file like below, and you need to update the dbName property. Here, I have renamed it to courier.

Then, go back to the Entities tab and import the courier Prisma schema to generate the entities related to the courier service.

Once the schema is imported, you will see 2 new entities named Parcel and Quote in the Entities tab.

Now, perform the same steps again for the other two services.

Step 2: Adding a Kafka integration

In this example, I will use a Message Broker to communicate between these services. You can easily generate a Message Broker through Amplication by clicking the Add Resource button and selecting the Message Broker option.

Then, go back to the shipping service and install the Kafka plugin to allow the shipping service to use the Message Broker.

Then, go to the Connections tab and select the Message pattern as Send to allow the shipping service to send messages.

Similarly, go to the notification service and select Message pattern as Receive to subscribe to the Message Broker.

Step 3: Building the application

Click the Commit change & build button to finalize the changes. It will start the build process, generate the new files in the Git repo, and create a pull request.

Note: Make sure to merge the pull request to the main branch to get the latest updates. 

Step 4: Configuring Docker compose

Each service generated by Amplication contains a separate Docker compose file. But, in this example, I want to share the same database with all services. Hence, I created a new Docker compose file by coping the content of the docker-compose files generated by amplication.



version: "3"
name: otel-workshop
services:
  # Shared DB for all services
  db:
    image: postgres:12
    ports:
      - 5432:5432
    environment:
      POSTGRES_USER: admin
      POSTGRES_PASSWORD: admin
    volumes:
      - postgres:/var/lib/postgresql/data

  # Jaeger
  jaeger-all-in-one:
    image: jaegertracing/all-in-one:latest
    ports:
      - "16686:16686"
      - "14268"
      - "14250"
  # Collector
  collector-gateway:
    image: otel/opentelemetry-collector:latest
    volumes:
      - ./collector-gateway.yml:/etc/collector-gateway.yaml
    command: ["--config=/etc/collector-gateway.yaml"]
    ports:
      - "1888:1888" # pprof extension
      - "13133:13133" # health_check extension
      - "4317:4317" # OTLP gRPC receiver
      - "4318:4318" # OTLP HTTP receiver
      - "55670:55679" # zpages extension
    depends_on:
      - jaeger-all-in-one

  kafka-ui:
    container_name: kafka-ui
    image: provectuslabs/kafka-ui:latest
    ports:
      - "8080:8080"
    depends_on:
      - zookeeper
      - kafka
    environment:
      KAFKA_CLUSTERS_0_NAME: local
      KAFKA_CLUSTERS_0_BOOTSTRAPSERVERS: kafka:29092
      KAFKA_CLUSTERS_0_ZOOKEEPER: zookeeper:2181
      KAFKA_CLUSTERS_0_JMXPORT: 9997

  zookeeper:
    image: confluentinc/cp-zookeeper:7.3.1
    environment:
      ZOOKEEPER_CLIENT_PORT: 2181
      ZOOKEEPER_TICK_TIME: 2000
    ports:
      - "2181:2181"

  kafka:
    image: confluentinc/cp-kafka:7.3.1
    depends_on:
      - zookeeper
    ports:
      - "9092:9092"
      - "9997:9997"
    environment:
      KAFKA_BROKER_ID: 1
      KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
      KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://kafka:29092,PLAINTEXT_HOST://localhost:9092
      KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: PLAINTEXT:PLAINTEXT,PLAINTEXT_HOST:PLAINTEXT
      KAFKA_INTER_BROKER_LISTENER_NAME: PLAINTEXT
      KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
      KAFKA_MESSAGE_MAX_BYTES: 10485760
      JMX_PORT: 9997
      KAFKA_JMX_OPTS: -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Djava.rmi.server.hostname=kafka -Dcom.sun.management.jmxremote.rmi.port=9997
    healthcheck:
      test: nc -z localhost 9092 || exit -1
      start_period: 15s
      interval: 30s
      timeout: 10s
      retries: 10

volumes:
  postgres: ~


Enter fullscreen mode Exit fullscreen mode

I don't need to specify a collector getaway in the above configuration since I'm using jaeger-all-in-one. However, I have specified a collector getaway to highlight the receiver's components and ports.

Run docker-compose up --detach command to start Docker.

Step 5: Configuring services for local development

Now, you need to set up all 3 services for local development. For that, you just need to follow the instructions given in the README.md file.

Note:: You don't need to run `_npm run docker:dev` command since Docker is already running_

Once all the databases are initialized and dependencies are installed, start each service using npm run start:watch command and the courier-gateway-service-admin using npm run start command.

Note: You need to update the ports of each service from the _.env_ files to avoid clashes between the services.

Step 6: Creating a Parcel through admin view

You can easily create a new Parcel by logging into the admin view.



.

Step 7: Connecting the services

First, navigate to the shipping service and install axios using npm install axios command. Then, add the below code to the shipping-service/src/shipment/shipment.service.ts file to get parcel details. 



import { Injectable } from "@nestjs/common";
import { PrismaService } from "../prisma/prisma.service";
import { ShipmentServiceBase } from "./base/shipment.service.base";
import { Prisma, Shipment } from "@prisma/client";
import axios from "axios";
import { KafkaProducerService } from "../kafka/kafka.producer.service";
import { ShippingEvent } from "./shipping.event";
import { MyMessageBrokerTopics } from "../kafka/topics";

@Injectable()
export class ShipmentService extends ShipmentServiceBase {
  constructor(
    protected readonly prisma: PrismaService,
    private readonly kafkaProducerService: KafkaProducerService
  ) {
    super(prisma);
  }

  async create<T extends Prisma.ShipmentCreateArgs>(
    args: Prisma.SelectSubset<T, Prisma.ShipmentCreateArgs>
  ): Promise<Shipment> {
    const {
      data: { accessToken },
    } = await axios.post(http://localhost:3002/api/login , {
      username: "admin",
      password: "admin",
    });

    const { data: parcels } = await axios.get(
      http://localhost:3002/api/parcels ,
      {
        params: {},
        headers: {
          Authorization: Bearer ${accessToken} ,
        },
      }
    );

    const randomParcel = Math.floor(Math.random() * parcels.length);

    const shipment = await super.create<T>({
      ...args,
      data: {
        ...args.data,
        price: parcels[randomParcel].price,
      },
    });

    const event: ShippingEvent = {
      Message: Shipment id: ${shipment.id} ,
      CustomerId: "1b2c",
    };

    await this.kafkaProducerService.emitMessage(
      MyMessageBrokerTopics.ShipmentCreateV1,
      {
        key: shipment.id,
        value: event,
      }
    );

    return shipment;
  }
}



Enter fullscreen mode Exit fullscreen mode

Step 8: Creating a client app

Before starting instrumenting, let's create a client application to get shipment data. This can be a simple Node.js project with a main.js file containing the code to fetch shipment data. 



// main.js

"use strict";
const axios = require("axios");

const url = "http://localhost:3004/api/shipments";
const numberOfRequests = 5;

const makeRequest = async (requestId) => {
  const result = await axios.post(url);
  return result;
};

const main = async () => {
  for (let i = 0; i < numberOfRequests; i++) {
    const res = await makeRequest(i);
    console.log("Response", res.data);
  }
};

main();



Enter fullscreen mode Exit fullscreen mode

Step 9: Adding tracing

Create a new file named tracing.js in the same directory where you created the main.js file to fetch shipment data. Then, install all the OpenTelemetry dependencies using the below command:



ls
npm install @opentelemetry/sdk-node \
  @opentelemetry/api \
  @opentelemetry/resources\
  @opentelemetry/semantic-conventions \
  @opentelemetry/instrumentation-http



Enter fullscreen mode Exit fullscreen mode

Add the below code to the tracing.js file.



const {
  BasicTracerProvider,
  SimpleSpanProcessor,
} = require("@opentelemetry/sdk-trace-base");
const { Resource } = require("@opentelemetry/resources");
const {
  SemanticResourceAttributes,
} = require("@opentelemetry/semantic-conventions");
const { trace } = require("@opentelemetry/api");
const {
  OTLPTraceExporter,
} = require("@opentelemetry/exporter-trace-otlp-http");
const { NodeSDK } = require("@opentelemetry/sdk-node");
const { HttpInstrumentation } = require("@opentelemetry/instrumentation-http");
const { B3Propagator } = require("@opentelemetry/propagator-b3");

const exporter = new OTLPTraceExporter({});

const getTracer = () => {
  return trace.getTracer("default");
};

const sdk = new NodeSDK({
  resource: new Resource({
    [SemanticResourceAttributes.SERVICE_NAME]: "fake-client-app",
    [SemanticResourceAttributes.SERVICE_VERSION]: "0.1.0",
  }),
  spanProcessor: new SimpleSpanProcessor(exporter),
  traceExporter: exporter,
  instrumentations: [new HttpInstrumentation()],
  textMapPropagator: new B3Propagator(),
});

sdk.start();

module.exports = { getTracer };



Enter fullscreen mode Exit fullscreen mode

To start tracing, you need to import the above file to the main.js file and do some modifications. The updated main.js file will look like below:



js
"use strict";
const { getTracer } = require("./tracing");
const axios = require("axios");
const { trace } = require("@opentelemetry/api");

const tracer = getTracer("fake-client");

const url = "http://localhost:3004/api/shipments";
const numberOfRequests = 1;

const makeRequest = async (requestId) => {
  return tracer.startActiveSpan("makeRequests", async (span) => {
    span.updateName(makeRequests-${requestId} );
    const result = await axios.post(url);
    span.end();
    return result;
  });
};

tracer.startActiveSpan("main", async (span) => {
  for (let i = 0; i < numberOfRequests; i++) {
    const res = await makeRequest(i);
    console.log("Response", res.data);
  }
  span.end();
});


Enter fullscreen mode Exit fullscreen mode

Now, you can run the client application with the node main.js command and monitor the trace data with Jeager.



.

That's it. You successfully created a Node. js-based microservices application using Amplication, configuring tracing, and monitoring trace data through Jeager. You can find the complete code example in GitHub and watch the video below to understand the code used for tracing.

Step 10: Adding tracing to the generated services

As Amplication now supports OpenTelemetry through a plugin, we will leverage the plugin to integrate all the services without much effort.

Go to each service starting from the shipping service and install the OpenTelemetry plugin.

Click the Commit change & build button to finalize the changes. It will start the build process again, generate the new files and update existing ones in the Git repo, and create/update a pull request.

Now try to perform new requests as before and observe the tracing data in Jaeger!

Watch Webinar

I took a live workshop few weeks ago on Distributed Tracing and Open Telemetry. You can watch it here: https://www.youtube.com/watch?v=Pu-HiD2QksI

Best practices to follow

  • Prioritize critical paths and high-impact services.
  • Use consistent and meaningful naming conventions for spans, and services.
  • Ensure that trace context is propagated across service boundaries. This typically involves adding trace headers to HTTP requests or message headers.
  • Use tags and annotations to add additional metadata to spans.
  • Implement adaptive sampling strategies that adjust the sampling rate based on the service's load, and error rates.
  • Automatically capture and log errors.
  • Retain trace data for an appropriate period.

Conclusion

This guide provided an overview of implementing tracing for Node.js-based microservices applications. As you can see, enabling tracing for your application requires little effort. But it can save you a whole lot of troubleshooting and debugging time. Thank you for reading.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .