Lambda Fleet Monitoring with OpenSearch: Real-Time Insights at Scale

Yaar Naumenko - Feb 17 - - Dev Community

Do you manage multiple AWS accounts with countless Lambda functions — and feel overwhelmed by the complexity of monitoring them all?
Look no further. The Lambda Fleet Monitoring Solution is a fully automated cross-account approach that tracks real-time metrics (invocations, errors, duration, and even cold starts) and funnels them into an OpenSearch cluster for robust analysis and visualization.
This article walks through this solution's architecture, features, and setup. To dive deeper into the code and additional details, check out the opensearch-monitoring GitHub repository.

Why This Matters

As serverless adoption grows, monitoring Lambda metrics becomes increasingly challenging, especially if you have multiple AWS accounts.

With the Lambda Fleet Monitoring Solution, you gain:
Visibility into every function’s performance and execution patterns.
Centralized dashboards for easier troubleshooting.
Scalability that covers as many AWS accounts as you need.

High-Level Architecture

Image description

Key Components:

  1. Amazon EventBridge: Schedules the monitoring Lambda to run on a configurable interval.
  2. Monitoring Lambda: Assumes roles in other AWS accounts to gather CloudWatch metrics and push them to OpenSearch.
  3. OpenSearch Domain: Serves as the data store for all metrics.
  4. OpenSearch Dashboards: Provides out-of-the-box (and customizable) visualization tools. Core Features • Cross-Account Monitoring: Leverage IAM roles to gather data from multiple AWS accounts. • Real-Time Metrics: Track invocation rates, error counts, memory usage, duration statistics, cold starts, etc. • Custom Dashboards: Quickly visualize performance trends and identify anomalies. • Automated Setup: Minimal manual configuration required — Terraform automates resource creation. • Customizable Alerts: Integrate with AWS services or third-party tools for alerting on critical thresholds. • Memory & Timeout Insights: Optimize Lambda performance and costs based on usage patterns.

Metrics You’ll See

  1. Invocation Count
  2. Error Rates
  3. Duration Statistics
  4. Memory Utilization
  5. Cold Start Frequency
  6. Timeout Proximity
  7. Runtime Distribution
  8. Cost Metrics

Prerequisites
To get started, ensure you have:
• AWS CLI configured with the right permissions.
• Terraform v1.5.0+ installed.
• Python 3.9+ installed.
• Cross-account IAM roles set up in each AWS account you wish to monitor.
• Permission to create:
• Lambda functions
• OpenSearch domains
• IAM roles and policies
• CloudWatch events
• S3 buckets

QuickStart Installation

Clone the Repository

git clone https://github.com/cloudon-one/opensearch-monitoring.git
cd opensearch-monitoring/lambda/terraform
Enter fullscreen mode Exit fullscreen mode
  1. Configure Variables In a terraform.tfvars file, define your settings:
aws_region                   = "us-west-1"
monitored_accounts           = ["123456789012", "098765432109"]
opensearch_master_user_password = "your-secure-password"
opensearch_instance_type     = "t3.small.search"
opensearch_instance_count    = 1
opensearch_volume_size       = 10
Enter fullscreen mode Exit fullscreen mode
  1. Initialize Terraform terraform init
  2. Plan & Apply
terraform plan
terraform apply
Enter fullscreen mode Exit fullscreen mode

This will provision the OpenSearch domain, monitoring Lambda, IAM roles, and other necessary resources.

Securing Your Setup

  1. Regular Rotation • Rotate access keys and review roles periodically.
  2. Access Logging • Enable CloudTrail logging for all AWS API activities.
  3. Least Privilege • Minimize permissions where possible and remove unused policies.
  4. Organization Controls • Use AWS Organizations Service Control Policies (SCPs) for additional governance.

Wrapping Up
The Lambda Fleet Monitoring Solution offers a robust, scalable way to track and analyze performance for all your AWS Lambda functions — regardless of how many accounts you manage. By combining real-time CloudWatch metrics with the visualization power of OpenSearch, this solution ensures you stay on top of function behaviour, performance trends, and potential cost optimizations.
For a deeper dive, including best practices, troubleshooting tips, and advanced configuration options, head to the opensearch-monitoring GitHub repository and explore the documentation.

Feel free to fork, submit issues, or contribute enhancements!
Have thoughts or questions?

Comment below or open an issue on GitHub to share your ideas.
Happy monitoring!

. .