Monitor EBS Volume Health

Todor Todorov - Jan 3 - - Dev Community

This blog post is inspired by an interesting question that was asked during the Ask The Experts panel at AWS Community Day Bulgaria 2024.
It’s a fascinating topic that I wanted to dive into and contribute about — a bit late, but now I have found the time to put it together, I hope you will find it useful.

In this tutorial, we’ll create and deploy an AWS Lambda function using the AWS Serverless Application Model (SAM) that:

  1. Fetches health status metrics for EBS volumes.
  2. Pushes the metrics to CloudWatch for monitoring and historical data.

This solution provides the ability to create alarms (if needed) and view historical data for specific EBS volumes (the main use case where the solution was searched for).

Prerequisites

  • AWS CLI installed and configured
  • AWS SAM CLI installed
  • Python 3.x
  • Basic knowledge of AWS Lambda and CloudWatch

Step 1: Create the Project

Start by creating a new SAM project:

sam init
Enter fullscreen mode Exit fullscreen mode

Choose the following options:

  • Template: AWS Quick Start Templates
  • Runtime: python3.12 (or the latest version)
  • Application: Hello World Example
  • Project Name: ebs-monitor
  • For other prompts let's agree to use defaults

This generates a basic folder structure. Navigate to the project directory(the name of the project you set during init process):

cd <your-project-directory>
Enter fullscreen mode Exit fullscreen mode

Step 2: Update the Lambda Code

Replace the contents of the app.py file with the following code:

import boto3
import logging
from botocore.exceptions import ClientError
from datetime import datetime

logger = logging.getLogger()
logger.setLevel(logging.INFO)

ec2_client = boto3.client('ec2')
cloudwatch_client = boto3.client('cloudwatch')

def fetch_ebs_volumes():
    """Fetch all EBS volumes, handling pagination."""
    paginator = ec2_client.get_paginator('describe_volumes')
    for page in paginator.paginate():
        for volume in page['Volumes']:
            yield volume

def publish_metrics(volume_id, status):
    """Publish the volume's health status to CloudWatch."""
    try:
        cloudwatch_client.put_metric_data(
            Namespace='EBS/Health',
            MetricData=[
                {
                    'MetricName': 'VolumeHealthStatus',
                    'Dimensions': [
                        {
                            'Name': 'VolumeId',
                            'Value': volume_id
                        }
                    ],
                    'Timestamp': datetime.utcnow(),
                    'Value': 1 if status == 'ok' else 0,
                    'Unit': 'Count'
                }
            ]
        )
        logger.info(f"Published health status for volume {volume_id}: {status}")
    except ClientError as e:
        logger.error(f"Failed to publish metrics for volume {volume_id}: {e}")

def lambda_handler(event, context):
    """Main Lambda function handler."""
    logger.info("Fetching EBS volume health status")
    volume_count = 0

    try:
        for volume in fetch_ebs_volumes():
            volume_id = volume['VolumeId']
            status = volume.get('State', 'unknown')
            publish_metrics(volume_id, status)
            volume_count += 1
        logger.info(f"Successfully processed health metrics for {volume_count} volumes.")
    except ClientError as e:
        logger.error(f"Error fetching EBS volumes: {e}")
Enter fullscreen mode Exit fullscreen mode

Step 3: Update the Template

Edit the template.yaml file to define the Lambda function and required permissions:

Resources:
  EBSTrackerFunction:
    Type: AWS::Serverless::Function
    Properties:
      CodeUri: .
      Handler: app.lambda_handler
      Runtime: python3.12
      Timeout: 60
      Policies:
        - EC2ReadOnlyAccess
        - CloudWatchPutMetricPolicy
      Events:
        ScheduledEvent:
          Type: Schedule
          Properties:
            Schedule: rate(5 minutes)
Enter fullscreen mode Exit fullscreen mode

Step 4: Build and Deploy

Build the application:

sam build
Enter fullscreen mode Exit fullscreen mode

Deploy the application:

sam deploy --guided
Enter fullscreen mode Exit fullscreen mode

Follow the prompts to provide a stack name and configure deployment parameters.

Step 5: Verify and Monitor

Once deployed, the Lambda function will run every 5 minutes. Metrics for each EBS volume’s health status will be available in CloudWatch under the namespace EBS/Health.

Create Alarms

You can create alarms in CloudWatch based on the VolumeHealthStatus metric for specific EBS volumes to notify you of any issues.

Summary

In this tutorial, we:

  • Used AWS SAM to create and deploy a Lambda function.
  • Monitored EBS volumes.
  • Published health metrics to CloudWatch for historical tracking and/or monitoring.

This solution provides a scalable way to monitor your EBS volumes and maintain visibility into their health over time.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .