Introduction
Maintaining up-to-date patches on Amazon EC2 instances is critical for security and compliance. However, patching auto-scaling groups (ASGs) can be challenging, especially when dealing with scheduled ASGs that are scaled down during maintenance windows. Traditional patching jobs rely on running instances, creating a gap when instances are unavailable.
In this post, we address this issue by exploring how to automate the patching process for scheduled ASGs. We’ll leverage AWS Systems Manager (SSM) Maintenance Windows, CloudFormation, and EventBridge to create a solution that ensures patches are applied even when no instances are running at the time of the maintenance job.
Problem Statement
Organizations often use scheduled ASGs to optimize costs by scaling down during non-peak hours or maintenance windows. However, this introduces several challenges when it comes to patching:
- No Running Instances: Since the ASG scales down to zero, there are no instances to trigger the patching process.
- Delayed Compliance: Patching jobs remain pending until instances are scaled up manually, leading to security and compliance gaps.
- Increased Manual Intervention: Administrators may need to manually scale up instances to execute patches, adding operational overhead.
Without a tailored solution, these gaps can leave critical systems exposed to vulnerabilities.
Reference to Existing Solutions
In a previous blog, Patching Your Auto Scaling Group on AWS, I discussed how to patch standard ASGs effectively. That solution focused on ensuring that patching was streamlined for running instances in dynamically scaled environments. However, scheduled ASGs introduce unique challenges due to their scaled-down state during maintenance windows.
This blog builds on that foundation, offering a targeted solution to patch scheduled ASGs by automating scaling, patch application, and scaling back down—ensuring seamless compliance without manual intervention.
AWS Automation for Scheduled Auto Scaling Groups
Patching scheduled auto-scaling groups requires an automation strategy that accounts for their scaled-down state during maintenance windows. AWS provides several tools that make this possible:
- AWS Systems Manager (SSM) Maintenance Windows: Automates patching during predefined schedules.
- Amazon EventBridge: Coordinates events and triggers necessary actions to manage scaling and patching processes.
- IAM Roles and Policies: Grants permissions required for automation tasks like scaling, patching, and updating ASG configurations.
By combining these services, you can automate scaling up instances for patching, applying patches, and scaling back down—all without manual intervention.
Implementation Strategy
Step 1: Identify Scheduled Auto Scaling Groups
Tagging plays a critical role in identifying ASGs that require patching. Use tags like ep:asg:patch=true
to specify the groups to be included in the automation process.
Step 2: Schedule and Automate the Patching Process
Leverage SSM Maintenance Windows to define patching schedules using cron expressions. For instance, you can create a window to run every second Wednesday at 4:00 AM:
Schedule: cron(0 4 ? * WED#2 *)
Step 3: Scale Up Instances During Maintenance
The automation logic temporarily scales up the ASG to ensure there are running instances for patching. This scaling is coordinated through EventBridge and Lambda functions.
Step 4: Apply Patches and Scale Down
Once patches are applied, the automation script scales the ASG back to its original size, maintaining cost-efficiency while ensuring compliance.
Code Walkthrough
The provided CloudFormation (CFN) template is designed to automate this entire process. Below are some key snippets to demonstrate how the solution works:
Tagging ASGs for Patching
The template uses tags to identify ASGs that require patching. The following parameters define the tag key and value:
Parameters:
AsgTagKey:
Type: String
Default: ep:asg:patch
AsgTagValue:
Type: String
Default: "true"
This ensures that only tagged ASGs are included in the patching process.
Scheduling Maintenance Windows
The template creates SSM Maintenance Windows based on environment and month-specific schedules:
Resources:
MaintenanceWindow:
Type: 'AWS::SSM::MaintenanceWindow'
Properties:
AllowUnassociatedTargets: false
Cutoff: 0
Duration: 1
Name: !Sub "Maintenance_Window-${AsgTagValue}"
Schedule: cron(0 4 ? * WED#2 *)
Description: !Sub "Maintenance window for patching ${AsgTagValue} ASGs"
Scaling Logic
The automation script checks the ASG's desired capacity and scales up instances if the ASG is scaled down:
def scaleUpASG(asg_client, asg_name):
asg_client.update_auto_scaling_group(
AutoScalingGroupName=asg_name,
MinSize=1,
DesiredCapacity=1
)
This function ensures there are running instances available for patching.
Patching and Creating a New AMI
The automation script applies patches to instances and creates a new AMI for the ASG:
def createAMI(ec2_client, instance_id, new_ami_name):
ec2_client.create_image(
InstanceId=instance_id,
Name=new_ami_name,
Description="Patched AMI created for ASG",
NoReboot=True
)
This ensures that patched AMIs are used for subsequent instance launches, maintaining compliance.
Updating the Auto Scaling Group
Once the patched AMI is created, the ASG is updated to use the new AMI:
def updateASG(asg_client, asg_name, launch_template_id, new_version):
asg_client.update_auto_scaling_group(
AutoScalingGroupName=asg_name,
LaunchTemplate={
'LaunchTemplateId': launch_template_id,
'Version': str(new_version)
}
)
The new launch template version ensures that all future instances in the ASG are launched with the patched AMI.
Scaling Down After Patching
Finally, the ASG is scaled back to its original size:
def scaleDownASG(asg_client, asg_name, original_min, original_desired):
asg_client.update_auto_scaling_group(
AutoScalingGroupName=asg_name,
MinSize=original_min,
DesiredCapacity=original_desired
)
This step restores the ASG to its cost-efficient state while ensuring patches have been applied.
Best Practices
1. Use Consistent Tagging
Ensure that all ASGs requiring patching are tagged consistently. This simplifies the automation process and minimizes the risk of missing critical groups.
2. Test Automation in Non-Production Environments
Before deploying automation scripts in production, test them in non-production environments to validate cron schedules, scaling logic, and patch application processes.
3. Monitor Maintenance Windows
Integrate monitoring tools like Amazon SNS to receive notifications about the status of maintenance windows. This allows administrators to track successes, failures, and potential issues.
4. Audit and Review Launch Templates
Regularly review and update ASG launch templates to ensure they reference the latest AMIs with applied patches.
5. Plan for Compliance
Align patching strategies with organizational and regulatory compliance requirements to avoid penalties and enhance security.
Conclusion
Patching scheduled auto-scaling groups can be a complex task due to their scaled-down state during maintenance windows. By leveraging AWS Systems Manager, CloudFormation, and EventBridge, this blog demonstrates how to automate the entire process—from scaling up instances to applying patches and scaling back down.
This solution addresses security and compliance gaps without increasing operational overhead, ensuring that your ASGs remain secure and cost-efficient. If you’ve faced similar challenges, consider implementing this automation strategy to streamline your patching workflows.
Feel free to share your thoughts or questions in the comments below!