Recently I found myself designing a system that had AWS Lambda functions inside a private VPC. But I needed to pass a payload from the output of the Lambda function to an AWS service that had to be publicly routable (specifically to SQS). I found there are really only three options to solve this situation:
The Options:
1) NAT Instance (Good)
This solution involves operating a compute instance to act as a network address translator (NAT) resource. When resources inside the private subnet needs to access a public DNS the traffic is routed through the NAT instance. This has the obvious disadvantage of needing to run a compute instance and being limited to the hardware related to it. This adds management and cost overhead that I really did not want to deal with. While the AMI Marketplace has pre-configured images available I still did not want to manage additional hardware for one Lambda that is invoked sporadically.
Here is what a NAT Instance network configuration looks like:
2) NAT Gateway
The better option is to leverage the NAT Gateway service. Imagine if you let AWS operate a NAT instance super-cluster with the additional benefit of lower cost to operate, easier setup, and higher network through put. The down side is the inability to use Security Groups with it This is the recommended solution for current NAT requirements going forward per AWS.
Here is what a NAT Gateway network configuration looks like:
3) Service Endpoint (Best)
The new kid on the block, Service Endpoints enable the ability to access supported services from within a private subnet with major benefits over NAT implementations. Imagine if you connected a network cable from your private subnet directly to the publicly routed resource. AWS does this via the an Elastic Network Interface (ENI) resource to the private subnet. The ENI even takes up an IP address in the CIDR range of the private subnet.
This solution have three big benefits:
- Traffic stays inside your VPC, never traversing the public internet. Thus faster, cheaper, and more secure.
- Similar to a NAT instance, Service Endpoints can have Security Groups applied to them.
- The infrastructure to operate and manage a Service Endpoint is incredibly minimal. Saving time, money, and operational effort.
This is the solution I wanted! Service Endpoints checks all the requirement boxes I had.
*Side Note: Service Endpoint Interfaces are an AWS service implementations of the Private Link feature. Service Endpoint Gateways are only available for S3 and DynamoDB. The Terraform configuration is minimally different between the two.
Here is what a Service Endpoint network configuration looks like:
Lets Terraform This Bad Boy!
VPC
Leveraging Terraform (0.12.24 at time of writing) I configured a basic VPC, a single AZ with a private subnet, and a wide open Security Group. Very basic networking here; nothing special, the core building blocks of any VPC. Note the VPC does not have any NAT resources nor an Internet Gateway.
# Networking
## VPC
resource aws_vpc this {
assign_generated_ipv6_cidr_block = false
cidr_block = var.vpc_private_cidr
enable_dns_hostnames = true
enable_dns_support = true
tags = merge(
{
Name = join(var.delimiter, [var.name, var.stage, "vpc", random_string.this.result])
Tech = "VPC"
Srv = "VPC"
},
var.tags
)
}
## Route Table <-> Subnet associations
resource aws_route_table_association private_0 {
subnet_id = aws_subnet.private_0.id
route_table_id = aws_route_table.private_0.id
}
## Route Tables
resource aws_route_table private_0 {
vpc_id = aws_vpc.this.id
depends_on = [
aws_vpc.this
]
tags = merge(
{
Name = join(var.delimiter, [var.name, var.stage, "private-route", random_string.this.result])
Tech = "Route"
Srv = "VPC"
},
var.tags
)
}
## Subnets
resource aws_subnet private_0 {
availability_zone = var.availability_zone[0]
vpc_id = aws_vpc.this.id
cidr_block = var.vpc_private_cidr
assign_ipv6_address_on_creation = false
depends_on = [
aws_vpc.this
]
tags = merge(
{
Name = join(var.delimiter, [var.name, var.stage, "subnet-a", random_string.this.result])
Tech = "Subnet"
Srv = "VPC"
Note = "Private"
},
var.tags
)
}
resource aws_security_group private_lambda_0 {
description = "Private Lambda SG"
name = join(var.delimiter, [var.name, var.stage, "private-subnet-lambda-0", random_string.this.id])
vpc_id = aws_vpc.this.id
ingress {
from_port = 0
to_port = 65535
protocol = "tcp"
cidr_blocks = [
var.vpc_private_cidr
]
}
egress {
from_port = 0
to_port = 65535
protocol = "tcp"
cidr_blocks = [
var.vpc_private_cidr
]
}
tags = merge(
{
Name = join(var.delimiter, [var.name, var.stage, "private-subnet-lambda-0", random_string.this.id])
Tech = "Security Group"
Srv = "EC2"
},
var.tags
)
}
SQS Queue
The first resource after the base VPC resources I needed to create was the SQS queue. Like many other services offered by AWS the queues has a routable FQDNs. Leverage proper security and IAM configuration! Principle of least privilege to secure _all_ your resources. Remember: security first.
resource aws_sqs_queue dead_letter_queue {
name = join(var.delimiter, [var.name, var.stage, "sqs-dead-letter", var.random_string.id])
tags = merge(
{
Name = join(var.delimiter, [var.name, var.stage, "sqs-dead-letter", var.random_string.id])
Tech = "SQS"
Srv = "SQS"
},
var.tags
)
}
resource aws_sqs_queue this {
name = join(var.delimiter, [var.name, var.stage, "sqs", var.random_string.id])
redrive_policy = jsonencode({
deadLetterTargetArn = aws_sqs_queue.dead_letter_queue.arn
maxReceiveCount = 4
})
tags = merge(
{
Name = join(var.delimiter, [var.name, var.stage, "sqs", var.random_string.id])
Tech = "SQS"
Srv = "SQS"
},
var.tags
)
}
Private Lambda
Next I created a Lambda function; assigning it to the private subnet and the security group that are contained inside the VPC. The Lambda code is Python based and as such I used Boto3 to handle creating the HTTPS request that will place the message in the queue. This will not work initially since we have not created the Service Endpoint.
## data
data archive_file this {
type = "zip"
source_dir = "${path.module}/src"
output_path = "${path.module}/file.zip"
}
## resources
resource aws_lambda_function this {
filename = data.archive_file.this.output_path
function_name = join("-", [var.stage, var.name, "private-lambda", var.random_string.id])
handler = "index.lambda_handler"
role = aws_iam_role.this.arn
runtime = "python3.7"
source_code_hash = data.archive_file.this.output_base64sha256
# NOTE Need to pass the REGION and QUEUE_ARN to enable Boto3 to find the correct queue
environment {
variables = {
AWS_ACCT_ID = var.aws_acct_id
QUEUE_ARN = var.aws_sqs_queue.arn
REGION = var.region
}
}
# NOTE This places the Lambda inside a VPC into the subnet of choice
vpc_config {
security_group_ids = var.security_group_ids
subnet_ids = var.subnet_ids
}
tags = merge(
{
Name = join(var.delimiter, [var.name, var.stage, "private-lambda", var.random_string.id])
Tech = "Python_3_7"
Srv = "Lambda"
},
var.tags
)
}
## IAM role, policies, and attachments
resource aws_iam_policy this {
name = join(var.delimiter, [var.name, var.stage, "private-lambda-policy", var.random_string.id])
path = "/"
policy = file("${path.module}/iam/policy.json")
}
resource aws_iam_role this {
assume_role_policy = file("${path.module}/iam/role.json")
name = join(var.delimiter, [var.name, var.stage, "private-lambda-role", var.random_string.id])
}
resource aws_iam_role_policy_attachment this {
role = aws_iam_role.this.name
policy_arn = aws_iam_policy.this.arn
}
Public Lambda
The second Lambda I made will consume the SQS queue. Notice the configuration does not include a VPC or Subnet configuration?This means the Lambda will be public within my account.
## data
data archive_file this {
type = "zip"
source_dir = "${path.module}/src"
output_path = "${path.module}/file.zip"
}
## resources
resource aws_lambda_function this {
filename = data.archive_file.this.output_path
function_name = join("-", [var.stage, var.name, "public-lambda", var.random_string.id])
handler = "index.lambda_handler"
role = aws_iam_role.this.arn
runtime = "python3.7"
source_code_hash = data.archive_file.this.output_base64sha256
tags = merge(
{
Name = join(var.delimiter, [var.name, var.stage, "public-lambda", var.random_string.id])
Tech = "Python_3_7"
Srv = "Lambda"
},
var.tags
)
}
## IAM role, policies, and attachments
resource aws_iam_policy this {
name = join(var.delimiter, [var.name, var.stage, "public-lambda-policy", var.random_string.id])
path = "/"
policy = file("${path.module}/iam/policy.json")
}
resource aws_iam_role this {
assume_role_policy = file("${path.module}/iam/role.json")
name = join(var.delimiter, [var.name, var.stage, "public-lambda-role", var.random_string.id])
}
resource aws_iam_role_policy_attachment this {
role = aws_iam_role.this.name
policy_arn = aws_iam_policy.this.arn
}
## Subscription to SQS queue
resource "aws_lambda_event_source_mapping" "example" {
event_source_arn = var.aws_sqs_queue.arn
function_name = aws_lambda_function.this.arn
}
Service Endpoint
Here's the magic sauce! This Terraform resources connects a SQS Queue via an ENI into my VPC's private subnet. Now the VPC will be able to route the private Lambda's outbound HTTPS request to the SQS service. Even though the private Lambda has no apparent defined route to the public services.
resource aws_vpc_endpoint sqs {
private_dns_enabled = true
service_name = join(".", ["com.amazonaws", var.region, "sqs"])
vpc_endpoint_type = "Interface"
vpc_id = aws_vpc.this.id
security_group_ids = [
aws_security_group.private_lambda_0.id
]
# Interface types get this. It connects the Endpoint to a subnet
subnet_ids = [
aws_subnet.private_0.id
]
tags = merge(
{
Name = join(var.delimiter, [var.name, var.stage, "service-endpoint-for-sqs", random_string.this.id])
Tech = "Service Endpoint"
Srv = "VPC"
},
var.tags
)
}
resource aws_vpc_endpoint_subnet_association sqs_assoc {
subnet_id = aws_subnet.private_0.id
vpc_endpoint_id = aws_vpc_endpoint.sqs.id
}
Demo / Proof
Executing the Private Lambda with a test payload. Watching the logs I can see the private Lambda executes successfully. Checking the public Lambda I also see the payload from the private Lambda. It works!
Conclusion
While it may seem a little weird at first Service Endpoints are a great way to attach supported AWS services into a VPC's private subnet(s). It's secure, fast, cheap, and best of all easy to manage.
Have you used Service Endpoints before? Do you have questions? Lets talk in the comments below.
Resources
- Here is an the Example Terraform project on GiitHub.