How to: AWS Service Endpoints via Terraform for fun and profit

David J Eddy - Apr 1 '20 - - Dev Community

Originally posted on my blog.

Recently I found myself designing a system that had AWS Lambda functions inside a private VPC. But I needed to pass a payload from the output of the Lambda function to an AWS service that had to be publicly routable (specifically to SQS). I found there are really only three options to solve this situation:

The Options:

1) NAT Instance (Good)

This solution involves operating a compute instance to act as a network address translator (NAT) resource. When resources inside the private subnet needs to access a public DNS the traffic is routed through the NAT instance. This has the obvious disadvantage of needing to run a compute instance and being limited to the hardware related to it. This adds management and cost overhead that I really did not want to deal with. While the AMI Marketplace has pre-configured images available I still did not want to manage additional hardware for one Lambda that is invoked sporadically.

Here is what a NAT Instance network configuration looks like:

<br>
        NAT instance setup<br>
Credit: AWS

2) NAT Gateway

The better option is to leverage the NAT Gateway service. Imagine if you let AWS operate a NAT instance super-cluster with the additional benefit of lower cost to operate, easier setup, and higher network through put. The down side is the inability to use Security Groups with it This is the recommended solution for current NAT requirements going forward per AWS.

Here is what a NAT Gateway network configuration looks like:

<br>
          A VPC with public and private subnets and a NAT gateway<br>
Credit: AWS

3) Service Endpoint (Best)

The new kid on the block, Service Endpoints enable the ability to access supported services from within a private subnet with major benefits over NAT implementations. Imagine if you connected a network cable from your private subnet directly to the publicly routed resource. AWS does this via the an Elastic Network Interface (ENI) resource to the private subnet. The ENI even takes up an IP address in the CIDR range of the private subnet.

This solution have three big benefits:

  1. Traffic stays inside your VPC, never traversing the public internet. Thus faster, cheaper, and more secure.
  2. Similar to a NAT instance, Service Endpoints can have Security Groups applied to them.
  3. The infrastructure to operate and manage a Service Endpoint is incredibly minimal. Saving time, money, and operational effort.

This is the solution I wanted! Service Endpoints checks all the requirement boxes I had.

*Side Note: Service Endpoint Interfaces are an AWS service implementations of the Private Link feature. Service Endpoint Gateways are only available for S3 and DynamoDB. The Terraform configuration is minimally different between the two.

Here is what a Service Endpoint network configuration looks like:

Amazon VPC Endpoints for Amazon S3 - AWS Glue
Credit: AWS

Lets Terraform This Bad Boy!

VPC

Leveraging Terraform (0.12.24 at time of writing) I configured a basic VPC, a single AZ with a private subnet, and a wide open Security Group. Very basic networking here; nothing special, the core building blocks of any VPC. Note the VPC does not have any NAT resources nor an Internet Gateway.

# Networking

## VPC

resource aws_vpc this {
  assign_generated_ipv6_cidr_block = false
  cidr_block                       = var.vpc_private_cidr
  enable_dns_hostnames             = true
  enable_dns_support               = true

  tags = merge(
    {
      Name = join(var.delimiter, [var.name, var.stage, "vpc", random_string.this.result])
      Tech = "VPC"
      Srv  = "VPC"
    },
    var.tags
  )
}

## Route Table <-> Subnet associations

resource aws_route_table_association private_0 {
  subnet_id      = aws_subnet.private_0.id
  route_table_id = aws_route_table.private_0.id
}

## Route Tables

resource aws_route_table private_0 {
  vpc_id = aws_vpc.this.id

  depends_on = [
    aws_vpc.this
  ]

  tags = merge(
    {
      Name = join(var.delimiter, [var.name, var.stage, "private-route", random_string.this.result])
      Tech = "Route"
      Srv  = "VPC"
    },
    var.tags
  )
}

## Subnets

resource aws_subnet private_0 {
  availability_zone               = var.availability_zone[0]
  vpc_id                          = aws_vpc.this.id
  cidr_block                      = var.vpc_private_cidr
  assign_ipv6_address_on_creation = false

  depends_on = [
    aws_vpc.this
  ]

  tags = merge(
    {
      Name = join(var.delimiter, [var.name, var.stage, "subnet-a", random_string.this.result])
      Tech = "Subnet"
      Srv  = "VPC"
      Note = "Private"
    },
    var.tags
  )
}

 
resource aws_security_group private_lambda_0 {
  description = "Private Lambda SG"
  name        = join(var.delimiter, [var.name, var.stage, "private-subnet-lambda-0", random_string.this.id])
  vpc_id      = aws_vpc.this.id

  ingress {
    from_port   = 0
    to_port     = 65535
    protocol    = "tcp"
    cidr_blocks = [
      var.vpc_private_cidr
    ]
  }

  egress {
    from_port   = 0
    to_port     = 65535
    protocol    = "tcp"
    cidr_blocks = [
      var.vpc_private_cidr
    ]
  }

  tags = merge(
    {
      Name = join(var.delimiter, [var.name, var.stage, "private-subnet-lambda-0", random_string.this.id])
      Tech = "Security Group"
      Srv  = "EC2"
    },
    var.tags
  )
}

SQS Queue

The first resource after the base VPC resources I needed to create was the SQS queue. Like many other services offered by AWS the queues has a routable FQDNs. Leverage proper security and IAM configuration! Principle of least privilege to secure _all_ your resources. Remember: security first.

resource aws_sqs_queue dead_letter_queue {
  name = join(var.delimiter, [var.name, var.stage, "sqs-dead-letter", var.random_string.id])

  tags = merge(
    {
      Name = join(var.delimiter, [var.name, var.stage, "sqs-dead-letter", var.random_string.id])
      Tech = "SQS"
      Srv  = "SQS"
    },
    var.tags
  )
}

resource aws_sqs_queue this {
  name                      = join(var.delimiter, [var.name, var.stage, "sqs", var.random_string.id])

  redrive_policy = jsonencode({
    deadLetterTargetArn = aws_sqs_queue.dead_letter_queue.arn
    maxReceiveCount     = 4
  })

  tags = merge(
    {
      Name = join(var.delimiter, [var.name, var.stage, "sqs", var.random_string.id])
      Tech = "SQS"
      Srv  = "SQS"
    },
    var.tags
  )
}

Private Lambda

Next I created a Lambda function; assigning it to the private subnet and the security group that are contained inside the VPC. The Lambda code is Python based and as such I used Boto3 to handle creating the HTTPS request that will place the message in the queue. This will not work initially since we have not created the Service Endpoint.

## data

data archive_file this {
  type        = "zip"
  source_dir  = "${path.module}/src"
  output_path = "${path.module}/file.zip"
}

## resources

resource aws_lambda_function this {
  filename         = data.archive_file.this.output_path
  function_name    = join("-", [var.stage, var.name, "private-lambda", var.random_string.id])
  handler          = "index.lambda_handler"
  role             = aws_iam_role.this.arn
  runtime          = "python3.7"
  source_code_hash = data.archive_file.this.output_base64sha256

  # NOTE Need to pass the REGION and QUEUE_ARN to enable Boto3 to find the correct queue
  environment {
    variables = {
      AWS_ACCT_ID = var.aws_acct_id
      QUEUE_ARN   = var.aws_sqs_queue.arn
      REGION      = var.region
    }
  }

  # NOTE This places the Lambda inside a VPC into the subnet of choice
  vpc_config {
    security_group_ids = var.security_group_ids
    subnet_ids         = var.subnet_ids
  }

  tags = merge(
    {
      Name = join(var.delimiter, [var.name, var.stage, "private-lambda", var.random_string.id])
      Tech = "Python_3_7"
      Srv  = "Lambda"
    },
    var.tags
  )
}

## IAM role, policies, and attachments

resource aws_iam_policy this {
  name   = join(var.delimiter, [var.name, var.stage, "private-lambda-policy", var.random_string.id])
  path   = "/"
  policy = file("${path.module}/iam/policy.json")
}

resource aws_iam_role this {
  assume_role_policy = file("${path.module}/iam/role.json")
  name               = join(var.delimiter, [var.name, var.stage, "private-lambda-role", var.random_string.id])
}

resource aws_iam_role_policy_attachment this {
  role       = aws_iam_role.this.name
  policy_arn = aws_iam_policy.this.arn
}

Public Lambda

The second Lambda I made will consume the SQS queue. Notice the configuration does not include a VPC or Subnet configuration?This means the Lambda will be public within my account.

## data

data archive_file this {
  type        = "zip"
  source_dir  = "${path.module}/src"
  output_path = "${path.module}/file.zip"
}

## resources

resource aws_lambda_function this {
  filename         = data.archive_file.this.output_path
  function_name    = join("-", [var.stage, var.name, "public-lambda", var.random_string.id])
  handler          = "index.lambda_handler"
  role             = aws_iam_role.this.arn
  runtime          = "python3.7"
  source_code_hash = data.archive_file.this.output_base64sha256

  tags = merge(
    {
      Name = join(var.delimiter, [var.name, var.stage, "public-lambda", var.random_string.id])
      Tech = "Python_3_7"
      Srv  = "Lambda"
    },
    var.tags
  )
}

## IAM role, policies, and attachments

resource aws_iam_policy this {
  name   = join(var.delimiter, [var.name, var.stage, "public-lambda-policy", var.random_string.id])
  path   = "/"
  policy = file("${path.module}/iam/policy.json")
}

resource aws_iam_role this {
  assume_role_policy = file("${path.module}/iam/role.json")
  name               = join(var.delimiter, [var.name, var.stage, "public-lambda-role", var.random_string.id])
}

resource aws_iam_role_policy_attachment this {
  role       = aws_iam_role.this.name
  policy_arn = aws_iam_policy.this.arn
}

## Subscription to SQS queue

resource "aws_lambda_event_source_mapping" "example" {
  event_source_arn = var.aws_sqs_queue.arn
  function_name    = aws_lambda_function.this.arn
}

Service Endpoint

Here's the magic sauce! This Terraform resources connects a SQS Queue via an ENI into my VPC's private subnet. Now the VPC will be able to route the private Lambda's outbound HTTPS request to the SQS service. Even though the private Lambda has no apparent defined route to the public services.

resource aws_vpc_endpoint sqs {
  private_dns_enabled = true
  service_name        = join(".", ["com.amazonaws", var.region, "sqs"])
  vpc_endpoint_type   = "Interface"
  vpc_id              = aws_vpc.this.id

  security_group_ids = [
    aws_security_group.private_lambda_0.id
  ]

  # Interface types get this. It connects the Endpoint to a subnet
  subnet_ids = [
    aws_subnet.private_0.id
  ] 

  tags = merge(
    {
      Name = join(var.delimiter, [var.name, var.stage, "service-endpoint-for-sqs", random_string.this.id])
      Tech = "Service Endpoint"
      Srv  = "VPC"
    },
    var.tags
  )
}

resource aws_vpc_endpoint_subnet_association sqs_assoc {
  subnet_id       = aws_subnet.private_0.id
  vpc_endpoint_id = aws_vpc_endpoint.sqs.id
}

Demo / Proof

Executing the Private Lambda with a test payload. Watching the logs I can see the private Lambda executes successfully. Checking the public Lambda I also see the payload from the private Lambda. It works!

CloudWatch logs of the private Lambda invocation output. Notice the sqs.us-west-2.amazonaws.com:443 FQDN.

CloudWatch logs output after the public Lambda process the SQS queue message.

Conclusion

While it may seem a little weird at first Service Endpoints are a great way to attach supported AWS services into a VPC's private subnet(s). It's secure, fast, cheap, and best of all easy to manage.

Have you used Service Endpoints before? Do you have questions? Lets talk in the comments below.

Resources

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .