Deploying a Kubernetes Cluster (AWS EKS) & an API Gateway secured by mTLS, with Terraform, External-DNS & Traefik - Part 1

Aurélie Vache - Apr 25 '20 - - Dev Community

Once upon a time a team who wants to have a dedicated Kubernetes cluster for their services. Services which will be deployed need to be exposed through an API Gateway and protected via mTLS.

The aim of this story is basic, there are several ways to achieve it and we will see one way to do it. Not the perfect one, but a solution, a way to deploy a cluster, a managed cluster, with containers/services accessible and protected.

Are you ready?

Technical solutions

Like others real projects our project have constraints.

We need to deploy a Kubernetes cluster, it’s a fact.
But where? Can we handle our proper kubernetes cluster? Have we got a dedicated team?
No… so we need to choose a managed kubernetes cluster.
Can we choose our Cloud provider? No … the constraint is to install it in AWS…
OK, so let’s start with an EKS, the Kubernetes managed cluster by AWS! :-)

In order to deploy an AWS EKS cluster several solutions exists, you can download and use eksctl command line or you can use Terraform.

Our team always manage their infastructure, their managed resources with Terraform, an Infrastructure as Code (IaC) tool.

An existing Terraform module exists in order to deploy an AWS EKS but:

  • it’s an old module, not uptodate and not maintained
  • it’s not tf 0.12 compliant

So we will define our proper resources.

In order to access the services deployed in containers, several solutions exists. We need a reverse-proxy, deployable as a Kubernetes Ingress, easy to deploy and compatible with mTLS, we will choose Traefik.

We don't want to create manually new DN so we will use external-dns linked to our services which will create AWS Route53 records.

For mTLS we will create our own PKI and generate our own CA root, server and client certificates.

Alt Text

And for tests, we will create a test collection and use Postman.

Kubernetes cluster deployment through IaC

Alt Text

We know we wants to deploy with Terraform a AWS EKS cluster. So concretely we need to deploy a lot of resources for this EKS cluster, a lot of resources… 😅

Here the list of AWS resources to define:

  • EKS
  • IAM roles
  • IAM roles policies
  • IAM policies attachment
  • IAM OpenID Connect Provider
  • ASG
  • IG
  • SG
  • Subnets
  • Route53

First of all, we need to initialize our code organization.

Pre-requisits: install Terraform CLI in your machine.

Creates a terraform folder:



$ cd my_git_repository
$ mkdir terraform
$ cd terraform/


Enter fullscreen mode Exit fullscreen mode

Now we will, as usual, creates a backend.tf:

In this file we define a s3 storage. A good practice is to store your terraform state remotely. So we will store the state in a AWS S3 bucket:



# Backend configuration is loaded early so we can't use variables
terraform {
  required_version = ">= 0.12"

  backend "s3" {
    region  = "eu-central-1"
    bucket  = "my-tf-state-bucket"
    key     = "eks-cluster.tfstate"
    encrypt = true
  }
}


Enter fullscreen mode Exit fullscreen mode

Next, we need to define a provider.tf file in which we will define the AWS provider we use and eventually the IAM role we need to assume:



############### PROVIDERS DEFINITION ###############

provider "aws" {
  region = var.aws_region
}


Enter fullscreen mode Exit fullscreen mode

Ok, so now we can define our EKS cluster and an EKS cluster node group in an eks.tf file:



resource "aws_eks_cluster" "eks-scraly-cluster" {
  name     = local.cluster_name
  role_arn = aws_iam_role.eks-scraly-cluster-ServiceRole.arn

  vpc_config {
    security_group_ids = [ aws_security_group.eks-scraly-cluster-sg.id ]
    subnet_ids = concat(aws_subnet.eks-subnet-private[*].id, aws_subnet.eks-subnet-public[*].id)
    endpoint_private_access = true
    public_access_cidrs = var.eks_public_access_cidrs
  }

  # Ensure that IAM Role permissions are created before and deleted after EKS Cluster handling.
  # Otherwise, EKS will not be able to properly delete EKS managed EC2 infrastructure such as Security Groups.
  depends_on = [
    aws_iam_role_policy_attachment.eks-scraly-cluster-AmazonEKSClusterPolicy,
    aws_iam_role_policy_attachment.eks-scraly-cluster-AmazonEKSServicePolicy,
  ]

  version = var.eks_version

  tags = local.tags
}

resource "aws_eks_node_group" "eks-scraly-cluster-node-group" {
  cluster_name    = aws_eks_cluster.eks-scraly-cluster.name
  node_group_name = "eks-scraly-cluster-node-group"
  node_role_arn   = aws_iam_role.eks-scraly-cluster-worker-role.arn
  subnet_ids      = aws_subnet.eks-subnet-private[*].id

  scaling_config {
    desired_size = 1
    max_size     = 1
    min_size     = 1
  }

  # Ensure that IAM Role permissions are created before and deleted after EKS Node Group handling.
  # Otherwise, EKS will not be able to properly delete EC2 Instances and Elastic Network Interfaces.
  depends_on = [
    aws_iam_role_policy_attachment.eks-scraly-cluster-AmazonEKSWorkerNodePolicy,
    aws_iam_role_policy_attachment.eks-scraly-cluster-AmazonEKS_CNI_Policy,
    aws_iam_role_policy_attachment.eks-scraly-cluster-AmazonEC2ContainerRegistryReadOnly,
  ]}


Enter fullscreen mode Exit fullscreen mode

An EKS needs a lot of IAM roles, policies and policies attachment, let’s create them in a iam.tf file:



resource "aws_iam_role" "eks-scraly-cluster-ServiceRole" {
    name               = "${var.eks_resource_prefix}-ServiceRole"
    path               = "/"
    assume_role_policy = jsonencode({
        Version = "2012-10-17"
        Statement = [{
            Action = "sts:AssumeRole"
            Effect ="Allow"
            Principal = {
                Service = [
                    "eks.amazonaws.com",
                    "eks-fargate-pods.amazonaws.com"
                ]
            }
        }]        
    })
    tags = local.tags
}


resource "aws_iam_role_policy_attachment" "eks-scraly-cluster-AmazonEKSClusterPolicy" {
  policy_arn = "arn:aws:iam::aws:policy/AmazonEKSClusterPolicy"
  role       = aws_iam_role.eks-scraly-cluster-ServiceRole.name
}

resource "aws_iam_role_policy_attachment" "eks-scraly-cluster-AmazonEKSServicePolicy" {
  policy_arn = "arn:aws:iam::aws:policy/AmazonEKSServicePolicy"
  role       = aws_iam_role.eks-scraly-cluster-ServiceRole.name
}

resource "aws_iam_role_policy_attachment" "eks-scraly-cluster-NLBPolicy" {
  policy_arn  = aws_iam_policy.nlb_iam_policy.arn
  role        = aws_iam_role.eks-scraly-cluster-ServiceRole.name
}

resource "aws_iam_role_policy_attachment" "eks-scraly-cluster-ExternalDnsRoute53" {
  policy_arn  = aws_iam_policy.externalDNS_iam_policy.arn
  role        = aws_iam_role.eks-scraly-cluster-ServiceRole.name
}

resource "aws_iam_role" "eks-scraly-cluster-worker-role" {
  name = "${var.eks_resource_prefix}-worker-role"

  assume_role_policy = jsonencode({
    Statement = [{
      Action = "sts:AssumeRole"
      Effect = "Allow"
      Principal = {
        Service = "ec2.amazonaws.com",
      }
    }]
    Version = "2012-10-17"
  })
  tags = local.tags
}

resource "aws_iam_role_policy_attachment" "eks-scraly-cluster-AmazonEKSWorkerNodePolicy" {
  policy_arn = "arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy"
  role       = aws_iam_role.eks-scraly-cluster-worker-role.name
}

resource "aws_iam_role_policy_attachment" "eks-scraly-cluster-AmazonEKS_CNI_Policy" {
  policy_arn = "arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy"
  role       = aws_iam_role.eks-scraly-cluster-worker-role.name
}

resource "aws_iam_role_policy_attachment" "eks-scraly-cluster-AmazonEC2ContainerRegistryReadOnly" {
  policy_arn = "arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly"
  role       = aws_iam_role.eks-scraly-cluster-worker-role.name
}

resource "aws_iam_role_policy_attachment" "eks-scraly-cluster-CertManagerRoute53" {
  policy_arn = aws_iam_policy.certmanager_route53_iam_policy.arn
  role       = aws_iam_role.eks-scraly-cluster-worker-role.name
}

### External cli kubergrunt
data "external" "thumb" {
  program = [ "get_thumbprint.sh", var.aws_region ]
}

# Enabling IAM Roles for Service Accounts
resource "aws_iam_openid_connect_provider" "eks-scraly-cluster-oidc" {
  client_id_list  = ["sts.amazonaws.com"]
  thumbprint_list = [data.external.thumb.result.thumbprint]
  url             = aws_eks_cluster.eks-scraly-cluster.identity.0.oidc.0.issuer
}

data "aws_iam_policy_document" "eks-scraly-cluster-assume-role-policy" {
  statement {
    actions = ["sts:AssumeRoleWithWebIdentity"]
    effect  = "Allow"

    condition {
      test     = "StringEquals"
      variable = "${replace(aws_iam_openid_connect_provider.eks-scraly-cluster-oidc.url, "https://", "")}:aud"
      values   = ["sts.amazonaws.com"]
    }

    principals {
      identifiers = ["${aws_iam_openid_connect_provider.eks-scraly-cluster-oidc.arn}"]
      type        = "Federated"
    }
  }
}

resource "aws_iam_role" "eks-scraly-cluster-externalDns-role" {
  description = "Role used by External-DNS to manage route 53 inside the cluster"
  assume_role_policy = data.aws_iam_policy_document.eks-scraly-cluster-assume-role-policy.json
  name               = "${var.eks_resource_prefix}-ServiceRole-ExternalDns"
  tags = local.tags
}

resource "aws_iam_role_policy_attachment" "eks-scraly-cluster-externalDNS-attachment" {  
  policy_arn  = aws_iam_policy.externalDNS_iam_policy.arn
  role        = aws_iam_role.eks-scraly-cluster-externalDns-role.name
}

resource "aws_iam_policy" "certmanager_route53_iam_policy" {
  name        = "${var.eks_resource_prefix}-CertManagerRoute53Policy"
  path        = "/"
  description = "Route53 policy IAM Policy for eks"

  policy = jsonencode(
    {
      Version: "2012-10-17"
      Statement: [
        {
            Effect: "Allow"
            Action: "route53:GetChange"
            Resource: "arn:aws:route53:::change/*"
        },
        {
            Effect: "Allow"
            Action: "route53:ChangeResourceRecordSets"
            Resource: "arn:aws:route53:::hostedzone/*"
        },
        {
            Effect: "Allow"
            Action: "route53:ListHostedZonesByName"
            Resource: "*"
        }
      ]
    }
  )
}

resource "aws_iam_policy" "externalDNS_iam_policy" {
  name        = "${var.eks_resource_prefix}-externalDNSPolicy"
  path        = "/"
  description = "ExternalDNS IAM Policy for eks"

  policy = jsonencode(
    {
      Version: "2012-10-17"
      Statement: [
        {
          Effect: "Allow"
          Action: [
            "route53:ChangeResourceRecordSets"
          ]
          Resource: [
            "arn:aws:route53:::hostedzone/*"
          ]
        },
        {
          Effect: "Allow",
          Action: [
            "route53:ListHostedZones",
            "route53:ListResourceRecordSets"
          ],
          Resource: [
            "*"
          ]
        }
      ]
   }
  )
}

resource "aws_iam_policy" "nlb_iam_policy" {
  name        = "${var.eks_resource_prefix}-NLBPolicy"
  path        = "/"
  description = "NLB IAM Policy for eks"

  policy = jsonencode(
    {
      Version: "2012-10-17"
      Statement: [
          {
              Action: [
                  "elasticloadbalancing:*",
                  "ec2:CreateSecurityGroup",
                  "ec2:Describe*"
              ],
              Resource: "*",
              Effect: "Allow"
          }
      ]
    }
  )
}


Enter fullscreen mode Exit fullscreen mode

We need to deploy the EKS cluster in network resources so let’s create them in a network.tf file:



data "aws_vpc" "eks-vpc" {
  filter {
    name = "tag:Name"
    values = [ var.eks_vpc_name ]
  }
}

data "aws_nat_gateway" "nat-gateway" {
  count = length(var.nat_gateway_ids)
  id = var.nat_gateway_ids[count.index]
}

data "aws_route_table" "eks-route-table-public" {
  # There is a single route table for all public subnet
  filter {
    name = "tag:Name"
    values = [ var.route_table_name_public ]
  }
}

data "aws_route_table" "eks-route-table-private" {
  count = length(var.route_table_name_private)
  filter {
    name = "tag:Name"
    values = [ var.route_table_name_private[count.index] ]
  }
}

data "aws_internet_gateway" "internet-gateway" {  
  filter {
    name = "tag:Name"
    values = [ var.internet_gateway_name ]
  }
}


resource "aws_route_table_association" "rt-association-private" {
  count           = length(aws_subnet.eks-subnet-private)
  subnet_id       = aws_subnet.eks-subnet-private[count.index].id
  route_table_id  = data.aws_route_table.eks-route-table-private[count.index].id
}

resource "aws_route_table_association" "rt-association-public" {
  count           = length(aws_subnet.eks-subnet-public)
  subnet_id       = aws_subnet.eks-subnet-public[count.index].id
  route_table_id  = data.aws_route_table.eks-route-table-public.id
}

resource "aws_security_group" "eks-scraly-cluster-sg" {
  name        = "eks-scraly-cluster-sg"
  description = "Cluster communication with worker nodes"
  vpc_id      = data.aws_vpc.eks-vpc.id

  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }

  tags = local.tags  
}

################
# Public subnet
################
resource "aws_subnet" "eks-subnet-public" {  
  count = length(var.subnet_public_cidr)
  vpc_id                          = data.aws_vpc.eks-vpc.id
  cidr_block                      = var.subnet_public_cidr[count.index]
  availability_zone               = length(regexall("^[a-z]{2}-", element(local.azs, count.index))) > 0 ? element(local.azs, count.index) : null
  availability_zone_id            = length(regexall("^[a-z]{2}-", element(local.azs, count.index))) == 0 ? element(local.azs, count.index) : null
  map_public_ip_on_launch         = true

  tags = merge(
    {
      "Name" = format(
        "%s-public-%s",
        var.eks_subnet_prefix,
        element(local.azs, count.index),
      ),
      "kubernetes.io/cluster/${local.cluster_name}" = "shared"
    },
    local.tags,
    local.public_subnet_tags
  )
}

#################
# Private subnet
#################
resource "aws_subnet" "eks-subnet-private" {  
  count = length(var.subnet_public_cidr)
  vpc_id                          = data.aws_vpc.eks-vpc.id
  cidr_block                      = var.subnet_private_cidr[count.index]

  availability_zone               = length(regexall("^[a-z]{2}-", element(local.azs, count.index))) > 0 ? element(local.azs, count.index) : null
  availability_zone_id            = length(regexall("^[a-z]{2}-", element(local.azs, count.index))) == 0 ? element(local.azs, count.index) : null

  map_public_ip_on_launch         = false

  tags = merge(
    {
      "Name" = format(
        "%s-private-%s",
        var.eks_subnet_prefix,
        element(local.azs, count.index),
      ),
      "kubernetes.io/cluster/${local.cluster_name}" = "shared"
    },
    local.tags,
    local.private_subnet_tags
  )
}


Enter fullscreen mode Exit fullscreen mode

Now we will create a locals.tf file in which we will create a tricks to generate the future kubeconfig file. Yes, the file we need in order to access to our future kubernetes cluster :-).



# Get the Availability Zones to automatically set default subnets number (1 in each AZ)
data "aws_availability_zones" "az_list" {}

locals {
    azs = data.aws_availability_zones.az_list.names

    tags = {
        Application    = "${var.tag_application}"
        Contact        = "${var.tag_contact}"
        Tool         = « Terrafor »m
    }

    cluster_name = var.eks_cluster_name  

    public_subnet_tags = {
        Description = "Public subnet for ${var.tag_application} environment"
        Usage       = "Public"
        Type        = "Public"
        "kubernetes.io/role/elb" = "1"
    }

    private_subnet_tags = {
        Description = "Private subnet for ${var.tag_application} environment"
        Usage       = "Private"
        Type        = "Private"        
    }

    kubeconfig = <<KUBECONFIG


apiVersion: v1
clusters:
- cluster:
    certificate-authority-data: ${aws_eks_cluster.eks-scraly-cluster.certificate_authority.0.data}
    server: ${aws_eks_cluster.eks-scraly-cluster.endpoint}
  name: ${var.eks_cluster_name}
contexts:
- context:
    cluster: ${var.eks_cluster_name}
    namespace: kube-system
    user: ${var.eks_cluster_name}
  name: ${var.eks_cluster_name}
current-context: ${var.eks_cluster_name}
kind: Config
preferences: {}
users:
- name: ${var.eks_cluster_name}
  user:
    exec:
      apiVersion: client.authentication.k8s.io/v1alpha1
      args:
      - token
      - -i
      - eks-scraly-cluster
      command: aws-iam-authenticator
      env: null
    KUBECONFIG
}


Enter fullscreen mode Exit fullscreen mode

We now need to output some useful informations (like kubeconfig file content), so let’s define them in outputs.tf file:



output "endpoint" {
    value = aws_eks_cluster.eks-scraly-cluster.endpoint
}

output "kubeconfig" {
  value = "${local.kubeconfig}"
}

output "arn_external_dns_role" {
  value = aws_iam_role.eks-scraly-cluster-externalDns-role.arn
}


Enter fullscreen mode Exit fullscreen mode

If we want to create a new Route53 route, you can create a route53.tf file, but personally I will use an existing one.
Cluster will be accessible through *.scraly.com Route53.

We use some input variables in our resources so let’s cretae variables.tf file:



variable "region" {
  default = "eu-central-1"
}

variable "vpc_cidr" {
  description = "CIDR of the VPC to create. Example : 10.1.0.0/22"
  default = "10.xx.x.x/16"
}

variable "subnet_public_cidr" {
  description = "For public ips"
  type        = list(string)
  default = ["10.xx.x.x/24","10.xx.x.x/24","10.xx.x.x/24"]
}

variable "subnet_private_cidr" {
  description = "For private ips"
  type        = list(string)
  default = ["10.xx.x.x/24","10.xx.x.x/24","10.xx.x.x/24"]
}

variable "eks_vpc_id" {
  default = "vpc-123456"
}

variable "eks_vpc_name" {
  default = "our_existing_vpc"
}

variable "eks_subnet_prefix" {
  default = "eks-scraly"
}

variable "nat_gateway_ids" {
  description = "The NAT gateway to be used by the EKS worker to reach the internet"
  default = [
    "nat-lalala",
    "nat-scraly"
    ]
}

variable "internet_gateway_name" {
  default = "ig_scraly
}

variable "route_table_name_public" {
  default = "scraly
}

variable "route_table_name_private" {
  default = [
    "scraly-private-eu-central-1a",
    "scraly-private-eu-central-1b",
    "scraly-private-eu-central-1c"
  ]
}

# -------------------------------------------
# EKS
# -------------------------------------------

variable "eks_cluster_name" {
  default = "eks-scraly-cluster"
}

variable "eks_version" {
  default = "1.15" # Kubernetes v1.15
}

variable "eks_resource_prefix" {
  default = "eks-scraly-cluster"
}

variable "eks_public_access_cidrs" {
  description = "List of CIDR blocks which can access the EKS public API server endpoint when enabled"
  default = [
    "xx.xx.xx.xx/32",
    "xx.xx.xx.xx/32"
    ]
}

# -------------------------------------------
# Tags
# -------------------------------------------

variable "tag_application" {
  default = "eks-scraly-cluster"
}

variable "tag_contact" {
  default = "scraly@mail.com"
}


Enter fullscreen mode Exit fullscreen mode

Our resources have been defined, so, now the only thing to do is to deploy our infra with theses following command lines:



$ cd "terraform/"

$ terraform init -reconfigure

# Apply
$ terraform apply -input=false -auto-approve

data.external.thumb: Refreshing state...
data.aws_caller_identity.current: Refreshing state...
data.aws_route_table.eks-route-table-public: Refreshing state...
data.aws_availability_zones.az_list: Refreshing state...
data.aws_internet_gateway.internet-gateway: Refreshing state...
data.aws_route_table.eks-route-table-private[1]: Refreshing state...
data.aws_route_table.eks-route-table-private[2]: Refreshing state...
data.aws_route_table.eks-route-table-private[0]: Refreshing state...
data.aws_route53_zone.scraly-hosted-zone: Refreshing state...
data.aws_nat_gateway.nat-gateway[0]: Refreshing state...
data.aws_nat_gateway.nat-gateway[2]: Refreshing state...
data.aws_nat_gateway.nat-gateway[1]: Refreshing state...
data.aws_vpc.eks-vpc: Refreshing state...
aws_iam_policy.certmanager_route53_iam_policy: Creating...
...
aws_eks_node_group.eks-scraly-cluster-node-group: Still creating... [5m40s elapsed]
aws_eks_node_group.eks-scraly-cluster-node-group: Creation complete after 5m50s [id=eks-scraly-cluster:eks-scraly-cluster-node-group]

Apply complete! Resources: 33 added, 0 changed, 0 destroyed.

Outputs:

arn_external_dns_role = arn:aws:iam::<aws_account_id>:role/eks-scraly-cluster-ServiceRole-ExternalDns
endpoint = https://1234567891234.ab1.eu-central-1.eks.amazonaws.com
kubeconfig =
 ...


Enter fullscreen mode Exit fullscreen mode

Cool, we have an EKS Kubernetes cluster deployed in our AWS account!

Let’s verify it in your AWS console interface (or with aws CLI).

Pfiou… it was not easiest than with eksctl but with Terraform we can now deploy multiples EKS clusters with differents variables, in different environments, with different Kubernetes versions, in different network VPC/IG/Subnets…

Let's go in the part 2 of this article for EKS configuration, deployment of Traefik, external-dns, HTTPBin secured through mTLS.

And if you are interested in Terraform, I created a cheat sheet:

https://github.com/scraly/terraform-cheat-sheet/blob/master/terraform-cheat-sheet.pdf

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .