Once upon a time a team who wants to have a dedicated Kubernetes cluster for their services. Services which will be deployed need to be exposed through an API Gateway and protected via mTLS.
The aim of this story is basic, there are several ways to achieve it and we will see one way to do it. Not the perfect one, but a solution, a way to deploy a cluster, a managed cluster, with containers/services accessible and protected.
Are you ready?
Technical solutions
Like others real projects our project have constraints.
We need to deploy a Kubernetes cluster, it’s a fact.
But where? Can we handle our proper kubernetes cluster? Have we got a dedicated team?
No… so we need to choose a managed kubernetes cluster.
Can we choose our Cloud provider? No … the constraint is to install it in AWS…
OK, so let’s start with an EKS, the Kubernetes managed cluster by AWS! :-)
In order to deploy an AWS EKS cluster several solutions exists, you can download and use eksctl command line or you can use Terraform.
Our team always manage their infastructure, their managed resources with Terraform, an Infrastructure as Code (IaC) tool.
An existing Terraform module exists in order to deploy an AWS EKS but:
- it’s an old module, not uptodate and not maintained
- it’s not tf 0.12 compliant
So we will define our proper resources.
In order to access the services deployed in containers, several solutions exists. We need a reverse-proxy, deployable as a Kubernetes Ingress, easy to deploy and compatible with mTLS, we will choose Traefik.
We don't want to create manually new DN so we will use external-dns linked to our services which will create AWS Route53 records.
For mTLS we will create our own PKI and generate our own CA root, server and client certificates.
And for tests, we will create a test collection and use Postman.
Kubernetes cluster deployment through IaC
We know we wants to deploy with Terraform a AWS EKS cluster. So concretely we need to deploy a lot of resources for this EKS cluster, a lot of resources… 😅
Here the list of AWS resources to define:
- EKS
- IAM roles
- IAM roles policies
- IAM policies attachment
- IAM OpenID Connect Provider
- ASG
- IG
- SG
- Subnets
- Route53
First of all, we need to initialize our code organization.
Pre-requisits: install Terraform CLI in your machine.
Creates a terraform folder:
$ cd my_git_repository
$ mkdir terraform
$ cd terraform/
Now we will, as usual, creates a backend.tf:
In this file we define a s3 storage. A good practice is to store your terraform state remotely. So we will store the state in a AWS S3 bucket:
# Backend configuration is loaded early so we can't use variables
terraform {
required_version = ">= 0.12"
backend "s3" {
region = "eu-central-1"
bucket = "my-tf-state-bucket"
key = "eks-cluster.tfstate"
encrypt = true
}
}
Next, we need to define a provider.tf file in which we will define the AWS provider we use and eventually the IAM role we need to assume:
############### PROVIDERS DEFINITION ###############
provider "aws" {
region = var.aws_region
}
Ok, so now we can define our EKS cluster and an EKS cluster node group in an eks.tf file:
resource "aws_eks_cluster" "eks-scraly-cluster" {
name = local.cluster_name
role_arn = aws_iam_role.eks-scraly-cluster-ServiceRole.arn
vpc_config {
security_group_ids = [ aws_security_group.eks-scraly-cluster-sg.id ]
subnet_ids = concat(aws_subnet.eks-subnet-private[*].id, aws_subnet.eks-subnet-public[*].id)
endpoint_private_access = true
public_access_cidrs = var.eks_public_access_cidrs
}
# Ensure that IAM Role permissions are created before and deleted after EKS Cluster handling.
# Otherwise, EKS will not be able to properly delete EKS managed EC2 infrastructure such as Security Groups.
depends_on = [
aws_iam_role_policy_attachment.eks-scraly-cluster-AmazonEKSClusterPolicy,
aws_iam_role_policy_attachment.eks-scraly-cluster-AmazonEKSServicePolicy,
]
version = var.eks_version
tags = local.tags
}
resource "aws_eks_node_group" "eks-scraly-cluster-node-group" {
cluster_name = aws_eks_cluster.eks-scraly-cluster.name
node_group_name = "eks-scraly-cluster-node-group"
node_role_arn = aws_iam_role.eks-scraly-cluster-worker-role.arn
subnet_ids = aws_subnet.eks-subnet-private[*].id
scaling_config {
desired_size = 1
max_size = 1
min_size = 1
}
# Ensure that IAM Role permissions are created before and deleted after EKS Node Group handling.
# Otherwise, EKS will not be able to properly delete EC2 Instances and Elastic Network Interfaces.
depends_on = [
aws_iam_role_policy_attachment.eks-scraly-cluster-AmazonEKSWorkerNodePolicy,
aws_iam_role_policy_attachment.eks-scraly-cluster-AmazonEKS_CNI_Policy,
aws_iam_role_policy_attachment.eks-scraly-cluster-AmazonEC2ContainerRegistryReadOnly,
]}
An EKS needs a lot of IAM roles, policies and policies attachment, let’s create them in a iam.tf file:
resource "aws_iam_role" "eks-scraly-cluster-ServiceRole" {
name = "${var.eks_resource_prefix}-ServiceRole"
path = "/"
assume_role_policy = jsonencode({
Version = "2012-10-17"
Statement = [{
Action = "sts:AssumeRole"
Effect ="Allow"
Principal = {
Service = [
"eks.amazonaws.com",
"eks-fargate-pods.amazonaws.com"
]
}
}]
})
tags = local.tags
}
resource "aws_iam_role_policy_attachment" "eks-scraly-cluster-AmazonEKSClusterPolicy" {
policy_arn = "arn:aws:iam::aws:policy/AmazonEKSClusterPolicy"
role = aws_iam_role.eks-scraly-cluster-ServiceRole.name
}
resource "aws_iam_role_policy_attachment" "eks-scraly-cluster-AmazonEKSServicePolicy" {
policy_arn = "arn:aws:iam::aws:policy/AmazonEKSServicePolicy"
role = aws_iam_role.eks-scraly-cluster-ServiceRole.name
}
resource "aws_iam_role_policy_attachment" "eks-scraly-cluster-NLBPolicy" {
policy_arn = aws_iam_policy.nlb_iam_policy.arn
role = aws_iam_role.eks-scraly-cluster-ServiceRole.name
}
resource "aws_iam_role_policy_attachment" "eks-scraly-cluster-ExternalDnsRoute53" {
policy_arn = aws_iam_policy.externalDNS_iam_policy.arn
role = aws_iam_role.eks-scraly-cluster-ServiceRole.name
}
resource "aws_iam_role" "eks-scraly-cluster-worker-role" {
name = "${var.eks_resource_prefix}-worker-role"
assume_role_policy = jsonencode({
Statement = [{
Action = "sts:AssumeRole"
Effect = "Allow"
Principal = {
Service = "ec2.amazonaws.com",
}
}]
Version = "2012-10-17"
})
tags = local.tags
}
resource "aws_iam_role_policy_attachment" "eks-scraly-cluster-AmazonEKSWorkerNodePolicy" {
policy_arn = "arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy"
role = aws_iam_role.eks-scraly-cluster-worker-role.name
}
resource "aws_iam_role_policy_attachment" "eks-scraly-cluster-AmazonEKS_CNI_Policy" {
policy_arn = "arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy"
role = aws_iam_role.eks-scraly-cluster-worker-role.name
}
resource "aws_iam_role_policy_attachment" "eks-scraly-cluster-AmazonEC2ContainerRegistryReadOnly" {
policy_arn = "arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly"
role = aws_iam_role.eks-scraly-cluster-worker-role.name
}
resource "aws_iam_role_policy_attachment" "eks-scraly-cluster-CertManagerRoute53" {
policy_arn = aws_iam_policy.certmanager_route53_iam_policy.arn
role = aws_iam_role.eks-scraly-cluster-worker-role.name
}
### External cli kubergrunt
data "external" "thumb" {
program = [ "get_thumbprint.sh", var.aws_region ]
}
# Enabling IAM Roles for Service Accounts
resource "aws_iam_openid_connect_provider" "eks-scraly-cluster-oidc" {
client_id_list = ["sts.amazonaws.com"]
thumbprint_list = [data.external.thumb.result.thumbprint]
url = aws_eks_cluster.eks-scraly-cluster.identity.0.oidc.0.issuer
}
data "aws_iam_policy_document" "eks-scraly-cluster-assume-role-policy" {
statement {
actions = ["sts:AssumeRoleWithWebIdentity"]
effect = "Allow"
condition {
test = "StringEquals"
variable = "${replace(aws_iam_openid_connect_provider.eks-scraly-cluster-oidc.url, "https://", "")}:aud"
values = ["sts.amazonaws.com"]
}
principals {
identifiers = ["${aws_iam_openid_connect_provider.eks-scraly-cluster-oidc.arn}"]
type = "Federated"
}
}
}
resource "aws_iam_role" "eks-scraly-cluster-externalDns-role" {
description = "Role used by External-DNS to manage route 53 inside the cluster"
assume_role_policy = data.aws_iam_policy_document.eks-scraly-cluster-assume-role-policy.json
name = "${var.eks_resource_prefix}-ServiceRole-ExternalDns"
tags = local.tags
}
resource "aws_iam_role_policy_attachment" "eks-scraly-cluster-externalDNS-attachment" {
policy_arn = aws_iam_policy.externalDNS_iam_policy.arn
role = aws_iam_role.eks-scraly-cluster-externalDns-role.name
}
resource "aws_iam_policy" "certmanager_route53_iam_policy" {
name = "${var.eks_resource_prefix}-CertManagerRoute53Policy"
path = "/"
description = "Route53 policy IAM Policy for eks"
policy = jsonencode(
{
Version: "2012-10-17"
Statement: [
{
Effect: "Allow"
Action: "route53:GetChange"
Resource: "arn:aws:route53:::change/*"
},
{
Effect: "Allow"
Action: "route53:ChangeResourceRecordSets"
Resource: "arn:aws:route53:::hostedzone/*"
},
{
Effect: "Allow"
Action: "route53:ListHostedZonesByName"
Resource: "*"
}
]
}
)
}
resource "aws_iam_policy" "externalDNS_iam_policy" {
name = "${var.eks_resource_prefix}-externalDNSPolicy"
path = "/"
description = "ExternalDNS IAM Policy for eks"
policy = jsonencode(
{
Version: "2012-10-17"
Statement: [
{
Effect: "Allow"
Action: [
"route53:ChangeResourceRecordSets"
]
Resource: [
"arn:aws:route53:::hostedzone/*"
]
},
{
Effect: "Allow",
Action: [
"route53:ListHostedZones",
"route53:ListResourceRecordSets"
],
Resource: [
"*"
]
}
]
}
)
}
resource "aws_iam_policy" "nlb_iam_policy" {
name = "${var.eks_resource_prefix}-NLBPolicy"
path = "/"
description = "NLB IAM Policy for eks"
policy = jsonencode(
{
Version: "2012-10-17"
Statement: [
{
Action: [
"elasticloadbalancing:*",
"ec2:CreateSecurityGroup",
"ec2:Describe*"
],
Resource: "*",
Effect: "Allow"
}
]
}
)
}
We need to deploy the EKS cluster in network resources so let’s create them in a network.tf file:
data "aws_vpc" "eks-vpc" {
filter {
name = "tag:Name"
values = [ var.eks_vpc_name ]
}
}
data "aws_nat_gateway" "nat-gateway" {
count = length(var.nat_gateway_ids)
id = var.nat_gateway_ids[count.index]
}
data "aws_route_table" "eks-route-table-public" {
# There is a single route table for all public subnet
filter {
name = "tag:Name"
values = [ var.route_table_name_public ]
}
}
data "aws_route_table" "eks-route-table-private" {
count = length(var.route_table_name_private)
filter {
name = "tag:Name"
values = [ var.route_table_name_private[count.index] ]
}
}
data "aws_internet_gateway" "internet-gateway" {
filter {
name = "tag:Name"
values = [ var.internet_gateway_name ]
}
}
resource "aws_route_table_association" "rt-association-private" {
count = length(aws_subnet.eks-subnet-private)
subnet_id = aws_subnet.eks-subnet-private[count.index].id
route_table_id = data.aws_route_table.eks-route-table-private[count.index].id
}
resource "aws_route_table_association" "rt-association-public" {
count = length(aws_subnet.eks-subnet-public)
subnet_id = aws_subnet.eks-subnet-public[count.index].id
route_table_id = data.aws_route_table.eks-route-table-public.id
}
resource "aws_security_group" "eks-scraly-cluster-sg" {
name = "eks-scraly-cluster-sg"
description = "Cluster communication with worker nodes"
vpc_id = data.aws_vpc.eks-vpc.id
egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
tags = local.tags
}
################
# Public subnet
################
resource "aws_subnet" "eks-subnet-public" {
count = length(var.subnet_public_cidr)
vpc_id = data.aws_vpc.eks-vpc.id
cidr_block = var.subnet_public_cidr[count.index]
availability_zone = length(regexall("^[a-z]{2}-", element(local.azs, count.index))) > 0 ? element(local.azs, count.index) : null
availability_zone_id = length(regexall("^[a-z]{2}-", element(local.azs, count.index))) == 0 ? element(local.azs, count.index) : null
map_public_ip_on_launch = true
tags = merge(
{
"Name" = format(
"%s-public-%s",
var.eks_subnet_prefix,
element(local.azs, count.index),
),
"kubernetes.io/cluster/${local.cluster_name}" = "shared"
},
local.tags,
local.public_subnet_tags
)
}
#################
# Private subnet
#################
resource "aws_subnet" "eks-subnet-private" {
count = length(var.subnet_public_cidr)
vpc_id = data.aws_vpc.eks-vpc.id
cidr_block = var.subnet_private_cidr[count.index]
availability_zone = length(regexall("^[a-z]{2}-", element(local.azs, count.index))) > 0 ? element(local.azs, count.index) : null
availability_zone_id = length(regexall("^[a-z]{2}-", element(local.azs, count.index))) == 0 ? element(local.azs, count.index) : null
map_public_ip_on_launch = false
tags = merge(
{
"Name" = format(
"%s-private-%s",
var.eks_subnet_prefix,
element(local.azs, count.index),
),
"kubernetes.io/cluster/${local.cluster_name}" = "shared"
},
local.tags,
local.private_subnet_tags
)
}
Now we will create a locals.tf file in which we will create a tricks to generate the future kubeconfig file. Yes, the file we need in order to access to our future kubernetes cluster :-).
# Get the Availability Zones to automatically set default subnets number (1 in each AZ)
data "aws_availability_zones" "az_list" {}
locals {
azs = data.aws_availability_zones.az_list.names
tags = {
Application = "${var.tag_application}"
Contact = "${var.tag_contact}"
Tool = « Terrafor »m
}
cluster_name = var.eks_cluster_name
public_subnet_tags = {
Description = "Public subnet for ${var.tag_application} environment"
Usage = "Public"
Type = "Public"
"kubernetes.io/role/elb" = "1"
}
private_subnet_tags = {
Description = "Private subnet for ${var.tag_application} environment"
Usage = "Private"
Type = "Private"
}
kubeconfig = <<KUBECONFIG
apiVersion: v1
clusters:
- cluster:
certificate-authority-data: ${aws_eks_cluster.eks-scraly-cluster.certificate_authority.0.data}
server: ${aws_eks_cluster.eks-scraly-cluster.endpoint}
name: ${var.eks_cluster_name}
contexts:
- context:
cluster: ${var.eks_cluster_name}
namespace: kube-system
user: ${var.eks_cluster_name}
name: ${var.eks_cluster_name}
current-context: ${var.eks_cluster_name}
kind: Config
preferences: {}
users:
- name: ${var.eks_cluster_name}
user:
exec:
apiVersion: client.authentication.k8s.io/v1alpha1
args:
- token
- -i
- eks-scraly-cluster
command: aws-iam-authenticator
env: null
KUBECONFIG
}
We now need to output some useful informations (like kubeconfig file content), so let’s define them in outputs.tf file:
output "endpoint" {
value = aws_eks_cluster.eks-scraly-cluster.endpoint
}
output "kubeconfig" {
value = "${local.kubeconfig}"
}
output "arn_external_dns_role" {
value = aws_iam_role.eks-scraly-cluster-externalDns-role.arn
}
If we want to create a new Route53 route, you can create a route53.tf file, but personally I will use an existing one.
Cluster will be accessible through *.scraly.com
Route53.
We use some input variables in our resources so let’s cretae variables.tf file:
variable "region" {
default = "eu-central-1"
}
variable "vpc_cidr" {
description = "CIDR of the VPC to create. Example : 10.1.0.0/22"
default = "10.xx.x.x/16"
}
variable "subnet_public_cidr" {
description = "For public ips"
type = list(string)
default = ["10.xx.x.x/24","10.xx.x.x/24","10.xx.x.x/24"]
}
variable "subnet_private_cidr" {
description = "For private ips"
type = list(string)
default = ["10.xx.x.x/24","10.xx.x.x/24","10.xx.x.x/24"]
}
variable "eks_vpc_id" {
default = "vpc-123456"
}
variable "eks_vpc_name" {
default = "our_existing_vpc"
}
variable "eks_subnet_prefix" {
default = "eks-scraly"
}
variable "nat_gateway_ids" {
description = "The NAT gateway to be used by the EKS worker to reach the internet"
default = [
"nat-lalala",
"nat-scraly"
]
}
variable "internet_gateway_name" {
default = "ig_scraly
}
variable "route_table_name_public" {
default = "scraly
}
variable "route_table_name_private" {
default = [
"scraly-private-eu-central-1a",
"scraly-private-eu-central-1b",
"scraly-private-eu-central-1c"
]
}
# -------------------------------------------
# EKS
# -------------------------------------------
variable "eks_cluster_name" {
default = "eks-scraly-cluster"
}
variable "eks_version" {
default = "1.15" # Kubernetes v1.15
}
variable "eks_resource_prefix" {
default = "eks-scraly-cluster"
}
variable "eks_public_access_cidrs" {
description = "List of CIDR blocks which can access the EKS public API server endpoint when enabled"
default = [
"xx.xx.xx.xx/32",
"xx.xx.xx.xx/32"
]
}
# -------------------------------------------
# Tags
# -------------------------------------------
variable "tag_application" {
default = "eks-scraly-cluster"
}
variable "tag_contact" {
default = "scraly@mail.com"
}
Our resources have been defined, so, now the only thing to do is to deploy our infra with theses following command lines:
$ cd "terraform/"
$ terraform init -reconfigure
# Apply
$ terraform apply -input=false -auto-approve
data.external.thumb: Refreshing state...
data.aws_caller_identity.current: Refreshing state...
data.aws_route_table.eks-route-table-public: Refreshing state...
data.aws_availability_zones.az_list: Refreshing state...
data.aws_internet_gateway.internet-gateway: Refreshing state...
data.aws_route_table.eks-route-table-private[1]: Refreshing state...
data.aws_route_table.eks-route-table-private[2]: Refreshing state...
data.aws_route_table.eks-route-table-private[0]: Refreshing state...
data.aws_route53_zone.scraly-hosted-zone: Refreshing state...
data.aws_nat_gateway.nat-gateway[0]: Refreshing state...
data.aws_nat_gateway.nat-gateway[2]: Refreshing state...
data.aws_nat_gateway.nat-gateway[1]: Refreshing state...
data.aws_vpc.eks-vpc: Refreshing state...
aws_iam_policy.certmanager_route53_iam_policy: Creating...
...
aws_eks_node_group.eks-scraly-cluster-node-group: Still creating... [5m40s elapsed]
aws_eks_node_group.eks-scraly-cluster-node-group: Creation complete after 5m50s [id=eks-scraly-cluster:eks-scraly-cluster-node-group]
Apply complete! Resources: 33 added, 0 changed, 0 destroyed.
Outputs:
arn_external_dns_role = arn:aws:iam::<aws_account_id>:role/eks-scraly-cluster-ServiceRole-ExternalDns
endpoint = https://1234567891234.ab1.eu-central-1.eks.amazonaws.com
kubeconfig =
...
Cool, we have an EKS Kubernetes cluster deployed in our AWS account!
Let’s verify it in your AWS console interface (or with aws CLI).
Pfiou… it was not easiest than with eksctl but with Terraform we can now deploy multiples EKS clusters with differents variables, in different environments, with different Kubernetes versions, in different network VPC/IG/Subnets…
Let's go in the part 2 of this article for EKS configuration, deployment of Traefik, external-dns, HTTPBin secured through mTLS.
And if you are interested in Terraform, I created a cheat sheet:
https://github.com/scraly/terraform-cheat-sheet/blob/master/terraform-cheat-sheet.pdf