Deploying a highly available Vault cluster on Amazon EKS using Terraform

Chabane R. - Apr 15 '21 - - Dev Community

Many companies moving to the cloud want to continue working with legacy tools to:

  • avoid vendor lock-in,
  • use the existing skill and process,
  • take advantage of the multi-cloud strategy,
  • and so on.

Among companies that have used Vault in their on-premises environment, many continue to use it after their migration to the cloud.

Vault is a tool for securely accessing secrets. A secret is anything that you want to tightly control access to, such as API keys, passwords, or certificates. Vault provides a unified interface to any secret, while providing tight access control and recording a detailed audit log. [1]

In this post we will deploy step by step a Vault cluster on Amazon Amazon Elastic Container Kubernetes.

Using terraform we will deploy:

  • A highly available architecture that spans three Availability Zones.
  • A virtual private cloud (VPC) configured with public and private subnets according to AWS best practices.
  • In the public subnets:
    • Managed network address translation (NAT) gateways to allow outbound internet access for resources in the private subnets.
  • In the private subnets:
    • A group of Kubernetes nodes.
    • An Amazon EKS cluster, which provides the Kubernetes control plane.

Alt Text

To deploy the Vault cluster, we create in AWS:

  • An Elastic Load Balancer for the Vault UI.
  • An AWS Certificate Manager (ACM) certificate for the Vault UI.
  • A boot-vault IAM role to bootstrap the Vault servers.
  • A vault-server IAM role for Vault to access AWS Key Management Service (AWS KMS) for auto unseal.
  • AWS Secrets Manager to store the Vault on Amazon EKS root secret.
  • An AWS KMS key for auto unseal.

In Kubernetes:

  • A dedicated node group for Vault on Amazon EKS.
  • A dedicated namespace for Vault on Amazon EKS.
  • An internal Vault TLS certificate and certificate authority for securing communications.
  • For the Vault service:
    • Vault server pods.
    • A Vault UI.

Alt Text

If you prefer to use AWS Cloudformation instead of Terraform, the equivalent workshop can be found in aws-quickstart

Prerequisites

Network

In this section, we create a VPC, 3 private and public subnets, 3 NAT Gateways and an internet gateway.

plan/vpc.tf

resource "aws_vpc" "security" {
  cidr_block           = var.vpc_cidr_block
  instance_tenancy     = "default"
  enable_dns_support   = true
  enable_dns_hostnames = true

  tags = {
    Environment = "core"
    Name        = "security"
  }

  lifecycle {
    ignore_changes = [tags]
  }
}

resource "aws_default_security_group" "defaul" {
    vpc_id = aws_vpc.security.id
}
Enter fullscreen mode Exit fullscreen mode

plan/subnet.tf

resource "aws_subnet" "private" {
  for_each = {
    for subnet in local.private_nested_config : "${subnet.name}" => subnet
  }

  vpc_id                  = aws_vpc.security.id
  cidr_block              = each.value.cidr_block
  availability_zone       = var.az[index(local.private_nested_config, each.value)]
  map_public_ip_on_launch = false

  tags = {
    Environment                       = "security"
    Name                              = each.value.name
    "kubernetes.io/role/internal-elb" = 1
  }

  lifecycle {
    ignore_changes = [tags]
  }
}

resource "aws_subnet" "public" {
  for_each = {
    for subnet in local.public_nested_config : "${subnet.name}" => subnet
  }

  vpc_id                  = aws_vpc.security.id
  cidr_block              = each.value.cidr_block
  availability_zone       = var.az[index(local.public_nested_config, each.value)]
  map_public_ip_on_launch = true

  tags = {
    Environment              = "security"
    Name                     = each.value.name
    "kubernetes.io/role/elb" = 1
  }

  lifecycle {
    ignore_changes = [tags]
  }
}

Enter fullscreen mode Exit fullscreen mode

plan/igw.tf

resource "aws_internet_gateway" "igw" {
  vpc_id = aws_vpc.security.id

  tags = {
    Environment = "core"
    Name        = "igw-security"
  }
}

resource "aws_route_table" "public" {
  vpc_id = aws_vpc.security.id

  route {
    cidr_block = "0.0.0.0/0"
    gateway_id = aws_internet_gateway.igw.id
  }

  tags = {
    Environment = "core"
    Name        = "rt-public-security"
  }
}

resource "aws_route_table_association" "public" {
  for_each = {
    for subnet in local.public_nested_config : "${subnet.name}" => subnet
  }

  subnet_id      = aws_subnet.public[each.value.name].id
  route_table_id = aws_route_table.public.id
}
Enter fullscreen mode Exit fullscreen mode

plan/nat.tf

resource "aws_eip" "nat" {
  for_each = {
    for subnet in local.public_nested_config : "${subnet.name}" => subnet
  }

  vpc = true

  tags = {
    Environment = "core"
    Name        = "eip-${each.value.name}"
  }
}

resource "aws_nat_gateway" "nat-gw" {
  for_each = {
    for subnet in local.public_nested_config : "${subnet.name}" => subnet
  }

  allocation_id = aws_eip.nat[each.value.name].id
  subnet_id     = aws_subnet.public[each.value.name].id

  tags = {
    Environment = "core"
    Name        = "nat-${each.value.name}"
  }
}

resource "aws_route_table" "private" {
  for_each = {
    for subnet in local.public_nested_config : "${subnet.name}" => subnet
  }

  vpc_id = aws_vpc.security.id

  route {
    cidr_block     = "0.0.0.0/0"
    nat_gateway_id = aws_nat_gateway.nat-gw[each.value.name].id
  }

  tags = {
    Environment = "core"
    Name        = "rt-${each.value.name}"
  }
}

resource "aws_route_table_association" "private" {

  for_each = {
    for subnet in local.private_nested_config : "${subnet.name}" => subnet
  }

  subnet_id      = aws_subnet.private[each.value.name].id
  route_table_id = aws_route_table.private[each.value.associated_public_subnet].id
}
Enter fullscreen mode Exit fullscreen mode

Amazon EKS

In this section we create our Kubernetes cluster with the following settings:

  • restrict access to a specific IP (it could be your office range IPs) and to the NAT gateways IPs (if you want to access the vault from a CI / CD tool hosted in this VPC)
  • enable all logs
  • enable IAM roles for service accounts
  • security groups for the cluster

plan/eks-cluster.tf

resource "aws_eks_cluster" "security" {
  name     = var.eks_cluster_name
  role_arn = aws_iam_role.eks.arn

  version = "1.17"

  vpc_config {
    security_group_ids      = [aws_security_group.eks_cluster.id]
    endpoint_private_access = true
    endpoint_public_access  = true
    public_access_cidrs     = concat([var.authorized_source_ranges], [for n in aws_eip.nat : "${n.public_ip}/32"])
    subnet_ids              = concat([for s in aws_subnet.private : s.id], [for s in aws_subnet.public : s.id])
  }

  enabled_cluster_log_types = ["api", "audit", "authenticator", "controllerManager", "scheduler"]

  depends_on = [
    aws_iam_role_policy_attachment.eks-AmazonEKSClusterPolicy,
    aws_iam_role_policy_attachment.eks-AmazonEKSVPCResourceController,
    aws_iam_role_policy_attachment.eks-AmazonEKSServicePolicy
  ]

  tags = {
    Environment = "core"
  }
}

resource "aws_iam_role" "eks" {
  name = var.eks_cluster_name

  assume_role_policy = <<EOF
{
    "Version": "2012-10-17",
    "Statement": [
        {
        "Effect": "Allow",
        "Principal": {
            "Service": "eks.amazonaws.com"
        },
        "Action": "sts:AssumeRole"
        }
    ]
}
  EOF
}

data "tls_certificate" "cert" {
  url = aws_eks_cluster.security.identity[0].oidc[0].issuer
}

resource "aws_iam_openid_connect_provider" "openid" {
  client_id_list  = ["sts.amazonaws.com"]
  thumbprint_list = [data.tls_certificate.cert.certificates[0].sha1_fingerprint]
  url             = aws_eks_cluster.security.identity[0].oidc[0].issuer
}

resource "aws_iam_role_policy_attachment" "eks-AmazonEKSClusterPolicy" {
  policy_arn = "arn:aws:iam::aws:policy/AmazonEKSClusterPolicy"
  role       = aws_iam_role.eks.name
}

resource "aws_iam_role_policy_attachment" "eks-AmazonEKSServicePolicy" {
  policy_arn = "arn:aws:iam::aws:policy/AmazonEKSServicePolicy"
  role       = aws_iam_role.eks.name
}

resource "aws_security_group" "eks_cluster" {
  name        = "${var.eks_cluster_name}/ControlPlaneSecurityGroup"
  description = "Communication between the control plane and worker nodegroups"
  vpc_id      = aws_vpc.security.id

  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }

  tags = {
    Name        = "${var.eks_cluster_name}/ControlPlaneSecurityGroup"
  }
}

resource "aws_security_group_rule" "cluster_inbound" {
  description              = "Allow unmanaged nodes to communicate with control plane (all ports)"
  from_port                = 0
  protocol                 = "-1"
  security_group_id        = aws_eks_cluster.security.vpc_config[0].cluster_security_group_id
  source_security_group_id = aws_security_group.eks_nodes.id
  to_port                  = 0
  type                     = "ingress"
}
Enter fullscreen mode Exit fullscreen mode

Here we create two nodegroups, one private and one public.

plan/eks-nodegroup.tf

resource "aws_eks_node_group" "private" {
  cluster_name    = aws_eks_cluster.security.name
  node_group_name = "private-node-group-security"
  node_role_arn   = aws_iam_role.node-group.arn
  subnet_ids      = [for s in aws_subnet.private : s.id]

  labels          = {
    "type" = "private"
  }

  instance_types = ["t3.small"]

  scaling_config {
    desired_size = 3
    max_size     = 5
    min_size     = 3
  }

  depends_on = [
    aws_iam_role_policy_attachment.node-group-AmazonEKSWorkerNodePolicy,
    aws_iam_role_policy_attachment.node-group-AmazonEKS_CNI_Policy,
    aws_iam_role_policy_attachment.node-group-AmazonEC2ContainerRegistryReadOnly
  ]

  tags = {
    Environment = "core"
  }
}

resource "aws_eks_node_group" "public" {
  cluster_name    = aws_eks_cluster.security.name
  node_group_name = "public-node-group-security"
  node_role_arn   = aws_iam_role.node-group.arn
  subnet_ids      = [for s in aws_subnet.public : s.id]

  labels          = {
    "type" = "public"
  }

  instance_types = ["t3.small"]

  scaling_config {
    desired_size = 1
    max_size     = 3
    min_size     = 1
  }

  depends_on = [
    aws_iam_role_policy_attachment.node-group-AmazonEKSWorkerNodePolicy,
    aws_iam_role_policy_attachment.node-group-AmazonEKS_CNI_Policy,
    aws_iam_role_policy_attachment.node-group-AmazonEC2ContainerRegistryReadOnly,
  ]

  tags = {
    Environment = "core"
  }
}

resource "aws_iam_role" "node-group" {
  name = "eks-node-group-role-security"

  assume_role_policy = jsonencode({
    Statement = [{
      Action = "sts:AssumeRole"
      Effect = "Allow"
      Principal = {
        Service = "ec2.amazonaws.com"
      }
    }]
    Version = "2012-10-17"
  })
}

resource "aws_iam_role_policy_attachment" "node-group-AmazonEKSWorkerNodePolicy" {
  policy_arn = "arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy"
  role       = aws_iam_role.node-group.name
}

resource "aws_iam_role_policy_attachment" "node-group-AmazonEKS_CNI_Policy" {
  policy_arn = "arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy"
  role       = aws_iam_role.node-group.name
}

resource "aws_iam_role_policy_attachment" "node-group-AmazonEC2ContainerRegistryReadOnly" {
  policy_arn = "arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly"
  role       = aws_iam_role.node-group.name
}

resource "aws_iam_role_policy" "node-group-ClusterAutoscalerPolicy" {
  name = "eks-cluster-auto-scaler"
  role = aws_iam_role.node-group.id

  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Action = [
            "autoscaling:DescribeAutoScalingGroups",
            "autoscaling:DescribeAutoScalingInstances",
            "autoscaling:DescribeLaunchConfigurations",
            "autoscaling:DescribeTags",
            "autoscaling:SetDesiredCapacity",
            "autoscaling:TerminateInstanceInAutoScalingGroup"
        ]
        Effect   = "Allow"
        Resource = "*"
      },
    ]
  })
}

resource "aws_security_group" "eks_nodes" {
  name        = "${var.eks_cluster_name}/ClusterSharedNodeSecurityGroup"
  description = "Communication between all nodes in the cluster"
  vpc_id      = aws_vpc.security.id

  ingress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    self        = true
  }

  ingress {
    from_port       = 0
    to_port         = 0
    protocol        = "-1"
    security_groups = [aws_eks_cluster.security.vpc_config[0].cluster_security_group_id]
  }

  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }

  tags = {
    Name        = "${var.eks_cluster_name}/ClusterSharedNodeSecurityGroup"
    Environment = "core"
  }
}
Enter fullscreen mode Exit fullscreen mode

Vault

In this section, we create the AWS resources needed to allow Vault Cluster to access Secret Manager, CloudWatch logs, and KMS keys. We also create a RecordSet on Route53 to access vault-ui. We upload the necessary scripts to the S3 bucket.

plan/vault.tf

resource "aws_iam_role" "vault-unseal" {
  name = "vault-unseal"

  assume_role_policy = jsonencode({
        "Version": "2012-10-17",
        "Statement": [
            {
                "Effect": "Allow",
                "Principal": {
                    "Federated": aws_iam_openid_connect_provider.openid.arn
                },
                "Action": "sts:AssumeRoleWithWebIdentity",
                "Condition": {
                    "StringEquals": {
                        "${replace(aws_iam_openid_connect_provider.openid.url, "https://", "")}:sub": "system:serviceaccount:vault-server:vault"
                    }
                }
            }
        ]
    })

  tags = {
    Environment = "core"
  }
}

resource "aws_iam_role_policy" "vault-unseal" {
  name = "vault-unseal"
  role = aws_iam_role.vault-unseal.id

  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Action = [
          "iam:GetRole",
        ]
        Effect   = "Allow"
        Resource = "arn:aws:secretsmanager:${var.region}:${data.aws_caller_identity.current.account_id}:role/vault-unseal"
      },
      {
        Action = [
          "kms:*",
        ]
        Effect   = "Allow"
        Resource = "*"
      }
    ]
  })
}

resource "aws_iam_role" "vault" {
  name = "vault"

  assume_role_policy = jsonencode({
        "Version": "2012-10-17",
        "Statement": [
            {
                "Effect": "Allow",
                "Principal": {
                    "Federated": aws_iam_openid_connect_provider.openid.arn
                },
                "Action": "sts:AssumeRoleWithWebIdentity",
                "Condition": {
                    "StringEquals": {
                        "${replace(aws_iam_openid_connect_provider.openid.url, "https://", "")}:sub": "system:serviceaccount:vault-server:boot-vault"
                    }
                }
            }
        ]
    })

  tags = {
    Environment = "core"
  }
}

resource "aws_iam_role_policy" "vault" {
  name   = "vault"
  role   = aws_iam_role.vault.id

  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Action   = [
          "logs:CreateLogStream",
          "logs:DescribeLogStreams"
        ]
        Effect   = "Allow"
        Resource = "arn:aws:logs:${var.region}:${data.aws_caller_identity.current.account_id}:log-group:vault-audit-logs"
      },
      {
        Action   = [
          "logs:PutLogEvents",
        ]
        Effect   = "Allow"
        Resource = "arn:aws:logs:${var.region}:${data.aws_caller_identity.current.account_id}:log-group:vault-audit-logs:log-stream:*"
      },
      {
        Action   = [
          "ec2:DescribeInstances",
        ]
        Effect   = "Allow"
        Resource = "*"
      },
      {
        Action   = [
          "s3:*",
        ]
        Effect   = "Allow"
        Resource = "*"
      },
      {
        Action   = [
          "secretsmanager:UpdateSecretVersionStage",
          "secretsmanager:UpdateSecret",
          "secretsmanager:PutSecretValue",
          "secretsmanager:GetSecretValue"
        ]
        Effect   = "Allow"
        Resource = aws_secretsmanager_secret.vault-secret.arn
      },
      {
        Action   = [
          "iam:GetRole"
        ]
        Effect   = "Allow"
        Resource = "arn:aws:secretsmanager:${var.region}:${data.aws_caller_identity.current.account_id}:role/vault"
      }
    ]
  })
}

resource "aws_kms_key" "vault-kms" {
  description             = "Vault Seal/Unseal key"
  deletion_window_in_days = 7

  policy = <<EOT
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "Enable IAM User Permissions",
      "Action": [
        "kms:*"
      ],
      "Principal": {
        "AWS": "arn:aws:iam::${data.aws_caller_identity.current.account_id}:root"
      },
      "Effect": "Allow",
      "Resource": "*"
    },
    {
      "Sid": "Allow administration of the key",
      "Action": [
        "kms:Create*",
        "kms:Describe*",
        "kms:Enable*",
        "kms:List*",
        "kms:Put*",
        "kms:Update*",
        "kms:Revoke*",
        "kms:Disable*",
        "kms:Get*",
        "kms:Delete*",
        "kms:ScheduleKeyDeletion",
        "kms:CancelKeyDeletion"
      ],
      "Effect": "Allow",
      "Resource": "arn:aws:iam::${data.aws_caller_identity.current.account_id}:root",
      "Principal": {
        "AWS": [
            "arn:aws:iam::${data.aws_caller_identity.current.account_id}:role/vault",
            "arn:aws:iam::${data.aws_caller_identity.current.account_id}:role/vault-unseal"
        ]
       }
    },
    {
      "Sid": "Allow use of the key",
      "Action": [
        "kms:DescribeKey",
        "kms:Encrypt",
        "kms:Decrypt",
        "kms:ReEncrypt*",
        "kms:GenerateDataKey",
        "kms:GenerateDataKeyWithoutPlaintext"
      ],
      "Principal": {
        "AWS": [
            "arn:aws:iam::${data.aws_caller_identity.current.account_id}:role/vault",
            "arn:aws:iam::${data.aws_caller_identity.current.account_id}:role/vault-unseal"
        ]
      },
      "Effect": "Allow",
      "Resource": "*"
    }
  ]

}
EOT
}

resource "random_string" "vault-secret-suffix" {
  length  = 5
  special = false
  upper   = false
}

resource "aws_secretsmanager_secret" "vault-secret" {
  name        = "vault-secret-${random_string.vault-secret-suffix.result}"
  kms_key_id  = aws_kms_key.vault-kms.key_id
  description = "Vault Root/Recovery key"
}

resource "aws_route53_record" "vault" {
  zone_id    = data.aws_route53_zone.public.zone_id
  name       = "vault.${var.public_dns_name}"
  type       = "CNAME"
  ttl        = "300"
  records    = [data.kubernetes_service.vault-ui.status.0.load_balancer.0.ingress.0.hostname]

  depends_on = [
    kubernetes_job.vault-initialization,
    helm_release.vault,
    data.kubernetes_service.vault-ui
  ]
}

resource "aws_s3_bucket" "vault-scripts" {
  bucket = "bucket-${data.aws_caller_identity.current.account_id}-${var.region}-vault-scripts"
  acl    = "private"

  tags = {
    Name        = "Vault Scripts"
    Environment = "core"
  }
}

resource "aws_s3_bucket_object" "vault-script-bootstrap" {
  bucket = aws_s3_bucket.vault-scripts.id
  key    = "scripts/bootstrap.sh"
  source = "scripts/bootstrap.sh"
  etag = filemd5("scripts/bootstrap.sh")
}

resource "aws_s3_bucket_object" "vault-script-certificates" {
  bucket = aws_s3_bucket.vault-scripts.id
  key    = "scripts/certificates.sh"
  source = "scripts/certificates.sh"
  etag = filemd5("scripts/certificates.sh")
}
Enter fullscreen mode Exit fullscreen mode

Here we create our Kubernetes resources to initialize and deploy the Vault cluster.

plan/k8s.tf

resource "kubernetes_namespace" "vault-server" {
  metadata {
    name = "vault-server"
  }
}

data "template_file" "vault-values" {
  template = <<EOF
        global:
          tlsDisable: false
        ui:
          enabled: true
          externalPort: 443
          serviceType: "LoadBalancer"
          loadBalancerSourceRanges:
          - ${var.authorized_source_ranges}
          - ${aws_eip.nat["public-security-1"].public_ip}/32
          - ${aws_eip.nat["public-security-2"].public_ip}/32
          - ${aws_eip.nat["public-security-3"].public_ip}/32
          annotations: |
            service.beta.kubernetes.io/aws-load-balancer-ssl-cert: ${var.acm_vault_arn}
            service.beta.kubernetes.io/aws-load-balancer-backend-protocol: https
            service.beta.kubernetes.io/aws-load-balancer-ssl-ports: "443,8200"
            service.beta.kubernetes.io/do-loadbalancer-healthcheck-path: "/ui/"
            service.beta.kubernetes.io/aws-load-balancer-internal: "false"
            external-dns.alpha.kubernetes.io/hostname: "vault.${var.public_dns_name}"
            external-dns.alpha.kubernetes.io/ttl: "30"
        server:
          nodeSelector: |
            eks.amazonaws.com/nodegroup: private-node-group-security
          extraEnvironmentVars:
            VAULT_CACERT: /vault/userconfig/vault-server-tls/vault.ca
          extraVolumes:
          - type: secret
            name: vault-server-tls
          image:
            repository: "vault"
            tag: "1.6.0"
          logLevel: "debug"
          serviceAccount:
            annotations: |
              eks.amazonaws.com/role-arn: "arn:aws:iam::${data.aws_caller_identity.current.account_id}:role/vault-unseal"
          extraEnvironmentVars: 
            AWS_ROLE_SESSION_NAME: some_name
          ha:
            enabled: true
            nodes: 3
            raft:
              enabled: true
              setNodeId: true
              config: |
                ui = true

                listener "tcp" {
                  tls_disable = 0
                  tls_cert_file = "/vault/userconfig/vault-server-tls/vault.crt"
                  tls_key_file  = "/vault/userconfig/vault-server-tls/vault.key"
                  tls_client_ca_file = "/vault/userconfig/vault-server-tls/vault.ca"
                  address = "[::]:8200"
                  cluster_address = "[::]:8201"
                }

                storage "raft" {
                  path    = "/vault/data"
                }

                service_registration "kubernetes" {}

                seal "awskms" {
                  region     = "${var.region}"
                  kms_key_id = "${aws_kms_key.vault-kms.key_id}"
                }
   EOF
}

resource "helm_release" "vault" {
  name       = "vault"

  chart      = "hashicorp/vault"
  values     = [data.template_file.vault-values.rendered]

  namespace  = "vault-server"

  depends_on = [kubernetes_job.vault-certificate]
}

resource "kubernetes_cluster_role" "boot-vault" {
  metadata {
    name = "boot-vault"
  }

  rule {
    api_groups = [""]
    resources  = ["pods/exec", "pods", "pods/log", "secrets", "tmp/secrets"]
    verbs      = ["get", "list", "create"]
  }

  rule {
    api_groups = ["certificates.k8s.io"]
    resources  = ["certificatesigningrequests", "certificatesigningrequests/approval"]
    verbs      = ["get", "list", "create", "update"]
  }
}

resource "kubernetes_service_account" "boot-vault" {
  metadata {
    name = "boot-vault"
    namespace = "vault-server"
    labels = {
      "app.kubernetes.io/name" = "boot-vault"
    }
    annotations = {
      "eks.amazonaws.com/role-arn" = aws_iam_role.vault.arn
    }
  }
}

resource "kubernetes_job" "vault-initialization" {
  metadata {
    name = "boot-vault"
    namespace = "vault-server"
  }
  spec {
    template {
      metadata {}
      spec {
        container {
          name    = "boot-vault"
          image   = "amazonlinux"
          command = ["/bin/bash","-c"]
          args    = ["sleep 15; yum install -y awscli 2>&1 > /dev/null; export AWS_REGION=${var.region}; aws sts get-caller-identity; aws s3 cp $(S3_SCRIPT_URL) ./script.sh; chmod +x ./script.sh; ./script.sh"]
          env {
            name  = "S3_SCRIPT_URL"
            value = "s3://${aws_s3_bucket.vault-scripts.id}/scripts/bootstrap.sh"
          }
          env {
            name  = "VAULT_SECRET"
            value = aws_secretsmanager_secret.vault-secret.arn
          }
        }
        service_account_name = "boot-vault"
        restart_policy = "Never"
      }
    }
    backoff_limit = 0
  }

  depends_on = [
    kubernetes_job.vault-certificate,
    helm_release.vault,
    aws_s3_bucket_object.vault-script-bootstrap
  ]
}

resource "kubernetes_job" "vault-certificate" {
  metadata {
    name      = "certificate-vault"
    namespace = "vault-server"
  }
  spec {
    template {
      metadata {}
      spec {
        container {
          name    = "certificate-vault"
          image   = "amazonlinux"
          command = ["/bin/bash","-c"]
          args    = ["sleep 15; yum install -y awscli 2>&1 > /dev/null; export AWS_REGION=${var.region}; export NAMESPACE='vault-server'; aws sts get-caller-identity; aws s3 cp $(S3_SCRIPT_URL) ./script.sh; chmod +x ./script.sh; ./script.sh"]
          env {
            name  = "S3_SCRIPT_URL"
            value = "s3://${aws_s3_bucket.vault-scripts.id}/scripts/certificates.sh"
          }
        }
        service_account_name = "boot-vault"
        restart_policy       = "Never"
      }
    }
    backoff_limit = 0
  }

  depends_on = [
    aws_eks_node_group.private,
    aws_s3_bucket_object.vault-script-certificates
  ]
}

resource "kubernetes_cluster_role_binding" "boot-vault" {
  metadata {
    name = "boot-vault"
    labels = {
        "app.kubernetes.io/name": "boot-vault"
    }
  }
  role_ref {
    api_group = "rbac.authorization.k8s.io"
    kind      = "ClusterRole"
    name      = "boot-vault"
  }
  subject {
    kind      = "ServiceAccount"
    name      = "boot-vault"
    namespace = "vault-server"
  }
}

data "kubernetes_service" "vault-ui" {
  metadata {
    name      = "vault-ui"
    namespace = "vault-server"
  }
  depends_on = [
    kubernetes_job.vault-initialization,
    helm_release.vault
  ]
}
Enter fullscreen mode Exit fullscreen mode

The following script is used to create the vault-server-tls certificate.

plan/scripts/certificates.sh

#!/bin/bash -e

# SERVICE is the name of the Vault service in Kubernetes.
# It does not have to match the actual running service, though it may help for consistency.
SERVICE=vault
SECRET_NAME=vault-server-tls
# TMPDIR is a temporary working directory.
TMPDIR=/tmp
# Sleep timer
SLEEP_TIME=15
# Name of the CSR
echo "Name the CSR: vault-csr"
export CSR_NAME=vault-csr

# Install OpenSSL
echo "Install openssl"
yum install -y openssl 2>&1

# Install Kubernetes cli
echo "Install Kubernetes cli"
curl -o kubectl https://amazon-eks.s3.us-west-2.amazonaws.com/1.16.8/2020-04-16/bin/linux/amd64/kubectl
chmod +x ./kubectl
mkdir -p $HOME/bin && cp ./kubectl $HOME/bin/kubectl && export PATH=$PATH:$HOME/bin
kubectl version --short --client

# Create a private key
echo "Generate certificate Private key"
openssl genrsa -out ${TMPDIR}/vault.key 2048

# Create CSR
echo "Create CSR file"
cat <<EOF >${TMPDIR}/csr.conf
[req]
req_extensions = v3_req
distinguished_name = req_distinguished_name
[req_distinguished_name]
[ v3_req ]
basicConstraints = CA:FALSE
keyUsage = nonRepudiation, digitalSignature, keyEncipherment
extendedKeyUsage = serverAuth
subjectAltName = @alt_names
[alt_names]
DNS.1 = ${SERVICE}
DNS.2 = ${SERVICE}.${NAMESPACE}
DNS.3 = ${SERVICE}.${NAMESPACE}.svc
DNS.4 = ${SERVICE}.${NAMESPACE}.svc.cluster.local
DNS.5 = vault-0.vault-internal
DNS.6 = vault-1.vault-internal
DNS.7 = vault-2.vault-internal
IP.1 = 127.0.0.1
EOF

# Sign the CSR
echo "Sign the CSR"
openssl req -new -key ${TMPDIR}/vault.key -subj "/CN=${SERVICE}.${NAMESPACE}.svc" -out ${TMPDIR}/server.csr -config ${TMPDIR}/csr.conf

echo "Create a CSR Manifest file"
cat <<EOF >${TMPDIR}/csr.yaml
apiVersion: certificates.k8s.io/v1beta1
kind: CertificateSigningRequest
metadata:
  name: ${CSR_NAME}
spec:
  groups:
  - system:authenticated
  request: $(cat ${TMPDIR}/server.csr | base64 | tr -d '\n')
  usages:
  - digital signature
  - key encipherment
  - server auth
EOF

echo "Create CSR from manifest file"
kubectl create -f ${TMPDIR}/csr.yaml

sleep ${SLEEP_TIME}
echo "Fetch the CSR from kubernetes"
kubectl get csr ${CSR_NAME}

# Approve Cert
echo "Approve the Certificate"
kubectl certificate approve ${CSR_NAME}

serverCert=$(kubectl get csr ${CSR_NAME} -n kubecf -o jsonpath='{.status.certificate}')
echo "${serverCert}" | openssl base64 -d -A -out ${TMPDIR}/vault.crt

echo "Fetch Kubernetes CA Certificate"
kubectl get secret -o jsonpath="{.items[?(@.type==\"kubernetes.io/service-account-token\")].data['ca\.crt']}" | base64 --decode > ${TMPDIR}/vault.ca 2>/dev/null || true

echo "Create secret containing the TLS Certificates and key"
echo kubectl create secret generic ${SECRET_NAME} \
        --namespace ${NAMESPACE} \
        --from-file=vault.key=${TMPDIR}/vault.key \
        --from-file=vault.crt=${TMPDIR}/vault.crt \
        --from-file=vault.ca=${TMPDIR}/vault.ca

kubectl create secret generic ${SECRET_NAME} \
        --namespace ${NAMESPACE} \
        --from-file=vault.key=${TMPDIR}/vault.key \
        --from-file=vault.crt=${TMPDIR}/vault.crt \
        --from-file=vault.ca=${TMPDIR}/vault.ca
Enter fullscreen mode Exit fullscreen mode

The following script is used to initialize vault

plan/scripts/bootstrap.sh

#!/bin/bash
VAULT_NUMBER_OF_KEYS_FOR_UNSEAL=3
VAULT_NUMBER_OF_KEYS=5

SLEEP_SECONDS=15
PROTOCOL=https
VAULT_PORT=8200
VAULT_0=vault-0.vault-internal

get_secret () {
    local value=$(aws secretsmanager --region ${AWS_REGION} get-secret-value --secret-id "$1" | jq --raw-output .SecretString)
    echo $value
}

# Install JQ as we use it later on
yum install -y jq 2>&1 >/dev/null

# Give the Helm chart a chance to get started
echo "Sleeping for ${SLEEP_SECONDS} seconds"
sleep ${SLEEP_SECONDS} # Allow helm chart some time 

# Install Kubernetes cli
curl -o kubectl https://amazon-eks.s3.us-west-2.amazonaws.com/1.16.8/2020-04-16/bin/linux/amd64/kubectl
chmod +x ./kubectl
mkdir -p $HOME/bin && cp ./kubectl $HOME/bin/kubectl && export PATH=$PATH:$HOME/bin
kubectl version --short --client

until curl -k -fs -o /dev/null ${PROTOCOL}://${VAULT_0}:8200/v1/sys/init; do
    echo "Waiting for Vault to start..."
    sleep 1
done

# See if vault is initialized
init=$(curl -fs -k ${PROTOCOL}://${VAULT_0}:8200/v1/sys/init | jq -r .initialised)

echo "Is vault initialized: '${init}'"

if [ "$init" != "false" ]; then
    echo "Initializing Vault"
    SECRET_VALUE=$(kubectl exec vault-0 -- "/bin/sh" "-c" "export VAULT_SKIP_VERIFY=true && vault operator init -recovery-shares=${VAULT_NUMBER_OF_KEYS} -recovery-threshold=${VAULT_NUMBER_OF_KEYS_FOR_UNSEAL}")
    echo "storing vault init values in secrets manager"
    aws secretsmanager put-secret-value --region ${AWS_REGION} --secret-id ${VAULT_SECRET} --secret-string "${SECRET_VALUE}"
else
    echo "Vault is already initialized"
fi

sealed=$(curl -fs -k ${PROTOCOL}://${VAULT_0}:8200/v1/sys/seal-status | jq -r .sealed)

# Should Auto unseal using KMS but this is for demonstration for manual unseal
if [ "$sealed" == "true" ]; then
    VAULT_SECRET_VALUE=$(get_secret ${VAULT_SECRET})
    root_token=$(echo ${VAULT_SECRET_VALUE} | awk '{ if (match($0,/Initial Root Token: (.*)/,m)) print m[1] }' | cut -d " " -f 1)
    for UNSEAL_KEY_INDEX in {1..${VAULT_NUMBER_OF_KEYS_FOR_UNSEAL}}
    do
            unseal_key+=($(echo ${VAULT_SECRET_VALUE} | awk '{ if (match($0,/Recovery Key '${UNSEAL_KEY_INDEX}': (.*)/,m)) print m[1] }'| cut -d " " -f 1))
    done

    echo "Unsealing Vault"
    # Handle variable number of unseal keys
    for UNSEAL_KEY_INDEX in {1..${VAULT_NUMBER_OF_KEYS_FOR_UNSEAL}}
    do
        kubectl exec vault-0 -- vault operator unseal $unseal_key[${UNSEAL_KEY_INDEX}]
    done
else
    echo "Vault is already unsealed"
fi

VAULT_SECRET_VALUE=$(get_secret ${VAULT_SECRET})
root_token=$(echo ${VAULT_SECRET_VALUE} | awk '{ if (match($0,/Initial Root Token: (.*)/,m)) print m[1] }' | cut -d " " -f 1)

# Show who we have joined
kubectl exec vault-0 -- "/bin/sh" "-c" "export VAULT_SKIP_VERIFY=true && vault login token=$root_token 2>&1 > /dev/null"  # Hide this output from the console

# Join other pods to the raft cluster
kubectl exec -t vault-1 -- "/bin/sh" "-c" "vault operator raft join -tls-skip-verify -leader-ca-cert=\"$(cat /var/run/secrets/kubernetes.io/serviceaccount/ca.crt)\" ${PROTOCOL}://${VAULT_0}:${VAULT_PORT}"
kubectl exec -t vault-2 -- "/bin/sh" "-c" "vault operator raft join -tls-skip-verify -leader-ca-cert=\"$(cat /var/run/secrets/kubernetes.io/serviceaccount/ca.crt)\" ${PROTOCOL}://${VAULT_0}:${VAULT_PORT}"

# Show who we have joined
kubectl exec -t vault-0 -- "/bin/sh" "-c" "export VAULT_SKIP_VERIFY=true && vault operator raft list-peers"
Enter fullscreen mode Exit fullscreen mode

Deployment

We've finished creating our terraform files, let's get ready for deployment!

plan/main.tf

data "aws_caller_identity" "current" {}

data "aws_route53_zone" "public" {
  name = "${var.public_dns_name}."
}
Enter fullscreen mode Exit fullscreen mode

plan/output.tf

output "eks-endpoint" {
    value = aws_eks_cluster.security.endpoint
}

output "kubeconfig-certificate-authority-data" {
    value = aws_eks_cluster.security.certificate_authority[0].data
}

output "eks_issuer_url" {
    value = aws_iam_openid_connect_provider.openid.url
}

output "vault_secret_name" {
    value = "vault-secret-${random_string.vault-secret-suffix.result}"
}

output "nat1_ip" {
    value = aws_eip.nat["public-security-1"].public_ip
}

output "nat2_ip" {
    value = aws_eip.nat["public-security-2"].public_ip
}

output "nat3_ip" {
    value = aws_eip.nat["public-security-3"].public_ip
}
Enter fullscreen mode Exit fullscreen mode

plan/variables.tf

variable "region" {
  type = string
}

variable "az" {
  type    = list(string)
  default = ["eu-west-1a", "eu-west-1b", "eu-west-1c"]
}

variable "vpc_cidr_block" {
  type = string
}

variable "eks_cluster_name" {
  type = string
  default = "security"
}

variable "acm_vault_arn" {
  type = string
}

variable "private_network_config" {
  type = map(object({
      cidr_block               = string
      associated_public_subnet = string
  }))

  default = {
    "private-security-1" = {
        cidr_block               = "10.0.0.0/23"
        associated_public_subnet = "public-security-1"
    },
    "private-security-2" = {
        cidr_block               = "10.0.2.0/23"
        associated_public_subnet = "public-security-2"
    },
    "private-security-3" = {
        cidr_block               = "10.0.4.0/23"
        associated_public_subnet = "public-security-3"
    }
  }
}

locals {
    private_nested_config = flatten([
        for name, config in var.private_network_config : [
            {
                name                     = name
                cidr_block               = config.cidr_block
                associated_public_subnet = config.associated_public_subnet
            }
        ]
    ])
}

variable "public_network_config" {
  type = map(object({
      cidr_block              = string
  }))

  default = {
    "public-security-1" = {
        cidr_block = "10.0.8.0/23"
    },
    "public-security-2" = {
        cidr_block = "10.0.10.0/23"
    },
    "public-security-3" = {
        cidr_block = "10.0.12.0/23"
    }
  }
}

locals {
    public_nested_config = flatten([
        for name, config in var.public_network_config : [
            {
                name                    = name
                cidr_block              = config.cidr_block
            }
        ]
    ])
}

variable "public_dns_name" {
  type    = string
}

variable "authorized_source_ranges" {
  type        = string
  description = "Addresses or CIDR blocks which are allowed to connect to the Vault IP address. The default behavior is to allow anyone (0.0.0.0/0) access. You should restrict access to external IPs that need to access the Vault cluster."
  default     = "0.0.0.0/0"
}

Enter fullscreen mode Exit fullscreen mode

plan/backend.tf

terraform {
  backend "s3" {
  }
}
Enter fullscreen mode Exit fullscreen mode

plan/versions.tf

terraform {
  required_version = ">= 0.12"
}
Enter fullscreen mode Exit fullscreen mode

plan/provider.tf

provider "aws" {
  region = var.region
}

provider "kubernetes" {
  host                   = aws_eks_cluster.security.endpoint

  cluster_ca_certificate = base64decode(
    aws_eks_cluster.security.certificate_authority[0].data
  )

  exec {
    api_version = "client.authentication.k8s.io/v1alpha1"
    args        = ["eks", "get-token", "--cluster-name", var.eks_cluster_name]
    command     = "aws"
  }
}

provider "helm" {
  kubernetes {
    host                   = aws_eks_cluster.security.endpoint
    cluster_ca_certificate = base64decode(
        aws_eks_cluster.security.certificate_authority[0].data
    )
    exec {
        api_version = "client.authentication.k8s.io/v1alpha1"
        args        = ["eks", "get-token", "--cluster-name", var.eks_cluster_name]
        command     = "aws"
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

plan/terraform.tfvars

az                       = ["<AWS_REGION>a", "<AWS_REGION>b", "<AWS_REGION>c"]
region                   = "<AWS_REGION>"
acm_vault_arn            = "<ACM_VAULT_ARN>"
vpc_cidr_block           = "10.0.0.0/16"
public_dns_name          = "<PUBLIC_DNS_NAME>"
authorized_source_ranges = "<LOCAL_IP_RANGES>"
Enter fullscreen mode Exit fullscreen mode

Initialize AWS security infrastructure. The states will be saved in AWS.

terraform init \
    -backend-config="bucket=$TERRAFORM_BUCKET_NAME" \
    -backend-config="key=security/terraform-state" \
    -backend-config="region=$AWS_REGION"
Enter fullscreen mode Exit fullscreen mode

Complete plan/terraform.tfvars and run

sed -i "s/<LOCAL_IP_RANGES>/$(curl -s http://checkip.amazonaws.com/)\/32/g; s/<PUBLIC_DNS_NAME>/${PUBLIC_DNS_NAME}/g; s/<AWS_ACCOUNT_ID>/${AWS_ACCOUNT_ID}/g; s/<AWS_REGION>/${AWS_REGION}/g; s/<EKS_CLUSTER_NAME>/${EKS_CLUSTER_NAME}/g; s,<ACM_VAULT_ARN>,${ACM_VAULT_ARN},g;" terraform.tfvars
terraform apply
Enter fullscreen mode Exit fullscreen mode

Access the EKS Cluster using

aws eks --region $AWS_REGION update-kubeconfig --name $EKS_CLUSTER_NAME
kubectl config set-context --current --namespace=vault-server
Enter fullscreen mode Exit fullscreen mode

Set Vault's address, and the initial root token.

cd plan

export VAULT_ADDR="https://vault.${PUBLIC_DNS_NAME}"
export VAULT_TOKEN="$(aws secretsmanager get-secret-value --secret-id $(terraform output vault_secret_name) --version-stage AWSCURRENT --query SecretString --output text | grep "Initial Root Token: " | awk -F ': ' '{print $2}')"
Enter fullscreen mode Exit fullscreen mode

Check all pods are running

$ kubectl get jobs

NAME                COMPLETIONS   DURATION   AGE
boot-vault          1/1           54s        28m
certificate-vault   1/1           55s        39m

$ kubectl get pods

NAME                                    READY   STATUS      RESTARTS   AGE
boot-vault-4j76p                        0/1     Completed   0          6m17s
certificate-vault-znwfb                 0/1     Completed   0          17m
vault-0                                 1/1     Running     0          6m42s
vault-1                                 1/1     Running     0          6m42s
vault-2                                 1/1     Running     0          6m41s
vault-agent-injector-7d65f7875f-k8zgv   1/1     Running     0          6m42s

$ kubectl get svc

NAME                       TYPE           CLUSTER-IP       EXTERNAL-IP                                                              PORT(S)             AGE
vault                      ClusterIP      172.20.116.147   <none>                                                                   8200/TCP,8201/TCP   7m39s
vault-active               ClusterIP      172.20.213.40    <none>                                                                   8200/TCP,8201/TCP   7m39s
vault-agent-injector-svc   ClusterIP      172.20.182.101   <none>                                                                   443/TCP             7m39s
vault-internal             ClusterIP      None             <none>                                                                   8200/TCP,8201/TCP   7m39s
vault-standby              ClusterIP      172.20.167.47    <none>                                                                   8200/TCP,8201/TCP   7m39s
vault-ui                   LoadBalancer   172.20.22.192    a7442caffb7f74b1ea2eb40bd5f432ef-694516578.eu-west-1.elb.amazonaws.com   443:32363/TCP       7m39s

$ kubectl get secrets

kubectl get secrets
NAME                               TYPE                                  DATA   AGE
boot-vault-token-nq8qm             kubernetes.io/service-account-token   3      45m
default-token-6qjw8                kubernetes.io/service-account-token   3      45m
sh.helm.release.v1.vault.v1        helm.sh/release.v1                    1      27m
vault-agent-injector-token-p6ktz   kubernetes.io/service-account-token   3      27m
vault-server-tls                   Opaque                                3      36m
vault-token-p9gqj                  kubernetes.io/service-account-token   3      27m

$ kubectl get sa

NAME                   SECRETS   AGE
boot-vault             1         47m
default                1         47m
vault                  1         29m
vault-agent-injector   1         29m

$ kubectl get role
NAME                   AGE
vault-discovery-role   30m

$ kubectl get rolebinding
NAME                          AGE
vault-discovery-rolebinding   30m

$ kubectl get certificatesigningrequests

NAME        AGE   REQUESTOR                                               CONDITION
csr-5vqrf   43m   system:node:ip-10-0-0-59.eu-west-1.compute.internal     Approved,Issued
csr-6klsj   43m   system:node:ip-10-0-5-29.eu-west-1.compute.internal     Approved,Issued
csr-chh42   43m   system:node:ip-10-0-10-214.eu-west-1.compute.internal   Approved,Issued
csr-pm5jd   43m   system:node:ip-10-0-2-39.eu-west-1.compute.internal     Approved,Issued
vault-csr   37m   system:serviceaccount:vault-server:boot-vault           Approved,Issued

Enter fullscreen mode Exit fullscreen mode

Let's create credentials:

ACCESS_KEY=ACCESS_KEY
SECRET_KEY=SECRET_KEY
PROJECT_NAME=web

$ vault secrets enable -path=company/projects/${PROJECT_NAME} -version=2 kv

Success! Enabled the kv secrets engine at: company/projects/web/

$ vault kv put company/projects/${PROJECT_NAME}/credentials/access key="$ACCESS_KEY"

Key              Value
---              -----
created_time     2021-04-15T12:43:48.024422363Z
deletion_time    n/a
destroyed        false
version          1

$ vault kv put company/projects/${PROJECT_NAME}/credentials/secret key="$SECRET_KEY"

Key              Value
---              -----
created_time     2021-04-15T12:44:01.270353488Z
deletion_time    n/a
destroyed        false
version          1
Enter fullscreen mode Exit fullscreen mode

Create the policy named my-policy with the contents from stdin

$ vault policy write my-policy - <<EOF
# Read-only permissions

path "company/projects/${PROJECT_NAME}/*" {
  capabilities = [ "read" ]
}

EOF

Success! Uploaded policy: my-policy
Enter fullscreen mode Exit fullscreen mode

Create a token and add the my-policy policy

VAULT_TOKEN=$(vault token create -policy=my-policy | grep "token" | awk 'NR==1{print $2}')
Enter fullscreen mode Exit fullscreen mode

Now we can retrieve our credentials

$ vault kv get -field=key company/projects/${PROJECT_NAME}/credentials/access

ACCESS_KEY

$ vault kv get -field=key company/projects/${PROJECT_NAME}/credentials/secret

SECRET_KEY
Enter fullscreen mode Exit fullscreen mode

That's it!

The source code is available on Gitlab.

Conclusion

We discovered in this article how to create a highly available Vault cluster and deploy it to Amazon EKS.

Hope you enjoyed reading this blog post.

If you have any questions or feedback, please feel free to leave a comment.

Thanks for reading!

Documentation

[1] https://www.vaultproject.io/docs/what-is-vault

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .