EKS Auto Mode Unlocked for Existing Clusters with Terraform

Jatin Mehrotra - Dec 13 - - Dev Community

In the previous blog, I explained that EKS Auto mode is now supported by terraform-eks-module and illustrated how we can create new cluster with EKS Auto Mode.

In this blog, we’ll learn how to enable EKS Auto Mode on existing clusters and migrate workloads from EKS Managed Node Groups to EKS Auto nodes with ZERO DOWNTIME and continued application availability using my terraform code.

I have also added a BONUS section which explains how we can control our pod's deployments on EKS Auto Mode nodes or other compute types.

Motivation

tf Aws 5.81

Githhub Issue for bug fix

  • Terraform-aws-eks release a new version v20.31.1 which allows to use custom NodeClass/NodePools when EKS Auto is enabled without built-in NodePools.

terraform eks module 20.31.1

I want this blog to be really short, crisp and efficient so lets jump into actual steps!

Deploy Terraform cluster without EKS Auto Mode

  • We want to create the use case where we have an existing cluster WITHOUT EKS Auto Mode using EKS MNG.

  • Use this repository code to deploy EKS cluster with Managed node group.

Note: I am attaching policies to the node IAM role for EKS MNG - this is too permissive, better to use EKS Pod Identity (or IRSA, but EKS Pod Identity is preferred). Feel free to send a PR to the repo :)

Deploy workload or pods

  • We will automate this as well using terraform's kubectl_manifest resource, we will deploy workload yaml code using terraform

Note: During cluster creation, test workload(pods) were not deployed because kubectl context was not set locally. So run the following command to set the kubectl context and run terraform apply again once cluster is created.

aws eks --region us-east-1 update-kubeconfig --name eks-existing-cluster-tf-test --profile <your-profile-name> ; terraform apply

Enter fullscreen mode Exit fullscreen mode

Current state of EKS cluster before EKS Auto Mode

  • Let's verify the current state of eks cluster when EKS Auto mode is not enabled.

  • EKS Auto mode is disabled.

Diabled Auto Mode

  • EKS Auto Managed Node group created by me is running.

eks MNG

  • Pods are running on EKS managed node group

pods

pods_nodes_status

Enable EKS Auto Mode on Existing Cluster

  • Uncomment the following code to the eks.tf and terraform apply to enable EKS Auto Mode
bootstrap_self_managed_addons = true

cluster_compute_config = {
   enabled = true
}
Enter fullscreen mode Exit fullscreen mode
  • bootstrap_self_managed_addons = true is very important otherwise you will face error where terraform tries to recreate the cluster again. I literally cried over this

Current state of EKS cluster after EKS Auto Mode

cluster mode enabled on existing cluster

Empty built-in NodePools

  • As expected built-in NodePools are empty

Migrate workload(pods) from EKS MNG to EKS Auto Node

  • There are couple of ways to smoothly migrate existing workloads from MNG to EKS Auto with minimal disruption while maintaining application’s availability throughout the migration.

Note: Copy the EKS MNG node group name.

Using eksctl tool

  • The following command will cordon all nodes and all pods are evicted from a nodegroup and EKS will provision pods to node managed by EKS Auto.
eksctl drain nodegroup --cluster=<clusterName> --name=<copiedNodegroupName>  --region us-east-1 --profile=<profile>

Enter fullscreen mode Exit fullscreen mode
  • eksctl command evicts pod one at a time which I have tested so application availability is maintained.

  • But if you still want to be 100% sure, you can use the best practice of using pod Disruption budget. We will automate this using terraform so run terraform apply

resource "kubectl_manifest" "test_pdb" {
  yaml_body = <<YAML
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: test-pdb
  labels:
    environment: test
spec:
  minAvailable: 1
  selector:
    matchLabels:
      environment: test
YAML
}

Enter fullscreen mode Exit fullscreen mode

Node during cordon

Pod Migrated to eks Auto Node

pod events

  • After migrating; If we want to allow scheduling pods to EKS MNG we need to uncordon the EKS MNG or you can delete the Node group
eksctl drain nodegroup --cluster=<clusterName> --name=<copiedNodegroupName>  --region us-east-1 --profile=<profile> --undo
Enter fullscreen mode Exit fullscreen mode

uncordon nodes

Using kubectl

  • we can use the following command to drain the nodes using kubectl
kubectl drain --ignore-daemonsets <node name>
Enter fullscreen mode Exit fullscreen mode
  • Once it returns (without giving an error), you can delete the node or you want to tell Kubernetes that it can resume scheduling new pods onto the node
kubectl uncordon <node name>
Enter fullscreen mode Exit fullscreen mode

[ BONUS ] How to schedule Pods always on EKS Auto Nodes?

  • There are 2 options to achieve this :
  1. Either delete The NodeGroup and let EKS Auto handle the scheduling on EKS Auto Nodes

  2. Using labels and NodeAffinity

Control if a workload is deployed on EKS Auto Mode nodes

  • There is concept called mix-mode cluster where you’re running both EKS Auto Mode and other compute types, such as self-managed Karpenter provisioners or EKS Managed Node Groups.

  • In mix mode clusters by default deployment is deployed to EKS MNG nodes and not EKS Auto Nodes

  • In such case we can use labels and nodeAffinity.

Using NodeSelector label

  • Use the label eks.amazonaws.com/compute-type: auto when you want a workload is deployed to EKS Auto Node.
  • This nodeSelector value is only relevant if you are running a cluster in a mixed mode, node types not managed by EKS Auto Mode
apiVersion: apps/v1
kind: Deployment
spec:
      nodeSelector:
        eks.amazonaws.com/compute-type: auto
Enter fullscreen mode Exit fullscreen mode
  • I have an added the above configuration in sample_app_on_eks_auto_nodes.tf file. We are automating using Terraform so uncomment and run `terraform apply.

workload on eks auto nodes

nodeSelector labels

Using nodeAffinity

  • You can add this nodeAffinity to Deployments or other workloads to require Kubernetes to not schedule them onto EKS Auto Mode nodes

Node Affinity config

workload not on auto node

node affinity

From DevOps, IaC Perspective

  • We saw how we can enable EKS Auto mode for Existing clusters with built-in NodePools using terraform-eks-module

  • We also saw how we can migrate our existing workload from EKS Managed Group to EKS Auto Nodes without any down time as EKS Auto node respect PodDisruptionBudget.

  • We also saw how we can use nodeSelector Labels and nodeAffinity to control deployment of workload in case of mixed-mode EKS clusters.

Currently EKS Auto deploys EC2 of instance type c6a.large which can be also customized using nodeClass and NodePool which we will see in the next blog. Follow me on Linkedin or on dev.to so that you get timely updates of what I share.

Feel free to reach out to me on Linkedin, X if you face any error migrating your Existing workloads to EKS Auto Mode Nodes using terraform.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .