In this article, we will integrate Cluster Autoscaler, which will enable our cluster to adjust the number of nodes based on demand. We will need to make some adjustments to our EKS module.

Prerequisites

Ensure you have:

An active AWS account
Terraform installed and configured
Helm installed
kubectl

Setup Overview

Prepare Your EKS Cluster

Ensure your EKS cluster is ready. If needed, refer back to our guide on setting up an EKS cluster using Terraform

Modifications

To enhance flexibility and customization, we've introduced new variables that streamline the addition of role and user mappings to the Kubernetes aws-auth ConfigMap. Additionally, to support Cluster Autoscaler's auto-discovery feature, it is essential to include specific tags in the launch configuration. We have also incorporated a variable to manage the Kubernetes version for the EKS cluster, allowing for more controlled upgrades and maintenance.

variable "additional_role_mappings" {
  description = "Additional role mappings for aws-auth ConfigMap"
  type = list(object({
    rolearn  = string
    username = string
    groups   = list(string)
  }))
  default = []
}

variable "additional_user_mappings" {
  description = "Additional user mappings for aws-auth ConfigMap"
  type = list(object({
    userarn  = string
    username = string
    groups   = list(string)
  }))
  default = []
}

variable "eks_cluster_version" {
  description = "The desired Kubernetes version for the EKS cluster"
  type        = string
  default     = "1.29"
}

variable "additional_launch_template_tags" {
  description = "Additional tags to apply to the launch template"
  type = map(string)
  default = {}
}

Create IAM Role for Cluster Autoscaler

The autoscaler needs specific permissions to interact with EC2 instances and AWS services.
We will use recommended permissions from the documentation

############################################################################################################
### AUTOSCALING
############################################################################################################
data "aws_iam_policy_document" "cluster_autoscaler_assume_role_policy" {
  statement {
    actions = ["sts:AssumeRoleWithWebIdentity"]
    effect  = "Allow"

    condition {
      test     = "StringEquals"
      variable = "${replace(module.eks.oidc_provider_url, "https://", "")}:sub"
      values   = ["system:serviceaccount:kube-system:cluster-autoscaler"]
    }

    principals {
      identifiers = ["${module.eks.oidc_provider_arn}"]
      type        = "Federated"
    }
  }
}

# IAM Role for Cluster Autoscaler
resource "aws_iam_role" "cluster_autoscaler_role" {
  name               = "${var.cluster_name}-cluster-autoscaler"
  assume_role_policy = data.aws_iam_policy_document.cluster_autoscaler_assume_role_policy.json
}

# Custom policy for Cluster Autoscaler
data "aws_iam_policy_document" "cluster_autoscaler_policy" {
  statement {
    effect = "Allow"
    actions = [
      "autoscaling:DescribeAutoScalingGroups",
      "autoscaling:DescribeAutoScalingInstances",
      "autoscaling:DescribeLaunchConfigurations",
      "autoscaling:DescribeScalingActivities",
      "ec2:DescribeImages",
      "ec2:DescribeInstanceTypes",
      "ec2:DescribeLaunchTemplateVersions",
      "ec2:GetInstanceTypesFromInstanceRequirements",
      "eks:DescribeNodegroup"
    ]
    resources = ["*"]
  }

  statement {
    effect = "Allow"
    actions = [
      "autoscaling:SetDesiredCapacity",
      "autoscaling:TerminateInstanceInAutoScalingGroup"
    ]
    resources = ["*"]
  }
}

resource "aws_iam_role_policy" "cluster_autoscaler_policy" {
  name   = "${var.cluster_name}-cluster-autoscaler-policy"
  role   = aws_iam_role.cluster_autoscaler_role.id
  policy = data.aws_iam_policy_document.cluster_autoscaler_policy.json
}

We can then pass the role mapping to the config map in the module, as well as the tags needed for auto-discovery

module "eks" {
  source = "./modules/aws/eks/v1"

  region          = var.region
  cluster_name    = var.cluster_name
  private_subnets = module.vpc.private_subnets
  public_subnets  = module.vpc.public_subnets
  vpc_id          = module.vpc.vpc_id

  managed_node_groups = {
    demo_group = {
      name           = "demo-node-group"
      desired_size   = 2
      min_size       = 1
      max_size       = 3
      instance_types = ["t3a.small"]
    }
  }

  additional_role_mappings = [
    {
      rolearn  = aws_iam_role.cluster_autoscaler_role.arn
      username = "system:serviceaccount:kube-system:cluster-autoscaler"
      groups   = ["system:masters"]
    }
  ]

  additional_launch_template_tags = {
      "k8s.io/cluster-autoscaler/${var.cluster_name}" = "owned"
      "k8s.io/cluster-autoscaler/enabled"             = "true"    
  }
}

We can then apply our changes with terraform

terraform apply

We can confirm our cluster is up and running in the AWS console.

Install Cluster Autoscaler using Helm

We have created the role mapping that the cluster-autoscaler service account will use, but we still need to create the actual service account, the manifest to create that will look like this

apiVersion: v1
kind: ServiceAccount
metadata:
  name: cluster-autoscaler
  namespace: kube-system
  annotations:
    eks.amazonaws.com/role-arn: "arn:aws:iam::<ACCOUNT_ID>:role/<IAM_ROLE_NAME>"

For simplicity (or just because I'm lazy), I have created four scripts to help with the rest of the article. They make use of terraform outputs to avoid me having to put placeholders or keep track of which command I have to run first.

An explanation of what is happening is that our kubectl context is updated to that of the cluster, the service account for cluster autoscaler is created, and cluster autoscaler is deployed using helm.

#!/bin/bash
set -euxo pipefail

# Retrieve Terraform outputs
REGION=$(terraform output -raw aws_region)
CLUSTER_NAME=$(terraform output -raw cluster_name)
CLUSTER_AUTOSCALER_ROLE_ARN=$(terraform output -raw cluster_autoscaler_role_arn)

# Update kubeconfig
aws eks update-kubeconfig --region $REGION --name $CLUSTER_NAME

# Create the service account for cluster autoscaler
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: ServiceAccount
metadata:
  name: cluster-autoscaler
  namespace: kube-system
  annotations:
    eks.amazonaws.com/role-arn: $CLUSTER_AUTOSCALER_ROLE_ARN
EOF

# Deploy the cluster autoscaler Helm chart
helm repo add autoscaler https://kubernetes.github.io/autoscaler || true
helm repo update

helm upgrade --install cluster-autoscaler autoscaler/cluster-autoscaler \
  --namespace kube-system \
  --set autoDiscovery.clusterName=$CLUSTER_NAME \
  --set rbac.serviceAccount.create=false \
  --set rbac.serviceAccount.name=cluster-autoscaler \
  --set awsRegion=$REGION \
  --set extraArgs.balance-similar-node-groups=true \
  --set extraArgs.skip-nodes-with-system-pods=false \
  --set extraArgs.expander=least-waste || true

I also deployed kubernetes dashboard just so we can have an application with a user interface we can interact with running on the cluster, and to confirm what we are seeing on the AWS dashboard regarding our cluster. This where the kube proxy add on comes in handy.

# Deploy the Kubernetes Dashboard
kubectl apply -f https://raw.githubusercontent.com/kubernetes/dashboard/v2.5.1/aio/deploy/recommended.yaml || true

# Create the admin-user service account and cluster role binding
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: ServiceAccount
metadata:
  name: admin-user
  namespace: kubernetes-dashboard
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: admin-user
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: cluster-admin
subjects:
- kind: ServiceAccount
  name: admin-user
  namespace: kubernetes-dashboard
EOF

# Get the token for the admin-user
kubectl -n kubernetes-dashboard create token admin-user || true

# Print the Kubernetes Dashboard URL
echo "Kubernetes Dashboard URL: http://localhost:8001/api/v1/namespaces/kubernetes-dashboard/services/https:kubernetes-dashboard:/proxy/"

# Start a kubectl proxy to access the Kubernetes Dashboard
kubectl proxy --port=8001

We can see the token and a link to view the dashboard via kube proxy

You can follow the link in the command and enter the token to access the dashboard and keep track of events. You will notice that we only have two nodes at the moment.

This matches what we see in the AWS console.

Deploy Applications To Trigger Scale Up Operation

Next, we will deploy Prometheus and Grafana to the cluster, but because of how little CPU and Memory the EC2 type we selected has, the deployments will fail to reach a healthy state. Cluster autoscale will come to the rescue and deploy an additional node to make our cluster healthy again.

I am also using a script to accomplish the deployments with helm

#!/bin/bash
set -euxo pipefail


# Deploy the metrics server Helm chart
helm repo add metrics-server https://kubernetes-sigs.github.io/metrics-server || true
helm repo update

helm upgrade --install metrics-server metrics-server/metrics-server \
  --namespace kube-system \
  --set args[0]="--kubelet-insecure-tls" \
  --set args[1]="--kubelet-preferred-address-types=InternalIP" || true

# Deploy Prometheus and Grafana using the kube-prometheus-stack Helm chart
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts || true
helm repo add grafana https://grafana.github.io/helm-charts || true
helm repo update

helm upgrade --install prometheus prometheus-community/kube-prometheus-stack \
  --namespace monitoring \
  --create-namespace \
  --set grafana.enabled=true \
  --set grafana.adminPassword='admin' \
  --set prometheus.prometheusSpec.serviceMonitorSelectorNilUsesHelmValues=false \
  --set defaultRules.create=true \
  --set alertmanager.enabled=false || true

# Ensure Prometheus is scraping metrics from the cluster autoscaler and metrics server
cat <<EOF | kubectl apply -f -
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: cluster-autoscaler
  namespace: kube-system
  labels:
    release: prometheus
spec:
  selector:
    matchLabels:
      app: cluster-autoscaler
  endpoints:
  - port: http
    interval: 30s
EOF

cat <<EOF | kubectl apply -f -
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: metrics-server
  namespace: kube-system
  labels:
    release: prometheus
spec:
  selector:
    matchLabels:
      k8s-app: metrics-server
  endpoints:
  - port: https
    interval: 30s
    tlsConfig:
      insecureSkipVerify: true
EOF

You will notice that cluster autoscaler did it's job and added an extra node to our cluster, the deployments were able to succeed after their initial failure

If you check the events of your cluster, you will be able to find the scale up events.

Scale Down Operation

Now we will delete the Prometheus and Grafana applications and watch to see if cluster autoscaler scales down the cluster.
Once again I am using a script for the deletes

#!/bin/bash
set -euxo pipefail

# Delete the metrics server Helm release
helm uninstall metrics-server --namespace kube-system || true

# Delete Prometheus and Grafana Helm release
helm uninstall prometheus --namespace monitoring || true

# Delete the ServiceMonitor for cluster autoscaler
kubectl delete servicemonitor cluster-autoscaler --namespace kube-system --ignore-not-found=true

# Delete the ServiceMonitor for metrics server
kubectl delete servicemonitor metrics-server --namespace kube-system --ignore-not-found=true

# Delete the monitoring namespace
kubectl delete namespace monitoring --ignore-not-found=true

Cluster Autoscaler might scale down our cluster back to two Nodes immediately, or it might modify our desired number of nodes in our node group to require three nodes because of how weak the EC2 instance types we selected are, which is what happened in this case.

After some minutes of no new activity, cluster autoscaler scales down the cluster back to two Nodes.

It also changes the desired size of our node group back to two nodes.

The complete code used in this article can be found on this branch of the repository

Conclusion

Your EKS cluster now has a functioning Cluster Autoscaler that is adept at scaling node resources in response to workload changes.
Don't forget to clean up any resources you are not using, it can get quite expensive.

For advanced configurations and best practices, refer to the Cluster Autoscaler documentation on AWS.

Navigating AWS EKS with Terraform: Implementing Cluster Auoscaler in your EKS Cluster