Hands-On: Escalonamento automático com EKS e Cluster Autoscaler utilizando Terraform e Helm

Rodrigo Fernandes - Jun 20 - - Dev Community

Introdução

O escalonamento automático de clusters é uma funcionalidade essencial em ambientes de computação em nuvem, especialmente quando se trata de gerenciar recursos de forma eficiente e econômica. 

Nesse contexto o Cluster Autoscaler (CA) é uma ferramenta vital para ajustar dinamicamente o número de instâncias de nó em um cluster Kubernetes, garantindo que as cargas de trabalho tenham recursos suficientes enquanto minimiza os custos. 

Este artigo técnico explora o processo de configuração e uso do Amazon EKS e do Cluster Autoscaler utilizando Terraform e Helm para implementar o escalonamento automático.


Informações gerais

As configurações abaixo são para ambientes de testes, workshops e demos. Não utilizar em ambientes de produção.

Caso já conheça o Cluster Autoscaler e quer fazer testes, clique nesse link e use o repositório completo. 

Se quer fazer o passo-a-passo para entender em detalhes, siga as instruções abaixo.


Setup do Cluster 

Para o setup do cluster, iremos utilizar um repositório em Terraform com o código de um cluster básico já pronto. 

Acesse o repositório clicando aqui, no readme existe o passo-a-passo para o setup completo do cluster.

Após a execução dos passos, aguarde até conclusão, o output será conforme imagem abaixo:
Image description

Pronto, o setup do cluster está concluido, vamos acessar o cluster e fazer alguns testes iniciais para analisar a integridade do cluster.


Acessando o Cluster

Para acessar o cluster vamos utilizar o AWS Cloud9 e para a configuração vamos seguir o artigo Boosting AWS Cloud9 to Simplify Amazon EKS Administration clicando aqui.

Após seguir os passos do artigo teremos o Cloud9 e o script de ferramentas para Kubernetes configurados.

Copie o comando abaixo, altere a região e o nome do cluster e execute o comando para acessar o cluster EKS .

$ aws eks --region <sua-região> update-kubeconfig --name <nome-do-cluster>

Vamos fazer alguns testes iniciais para verificar a integridade do cluster.

Coletando algumas informações.

kubectl cluster-info

Image description

Verificando os Worker Nodes.

kubectl get nodes -o wide

Image description

Analisando todos os recursos criados.

kubectl get all -A

Image description

Com isso podemos concluir que nosso cluster está funcionando corretamente.

Image description


Com o cluster configurado vamos ao Cluster Autoscaler.

O que é o Cluster Autoscaler

O Cluster Autoscaler é uma ferramenta de gerenciamento automático de recursos em clusters Kubernetes.

Ele ajusta automaticamente o tamanho de um cluster Kubernetes, aumentando ou diminuindo o número de Worker Nodes conforme a necessidade de execução das cargas de trabalho. 

O Cluster Autoscaler toma decisões com base na quantidade de pods em execução e nas suas respectivas necessidades de recursos.

Para saber mais sobre o Cluster Autoscaler acesse a documentação oficial clicando aqui.


Instalação do Cluster Autoscaler

Vamos dividir os arquivos de insralação e configuração do cluster em 3 partes:

  • cluster_autoscaler_iam.tf
  • cluster_autoscaler_chart.tf
  • cluster_autoscaler_values.yaml

Vamos começar configurando as permissões.

Primeiramente temos que pegar o id da conta AWS e o id do OIDC Provider criado pelo cluster EKS.

Para pegar o id do OIDC Provider execute o comando abaixo, alterando a variável cluster_name.

aws eks describe-cluster --name <cluster_name> --query "cluster.identity.oidc.issuer" --output text

Com o id da conta aws e o id do OIDC Provider, vamos criar o arquivo cluster_autoscaler_iam.tf e colar o trecho do código abaixo. 

Lembrado de alterar as variáveis id-da-conta-aws e oidc.

# Criação da política IAM para o Cluster Autoscaler
resource "aws_iam_policy" "cluster_autoscaler_policy" {
  name        = "ClusterAutoscalerPolicy"
  description = "Policy for Kubernetes Cluster Autoscaler"
  policy      = jsonencode({
    Version = "2012-10-17",
    Statement = [
      {
        Effect = "Allow",
        Action = [
          "autoscaling:DescribeAutoScalingGroups",
          "autoscaling:DescribeAutoScalingInstances",
          "autoscaling:DescribeLaunchConfigurations",
          "autoscaling:DescribeTags",
          "autoscaling:SetDesiredCapacity",
          "autoscaling:TerminateInstanceInAutoScalingGroup",
          "ec2:DescribeInstances",
          "ec2:DescribeLaunchTemplateVersions",
          "ec2:DescribeTags"
        ],
        Resource = "*"
      }
    ]
  })
}

# Criar a IAM Role
resource "aws_iam_role" "cluster_autoscaler" {
  name = "eks-cluster-autoscaler-role"

  assume_role_policy = jsonencode({
    Version = "2012-10-17",
    Statement = [
      {
        Effect = "Allow",
        Principal = {
          Federated = "arn:aws:iam::<id-da-conta-aws>:oidc-provider/oidc.eks.us-east-1.amazonaws.com/id/<iodc>"
         },
        Action = "sts:AssumeRoleWithWebIdentity",
        Condition = {
          StringEquals = {
            "oidc.eks.${var.region}.amazonaws.com/id/<iodc>:aud" = "sts.amazonaws.com"
            "oidc.eks.${var.region}.amazonaws.com/id/<iodc>:sub" = "system:serviceaccount:kube-system:cluster-autoscaler"
          }
        }
      },
    ],
  })
}

# Criar a service account
resource "kubernetes_service_account" "cluster_autoscaler" {
  metadata {
    name      = "cluster-autoscaler"
    namespace = "kube-system"
    annotations = {
      "eks.amazonaws.com/role-arn" = aws_iam_role.cluster_autoscaler.arn
    }
  }
}

# Atachar a policy na role
resource "aws_iam_role_policy_attachment" "cluster_autoscaler_policy_attachment" {
  policy_arn = aws_iam_policy.cluster_autoscaler_policy.arn  #"arn:aws:iam::${data.aws_caller_identity.current.account_id}:policy/ClusterAutoscalerPolicy"
  role       = aws_iam_role.cluster_autoscaler.name
}

# (Opcional) Se você estiver usando uma instância EC2 para executar o Cluster Autoscaler, crie um profile para a instância
resource "aws_iam_instance_profile" "cluster_autoscaler_instance_profile" {
  name = "ClusterAutoscalerInstanceProfile"
  role = aws_iam_role.cluster_autoscaler.name
}
Enter fullscreen mode Exit fullscreen mode

Criamos uma IAM Policy chamada ClusterAutoscalerPolicy com as permissões necessárias para o Cluster Autoscaler funcionar.

Criamos uma IAM Role com as permissões necessárias para o OIDC Provider.
Criamos uma Service Account e "atachamos" a role criada.

Opcional, se você estiver usando uma instância EC2 para executar o Cluster Autoscaler, crie um instance profile.

Agora vamos configurar o Helm Chart, para isso crie um arquivo chamado cluster_autoscaler_chart.tf e cole o trecho de código abaixo:

resource "helm_release" "cluster_autoscaler" {
  name       = "cluster-autoscaler"
  repository = "https://kubernetes.github.io/autoscaler"
  chart      = "cluster-autoscaler"
  namespace  = "kube-system"
  timeout    = 300
  version = "9.34.1"

  values = [
    "${file("cluster_autoscaler_values.yaml")}"
  ]

  set {
    name  = "autoDiscovery.clusterName"
    value = data.aws_eks_cluster.cluster.name
  }

  set {
    name  = "awsRegion"
    value = var.region
  }

  set {
    name  = "rbac.serviceAccount.create"
    value = "false"
  }

  set {
    name  = "rbac.serviceAccount.name"
    value = "cluster-autoscaler"
  }

}
Enter fullscreen mode Exit fullscreen mode

Para configurar o Cluster Autoscaler com opções avançadas do Helm chart, você pode ajustar vários parâmetros que controlam o comportamento do autoscaler. 

O arquivo_ values.yaml_ permite configurar opções como escalonamento mínimo e máximo de Worker Nodes, controle de tolerâncias, métricas, intervalos de checagem, e muito mais.

Agora crie o arquivo cluster_autoscaler_values.yaml e cole o trecho abaixo. 

Temos que ajustar alguns parâmetros:

  • clusterName - Inserir o nome do cluster EKS
  • awsRegion - Inserir a região da AWS
## Ref: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity
# affinity -- Affinity for pod assignment
affinity: {}

# additionalLabels -- Labels to add to each object of the chart.
additionalLabels: {}

autoDiscovery:
  # cloudProviders `aws`, `gce`, `azure`, `magnum`, `clusterapi` and `oci` are supported by auto-discovery at this time
  # AWS: Set tags as described in https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/aws/README.md#auto-discovery-setup

  # autoDiscovery.clusterName -- Enable autodiscovery for `cloudProvider=aws`, for groups matching `autoDiscovery.tags`.
  # autoDiscovery.clusterName -- Enable autodiscovery for `cloudProvider=azure`, using tags defined in https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/azure/README.md#auto-discovery-setup.
  # Enable autodiscovery for `cloudProvider=clusterapi`, for groups matching `autoDiscovery.labels`.
  # Enable autodiscovery for `cloudProvider=gce`, but no MIG tagging required.
  # Enable autodiscovery for `cloudProvider=magnum`, for groups matching `autoDiscovery.roles`.
  clusterName: cluster-workshop

  # autoDiscovery.namespace -- Enable autodiscovery via cluster namespace for for `cloudProvider=clusterapi`
  namespace:  # default

  # autoDiscovery.tags -- ASG tags to match, run through `tpl`.
  tags:
    - k8s.io/cluster-autoscaler/enabled
    - k8s.io/cluster-autoscaler/{{ .Values.autoDiscovery.clusterName }}
  # - kubernetes.io/cluster/{{ .Values.autoDiscovery.clusterName }}

  # autoDiscovery.roles -- Magnum node group roles to match.
  roles:
    - worker

  # autoDiscovery.labels -- Cluster-API labels to match  https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/clusterapi/README.md#configuring-node-group-auto-discovery
  labels: []
    # - color: green
    # - shape: circle
# autoscalingGroups -- For AWS, Azure AKS or Magnum. At least one element is required if not using `autoDiscovery`. For example:
# <pre>
# - name: asg1<br />
#   maxSize: 2<br />
#   minSize: 1
# </pre>
# For Hetzner Cloud, the `instanceType` and `region` keys are also required.
# <pre>
# - name: mypool<br />
#   maxSize: 2<br />
#   minSize: 1<br />
#   instanceType: CPX21<br />
#   region: FSN1
# </pre>
autoscalingGroups: []
# - name: asg1
#   maxSize: 2
#   minSize: 1
# - name: asg2
#   maxSize: 2
#   minSize: 1

# autoscalingGroupsnamePrefix -- For GCE. At least one element is required if not using `autoDiscovery`. For example:
# <pre>
# - name: ig01<br />
#   maxSize: 10<br />
#   minSize: 0
# </pre>
autoscalingGroupsnamePrefix: []
# - name: ig01
#   maxSize: 10
#   minSize: 0
# - name: ig02
#   maxSize: 10
#   minSize: 0

# awsAccessKeyID -- AWS access key ID ([if AWS user keys used](https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/aws/README.md#using-aws-credentials))
awsAccessKeyID: ""

# awsRegion -- AWS region (required if `cloudProvider=aws`)
awsRegion: us-east-1

# awsSecretAccessKey -- AWS access secret key ([if AWS user keys used](https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/aws/README.md#using-aws-credentials))
awsSecretAccessKey: ""

# azureClientID -- Service Principal ClientID with contributor permission to Cluster and Node ResourceGroup.
# Required if `cloudProvider=azure`
azureClientID: ""

# azureClientSecret -- Service Principal ClientSecret with contributor permission to Cluster and Node ResourceGroup.
# Required if `cloudProvider=azure`
azureClientSecret: ""

# azureResourceGroup -- Azure resource group that the cluster is located.
# Required if `cloudProvider=azure`
azureResourceGroup: ""

# azureSubscriptionID -- Azure subscription where the resources are located.
# Required if `cloudProvider=azure`
azureSubscriptionID: ""

# azureTenantID -- Azure tenant where the resources are located.
# Required if `cloudProvider=azure`
azureTenantID: ""

# azureUseManagedIdentityExtension -- Whether to use Azure's managed identity extension for credentials. If using MSI, ensure subscription ID, resource group, and azure AKS cluster name are set. You can only use one authentication method at a time, either azureUseWorkloadIdentityExtension or azureUseManagedIdentityExtension should be set.
azureUseManagedIdentityExtension: false

# azureUseWorkloadIdentityExtension -- Whether to use Azure's workload identity extension for credentials. See the project here: https://github.com/Azure/azure-workload-identity for more details. You can only use one authentication method at a time, either azureUseWorkloadIdentityExtension or azureUseManagedIdentityExtension should be set.
azureUseWorkloadIdentityExtension: false

# azureVMType -- Azure VM type.
azureVMType: "vmss"

# azureEnableForceDelete -- Whether to force delete VMs or VMSS instances when scaling down.
azureEnableForceDelete: false

# cloudConfigPath -- Configuration file for cloud provider.
cloudConfigPath: ""

# cloudProvider -- The cloud provider where the autoscaler runs.
# Currently only `gce`, `aws`, `azure`, `magnum` and `clusterapi` are supported.
# `aws` supported for AWS. `gce` for GCE. `azure` for Azure AKS.
# `magnum` for OpenStack Magnum, `clusterapi` for Cluster API.
cloudProvider: aws

# clusterAPICloudConfigPath -- Path to kubeconfig for connecting to Cluster API Management Cluster, only used if `clusterAPIMode=kubeconfig-kubeconfig or incluster-kubeconfig`
clusterAPICloudConfigPath: /etc/kubernetes/mgmt-kubeconfig

# clusterAPIConfigMapsNamespace -- Namespace on the workload cluster to store Leader election and status configmaps
clusterAPIConfigMapsNamespace: ""

# clusterAPIKubeconfigSecret -- Secret containing kubeconfig for connecting to Cluster API managed workloadcluster
# Required if `cloudProvider=clusterapi` and `clusterAPIMode=kubeconfig-kubeconfig,kubeconfig-incluster or incluster-kubeconfig`
clusterAPIKubeconfigSecret: ""

# clusterAPIMode --  Cluster API mode, see https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/clusterapi/README.md#connecting-cluster-autoscaler-to-cluster-api-management-and-workload-clusters
# Syntax: workloadClusterMode-ManagementClusterMode
# for `kubeconfig-kubeconfig`, `incluster-kubeconfig` and `single-kubeconfig` you always must mount the external kubeconfig using either `extraVolumeSecrets` or `extraMounts` and `extraVolumes`
# if you dont set `clusterAPIKubeconfigSecret`and thus use an in-cluster config or want to use a non capi generated kubeconfig you must do so for the workload kubeconfig as well
clusterAPIMode: incluster-incluster  # incluster-incluster, incluster-kubeconfig, kubeconfig-incluster, kubeconfig-kubeconfig, single-kubeconfig

# clusterAPIWorkloadKubeconfigPath -- Path to kubeconfig for connecting to Cluster API managed workloadcluster, only used if `clusterAPIMode=kubeconfig-kubeconfig or kubeconfig-incluster`
clusterAPIWorkloadKubeconfigPath: /etc/kubernetes/value

# containerSecurityContext -- [Security context for container](https://kubernetes.io/docs/tasks/configure-pod-container/security-context/)
containerSecurityContext: {}
  # capabilities:
  #   drop:
  #   - ALL

deployment:
  # deployment.annotations -- Annotations to add to the Deployment object.
  annotations: {}

# dnsPolicy -- Defaults to `ClusterFirst`. Valid values are:
# `ClusterFirstWithHostNet`, `ClusterFirst`, `Default` or `None`.
# If autoscaler does not depend on cluster DNS, recommended to set this to `Default`.
dnsPolicy: ClusterFirst

# envFromConfigMap -- ConfigMap name to use as envFrom.
envFromConfigMap: ""

# envFromSecret -- Secret name to use as envFrom.
envFromSecret: ""

## Priorities Expander
# expanderPriorities -- The expanderPriorities is used if `extraArgs.expander` contains `priority` and expanderPriorities is also set with the priorities.
# If `extraArgs.expander` contains `priority`, then expanderPriorities is used to define cluster-autoscaler-priority-expander priorities.
# See: https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/expander/priority/readme.md
expanderPriorities: {}

# extraArgs -- Additional container arguments.
# Refer to https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/FAQ.md#what-are-the-parameters-to-ca for the full list of cluster autoscaler
# parameters and their default values.
# Everything after the first _ will be ignored allowing the use of multi-string arguments.
extraArgs:
  logtostderr: true
  stderrthreshold: info
  v: 4
  # write-status-configmap: true
  # status-config-map-name: cluster-autoscaler-status
  # leader-elect: true
  # leader-elect-resource-lock: endpoints
  # skip-nodes-with-local-storage: true
  # expander: random
  # scale-down-enabled: true
  # balance-similar-node-groups: true
  # min-replica-count: 0
  # scale-down-utilization-threshold: 0.5
  # scale-down-non-empty-candidates-count: 30
  # max-node-provision-time: 15m0s
  # scan-interval: 10s
  # scale-down-delay-after-add: 10m
  # scale-down-delay-after-delete: 0s
  # scale-down-delay-after-failure: 3m
  # scale-down-unneeded-time: 10m
  # skip-nodes-with-system-pods: true
  # balancing-ignore-label_1: first-label-to-ignore
  # balancing-ignore-label_2: second-label-to-ignore

# extraEnv -- Additional container environment variables.
extraEnv: {}

# extraEnvConfigMaps -- Additional container environment variables from ConfigMaps.
extraEnvConfigMaps: {}

# extraEnvSecrets -- Additional container environment variables from Secrets.
extraEnvSecrets: {}

# extraVolumeMounts -- Additional volumes to mount.
extraVolumeMounts: []
  # - name: ssl-certs
  #   mountPath: /etc/ssl/certs/ca-certificates.crt
  #   readOnly: true

# extraVolumes -- Additional volumes.
extraVolumes: []
  # - name: ssl-certs
  #   hostPath:
  #     path: /etc/ssl/certs/ca-bundle.crt

# extraVolumeSecrets -- Additional volumes to mount from Secrets.
extraVolumeSecrets: {}
  # autoscaler-vol:
  #   mountPath: /data/autoscaler/
  # custom-vol:
  #   name: custom-secret
  #   mountPath: /data/custom/
  #   items:
  #     - key: subkey
  #       path: mypath

# fullnameOverride -- String to fully override `cluster-autoscaler.fullname` template.
fullnameOverride: ""

# hostNetwork -- Whether to expose network interfaces of the host machine to pods.
hostNetwork: false

image:
  # image.repository -- Image repository
  repository: registry.k8s.io/autoscaling/cluster-autoscaler
  # image.tag -- Image tag
  tag: v1.30.0
  # image.pullPolicy -- Image pull policy
  pullPolicy: IfNotPresent
  ## Optionally specify an array of imagePullSecrets.
  ## Secrets must be manually created in the namespace.
  ## ref: https://kubernetes.io/docs/tasks/configure-pod-container/pull-image-private-registry/
  ##
  # image.pullSecrets -- Image pull secrets
  pullSecrets: []
  # - myRegistrKeySecretName

# kubeTargetVersionOverride -- Allow overriding the `.Capabilities.KubeVersion.GitVersion` check. Useful for `helm template` commands.
kubeTargetVersionOverride: ""

# kwokConfigMapName -- configmap for configuring kwok provider
kwokConfigMapName: "kwok-provider-config"

# magnumCABundlePath -- Path to the host's CA bundle, from `ca-file` in the cloud-config file.
magnumCABundlePath: "/etc/kubernetes/ca-bundle.crt"

# magnumClusterName -- Cluster name or ID in Magnum.
# Required if `cloudProvider=magnum` and not setting `autoDiscovery.clusterName`.
magnumClusterName: ""

# nameOverride -- String to partially override `cluster-autoscaler.fullname` template (will maintain the release name)
nameOverride: ""

# nodeSelector -- Node labels for pod assignment. Ref: https://kubernetes.io/docs/user-guide/node-selection/.
nodeSelector: {}

# podAnnotations -- Annotations to add to each pod.
podAnnotations:
  cluster-autoscaler.kubernetes.io/safe-to-evict: "false"

# podDisruptionBudget -- Pod disruption budget.
podDisruptionBudget:
  maxUnavailable: 1
  # minAvailable: 2

# podLabels -- Labels to add to each pod.
podLabels: {}

# priorityClassName -- priorityClassName
priorityClassName: "system-cluster-critical"

# priorityConfigMapAnnotations -- Annotations to add to `cluster-autoscaler-priority-expander` ConfigMap.
priorityConfigMapAnnotations: {}
  # key1: "value1"
  # key2: "value2"

## Custom PrometheusRule to be defined
## The value is evaluated as a template, so, for example, the value can depend on .Release or .Chart
## ref: https://github.com/coreos/prometheus-operator#customresourcedefinitions
prometheusRule:
  # prometheusRule.enabled -- If true, creates a Prometheus Operator PrometheusRule.
  enabled: false
  # prometheusRule.additionalLabels -- Additional labels to be set in metadata.
  additionalLabels: {}
  # prometheusRule.namespace -- Namespace which Prometheus is running in.
  namespace: monitoring
  # prometheusRule.interval -- How often rules in the group are evaluated (falls back to `global.evaluation_interval` if not set).
  interval: null
  # prometheusRule.rules -- Rules spec template (see https://github.com/prometheus-operator/prometheus-operator/blob/master/Documentation/api.md#rule).
  rules: []

rbac:
  # rbac.create -- If `true`, create and use RBAC resources.
  create: true
  # rbac.pspEnabled -- If `true`, creates and uses RBAC resources required in the cluster with [Pod Security Policies](https://kubernetes.io/docs/concepts/policy/pod-security-policy/) enabled.
  # Must be used with `rbac.create` set to `true`.
  pspEnabled: false
  # rbac.clusterScoped -- if set to false will only provision RBAC to alter resources in the current namespace. Most useful for Cluster-API
  clusterScoped: true
  serviceAccount:
    # rbac.serviceAccount.annotations -- Additional Service Account annotations.
    annotations: {}
    # rbac.serviceAccount.create -- If `true` and `rbac.create` is also true, a Service Account will be created.
    create: true
    # rbac.serviceAccount.name -- The name of the ServiceAccount to use. If not set and create is `true`, a name is generated using the fullname template.
    name: ""
    # rbac.serviceAccount.automountServiceAccountToken -- Automount API credentials for a Service Account.
    automountServiceAccountToken: true

# replicaCount -- Desired number of pods
replicaCount: 1

# resources -- Pod resource requests and limits.
resources: {}
  # limits:
  #   cpu: 100m
  #   memory: 300Mi
  # requests:
  #   cpu: 100m
  #   memory: 300Mi

# revisionHistoryLimit -- The number of revisions to keep.
revisionHistoryLimit: 10

# securityContext -- [Security context for pod](https://kubernetes.io/docs/tasks/configure-pod-container/security-context/)
securityContext: {}
  # runAsNonRoot: true
  # runAsUser: 1001
  # runAsGroup: 1001

service:
  # service.create -- If `true`, a Service will be created.
  create: true
  # service.annotations -- Annotations to add to service
  annotations: {}
  # service.labels -- Labels to add to service
  labels: {}
  # service.externalIPs -- List of IP addresses at which the service is available. Ref: https://kubernetes.io/docs/user-guide/services/#external-ips.
  externalIPs: []

  # service.loadBalancerIP -- IP address to assign to load balancer (if supported).
  loadBalancerIP: ""
  # service.loadBalancerSourceRanges -- List of IP CIDRs allowed access to load balancer (if supported).
  loadBalancerSourceRanges: []
  # service.servicePort -- Service port to expose.
  servicePort: 8085
  # service.portName -- Name for service port.
  portName: http
  # service.type -- Type of service to create.
  type: ClusterIP

## Are you using Prometheus Operator?
serviceMonitor:
  # serviceMonitor.enabled -- If true, creates a Prometheus Operator ServiceMonitor.
  enabled: false
  # serviceMonitor.interval -- Interval that Prometheus scrapes Cluster Autoscaler metrics.
  interval: 10s
  # serviceMonitor.namespace -- Namespace which Prometheus is running in.
  namespace: monitoring
  ## [Prometheus Selector Label](https://github.com/helm/charts/tree/master/stable/prometheus-operator#prometheus-operator-1)
  ## [Kube Prometheus Selector Label](https://github.com/helm/charts/tree/master/stable/prometheus-operator#exporters)
  # serviceMonitor.selector -- Default to kube-prometheus install (CoreOS recommended), but should be set according to Prometheus install.
  selector:
    release: prometheus-operator
  # serviceMonitor.path -- The path to scrape for metrics; autoscaler exposes `/metrics` (this is standard)
  path: /metrics
  # serviceMonitor.annotations -- Annotations to add to service monitor
  annotations: {}
  ## [RelabelConfig](https://github.com/prometheus-operator/prometheus-operator/blob/main/Documentation/api.md#monitoring.coreos.com/v1.RelabelConfig)
  # serviceMonitor.metricRelabelings -- MetricRelabelConfigs to apply to samples before ingestion.
  metricRelabelings: {}

# tolerations -- List of node taints to tolerate (requires Kubernetes >= 1.6).
tolerations: []

# topologySpreadConstraints -- You can use topology spread constraints to control how Pods are spread across your cluster among failure-domains such as regions, zones, nodes, and other user-defined topology domains. (requires Kubernetes >= 1.19).
topologySpreadConstraints: []
  # - maxSkew: 1
  #   topologyKey: topology.kubernetes.io/zone
  #   whenUnsatisfiable: DoNotSchedule
  #   labelSelector:
  #     matchLabels:
  #       app.kubernetes.io/instance: cluster-autoscaler

# updateStrategy -- [Deployment update strategy](https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#strategy)
updateStrategy: {}
  # rollingUpdate:
  #   maxSurge: 1
  #   maxUnavailable: 0
  # type: RollingUpdate

# vpa -- Configure a VerticalPodAutoscaler for the cluster-autoscaler Deployment.
vpa:
  # vpa.enabled -- If true, creates a VerticalPodAutoscaler.
  enabled: false
  # vpa.updateMode -- [UpdateMode](https://github.com/kubernetes/autoscaler/blob/vertical-pod-autoscaler/v0.13.0/vertical-pod-autoscaler/pkg/apis/autoscaling.k8s.io/v1/types.go#L124)
  updateMode: "Auto"
  # vpa.containerPolicy -- [ContainerResourcePolicy](https://github.com/kubernetes/autoscaler/blob/vertical-pod-autoscaler/v0.13.0/vertical-pod-autoscaler/pkg/apis/autoscaling.k8s.io/v1/types.go#L159). The containerName is always et to the deployment's container name. This value is required if VPA is enabled.
  containerPolicy: {}

# secretKeyRefNameOverride -- Overrides the name of the Secret to use when loading the secretKeyRef for AWS and Azure env variables
secretKeyRefNameOverride: ""
Enter fullscreen mode Exit fullscreen mode

Algumas configurações que podem ser personalizadas no arquivo de values:

  • autoDiscovery: Configura o nome do cluster para descoberta automática de grupos de Auto Scaling.
  • extraArgs: Define argumentos adicionais para o Cluster Autoscaler, como políticas de escalonamento e thresholds.
  • rbac: Configura a conta de serviço e as permissões RBAC.
  • image: Define a versão da imagem do Cluster Autoscaler.
  • resources: Especifica os recursos solicitados e limites para o pod do Cluster Autoscaler.
  • nodeSelector, tolerations, affinity: Configurações para especificar onde os pods do Cluster Autoscaler podem ser agendados.
  • replicaCount: Define o número de réplicas do Cluster Autoscaler.
  • podAnnotations: Adiciona anotações ao pod do Cluster Autoscaler.

Após criar todos os arquivos acima, vamos aplica-lo´s com o Terraform executando o comando abaixo:

terraform apply --auto-approve

Acompanhe os logs do Cluster Autoscaler para avaliar se o deploy ocorreu com sucesso.

kubectl -n kube-system logs -f deployment/cluster-autoscaler-aws-cluster-autoscaler

Caso esteja tudo certo o Cluster Autoscaler está operacional e pronto para testes de escalonamento.


Teste o escalonamento automático

Vamos iniciar os teste o escalonamento automático para isso vamos obter algumas informações, criar alguns recursos e acompanhar os resultados.

Observe a quantidade de Worker Nodes atuais com o comando abaixo:

kubectl get nodes

Image description

Observe que nesse momento temos somente 1 Worker Node disponível.

Vamos criar um deployment para os testes de stress. 

Crie um arquivo com o nome de cpu-stress-deployment.yaml e cole o código abaixo:

 

apiVersion: apps/v1
kind: Deployment
metadata:
  name: cpu-stress
spec:
  replicas: 5
  selector:
    matchLabels:
      app: cpu-stress
  template:
    metadata:
      labels:
        app: cpu-stress
    spec:
      containers:
      - name: cpu-stress
        image: vish/stress
        resources:
          requests:
            cpu: "1"
        args:
        - -cpus
        - "1"
Enter fullscreen mode Exit fullscreen mode

Aplique o deployment com o comando:

kubectl apply -f cpu-stress-deployment.yaml

Observe o comportamento do Cluster Autoscaler, que deve aumentar o número de Worker Nodes para acomodar o workload adicional.

Acopanhe os logs do Cluster Autoscaler.

kubectl -n kube-system logs -f deployment/cluster-autoscaler-aws-cluster-autoscaler

Acompanhe os Worker Nodes escalando e nota-se que ele escalou vários Worker Nodes para acomodar o novo workload. 

kubectl get nodes

Vamos simular a redução do workload excedente, voltando o ambiente normal.

Vamos zerar a quantidade de pods no deployment e acompanhe os Worker Nodes sendo desprovisionados da infraestrutura e voltando ao seu estado original. 


Conclusão

Usar Terraform e Helm para configurar um cluster EKS e o Cluster Autoscaler proporciona uma solução robusta e automatizada para gerenciar a escalabilidade dos clusters Kubernetes. 
Este artigo detalhado fornece os passos necessários para implementar e gerenciar o escalonamento automático, garantindo que os recursos sejam utilizados de forma eficiente e econômica.
Com estas ferramentas, você pode otimizar os custos e melhorar o desempenho das suas aplicações em um ambiente Kubernetes gerenciado pela AWS.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .