Overview
Welcome back to the Back2Basics
series! In this part, we'll explore how Karpenter
, a just-in-time node provisioner, automatically manages nodes based on your workload needs. We'll also walk you through deploying a voting application to showcase this functionality in action.
If you haven't read the first part, you can check it out here:
Back2Basics: Setting Up an Amazon EKS Cluster
Romar Cablao for AWS Community Builders ・ Jun 12
Infrastructure Setup
In the previous post, we covered the fundamentals of cluster provisioning using OpenTofu
and simple workload deployment. Now, we will enable additional addons including Karpenter
for automatic node provisioning based on workload needs.
First we need to uncomment these lines in 03_eks.tf
to create taints on the nodes managed by the initial node group.
# Uncomment this if you will use Karpenter
# taints = {
# init = {
# key = "node"
# value = "initial"
# effect = "NO_SCHEDULE"
# }
# }
Taints ensure that only pods configured to tolerate these taints can be scheduled on those nodes. This allows us to reserve the initial nodes for specific purposes while Karpenter
provisions additional nodes for other workloads.
We also need to uncomment the codes in 04_karpenter
and 05_addons
to activate Karpenter
and provision other addons.
Once updated, we have to run tofu init
, tofu plan
and tofu apply
. When prompted to confirm, type yes
to proceed with provisioning the additional resources.
Karpenter
Karpenter is an open-source project that automates node provisioning in Kubernetes clusters. By integrating with EKS, Karpenter dynamically scales the cluster by adding new nodes when workloads require additional resources and removing idle nodes to optimize costs. The Karpenter configuration defines different node classes and pools for specific workload types, ensuring efficient resource allocation. Read more: https://karpenter.sh/docs/
The template 04_karpenter
defines several node classes and pools categorized by workload type. These include:
-
critical-workloads
: for running essential cluster addons -
monitoring
: dedicated to Grafana and other monitoring tools -
vote-app
: for the voting application we'll be deploying
Workload Setup
The voting application consists of several components: vote
, result
, worker
, redis
, and postgresql
. While we'll deploy everything on Kubernetes for simplicity, you can leverage managed services like Amazon ElastiCache for Redis
and Amazon RDS
for a production environment.
Component | Description |
---|---|
Vote | Handles receiving and processing votes. |
Result | Provides real-time visualizations of the current voting results. |
Worker | Synchronizes votes between Redis and PostgreSQL. |
Redis | Stores votes temporarily, easing the load on PostgreSQL. |
PostgreSQL | Stores all votes permanently for secure and reliable data access. |
Here's the Voting App UI for both voting and results.
Deployment Using Kubernetes Manifest
If you explore the workloads/manifest
directory, you'll find separate YAML files for each workload. Let's take a closer look at the components used for stateful applications like postgres
and redis
:
apiVersion: v1
kind: Secret
...
---
apiVersion: v1
kind: PersistentVolumeClaim
...
---
apiVersion: apps/v1
kind: StatefulSet
...
---
apiVersion: v1
kind: Service
...
As you may see, Secret
, PersistentVolumeClaim
, StatefulSet
and Service
were used for postgres
and redis
. Let's take a quick review of the following API objects used:
-
Secret
- used to store and manage sensitive information such as passwords, tokens, and keys. -
PersistentVolumeClaim
- a request for storage, used to provision persistent storage dynamically. -
StatefulSet
- manages stateful applications with guarantees about the ordering and uniqueness ofpods
. -
Service
- used for exposing an application that is running as one or morepods
in the cluster.
Now, lets view vote-app.yaml
, results-app.yaml
and worker.yaml
:
apiVersion: v1
kind: ConfigMap
...
---
apiVersion: apps/v1
kind: Deployment
...
---
apiVersion: v1
kind: Service
...
Similar to postgres
and redis
, we have used a service for stateless workloads. Then we introduce the use of Configmap
and Deployment
.
-
Configmap
- stores non-confidential configuration data in key-value pairs, decoupling configurations from code. -
Deployment
- used to provide declarative updates forpods
andreplicasets
, typically used for stateless workloads.
And lastly the ingress.yaml
. To make our service accessible from outside the cluster, we'll use an Ingress
. This API object manages external access to the services in a cluster, typically in HTTP/S.
apiVersion: networking.k8s.io/v1
kind: Ingress
...
Now that we've examined the manifest files, let's deploy them to the cluster. You can use the following command to apply all YAML files within the workloads/manifest/
directory:
kubectl apply -f workloads/manifest/
For more granular control, you can apply each YAML file individually. To clean up the deployment later, simply run kubectl delete -f workloads/manifest/
While manifest files are a common approach, there are alternative tools for deployment management:
-
Kustomize
: This tool allows customizing raw YAML files for various purposes without modifying the original files. -
Helm
: A popular package manager for Kubernetes applications. Helm charts provide a structured way to define, install, and upgrade even complex applications within the cluster.
Deployment Using Kustomize
Let's check Kustomize
. If you haven't installed it's binary, you can refer to Kustomize Installation Docs. This example utilizes an overlay file to make specific changes to the default configuration. To apply the built kustomization
, you can run the command:
kustomize build .\workloads\kustomize\overlays\dev\ | kubectl apply -f -
Here's what we've modified:
- Added an annotation:
note: "Back2Basics: A Series"
. - Set the replicas for both the
vote
andresult
deployments to3
.
To check you can refer to the commands below:
D:\> kubectl get pod -o custom-columns=NAME:.metadata.name,ANNOTATIONS:.metadata.annotations
NAME ANNOTATIONS
postgres-0 map[note:Back2Basics: A Series]
redis-0 map[note:Back2Basics: A Series]
result-app-6c9dd6d458-8hxkf map[note:Back2Basics: A Series]
result-app-6c9dd6d458-l4hp9 map[note:Back2Basics: A Series]
result-app-6c9dd6d458-r5srd map[note:Back2Basics: A Series]
vote-app-cfd5fc88-lsbzx map[note:Back2Basics: A Series]
vote-app-cfd5fc88-mdblb map[note:Back2Basics: A Series]
vote-app-cfd5fc88-wz5ch map[note:Back2Basics: A Series]
worker-bf57ddcb8-kkk79 map[note:Back2Basics: A Series]
D:\> kubectl get deploy
NAME READY UP-TO-DATE AVAILABLE AGE
result-app 3/3 3 3 5m
vote-app 3/3 3 3 5m
worker 1/1 1 1 5m
To remove all the resources we created, run the following command:
kustomize build .\workloads\kustomize\overlays\dev\ | kubectl delete -f -
Deployment Using Helm Chart
Next to check is Helm
. If you haven't installed helm binary, you can refer to Helm Installation Docs. Once installed, lets add a repository and update.
helm repo add thecloudspark https://thecloudspark.github.io/helm-charts
helm repo update
Next, create a values.yaml
and add some overrides to the default configuration. You can also use existing config in workloads/helm/values.yaml
. This is how it looks like:
ingress:
enabled: true
className: alb
annotations:
alb.ingress.kubernetes.io/scheme: internet-facing
alb.ingress.kubernetes.io/target-type: instance
# Vote Handler Config
vote:
tolerations:
- key: app
operator: Equal
value: vote-app
effect: NoSchedule
nodeSelector:
app: vote-app
service:
type: NodePort
# Results Handler Config
result:
tolerations:
- key: app
operator: Equal
value: vote-app
effect: NoSchedule
nodeSelector:
app: vote-app
service:
type: NodePort
# Worker Handler Config
worker:
tolerations:
- key: app
operator: Equal
value: vote-app
effect: NoSchedule
nodeSelector:
app: vote-app
As you may see, we added nodeSelector
and tolerations
to make sure that the pods
will be scheduled on the dedicated nodes where we wanted them to run. This Helm chart offers various configuration options and you can explore them in more detail on ArtifactHub: Vote App.
Now install the chart and apply overrides from values.yaml
# Install
helm install app -f workloads/helm/values.yaml thecloudspark/vote-app
# Upgrade
helm upgrade app -f workloads/helm/values.yaml thecloudspark/vote-app
Wait for the pods to be up and running, then access the UI using the provisioned application load balancer.
To uninstall just run the command below.
helm uninstall app
Going back to Karpenter
Under the hood, Karpenter
provisioned nodes used by the voting app we've deployed. The sample logs you see here provide insights into it's activities:
{"level":"INFO","time":"2024-06-16T10:15:38.739Z","logger":"controller.provisioner","message":"found provisionable pod(s)","commit":"fb4d75f","pods":"default/result-app-6c9dd6d458-l4hp9, default/worker-bf57ddcb8-kkk79, default/vote-app-cfd5fc88-lsbzx","duration":"153.662007ms"}
{"level":"INFO","time":"2024-06-16T10:15:38.739Z","logger":"controller.provisioner","message":"computed new nodeclaim(s) to fit pod(s)","commit":"fb4d75f","nodeclaims":1,"pods":3}
{"level":"INFO","time":"2024-06-16T10:15:38.753Z","logger":"controller.provisioner","message":"created nodeclaim","commit":"fb4d75f","nodepool":"vote-app","nodeclaim":"vote-app-r9z7s","requests":{"cpu":"510m","memory":"420Mi","pods":"8"},"instance-types":"m5.2xlarge, m5.4xlarge, m5.large, m5.xlarge, m5a.2xlarge and 55 other(s)"}
{"level":"INFO","time":"2024-06-16T10:15:41.894Z","logger":"controller.nodeclaim.lifecycle","message":"launched nodeclaim","commit":"fb4d75f","nodeclaim":"vote-app-r9z7s","provider-id":"aws:///ap-southeast-1b/i-028457815289a8470","instance-type":"t3.small","zone":"ap-southeast-1b","capacity-type":"spot","allocatable":{"cpu":"1700m","ephemeral-storage":"14Gi","memory":"1594Mi","pods":"11"}}
{"level":"INFO","time":"2024-06-16T10:16:08.946Z","logger":"controller.nodeclaim.lifecycle","message":"registered nodeclaim","commit":"fb4d75f","nodeclaim":"vote-app-r9z7s","provider-id":"aws:///ap-southeast-1b/i-028457815289a8470","node":"ip-10-0-206-99.ap-southeast-1.compute.internal"}
{"level":"INFO","time":"2024-06-16T10:16:23.631Z","logger":"controller.nodeclaim.lifecycle","message":"initialized nodeclaim","commit":"fb4d75f","nodeclaim":"vote-app-r9z7s","provider-id":"aws:///ap-southeast-1b/i-028457815289a8470","node":"ip-10-0-206-99.ap-southeast-1.compute.internal","allocatable":{"cpu":"1700m","ephemeral-storage":"15021042452","hugepages-1Gi":"0","hugepages-2Mi":"0","memory":"1663292Ki","pods":"11"}}
As shown in the logs, when Karpenter
found pod/s that needs to be scheduled, a new node claim was created, launched and initialized. So whenever there is a need for additional resources, this component is responsible in fulfilling it.
Additionally, Karpenter
automatically labels nodes it provisions with karpenter.sh/initialized=true
. Let's use kubectl
to see these nodes:
kubectl get nodes -l karpenter.sh/initialized=true
This command will list all nodes that have this specific label. As you can see in the output below, three nodes have been provisioned by Karpenter
:
NAME STATUS ROLES AGE VERSION
ip-10-0-208-50.ap-southeast-1.compute.internal Ready <none> 10m v1.30.0-eks-036c24b
ip-10-0-220-238.ap-southeast-1.compute.internal Ready <none> 10m v1.30.0-eks-036c24b
ip-10-0-206-99.ap-southeast-1.compute.internal Ready <none> 1m v1.30.0-eks-036c24b
Lastly, let's check related logs for node termination. This process involves removing nodes from the cluster. Decommissioning typically involves tainting the node first to prevent further pod
scheduling, followed by node deletion.
{"level":"INFO","time":"2024-06-16T10:35:39.165Z","logger":"controller.disruption","message":"disrupting via consolidation delete, terminating 1 nodes (0 pods) ip-10-0-206-99.ap-southeast-1.compute.internal/t3.small/spot","commit":"fb4d75f","command-id":"5e5489a6-a99d-4b8d-912c-df314a4b5cfa"}
{"level":"INFO","time":"2024-06-16T10:35:39.483Z","logger":"controller.disruption.queue","message":"command succeeded","commit":"fb4d75f","command-id":"5e5489a6-a99d-4b8d-912c-df314a4b5cfa"}
{"level":"INFO","time":"2024-06-16T10:35:39.511Z","logger":"controller.node.termination","message":"tainted node","commit":"fb4d75f","node":"ip-10-0-206-99.ap-southeast-1.compute.internal"}
{"level":"INFO","time":"2024-06-16T10:35:39.530Z","logger":"controller.node.termination","message":"deleted node","commit":"fb4d75f","node":"ip-10-0-206-99.ap-southeast-1.compute.internal"}
{"level":"INFO","time":"2024-06-16T10:35:39.989Z","logger":"controller.nodeclaim.termination","message":"deleted nodeclaim","commit":"fb4d75f","nodeclaim":"vote-app-r9z7s","node":"ip-10-0-206-99.ap-southeast-1.compute.internal","provider-id":"aws:///ap-southeast-1b/i-028457815289a8470"}
What's Next?
We've successfully deployed our voting application! And thanks to Karpenter
, new nodes are added automatically when needed and terminates when not - making our setup more robust and cost effective. In the final part of this series, we'll delve into monitoring the voting application we've deployed with Grafana
and Prometheus
, providing us the visibility into resource utilization and application health.