Source: Horizontal Pod Autoscaler (HPA) in Kubernetes

1. Concept

Horizontal Pod Autoscaler (HPA) is a Kubernetes feature that automatically adjusts the number of replica pods in a deployment 1 based on observed system metrics. It ensures that your application always has the necessary resources to handle the current workload, preventing performance degradation during peak usage periods and saving costs during low-demand periods.

2. Essential parts

Metrics Server: An HPA (Horizontal Pod Autoscaler) requires a Metrics Server to collect and provide data on the resource utilization of pods.

Deployment/ReplicaSet: An HPA operates on these objects to scale the number of pods up or down.

3. How to Use HPA

3.1 Installing Metrics Server

Before you can leverage the Horizontal Pod Autoscaler (HPA) feature, you need to ensure that the Metrics Server is installed and running within your Kubernetes cluster. The Metrics Server is responsible for collecting resource usage metrics from your cluster nodes and pods, providing essential data for HPA to make informed decisions about scaling your applications.

kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

3.2 Creating an HPA

Here's an example of how to configure an HPA for a deployment. Let's say you have a deployment named "my-app" and you want to automatically scale the number of pods based on CPU utilization.

Create deployment

apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app
spec:
  replicas: 1
  selector:
    matchLabels:
      app: my-app
  template:
    metadata:
      labels:
        app: my-app
    spec:
      containers:
      - name: my-app-container
        image: my-app-image
        resources:
          requests:
            cpu: "100m"
            memory: "256Mi"
          limits:
            cpu: "500m"
            memory: "512Mi"

Create HPA configuration:

apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
  name: my-app-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app
  minReplicas: 1
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 50

Explanation: