1. Lifecycle of a pod
Kubernetes Pods go through various states from creation to termination. Here are the primary states and the conditions under which they occur:
Pending: After a Pod is created, it remains in a Pending state until all required resources (e.g., Node, volume) are allocated.
Running: A Pod transitions to the Running state when the containers within it are successfully started on a Node.
Succeeded: If all containers in a Pod complete their work and terminate successfully, the Pod moves to the Succeeded state.
Failed: If one or more containers in a Pod encounter an error and cannot be recovered, the Pod will be in a Failed state.
Unknown: When the state of a Pod cannot be determined due to a loss of connection with the Node, the Pod is marked as Unknown.
2. State transitions
2.1 Yes (1)
Upon deployment, a pod will initially be in a pending state. Before transitioning to the Running state, init containers must complete their tasks prior to the startup of any primary containers.
2.2 No (1)
Here are the common error cases when a Pod is stuck in the Pending state:
+ Insufficient CPU and RAM for the Pod : The Pod doesn't have enough CPU or RAM resources to start.
+ Pod's Volume is not declared or unavailable : The storage volume that the Pod needs is either not defined or cannot be accessed.
Kubernetes cannot pull container images : The Kubernetes cluster is unable to download the necessary container images for the Pod.
+ Init container cannot start or complete : The initial container, which is responsible for setting up the Pod environment, fails to start or finish successfully.
If the init container fails and restartPolicy is set to Never, the Pod will transition directly to the Failed state: When the initial container fails and the restart policy is set to never retry, the Pod will immediately move to the failed state without any further attempts to start.
2.3 Yes (2)
Upon successful completion of the init container's task (fetching the secret), the pod is scheduled and transitions to the Running phase. Both container-01 and container-02 are started. The pod will remain in the Running phase until it encounters a failure, at which point it will be terminated and enter the Failed phase.
2.4 No (2)
In Kubernetes, there are several reasons why a Pod might transition from a "Running" state to a "Failed" state. Here are some common causes:
+ Application Errors : If the application running inside the Pod encounters a critical error and cannot handle requests, it may cause the Pod to fail. Examples include code errors, unhandled exceptions, or connection failures to external services.
+ Resource Exhaustion : If a Pod consumes too many resources (such as memory or CPU) compared to its configured limits, Kubernetes may kill the Pod and mark it as failed. This can happen if the application causes an OutOfMemoryError or exceeds configured resource limits.
+ Startup Failures : If a container fails to start after several attempts, Kubernetes will mark the Pod as failed. This could be due to incorrect configuration or missing parameters required to start the container.
+ Liveness Probe Failures : If the liveness probe fails repeatedly and cannot restore the container's health, the Pod may be marked as failed. This occurs when the probe consistently fails to determine the container as "alive" after a certain number of attempts.
+ Readiness Probe Failures : While readiness probes do not directly lead to a "Failed" state, if a Pod is not ready to serve requests and cannot return to a ready state, it can impact service functionality and potentially cause the Pod to be marked as failed if other services depend on it.
+ Network Configuration Issues : If a Pod has trouble connecting to the network or external services due to incorrect network configuration, it may prevent the application inside the Pod from functioning correctly, leading to a failed state.
+ Debugging Issues : When Kubernetes detects severe issues with a Pod (such as inability to connect to required services or hardware failures), the Pod may be terminated and marked as failed.
3. Pod management
3.1 Scaling
Scaling is a critical factor in maintaining the performance and scalability of an application. Kubernetes provides two primary tools to accomplish this:
Horizontal Pod Autoscaler (HPA) is a Kubernetes feature that automatically adjusts the number of replica Pods in a Deployment or ReplicaSet based on observed CPU utilization or memory usage. This ensures that your application always has the right amount of resources to handle the current workload. If the demand for your application increases, HPA will automatically scale up the number of Pods to handle the additional load. Conversely, if the demand decreases, HPA will scale down the number of Pods to reduce costs.
kubectl autoscale deployment <deployment-name> --cpu-percent=50 --min=1 --max=10
Vertical Pod Autoscaler (VPA): VPA automatically adjusts the resource (CPU and memory) requests and limits of a Pod. It can scale these resources up or down based on the Pod's observed utilization, ensuring that Pods are neither over-provisioned nor under-provisioned. This optimization improves resource efficiency and overall cluster performance under varying workloads.
apiVersion: "autoscaling.k8s.io/v1"
kind: VerticalPodAutoscaler
metadata:
name: <vpa-name>
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: <deployment-name>
updatePolicy:
updateMode: "Auto"
3.2 Rolling Updates
Kubernetes provides Rolling Updates to deploy new versions of an application with zero downtime.
When you change the container image or configuration of a Deployment, Kubernetes will automatically perform a Rolling Update. It replaces old Pods with new Pods gradually, ensuring that some Pods are always available to serve requests.
kubectl set image deployment/<deployment-name> <container-name>=<new-image>
Kubernetes will create new Pods with the new image and stop the old Pods step by step, ensuring that the service is always up.
3.3 Liveness và Readiness Probes
Liveness and Readiness Probes are mechanisms that guarantee the health and readiness of Pods and their containers to handle incoming traffic.
You guys can see my previous post at Readiness and Liveness checks in Kubernetes
3.4 Pod Eviction
Kubernetes has a built-in mechanism to automatically remove or "evict" Pods from a Node under certain conditions. This is typically done to prevent system instability when resources on a Node become scarce. For instance, if a Node is experiencing high resource utilization, Kubernetes may evict less critical Pods to free up resources for more important workloads, such as those serving critical business functions. Additionally, Kubernetes can also evict Pods based on predefined priority policies, ensuring that Pods with higher priority are not disrupted.
resources:
requests:
cpu: "100m"
memory: "128Mi"
limits:
cpu: "500m"
memory: "512Mi"
3.4 Affinity và Anti-Affinity
Affinity and anti-affinity allow you to control how Pods are scheduled onto Nodes, helping to optimize the performance and reliability of your application.
Affinity: Allows Pods to select suitable Nodes for deployment based on specific labels.
Affinity: Configiguration
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: disktype
operator: In
values:
- ssd
In this example, Pods will only be deployed on Nodes with SSD disks.
Anti-Affinity: Prevents Pods from being scheduled on the same Node or a group of Nodes to mitigate risks associated with Node failures
antiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeAffinity:
- key: "app"
operator: "In"
values:
- "web"
"Anti-Affinity ensures that Pods with specific labels are not deployed on the same Node.
4. Conclusion
Understanding Pod states and management in Kubernetes requires a deep knowledge of automation mechanisms and supporting tools. By leveraging features such as Scaling, Rolling Updates, Liveness and Readiness Probes, Pod Eviction, and Affinity & Anti-Affinity, you can optimize the performance and reliability of your application. Ensure that you configure these elements correctly to maintain the smooth and efficient operation of your Kubernetes system.
Thank you for reading!