Debugging Kubernetes applications can feel like navigating a labyrinth. With its distributed nature and myriad components, identifying and resolving issues in Kubernetes requires robust tools and techniques.

This article will explore various techniques and tools for troubleshooting and debugging Kubernetes. Whether you’re an experienced Kubernetes user or just getting started, this guide will provide valuable insights into efficient debugging practices.

Analyzing Pod Lifecycle Events

Understanding a pod's lifecycle is crucial for debugging and maintaining applications running in Kubernetes. Each pod goes through several phases, from creation to termination, and analyzing these events can help you identify and resolve issues.

Pod Lifecycle Phases

A pod in Kubernetes goes through the following phases:

Pods Lifecycle Events

Using kubectl get and kubectl describe

To analyze the lifecycle events of a pod, you can use the kubectl get and kubectl describe commands.

The kubectl get command provides a high-level overview of the status of pods:

kubectl get pods


web-server-pod    1/1     Running   0          5m
db-server-pod     1/1     Pending   0          2m
cache-server-pod  1/1     Completed 1          10m
This output shows each pod's current status, which can help you identify pods that require further investigation.

The kubectl describe command provides detailed information about a pod, including its lifecycle events:

kubectl describe pod <pod-name>

Output snippet:

Name:           web-server-pod
Namespace:      default
Node:           node-1/
Start Time:     Mon, 01 Jan 2025 10:00:00 GMT
Labels:         app=web-server
Status:         Running
    Container ID:   docker://abcdef123456
    Image:          nginx:latest
    State:          Running
      Started:      Mon, 01 Jan 2025 10:01:00 GMT
    Ready:          True
    Restart Count:  0
  Type    Reason     Age   From               Message
  ----    ------     ----  ----               -------
  Normal  Scheduled  10m   default-scheduler  Successfully assigned default/web-server-pod to node-1
  Normal  Pulled     9m    kubelet, node-1    Container image "nginx:latest" already present on machine
  Normal  Created    9m    kubelet, node-1    Created container web-container
  Normal  Started    9m    kubelet, node-1    Started container web-container 
Analyzing Pod Events

The Events section in the kubectl describe output provides a chronological log of significant events for the pod. These events can help you understand the lifecycle transitions and identify issues such as:

Scheduling Delays: Delays in scheduling the pod can indicate resource constraints or issues with the scheduler.
Image Pull Errors: Failures in pulling container images can indicate network issues or problems with the container registry.
Container Crashes: Repeated container crashes can be diagnosed by examining the events leading up to the crash.

Kubernetes Events and Audit Logs

Kubernetes generates cluster-wide events resources kind: Event which we can use to overview what’s happening on the cluster quickly.

Audit logs kind: Policy On the other hand, they help ensure compliance and security on the cluster. They can show login attempts, pod privileges escalation and more.

Kubernetes Events

Kubernetes events provide a timeline of significant occurrences within your cluster, such as pod scheduling, container restarts, and errors. They help understand state transitions and identify the root causes of issues.

Viewing Events

To view events in your cluster, use the kubectl get events command:

kubectl get events

Output example:

LAST SEEN   TYPE      REASON             OBJECT                                   MESSAGE
12s         Normal    Scheduled          pod/web-server-pod                       Successfully assigned default/web-server-pod to node-1
10s         Normal    Pulling            pod/web-server-pod                       Pulling image "nginx:latest"
8s          Normal    Created            pod/web-server-pod                       Created container web-container
7s          Normal    Started            pod/web-server-pod                       Started container web-container
5s          Warning   BackOff            pod/db-server-pod                        Back-off restarting failed container 
Filtering Events

You can filter events to focus on specific namespaces, resource types, or periods. For example, to view events related to a particular pod:

kubectl get events --field-selector 
Describing Resources

The kubectl describe command includes events in its output, providing detailed information about a specific resource along with its event history:

kubectl describe pod web-server-pod

Output snippet:

  Type    Reason     Age   From               Message
  ----    ------     ----  ----               -------
  Normal  Scheduled  10m   default-scheduler  Successfully assigned default/web-server-pod to node-1
  Normal  Pulled     9m    kubelet, node-1    Container image "nginx:latest" already present on machine
  Normal  Created    9m    kubelet, node-1    Created container web-container
  Normal  Started    9m    kubelet, node-1    Started container web-container 
Kubernetes Audit Logs

Audit logs provide a detailed record of all API requests made to the Kubernetes API server, including the user, the action performed, and the outcome. They are essential for security auditing and compliance.

Enabling Audit Logging

Configure the API server with the appropriate flags and audit policy to enable audit logging. Here’s an example of an audit policy configuration:


kind: Policy
- level: Metadata
  - group: ""
    resources: ["pods"]
- level: RequestResponse
  users: ["admin"]
  verbs: ["update", "patch"]
  - group: ""
    resources: ["configmaps"] 
Configuring the API Server

Specify the audit policy file and log file location when starting the API server:

kube-apiserver --audit-policy-file=/etc/kubernetes/audit-policy.yaml --audit-log-path=/var/log/kubernetes/audit.log
Viewing Audit Logs

Audit logs are typically written to a file. You can use standard log analysis tools to view and filter the logs. Here’s an example of an audit log entry:

    "kind": "Event",
    "apiVersion": "",
    "level": "Metadata",
    "auditID": "12345",
    "stage": "ResponseComplete",
    "requestURI": "/api/v1/namespaces/default/pods",
    "verb": "create",
    "user": {
        "username": "admin",
        "groups": ["system:masters"]
    "sourceIPs": [""],
    "objectRef": {
        "resource": "pods",
        "namespace": "default",
        "name": "web-server-pod"
    "responseStatus": {
        "metadata": {},
        "code": 201
    "requestReceivedTimestamp": "2025-01-01T12:00:00Z",
    "stageTimestamp": "2025-01-01T12:00:01Z"
Kubernetes Dashboard

The Kubernetes Dashboard is a web-based UI that provides an easy way to manage and troubleshoot your Kubernetes cluster. It allows you to visualize cluster resources, deploy applications, and perform various administrative tasks.

Installing the Kubernetes Dashboard

Please take a look at the Kubernetes documentation for details on installing and accessing the dashboard.

Kubernetes Dashboard

Using the Dashboard

The Dashboard provides various features to help manage and troubleshoot your Kubernetes cluster:

Cluster Overview: View the overall status of your cluster, including nodes, namespaces, and resource usage.
Workloads: Monitor and manage workloads, such as Deployments, ReplicaSets, StatefulSets, and DaemonSets.
Services and Ingress: Manage services and ingress resources to control network traffic.
Config and Storage: Manage ConfigMaps, Secrets, PersistentVolumeClaims, and other storage resources.
Logs and Events: View logs and events for troubleshooting and auditing purposes.

Monitoring Resource Usage

Monitoring resource usage helps you understand how your applications consume resources and identify opportunities for optimization.

Tools for Monitoring

kubectl top: Provides real-time resource usage metrics.
Prometheus: Collects and stores metrics for detailed analysis.
Grafana: Visualizes metrics and provides dashboards for monitoring.

Using kubectl top

The kubectl top command shows the current CPU and memory usage of pods and nodes.

kubectl top pods
kubectl top nodes
Example output:

NAME        CPU(cores)   MEMORY(bytes)
my-app-pod  100m         120Mi 
Using kubectl logs

kubectl logs is one of the most essential tools for debugging Kubernetes applications. This command retrieves logs from a specific container in a pod, allowing you to diagnose and troubleshoot issues effectively.

Basic Usage

The simplest way to retrieve logs from a pod is by using the kubectl logs command followed by the pod name and namespace. Here’s a basic example for a pod running in a default namespace:

kubectl logs <pod-name>

This command fetches the logs from the first container in the specified pod. If your pod has multiple containers, you need to specify the container name as well:

kubectl logs <pod-name> -c <container-name>

Real-time Logs with f Flag

To stream logs in real-time, similar to tail -f in Linux, use the -f flag:

kubectl logs -f <pod-name>

This is particularly useful for monitoring logs as your application runs and observing the output of live processes.

Some projects enhance the log tailing with additional capabilities, such as stern.

Retrieving Previous Logs

If a pod has restarted, you can view the logs from the previous instance using the --previous flag:

kubectl logs <pod-name> --previous

Examining the logs before the failure helps us understand what caused the pod to restart.

Filtering Logs with Labels

You can also filter logs from pods that match specific labels using kubectl along with jq for advanced filtering:

kubectl get pods -l <label-selector> -o json | jq -r '.items[] |' | xargs -I {} kubectl logs {} 
Replace <label-selector> with your specific labels, such as app=myapp.
Combining with Other Tools

You can combine kubectl logs with other Linux commands to enhance your debugging process. For example, to search for a specific error message in the logs, you can use grep:

kubectl logs web-server-pod | grep "Error"

For a continuous search in real-time logs:

kubectl logs -f web-server-pod | grep --line-buffered "Error"

Practical Tips

Log Rotation and Retention: Please ensure your application handles log rotation to prevent the logs from consuming excessive disk space.

Structured Logging: Structured logging (e.g., JSON format) can make it easier to parse and analyze logs using tools like jq.

Centralized Logging: Consider setting up a centralized logging system (e.g., Elasticsearch, Fluentd, and Kibana — EFK stack) to aggregate and search logs from all your Kubernetes pods.

Using kubectl exec for Interactive Troubleshooting

kubectl exec allows us to execute commands directly inside a running container. This is particularly useful for interactive troubleshooting, enabling the inspection of the container’s environment, running diagnostic commands, and performing real-time fixes.

Basic Usage

The basic syntax kubectl exec is as follows:

kubectl exec <pod-name> -- <command>

Use the flag to execute a command in a specific container within a pod. This will execute a command and immediately exit the container.

kubectl exec <pod-name> -c <container-name> -- <command>

Running an Interactive Shell

One of the most common uses of kubectl exec is to open an interactive shell session within a container. This allows you to run multiple commands interactively. Here’s how to do it:

kubectl exec -it <pod-name> -- /bin/bash

For containers using sh instead of bash:

kubectl exec -it <pod-name> -- /bin/sh

Example: Inspecting Environment Variables

To check the environment variables inside a container, you can use the env command:

kubectl exec <pod-name> -- env

If you need to check environment variables in a specific container:

kubectl exec <pod-name> -c <container-name> -- env

Example: Checking Configuration Files

Suppose you need to inspect a configuration file inside the container. You can use cat or any text editor available inside the container:

kubectl exec <pod-name> -- cat /path/to/config/file

For a specific container:

kubectl exec <pod-name> -c <container-name> -- cat /path/to/config/file 
Copying Files to and from Containers

If you don’t have a binary you need inside a container, it’s easy to files to and from containers using kubectl cp. For example, to copy a file from your local machine to a container:

kubectl cp /local/path/to/file <pod-name>:/container/path/to/file

To copy a file from a container to your local machine:

kubectl cp <pod-name>:/container/path/to/file /local/path/to/file 
Practical Tips

Use the—i and—t Flags: The—i flag makes the session interactive, and the—t flag allocates a pseudo-TTY. Together, they enable a fully interactive session.

Run as a Specific User: Use the --user flag to execute commands as a specific user inside the container, if required.

kubectl exec --user=<username> -it <pod-name> -- /bin/bash
Security Considerations: Be cautious when running kubectl exec with elevated privileges. Ensure you have appropriate RBAC (Role-Based Access Control) policies in place to prevent unauthorized access.

Node-Level Debugging with kubectl debug

Most debugging techniques focus on the application level; however, the kubectl debug node command can also be used to debug a specific Kubernetes node.

Node-level debugging is crucial for diagnosing issues affecting the Kubernetes nodes, such as resource exhaustion, misconfigurations, or hardware failures.

This way, the debugging Pod can access the node's root filesystem, mounted at /host in the Pod.

Create a Debugging Session:

Use the kubectl debug command to start a debugging session on a node. This command creates a pod running a debug container on the specified node.

kubectl debug node/<node-name> -it --image=busybox 
Replace <node-name> with the name of the node you want to debug. The -it flag opens an interactive terminal, and --image=busybox specifies the image for the debug container.

For more details, refer to the official Kubernetes documentation on node-level debugging.

Application-Level Debuging with Debug Containers

For more complex issues, consider using a debug container with pre-installed tools. There are a lot of good docker images with tooling and scripts for debugging, one that stands out to me is It can quickly be created using:

kubectl run tmp-shell --rm -i --tty --image nicolaka/netshoot 
Example: Using the debug container as a sidecar

 apiVersion: apps/v1
   kind: Deployment
       name: nginx-netshoot
           app: nginx-netshoot
   replicas: 1
           app: nginx-netshoot
           app: nginx-netshoot
           - name: nginx
           image: nginx:1.14.2
               - containerPort: 80
           - name: netshoot
           image: nicolaka/netshoot
           command: ["/bin/bash"]
           args: ["-c", "while true; do ping localhost; sleep 60;done"] 
Apply the configuration:

kubectl apply -f debug-pod.yaml

Practical Tips

Set Restart Policies: Ensure that your pod specifications have appropriate restart policies to handle different failure scenarios.

Automated Monitoring: Set up automated monitoring and alerting for critical issues such as CrashLoopBackOff using Prometheus and Alertmanager.

Ephemeral Containers for Debugging

Ephemeral containers are temporary and explicitly created for debugging purposes. They are helpful in running diagnostic tools and commands without affecting the running application. This chapter will explore how to develop and use ephemeral pods for interactive troubleshooting in Kubernetes.

Why Use Ephemeral Pods?

Isolation: Debugging in an isolated environment prevents accidental changes to running applications.
Tool Availability: Allows the use of specialized tools that may not be present in the application container.
Temporary Nature: These pods can be easily created and destroyed as needed without leaving a residual impact on the cluster.

Creating Ephemeral Pods

There are several ways to create ephemeral pods in Kubernetes. One standard method is to use the kubectl run command.

Example: Creating an Ephemeral Pod

Using kubectl run:

kubectl debug mypod -it --image=nicolaka/netshoot
This command creates a debug pod using the Netshoot image and opens an interactive shell.

Practical Tips for Using Ephemeral Pods

Tool Availability: Ensure the debug container image includes all necessary tools for troubleshooting, such as curl, netcat, nslookup, df, top, and others.

Security Considerations: When creating ephemeral pods, consider security. Ensure they have limited access and are used by authorized personnel only.

Example: Advanced Debugging with Custom Debug Container

Let’s walk through an example of using a custom debug container for advanced debugging tasks.

Create an Ephemeral Pod with Custom Debug Container:

kubectl debug -it redis5 --image=nicolaka/netshoot

Defaulting debug container name to debugger-v4hfv.
If you don't see a command prompt, try pressing enter.

88d888b. .d8888b. d8888P .d8888b. 88d888b. .d8888b. .d8888b. d8888P
88'  `88 88ooood8   88   Y8ooooo. 88'  `88 88'  `88 88'  `88   88
88    88 88.  ...   88         88 88    88 88.  .88 88.  .88   88
dP    dP `88888P'   dP   `88888P' dP    dP `88888P' `88888P'   dP

Welcome to Netshoot! (
Version: 0.13
Run Diagnostic Commands:

Inside the debug container we can run various commands.

Check DNS resolution

nslookup kubernetes.default.svc.cluster.local


Name:   kubernetes.default.svc.cluster.local
Test network connectivity
curl http://my-service:8080/healthBy using ephemeral pods, you can effectively debug and troubleshoot Kubernetes applications in an isolated and controlled environment, minimizing the risk of impacting production workloads.

Handling DNS and Network Issues

We will go through 2 common troubleshooting scenarios: DNS issues and stateful pods debugging. Let’s see what we have learned in action.

Common Network Issues

DNS Resolution Failures: Issues resolving service names to IP addresses.
Service Unreachable: Services are not accessible within the cluster.
Pod Communication Issues: Pods cannot communicate with each other.
Network Policy Misconfigurations: Incorrect network policies blocking traffic.

Tools and Commands for Troubleshooting

kubectl exec: Run commands in a container to diagnose network issues.
nslookup: Check DNS resolution.
ping: Test connectivity between pods and services.
curl: Verify HTTP connectivity and responses.
traceroute: Trace the path packets take to reach a destination.

Example: Diagnosing a DNS Resolution Issue

Let’s walk through an example of diagnosing a DNS resolution issue for a pod named my-app-pod trying to reach a service my-db-service.

Check DNS Resolution:

kubectl exec -it my-app-pod -- nslookup my-db-service
Alternatively we can use debug pod or ephemeral containers.
Output indicating a problem:

** server can't find my-db-service: NXDOMAIN 
Check CoreDNS Logs:

Inspect the logs of CoreDNS pods to identify any DNS resolution issues.

kubectl logs -l k8s-app=kube-dns -n kube-system
Look for errors or warnings indicating DNS resolution failures.

Verify Service and Endpoints:

Ensure that the service and endpoints exist and are correctly configured.

kubectl get svc my-db-service
kubectl get endpoints my-db-service 
my-db-serviceClusterIP   <none>        5432/TCP   1h 
NAME         ENDPOINTS            AGE
my-db-service10.244.0.5:5432      1h 
Restart CoreDNS Pods:

Restart CoreDNS pods to resolve potential transient issues.

kubectl rollout restart deployment coredns -n kube-system

Verify DNS Resolution Again:

After resolving the issue, verify DNS resolution again:

kubectl exec -it my-app-pod -- nslookup my-db-service
Expected output:

Name:   my-db-service.default.svc.cluster.local
Practical Tips

Use Network Debug Containers: Use network debug containers like nicolaka/netshoot for comprehensive network troubleshooting.

kubectl run netshoot --rm -it --image nicolaka/netshoot -- /bin/bash 
Monitor Network Metrics: Use Prometheus and Grafana to monitor network metrics and set up network-issue alerts.

Implement Redundancy: Configure redundant DNS servers and failover mechanisms to enhance network reliability.

Debugging Stateful Applications

Stateful applications in Kubernetes require special debugging considerations due to their reliance on persistent storage and consistent state across restarts. This section will explore techniques for handling and debugging issues specific to stateful applications.

What are Stateful Applications?

Stateful applications maintain state information across sessions and restarts, often using persistent storage. Examples include databases, message queues, and other applications that require data persistence.

Common Issues in Stateful Applications

Persistent Storage Issues: Problems with PVCs or PVs can lead to data loss or unavailability.
Pod Start-up Failures: Errors during pod initialization due to state dependencies.
Network Partitioning: Network issues affecting communication between stateful pods.
Data Consistency Problems: Inconsistent data across replicas or restarts.

Example: Debugging a MySQL StatefulSet

Let’s walk through an example of debugging a MySQL StatefulSet named my-mysql.

Inspect the StatefulSet:

kubectl describe statefulset my-mysql

Output snippet:

Name:           my-mysql
Namespace:      default
Selector:       app=my-mysql
Replicas:       3 desired | 3 total
  Type    Reason            Age   From                    Message
  ----    ------            ----  ----                    -------
  Normal  SuccessfulCreate  1m    statefulset-controller  create Pod my-mysql-0 in StatefulSet my-mysql successful
  Normal  SuccessfulCreate  1m    statefulset-controller  create Pod my-mysql-1 in StatefulSet my-mysql successful
  Normal  SuccessfulCreate  1m    statefulset-controller  create Pod my-mysql-2 in StatefulSet my-mysql successful 
Check Persistent Volume Claims:

kubectl get pvc
kubectl describe pvc data-my-mysql-0 
Output snippet:

Name:          data-my-mysql-0
Namespace:     default
Status:        Bound
Volume:        pvc-1234abcd-56ef-78gh-90ij-klmnopqrstuv
Check Pod Logs:

kubectl logs my-mysql-0

Output snippet:

2025-01-01T00:00:00.000000Z 0 [Note] mysqld (mysqld 8.0.23) starting as process 1 ...
2025-01-01T00:00:00.000000Z 1 [ERROR] InnoDB: Unable to lock ./ibdata1 error: 11 
Execute Commands in Pods:

kubectl exec -it my-mysql-0 -- /bin/sh

Inside the pod:

Check mounted volumes:

df -h

Verify MySQL data directory:

ls -l /var/lib/mysql

Check MySQL status:

mysqladmin -u root -p status

Check Network Connectivity:

kubectl exec -it my-mysql-0 -- ping
Output snippet:

PING ( 56 data bytes
64 bytes from icmp_seq=0 ttl=64 time=0.047 ms 
Advanced Debugging Techniques

Advanced debugging techniques in Kubernetes involve using specialized tools and strategies to diagnose and resolve complex issues. This chapter will cover tracing instrumentation and remote debugging.

Profiling with Jaeger

Jaeger is an open-source, end-to-end distributed tracing tool that helps monitor and troubleshoot transactions in complex distributed systems. Profiling with Jaeger can provide insights into the performance of your microservices and help identify latency issues.

You can install Jaeger in your Kubernetes cluster using the Jaeger Operator or Helm.

helm repo add jaegertracing
helm repo update
helm install jaeger jaegertracing/jaeger 
Enter fullscreen mode Exit fullscreen mode

Instrument Your Application:

Ensure your application is instrumented to send tracing data to Jaeger. This typically involves adding Jaeger client libraries to your application code and configuring them to report to the Jaeger backend.

Example in a Go application:

import (
func initJaeger(service string) (opentracing.Tracer, io.Closer) {
    cfg := config.Configuration{
        ServiceName: service,
        Sampler: &config.SamplerConfig{
            Type:  "const",
            Param: 1,
        Reporter: &config.ReporterConfig{
            LogSpans:           true,
            LocalAgentHostPort: "jaeger-agent.default.svc.cluster.local:6831",
    tracer, closer, _ := cfg.NewTracer()
    return tracer, closer
Access the Jaeger UI to view and analyze traces.

kubectl port-forward svc/jaeger-query 16686:16686
Open http://localhost:16686 in your browser.

Remote Debugging with mirrord

Mirrord is an open-source tool that enables remote debugging of Kubernetes services by running local processes in the context of your Kubernetes cluster and remote infrastructure.

Setting Up mirrord

curl -fsSL | bash

Connect to Your Cluster:

Start a mirrord session to connect your local environment to your Kubernetes cluster.

mirrord connect

Swap Deployment:

Use mirrord to swap a deployment in your cluster with your local service.

mirrord exec --target-namespace devops-team --target deployment/foo-app-deployment nodemon server.js 
This command redirects traffic, environment variables, and file operations from your Kubernetes cluster to your local machine, allowing you to debug the service as if running locally.

Once the mirrord session is set up, you can debug the service on your local machine using your favourite debugging tools and IDES.

Set Breakpoints: Use your IDE to set breakpoints and step through the code.
Inspect Variables: Inspect variables and application state to identify issues.
Make Changes: Make code changes and immediately see the effects without redeploying to the cluster.

Additional Tools

In addition to the core Kubernetes commands and open-source tools, several other tools can enhance your troubleshooting capabilities across various categories. Here are a few noteworthy tools:

Closing Thoughts

Debugging Kubernetes applications can be complex and challenging, but it becomes much more manageable with the right tools and techniques.

Remember, effective debugging is not just about resolving issues as they arise but also about proactive monitoring, efficient resource management, and a deep understanding of your application’s architecture and dependencies.

By implementing the strategies and best practices outlined in this guide, you can build a robust debugging framework that empowers you to quickly identify, diagnose, and resolve issues, ensuring the smooth operation of your Kubernetes deployments.

