Pod Failure States — Interactive Guide

Overview

A Pod moves through phases shown below. Failures often appear as Pending, repeated restarts, or terminal Failed—each with distinct events and fixes.

Pod lifecycle (simplified)

Pending → ContainerCreating → Running →

Succeeded Failed

Init containers and multiple restarts add detail; this view highlights where scheduling, image pull, config, runtime, and OOM issues usually surface.

Failure states at a glance

State / symptom	One-line description	Severity
Pending	Scheduler or kubelet prerequisites not met; pod not placed or not starting.	Warn
ImagePullBackOff / ErrImagePull	Registry, auth, tag, or network prevents pulling the container image.	High
CrashLoopBackOff	Container exits repeatedly; kubelet backs off restarts.	High
OOMKilled	Process exceeded its cgroup memory limit and was killed.	High
CreateContainerConfigError	Env or volume references point to missing ConfigMaps, Secrets, or keys.	Warn
RunContainerError	Runtime could not start the container (security, mounts, binary path).	High

Which error am I seeing?

Use kubectl describe pod <name> and match Events or State to jump to the right tab:

Pending

Pending means the pod is accepted but not running—often scheduling, resource, or volume binding.

Common causes

No schedulable node (all cordoned, NotReady, or filtered out).
Insufficient CPU or memory vs. pod requests.
Node taints without matching tolerations.
Node affinity / selector mismatch—no node matches labels.
PVC not Bound—volume cannot attach.
Too many pods on a node (kubelet max pods) or quota limits.

Diagnosis steps

1. Inspect pod events:

kubectl describe pod <pod-name> -n <namespace>

Scroll to Events; look for FailedScheduling with reasons.

2. Filter cluster events for the pod:

kubectl get events --field-selector involvedObject.name=<pod-name> -n <namespace> --sort-by='.lastTimestamp'

3. Compare node capacity and allocation:

kubectl describe nodes

Check Allocatable vs. running pod requests; note taints and conditions.

Sample events (FailedScheduling)

Events:
  Type     Reason            Message
  ----     ------            -------
  Warning  FailedScheduling  0/3 nodes are available: 1 Insufficient cpu, 2 node(s) had taint {key: value}, that the pod didn't tolerate.

Fixes (by cause)

Cause	Fix
No schedulable node	`kubectl get nodes kubectl uncordon <node>` Fix NotReady nodes; adjust workloads.
Insufficient CPU/memory	Lower `requests`/`limits`, scale cluster, or remove noisy neighbors. Edit: `kubectl edit deployment <name> -n <ns>`
Taints / tolerations	Add tolerations to pod spec or remove taint: `kubectl taint nodes <node> key=value:NoSchedule-`
Affinity mismatch	Relax `nodeSelector`, `affinity`, or label nodes correctly.
PVC not bound	`kubectl get pvc -n <ns> kubectl describe pvc <pvc-name> -n <ns>` Fix StorageClass, provisioner, or capacity.
Too many pods	Spread across nodes, raise kubelet max pods (ops), or reduce replicas.

ImagePullBackOff / ErrImagePull

The kubelet cannot pull the image; Kubernetes retries with exponential backoff (ImagePullBackOff).

Common causes

Wrong image name, tag, or digest (typo or deleted tag).
Private registry without imagePullSecrets.
Registry rate limiting (e.g. Docker Hub anonymous pulls).
Network unreachable from the node (firewall, DNS, proxy).

Diagnosis

kubectl describe pod <pod-name> -n <namespace>

Under Events, find Failed to pull image with the underlying error (404, 401, timeout, etc.).

Sample output

State:          Waiting
  Reason:       ImagePullBackOff
Events:
  Warning  Failed     Error: ErrImagePull
  Warning  Failed     failed to pull image "myregistry/app:badtag": rpc error: Not Found

Fixes

Correct the image — fix deployment/pod image: field to a valid name:tag or digest.

Registry credentials — create a pull secret and reference it:

kubectl create secret docker-registry regcred \
  --docker-server=<registry> \
  --docker-username=<user> \
  --docker-password=<token> \
  --docker-email=<email> \
  -n <namespace>

DNS and connectivity — on a node, test: crictl pull or nerdctl pull (depending on runtime), and verify DNS resolves the registry host.

YAML: imagePullSecrets

apiVersion: v1
kind: Pod
metadata:
  name: app
spec:
  imagePullSecrets:
    - name: regcred
  containers:
    - name: app
      image: myregistry.example.com/myapp:1.2.3

CrashLoopBackOff

The container starts, exits non-zero (or is killed), and kubelet restarts it—backing off after repeated failures.

Common causes

Application crash on startup (panic, missing DB, bad config file).
Wrong command / args or working directory.
Missing environment variables or wrong values.
Missing ConfigMap/Secret mount paths (may surface as config error first).
Liveness probe fails too early—kubelet kills container before ready.

Diagnosis

kubectl logs <pod-name> -n <namespace> --previous

kubectl describe pod <pod-name> -n <namespace>

Check restartCount, Last State, and Reason (e.g. Error, OOMKilled).

kubectl get pod <pod-name> -n <namespace> \
  -o jsonpath='{.status.containerStatuses[0].lastState}' | jq .

Sample log patterns

panic: runtime error: invalid memory address
exit status 2
Error: connect ECONNREFUSED 10.0.0.5:5432

Fixes

Read --previous logs; fix application code or configuration causing exit.
Align command/args with image ENTRYPOINT documentation.
Supply env vars via ConfigMap/Secret; verify keys exist.
Soften or delay liveness probes: higher initialDelaySeconds, separate readiness from liveness.
If restarts show OOMKilled, treat as memory limit issue (OOM tab).

OOMKilled

The Linux OOM killer terminated the container process when cgroup memory exceeded the container limit.

Common causes

resources.limits.memory set too low for real workload.
Memory leak or spike in the application.
JVM / runtime heap larger than container limit (e.g. default JVM grabs too much).

Diagnosis

kubectl describe pod <pod-name> -n <namespace>

Under container status, lastState.terminated.reason: OOMKilled.

kubectl top pod -n <namespace>

On the node (if permitted):

dmesg | grep -i oom

Fixes

Increase memory limits (and usually requests) based on profiling.
Profile the app (heap dumps, language profilers) to fix leaks.
For Java: set -Xmx (and related flags) below the container memory limit, leaving headroom.

YAML: memory limits

containers:
  - name: app
    image: myapp:1.0
    resources:
      requests:
        memory: "256Mi"
      limits:
        memory: "512Mi"

CreateContainerConfigError

Kubelet cannot build the container environment—usually missing ConfigMap, Secret, or a referenced key.

Common causes

Referenced ConfigMap does not exist in the namespace.
Referenced Secret does not exist.
key not found in the ConfigMap/Secret.

Diagnosis

kubectl describe pod <pod-name> -n <namespace>

Events often show: Error: configmap "app-config" not found or similar for secrets/keys.

Fixes

Create the missing object:

kubectl apply -f configmap.yaml -n <namespace>
kubectl apply -f secret.yaml -n <namespace>

Fix name, namespace, and key references in the pod spec.

YAML: envFrom and volumes

envFrom:
  - configMapRef:
      name: app-config
  - secretRef:
      name: app-secret
volumeMounts:
  - name: cfg
    mountPath: /config
volumes:
  - name: cfg
    configMap:
      name: app-config
      items:
        - key: app.properties
          path: app.properties

RunContainerError

Container runtime failed to start the container—distinct from in-app crashes (CrashLoop) or image pull failures.

Common causes

Security context: e.g. runAsNonRoot: true but image only runs as root.
Volume mount failures (permissions, bad mount propagation, missing volume).
Command path wrong—binary not found in PATH or command.

Diagnosis

kubectl describe pod <pod-name> -n <namespace>

Container state Waiting, reason RunContainerError; message often names permission, mount, or executable issues.

Fixes

Align image user with pod security: set runAsUser/fsGroup the image supports, or use a non-root image.
Remove or relax runAsNonRoot only if policy allows—prefer fixing the image.
Verify volume sources exist, mount paths are valid, and readOnlyRootFilesystem allows required writes (tmpfs/extra volume).
Correct command to full path if needed: ["/usr/local/bin/myapp"].

Example security context adjustment (illustrative—match your policy):

securityContext:
  runAsUser: 1000
  runAsGroup: 1000
  allowPrivilegeEscalation: false

Quick Reference

Master troubleshooting table

Status / hint	First command	Common cause	Fix direction
Pending + FailedScheduling	`kubectl describe pod`	Resources, taints, affinity, PVC	Adjust requests/tolerations/affinity; fix PVC
ImagePullBackOff	`kubectl describe pod`	Bad tag, auth, rate limit, network	Fix image; add imagePullSecret; check node DNS
CrashLoopBackOff	`kubectl logs --previous`	App error, bad cmd, probes	Fix app/config; tune probes
OOMKilled	`kubectl describe pod`	Low limit, leak, JVM heap	Raise limit; profile; set -Xmx
CreateContainerConfigError	`kubectl describe pod`	Missing CM/Secret/key	Create objects; fix references
RunContainerError	`kubectl describe pod`	Security, mounts, binary	Adjust securityContext; fix volumes/cmd

Decision tree: “My pod isn’t running”

kubectl get pod <name> — check PHASE / STATUS | v Pending? ----yes----> describe pod → FailedScheduling? | | no +-- resources / taints / PVC → Pending tab | v ContainerWaiting? ----yes----> describe → ImagePull / Config / Run error? | | | +-- ErrImagePull → ImagePull tab | +-- CreateContainerConfigError → Config tab | +-- RunContainerError → RunContainer tab | v Running but restarts? ----yes----> logs --previous | | | +-- OOMKilled in lastState → OOM tab | +-- app exit / probe → CrashLoop tab | v Succeeded / Failed (Job) — check job logs and backoffLimit

Kubernetes Pod Failure States

Overview

Pod lifecycle (simplified)

Failure states at a glance

Which error am I seeing?

Pending

Common causes

Diagnosis steps

Sample events (FailedScheduling)

Fixes (by cause)

ImagePullBackOff / ErrImagePull

Common causes

Diagnosis

Sample output

Fixes

YAML: imagePullSecrets

CrashLoopBackOff

Common causes

Diagnosis

Sample log patterns

Fixes

OOMKilled

Common causes

Diagnosis

Fixes

YAML: memory limits

CreateContainerConfigError

Common causes

Diagnosis

Fixes

YAML: envFrom and volumes

RunContainerError

Common causes

Diagnosis

Fixes

Quick Reference

Master troubleshooting table

Decision tree: “My pod isn’t running”

Related pages