⌂ Home

Kubernetes Pod Failure States

Interactive guide to common pod problems, how to read events and status, and targeted fixes—no cluster required to learn the patterns.

Overview

A Pod moves through phases shown below. Failures often appear as Pending, repeated restarts, or terminal Failed—each with distinct events and fixes.

Pod lifecycle (simplified)

Pending ContainerCreating Running
Succeeded Failed

Init containers and multiple restarts add detail; this view highlights where scheduling, image pull, config, runtime, and OOM issues usually surface.

Failure states at a glance

State / symptom One-line description Severity
Pending Scheduler or kubelet prerequisites not met; pod not placed or not starting. Warn
ImagePullBackOff / ErrImagePull Registry, auth, tag, or network prevents pulling the container image. High
CrashLoopBackOff Container exits repeatedly; kubelet backs off restarts. High
OOMKilled Process exceeded its cgroup memory limit and was killed. High
CreateContainerConfigError Env or volume references point to missing ConfigMaps, Secrets, or keys. Warn
RunContainerError Runtime could not start the container (security, mounts, binary path). High

Which error am I seeing?

Use kubectl describe pod <name> and match Events or State to jump to the right tab:

Pending

Pending means the pod is accepted but not running—often scheduling, resource, or volume binding.

Common causes

Diagnosis steps

1. Inspect pod events:

kubectl describe pod <pod-name> -n <namespace>

Scroll to Events; look for FailedScheduling with reasons.

2. Filter cluster events for the pod:

kubectl get events --field-selector involvedObject.name=<pod-name> -n <namespace> --sort-by='.lastTimestamp'

3. Compare node capacity and allocation:

kubectl describe nodes

Check Allocatable vs. running pod requests; note taints and conditions.

Sample events (FailedScheduling)

Events:
  Type     Reason            Message
  ----     ------            -------
  Warning  FailedScheduling  0/3 nodes are available: 1 Insufficient cpu, 2 node(s) had taint {key: value}, that the pod didn't tolerate.

Fixes (by cause)

CauseFix
No schedulable node
kubectl get nodes
kubectl uncordon <node>
Fix NotReady nodes; adjust workloads.
Insufficient CPU/memory Lower requests/limits, scale cluster, or remove noisy neighbors. Edit:
kubectl edit deployment <name> -n <ns>
Taints / tolerations Add tolerations to pod spec or remove taint:
kubectl taint nodes <node> key=value:NoSchedule-
Affinity mismatch Relax nodeSelector, affinity, or label nodes correctly.
PVC not bound
kubectl get pvc -n <ns>
kubectl describe pvc <pvc-name> -n <ns>
Fix StorageClass, provisioner, or capacity.
Too many pods Spread across nodes, raise kubelet max pods (ops), or reduce replicas.

ImagePullBackOff / ErrImagePull

The kubelet cannot pull the image; Kubernetes retries with exponential backoff (ImagePullBackOff).

Common causes

Diagnosis

kubectl describe pod <pod-name> -n <namespace>

Under Events, find Failed to pull image with the underlying error (404, 401, timeout, etc.).

Sample output

State:          Waiting
  Reason:       ImagePullBackOff
Events:
  Warning  Failed     Error: ErrImagePull
  Warning  Failed     failed to pull image "myregistry/app:badtag": rpc error: Not Found

Fixes

Correct the image — fix deployment/pod image: field to a valid name:tag or digest.

Registry credentials — create a pull secret and reference it:

kubectl create secret docker-registry regcred \
  --docker-server=<registry> \
  --docker-username=<user> \
  --docker-password=<token> \
  --docker-email=<email> \
  -n <namespace>

DNS and connectivity — on a node, test: crictl pull or nerdctl pull (depending on runtime), and verify DNS resolves the registry host.

YAML: imagePullSecrets

apiVersion: v1
kind: Pod
metadata:
  name: app
spec:
  imagePullSecrets:
    - name: regcred
  containers:
    - name: app
      image: myregistry.example.com/myapp:1.2.3

CrashLoopBackOff

The container starts, exits non-zero (or is killed), and kubelet restarts it—backing off after repeated failures.

Common causes

Diagnosis

kubectl logs <pod-name> -n <namespace> --previous
kubectl describe pod <pod-name> -n <namespace>

Check restartCount, Last State, and Reason (e.g. Error, OOMKilled).

kubectl get pod <pod-name> -n <namespace> \
  -o jsonpath='{.status.containerStatuses[0].lastState}' | jq .

Sample log patterns

panic: runtime error: invalid memory address
exit status 2
Error: connect ECONNREFUSED 10.0.0.5:5432

Fixes

OOMKilled

The Linux OOM killer terminated the container process when cgroup memory exceeded the container limit.

Common causes

Diagnosis

kubectl describe pod <pod-name> -n <namespace>

Under container status, lastState.terminated.reason: OOMKilled.

kubectl top pod -n <namespace>

On the node (if permitted):

dmesg | grep -i oom

Fixes

YAML: memory limits

containers:
  - name: app
    image: myapp:1.0
    resources:
      requests:
        memory: "256Mi"
      limits:
        memory: "512Mi"

CreateContainerConfigError

Kubelet cannot build the container environment—usually missing ConfigMap, Secret, or a referenced key.

Common causes

Diagnosis

kubectl describe pod <pod-name> -n <namespace>

Events often show: Error: configmap "app-config" not found or similar for secrets/keys.

Fixes

YAML: envFrom and volumes

envFrom:
  - configMapRef:
      name: app-config
  - secretRef:
      name: app-secret
volumeMounts:
  - name: cfg
    mountPath: /config
volumes:
  - name: cfg
    configMap:
      name: app-config
      items:
        - key: app.properties
          path: app.properties

RunContainerError

Container runtime failed to start the container—distinct from in-app crashes (CrashLoop) or image pull failures.

Common causes

Diagnosis

kubectl describe pod <pod-name> -n <namespace>

Container state Waiting, reason RunContainerError; message often names permission, mount, or executable issues.

Fixes

Example security context adjustment (illustrative—match your policy):

securityContext:
  runAsUser: 1000
  runAsGroup: 1000
  allowPrivilegeEscalation: false

Quick Reference

Master troubleshooting table

Status / hint First command Common cause Fix direction
Pending + FailedScheduling kubectl describe pod Resources, taints, affinity, PVC Adjust requests/tolerations/affinity; fix PVC
ImagePullBackOff kubectl describe pod Bad tag, auth, rate limit, network Fix image; add imagePullSecret; check node DNS
CrashLoopBackOff kubectl logs --previous App error, bad cmd, probes Fix app/config; tune probes
OOMKilled kubectl describe pod Low limit, leak, JVM heap Raise limit; profile; set -Xmx
CreateContainerConfigError kubectl describe pod Missing CM/Secret/key Create objects; fix references
RunContainerError kubectl describe pod Security, mounts, binary Adjust securityContext; fix volumes/cmd

Decision tree: “My pod isn’t running”

kubectl get pod <name> — check PHASE / STATUS | v Pending? ----yes----> describe pod → FailedScheduling? | | no +-- resources / taints / PVC → Pending tab | v ContainerWaiting? ----yes----> describe → ImagePull / Config / Run error? | | | +-- ErrImagePull → ImagePull tab | +-- CreateContainerConfigError → Config tab | +-- RunContainerError → RunContainer tab | v Running but restarts? ----yes----> logs --previous | | | +-- OOMKilled in lastState → OOM tab | +-- app exit / probe → CrashLoop tab | v Succeeded / Failed (Job) — check job logs and backoffLimit

Related pages