⌂ Home

Kubernetes Network Troubleshooting

Interactive guide: services, DNS, pod connectivity, NetworkPolicy, Ingress, and a quick symptom-to-fix reference.

Overview

Networking model

Every Pod gets its own IP address on a flat, cluster-wide Pod network. Pods can talk to each other directly by IP without NAT between nodes (that behavior is provided by your CNI).

Three layers to remember

  • Pod networking (CNI) — assigns routes/interfaces so Pod IPs work across nodes.
  • Service networking — stable virtual IP/DNS; kube-proxy (or similar) programs iptables/IPVS (or eBPF) to load-balance to Endpoints.
  • External access — NodePort, LoadBalancer, or Ingress (HTTP/S) in front of Services.

“Can’t reach my app” decision tree

Walk through in order; the first failure you find is usually where to fix.

1. Is the Pod running and ready? kubectl get pods — check STATUS and READY.
2. Does the Service selector match Pod labels? Mismatch → no Endpoints.
3. Are Endpoints populated? kubectl get endpoints SERVICE — empty list means traffic has nowhere to go.
4. Is cluster DNS resolving Service names? Test from a client Pod with nslookup or getent.
5. Is a NetworkPolicy blocking ingress/egress (including DNS on port 53)?

Service Issues

Service has no Endpoints

The Service spec.selector must match labels on Pod template metadata. If labels drift after a deploy, Endpoints disappear.

Wrong port / targetPort

port is what clients use on the Service. targetPort is the container port. A typo here yields connection refused or timeouts while Endpoints still look “healthy.”

Service type confusion

  • ClusterIP — reachable only inside the cluster (default).
  • NodePort — exposes a high port on every node; still need firewall/route to nodes.
  • LoadBalancer — cloud LB in front; requires controller support and correct cloud integration.

Diagnosis commands

kubectl get endpoints my-service
kubectl describe svc my-service
# Check Selector, Ports, and Endpoints sections

kubectl get pods -l app=my-app --show-labels

Fix: align selector and labels

Example mismatch (broken): Service selects app: api but Deployment labels Pods app: backend.

# Service (corrected)
apiVersion: v1
kind: Service
metadata:
  name: my-service
spec:
  selector:
    app: backend
    tier: api
  ports:
    - port: 80
      targetPort: 8080

# Deployment pod template must include the same labels:
# metadata.labels: { app: backend, tier: api }

DNS Failures

CoreDNS not running

kubectl get pods -n kube-system -l k8s-app=kube-dns

If pods are CrashLoopBackOff or missing, Service name resolution cluster-wide breaks. Check events, resources, and CoreDNS logs.

Resolution test from a Pod

kubectl exec -it my-client-pod -n my-ns -- nslookup my-service.my-ns.svc.cluster.local

Short name my-service works inside the same namespace; FQDN is safest for debugging.

ndots and search list

Many images ship resolv.conf with ndots:5 and multiple search domains. Names with fewer than five dots may be tried as relative names first, causing odd delays or wrong answers. Compare behavior with the FQDN.

CoreDNS ConfigMap

kubectl get cm coredns -n kube-system -o yaml

Misconfigured forwarders, hosts plugin mistakes, or broken Corefile syntax can cause partial or total DNS failure after a change.

Common errors

NXDOMAIN Name does not exist — wrong name, wrong namespace, or Service missing. Verify Service exists and FQDN.

timeout Packets to UDP/TCP 53 not reaching CoreDNS — NetworkPolicy, node firewall, or CoreDNS not listening. Check policies and kube-proxy/CNI path to the ClusterIP for kube-dns.

Pod Connectivity

Pod-to-Pod across nodes

If same-node works but cross-node fails, suspect CNI routing, overlay, or underlying network (security groups, VLANs).

kubectl get pods -n kube-system
# Look for calico-node, canal, flannel, cilium, weave, etc.

Basic reachability

kubectl exec -it pod-a -- ping -c 3 POD_B_IP

Some minimal images omit ping; use a debug image or kubectl debug with a toolbox image.

Plugin-specific checks

  • Calico — node readiness, BGP/session status, IP pools.
  • Flannel — flannel.1 / VXLAN interfaces, etcd/kube-api lease data.
  • Ciliumcilium status on nodes (where installed).

iptables NAT path

On a node (privileged):

iptables -t nat -L KUBE-SERVICES -n

Confirms kube-proxy (iptables mode) programmed chains; empty or stale rules can break Service VIP forwarding.

kube-proxy logs

kubectl get pods -n kube-system -o wide | grep kube-proxy
kubectl logs -n kube-system POD_NAME_OF_KUBE_PROXY

NetworkPolicy

Repository YAML files

Pre-built NetworkPolicy manifests in k8s/labs/security/ — apply directly or use as templates:

  • deny-all-ingress.yaml — deny all ingress to pods labeled app: k8slearning
  • allow-ingress.yaml — allow ingress only from pods with matching app: k8slearning label
  • deny-from-other-namespaces.yaml — deny cross-namespace ingress in the prod namespace

Policy blocking legitimate traffic

NetworkPolicy is additive only when you allow; a deny-by-default namespace or a tight policy can block traffic that used to work.

Default deny surprise

Symptom: “Everything worked until I added a NetworkPolicy.” Often only a narrow ingress is allowed and egress (or DNS) was forgotten.

Diagnosis

kubectl get networkpolicy -A
kubectl describe networkpolicy my-policy -n my-ns

Test from a throwaway Pod

kubectl run tmp --rm -it --image=busybox:1.36 --restart=Never -n my-ns -- wget -qO- http://my-service:80

Common mistakes

YAML: allow app traffic and DNS egress (illustrative)

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-web-and-dns
  namespace: my-ns
spec:
  podSelector:
    matchLabels:
      app: my-app
  policyTypes:
    - Ingress
    - Egress
  ingress:
    - from:
        - podSelector:
            matchLabels:
              role: frontend
      ports:
        - protocol: TCP
          port: 8080
  egress:
    - to:
        - namespaceSelector:
            matchLabels:
              kubernetes.io/metadata.name: kube-system
      ports:
        - protocol: UDP
          port: 53
        - protocol: TCP
          port: 53
    - to:
        - podSelector:
            matchLabels:
              app: backend
      ports:
        - protocol: TCP
          port: 80

Ingress & NodePort

Ingress not routing

  • IngressClass not set or wrong — no controller claims the object.
  • Ingress controller not running (nginx, traefik, etc.).
  • Backend Service name/port wrong — controller logs show upstream errors.
  • TLS — Secret missing, wrong namespace, or cert mismatch causes 404 at edge or TLS handshake errors.

Ingress inspection

kubectl get ingress -A
kubectl describe ingress my-ing -n my-ns

NodePort not accessible from outside

  • Cloud or host firewall blocking 30000–32767 (default NodePort range).
  • kube-proxy down on the node you are hitting — traffic never forwarded to Pods.
  • Wrong node IP — use an address that actually reaches the kubelet/node (not only an internal SDN IP).

Quick Reference

Symptom → first check → likely cause → fix command

Symptom First check Likely cause Fix / command
Service unreachable kubectl get endpoints Selector/labels mismatch kubectl describe svc + align labels
Wrong backend port describe Service ports targetPort typo Edit Service targetPort / named port
DNS name fails CoreDNS pods + test FQDN CoreDNS down or policy blocks 53 kubectl get pods -n kube-system -l k8s-app=kube-dns
Only some Pods broken NetworkPolicy list Policy egress/ingress kubectl get netpol -A
External URL 502/503 Ingress describe + controller logs Bad Service ref or no Endpoints kubectl describe ingress
NodePort times out From node: ss -lntp Firewall / kube-proxy Open 30000–32767; check kube-proxy

Command cheat sheet

kubectl get svc,endpoints,pods -n NAMESPACE -o wide
kubectl describe svc SERVICE -n NAMESPACE
kubectl get pods --show-labels -n NAMESPACE
kubectl exec -it POD -n NAMESPACE -- nslookup SVC.NAMESPACE.svc.cluster.local
kubectl get networkpolicy -A
kubectl get ingress -A
kubectl logs -n kube-system -l k8s-app=kube-dns
kubectl logs -n kube-system DS_OR_POD_FOR_KUBE_PROXY