Interactive reference for node health, kubelet failures, the container runtime, resource pressure, and join issues — with practical commands and diagnosis flows.
A Kubernetes node runs user workloads. The control plane schedules Pods onto nodes; on each node, several daemons keep that node healthy and able to run containers.
/etc/kubernetes/kubelet.conf) to authenticate to the API server (typically port 6443 on the control plane).| Condition | True means | Typical checks |
|---|---|---|
| Ready | Node is healthy enough to accept new Pods (kubelet is up, runtime OK, networking OK). | kubectl describe node → Conditions; kubelet & runtime service status on the node. |
| MemoryPressure | Node is low on memory; kubelet may evict Pods. | free -m, cgroup/memory, kubelet eviction logs. |
| DiskPressure | Disk space or inodes are tight (often image/container layers or logs). | df -h, df -i, image cleanup with crictl. |
| PIDPressure | Too many processes / PIDs; kubelet may throttle or evict. | Process count, sysctl kernel.pid_max, workload churn. |
| NetworkUnavailable | Network plugin has not configured the node network (varies by CNI). | CNI pods, CNI logs, node routes and interfaces. |
kubectl get nodes
kubectl get nodes -o wide
kubectl describe node <node-name>
get nodes for Ready/NotReady, then describe node for Conditions, capacity, taints, and recent events.When a node shows NotReady, the scheduler should stop placing new Pods there (unless tolerations override). Existing Pods may keep running, but the control plane cannot rely on kubelet heartbeats.
kubectl get nodes and kubectl describe node <name>. Read Conditions (Ready, MemoryPressure, …) and Events at the bottom.systemctl status kubelet and systemctl status containerd (or crio if using CRI-O).journalctl -u kubelet --since "10 min ago" --no-pager
kubectl describe nodeConditions:
Type Status LastHeartbeatTime Reason Message
---- ------ ----------------- ------ -------
Ready False Mon, 05 Apr 2026 10:22:11 +0000 KubeletNotReady container runtime is down...
MemoryPressure False Mon, 05 Apr 2026 10:21:50 +0000 KubeletHasSufficientMemory kubelet has sufficient memory
DiskPressure False Mon, 05 Apr 2026 10:21:50 +0000 KubeletHasNoDiskPressure kubelet has no disk pressure
...
Events:
Warning NodeNotReady kubelet Node is not ready
| Cause | Remediation (examples) |
|---|---|
| kubelet failed / wedged | |
| containerd (or CRI) not running | |
| Certificates expired (kubeadm clusters) | |
| Network to API server | Verify /etc/kubernetes/kubelet.conf server URL, DNS, firewall, and routes; from node: curl -k https://<apiserver>:6443/version (with correct certs if needed). |
/var/lib/kubelet/config.yaml (or path set by your unit file)./etc/kubernetes/kubelet.conf (server URL, credentials, CA).systemd).sudo systemctl status kubelet -l
sudo journalctl -u kubelet -b --no-pager | tail -80
| Message pattern | What to check |
|---|---|
unable to load bootstrap kubeconfig |
TLS bootstrap: path to bootstrap kubeconfig, token, and API server reachability; file permissions. |
failed to run kubelet |
Often invalid flags, bad config.yaml, or cgroup/runtime mismatch — read the next lines of the log for the underlying error. |
node ... not found |
Node object missing in API (name mismatch, cluster reset, or RBAC/bootstrap timing). Confirm --hostname-override / cloud provider node name alignment. |
During bootstrap, kubelet uses a bootstrap kubeconfig and token to request a signed client cert. Failures usually mean wrong token, wrong API address, clock skew, or CSR approval not happening.
# Follow logs during bootstrap
sudo journalctl -u kubelet -f
systemctl cat kubelet for EnvironmentFile and dropped-in overrides.kubelet --version matches your cluster’s supported skew policy./var/log/kubelet.log — check if your distro redirects journal only.kubelet --version
sudo journalctl -u kubelet -f
config.yaml and systemd drop-ins over ad-hoc CLI flags; kubeadm and most installers manage flags via config.kubelet speaks CRI to the runtime. If the runtime socket is wrong or the daemon is down, Pods cannot start and the node may go NotReady.
sudo systemctl status containerd
sudo systemctl restart containerd
sudo crictl info
crictl info should show CRI version and a healthy response when the socket is correct.
sudo crictl pull docker.io/library/nginx:alpine
sudo crictl images
kubectl describe pod often mirror what you will see when pulling the same reference with crictl pull on the node.Default CRI socket is often /run/containerd/containerd.sock. kubelet must be configured to use the same socket your runtime exposes (see kubelet or cri plugin config).
ls -l /run/containerd/containerd.sock
sudo systemctl status crio
sudo crictl --runtime-endpoint unix:///var/run/crio/crio.sock version
Socket path may be /var/run/crio/crio.sock depending on version and OS packaging.
sudo crictl ps -a
sudo crictl pods
sudo crictl logs <container-id>
sudo crictl inspect <container-id>
Use crictl inspect for JSON detail (mounts, labels, sandbox) when kubelet or the CNI reports sandbox errors.
kubelet monitors node resources and sets pressure conditions. When thresholds are crossed, it can evict Pods to protect the node.
df -h and df -i.sudo crictl rmi --prune
free -m and host / cgroup metrics.ps aux | wc -l (interpretation varies by OS).kernel.pid_max and workload patterns; reduce noisy DaemonSets or buggy loops.| Signal | Soft threshold | Hard threshold |
|---|---|---|
| memory.available | Eviction after grace period if pressure persists; Pods terminated if node still starved. | Immediate eviction pressure when available memory falls below hard limit. |
| nodefs.available / imagefs.available | Soft: throttle + eventual eviction with grace period. | Hard: stronger eviction / refusal to accept new Pods depending on state. |
| pid.available | Soft: similar grace-period behavior for PID scarcity. | Hard: more aggressive reaction to low PIDs. |
Exact percentages and behavior depend on kubelet version and your KubeletConfiguration.
Edit /var/lib/kubelet/config.yaml (or the file referenced by your kubelet service) under evictionHard, evictionSoft, evictionSoftGracePeriod, and evictionMinimumReclaim. Example pattern:
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
evictionHard:
memory.available: "200Mi"
nodefs.available: "10%"
imagefs.available: "15%"
evictionSoft:
memory.available: "500Mi"
evictionSoftGracePeriod:
memory.available: "1m30s"
kubeadm join)Worker nodes run kubeadm join to obtain cluster credentials and start kubelet against the control plane. Failures usually fall into token, TLS trust, network, or “already joined” categories.
discovery-token-ca-cert-hash does not match the cluster CA./etc/kubernetes state from a previous join attempt.# On control plane: list tokens (requires appropriate access)
sudo kubeadm token list
# From the joining node: test API reachability
nc -zv <control-plane-host> 6443
# or
curl -vk https://<control-plane-host>:6443/version
If TCP fails, fix networking before re-running join. If TLS fails, verify CA hash and server certificate.
On a failed worker, clean local kubeadm state and join again with a fresh token:
sudo kubeadm reset -f
sudo systemctl restart kubelet
# On control plane: create a new token (example)
sudo kubeadm token create --print-join-command
Copy the printed kubeadm join ... line and run it on the worker.
ca.crt you trust matches the cluster.| Symptom | Direction |
|---|---|
| Port 10250 / 10248 / swap / br_netfilter | kubeadm preflights enforce host prerequisites — enable required kernel modules, disable swap (or configure kubelet allow), open firewall ports per docs. |
| Container runtime not running | Start containerd or crio and confirm CRI socket before join. |
| Hostname / MAC conflicts | Ensure unique node name and stable network identity for the cloud provider or kubeadm. |
token list → nc -zv → journalctl -u kubelet → kubeadm reset + new join command.