Essential guide for protecting your cluster state
# Extract etcd endpoint export ETCD_ENDPOINTS=$(grep -oP \ '(?<=--advertise-client-urls=)\S+' \ /etc/kubernetes/manifests/etcd.yaml) # Verify echo $ETCD_ENDPOINTS # Output: https://172.31.28.251:2379
sudo ETCDCTL_API=3 /usr/bin/etcdctl \ --endpoints="$ETCD_ENDPOINTS" \ --cacert=/etc/kubernetes/pki/etcd/ca.crt \ --cert=/etc/kubernetes/pki/etcd/server.crt \ --key=/etc/kubernetes/pki/etcd/server.key \ snapshot save /tmp/etcd_backup.db
sudo ETCDCTL_API=3 /usr/bin/etcdctl \ snapshot status /tmp/etcd_backup.db \ --write-out=table # Output: # +----------+---------+--------+-------------+ # | HASH | VERSION | SIZE | TOTAL KEYS | # +----------+---------+--------+-------------+ # | 123abcd | 3.5.0 | 5.1 MB | 3500 | # +----------+---------+--------+-------------+
# Copy to backup location sudo cp /tmp/etcd_backup.db \ /backup/etcd-$(date +%Y%m%d-%H%M%S).db # Or upload to cloud storage # aws s3 cp /tmp/etcd_backup.db \ # s3://backups/etcd/
# Move kube-apiserver manifest sudo mv \ /etc/kubernetes/manifests/kube-apiserver.yaml \ /tmp/kube-apiserver.yaml # Wait for apiserver pod to stop kubectl get pods -n kube-system | \ grep apiserver
sudo ETCDCTL_API=3 /usr/bin/etcdctl \ snapshot restore /tmp/etcd_backup.db \ --data-dir=/var/lib/etcd-new \ --name=master-restored # Note: No certs needed for restore # (local operation)
# Edit manifest:
# /etc/kubernetes/manifests/etcd.yaml
# Update these lines:
- --data-dir=/var/lib/etcd-new
- --name=master-restored
- --initial-cluster=master-restored=\
https://172.31.28.251:2380
- --initial-cluster-state=new
# Save and exit
# etcd pod restarts automatically# Move manifest back sudo mv /tmp/kube-apiserver.yaml \ /etc/kubernetes/manifests/kube-apiserver.yaml # Wait for pods to come up kubectl get pods -n kube-system # Verify cluster state kubectl get nodes kubectl get pods -A
• Always test restore procedures in non-production environments first
• Backup regularly: before upgrades, major changes, and on a daily schedule
• Store backups in secure, off-cluster locations with proper retention policies
• Document your cluster's specific certificate paths and etcd endpoint