⌂ Home

Horizontal Pod Autoscaling

Interactive guide to metrics-driven scaling, target utilization, and how HPA changes replica counts automatically.

Horizontal Pod Autoscaling is a control loop that reacts to observed metrics. It depends on a metrics pipeline and on realistic workload resource settings.

Core Model

Understand the Concept First

Repository YAML Files:
  • k8s/labs/workloads/hpa.yaml — HorizontalPodAutoscaler targeting a workload with CPU-based scale rules.
  • k8s/labs/workloads/app-hpa.yaml — Sample app Deployment paired with HPA-friendly resource requests for labs.
Automatic scaling

HPA adjusts replica counts in response to changing resource usage or other supported metrics.

Metrics-dependent

Built-in autoscaling typically depends on Metrics Server for CPU and memory metrics.

Request-aware

CPU utilization targets are meaningful only when resource requests are set sensibly.

Lifecycle Flow

Autoscaling Control Loop

Scale-Up Scenario: CPU 50% → 80% Current State 3 Pods 80% CPU Detect High Load HPA Triggered Target: 50% CPU Observed: 80% CPU Calculate Calculate Replicas ceil(3 × 80/50) = 5 Pods Scale Up New State 5 Pods (+2) ~48% CPU ✓ CPU Threshold 50% Target 80% Before 48% After
HPA is not a one-time action. It continuously re-evaluates the workload against the current metrics picture.
YAML and Commands

Examples You Can Recognize Quickly

Basic HPA
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: php-apache
spec:
minReplicas: 1
maxReplicas: 10
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: php-apache
Observe Scaling
kubectl get hpa
kubectl top pods
kubectl describe hpa php-apache
Decision Guide

Horizontal vs Vertical Scaling

Feature Horizontal Scaling Vertical Scaling
Method Add or remove Pods Change CPU or memory size
Downtime profile Usually none May require restart depending on approach
Best fit Stateless workloads Right-sizing resource envelopes
Typical tool HPA VPA
Horizontal scaling changes the number of workload copies; vertical scaling changes the size of each copy.
Use It Well

Practice and Real-World Thinking

Traffic spikes

Scale web or API workloads up during demand peaks and down later.

Resource efficiency

Avoid keeping excess replicas running during low traffic periods.

Autoscaling labs

Use generated load to observe how replica counts respond over time.