⌂ Home

Horizontal Pod Autoscaling

Interactive guide to metrics-driven scaling, target utilization, and how HPA changes replica counts automatically.

Horizontal Pod Autoscaling is a control loop that reacts to observed metrics. It depends on a metrics pipeline and on realistic workload resource settings.

Core Model

Understand the Concept First

Repository YAML Files:

k8s/labs/workloads/hpa.yaml — HorizontalPodAutoscaler targeting a workload with CPU-based scale rules.
k8s/labs/workloads/app-hpa.yaml — Sample app Deployment paired with HPA-friendly resource requests for labs.

Automatic scaling

HPA adjusts replica counts in response to changing resource usage or other supported metrics.

Metrics-dependent

Built-in autoscaling typically depends on Metrics Server for CPU and memory metrics.

Request-aware

CPU utilization targets are meaningful only when resource requests are set sensibly.

Lifecycle Flow

Autoscaling Control Loop

HPA is not a one-time action. It continuously re-evaluates the workload against the current metrics picture.

YAML and Commands

Examples You Can Recognize Quickly

Basic HPA

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: php-apache
spec:
  minReplicas: 1
  maxReplicas: 10
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: php-apache

Observe Scaling

kubectl get hpa
kubectl top pods
kubectl describe hpa php-apache

Decision Guide

Horizontal vs Vertical Scaling

Feature	Horizontal Scaling	Vertical Scaling
Method	Add or remove Pods	Change CPU or memory size
Downtime profile	Usually none	May require restart depending on approach
Best fit	Stateless workloads	Right-sizing resource envelopes
Typical tool	HPA	VPA

Horizontal scaling changes the number of workload copies; vertical scaling changes the size of each copy.

Use It Well

Practice and Real-World Thinking

Traffic spikes

Scale web or API workloads up during demand peaks and down later.

Resource efficiency

Avoid keeping excess replicas running during low traffic periods.

Autoscaling labs

Use generated load to observe how replica counts respond over time.