๐ Auto-Scale Based on Load
Fixed 3 replicas? CPU spikes, users wait. Traffic drops, you waste money. HPA (Horizontal Pod Autoscaler) scales pods automatically based on metrics.
How HPA Works
# deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: web-app
spec:
replicas: 2 # Initial replicas
template:
spec:
containers:
- name: app
image: myapp:latest
resources:
requests:
cpu: 100m # REQUIRED for HPA
memory: 128Mi
limits:
cpu: 200m
memory: 256Mi
# hpa.yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: web-app-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: web-app
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70 # Scale when avg CPU > 70%
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80 # Scale when avg memory > 80%
โ๏ธ Apply HPA
# Apply deployment kubectl apply -f deployment.yaml # Create HPA kubectl apply -f hpa.yaml # Or create via command line kubectl autoscale deployment web-app \ --cpu-percent=70 \ --min=2 \ --max=10 # Check HPA status kubectl get hpa # NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS # web-app-hpa Deployment/web-app 45%/70% 2 10 2 # Watch in real-time kubectl get hpa -w
Load Test: Watch It Scale
๐งช Generate Load
# Install Apache Bench sudo apt-get install apache2-utils # Generate load (1000 requests, 100 concurrent) ab -n 1000 -c 100 http://your-service-url/ # Watch HPA scale up kubectl get hpa -w # TARGETS changes: # 45%/70% โ 85%/70% โ Scaling up! # REPLICAS: 2 โ 4 โ 6 โ 8 # Load stops # TARGETS: 85%/70% โ 20%/70% โ Scaling down # REPLICAS: 8 โ 6 โ 4 โ 2 # Automatic! No manual intervention
Custom Metrics (Advanced)
# Scale based on request per second
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: web-app-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: web-app
minReplicas: 2
maxReplicas: 20
metrics:
- type: Pods
pods:
metric:
name: http_requests_per_second
target:
type: AverageValue
averageValue: "1000" # Scale when > 1000 req/s per pod
# Multiple metrics (all must be met)
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
โ Best Practices
- Set resource requests: HPA requires CPU/memory requests defined
- Don’t set minReplicas=1: Always have >= 2 for availability
- Add buffer: Set target at 70-80%, not 100%
- Cool-down period: Default 5 min scale-down prevents flapping
- Monitor metrics: Use Prometheus + Grafana to visualize
โ ๏ธ Important Notes
- Metrics Server required: Install if not present:
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml - Scale-down delay: Waits 5 minutes before removing pods (configurable)
- Cluster capacity: Can’t scale if no nodes available (use Cluster Autoscaler)
- Stateful apps: HPA works best with stateless apps
๐ก Cheat Sheet
| Command | Description |
|---|---|
kubectl get hpa |
List all HPAs |
kubectl describe hpa [name] |
Detailed HPA info |
kubectl get hpa -w |
Watch HPA changes live |
kubectl delete hpa [name] |
Remove autoscaling |
kubectl top pods |
See current CPU/memory usage |
“Black Friday traffic spike: 100x normal load. HPA scaled from 5 pods to 50 in minutes. Site stayed up, no manual intervention. After traffic died, scaled back down. Saved money + ensured uptime.”
