Kubernetes: Use HPA to Auto-Scale Pods Based on CPU/Memory

📈 Auto-Scale Based on Load

Fixed 3 replicas? CPU spikes, users wait. Traffic drops, you waste money. HPA (Horizontal Pod Autoscaler) scales pods automatically based on metrics.

How HPA Works

# deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-app
spec:
  replicas: 2  # Initial replicas
  template:
    spec:
      containers:
      - name: app
        image: myapp:latest
        resources:
          requests:
            cpu: 100m      # REQUIRED for HPA
            memory: 128Mi
          limits:
            cpu: 200m
            memory: 256Mi

# hpa.yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: web-app-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web-app
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70  # Scale when avg CPU > 70%
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80  # Scale when avg memory > 80%

⚙️ Apply HPA

# Apply deployment
kubectl apply -f deployment.yaml

# Create HPA
kubectl apply -f hpa.yaml

# Or create via command line
kubectl autoscale deployment web-app \
  --cpu-percent=70 \
  --min=2 \
  --max=10

# Check HPA status
kubectl get hpa

# NAME          REFERENCE            TARGETS   MINPODS   MAXPODS   REPLICAS
# web-app-hpa   Deployment/web-app   45%/70%   2         10        2

# Watch in real-time
kubectl get hpa -w

Load Test: Watch It Scale

🧪 Generate Load

# Install Apache Bench
sudo apt-get install apache2-utils

# Generate load (1000 requests, 100 concurrent)
ab -n 1000 -c 100 http://your-service-url/

# Watch HPA scale up
kubectl get hpa -w

# TARGETS changes:
# 45%/70% → 85%/70% → Scaling up!
# REPLICAS: 2 → 4 → 6 → 8

# Load stops
# TARGETS: 85%/70% → 20%/70% → Scaling down
# REPLICAS: 8 → 6 → 4 → 2

# Automatic! No manual intervention

Custom Metrics (Advanced)

# Scale based on request per second
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: web-app-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web-app
  minReplicas: 2
  maxReplicas: 20
  metrics:
  - type: Pods
    pods:
      metric:
        name: http_requests_per_second
      target:
        type: AverageValue
        averageValue: "1000"  # Scale when > 1000 req/s per pod
  
  # Multiple metrics (all must be met)
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70

✅ Best Practices

Set resource requests: HPA requires CPU/memory requests defined
Don’t set minReplicas=1: Always have >= 2 for availability
Add buffer: Set target at 70-80%, not 100%
Cool-down period: Default 5 min scale-down prevents flapping
Monitor metrics: Use Prometheus + Grafana to visualize

⚠️ Important Notes

Metrics Server required: Install if not present: kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
Scale-down delay: Waits 5 minutes before removing pods (configurable)
Cluster capacity: Can’t scale if no nodes available (use Cluster Autoscaler)
Stateful apps: HPA works best with stateless apps

💡 Cheat Sheet

Command	Description
`kubectl get hpa`	List all HPAs
`kubectl describe hpa [name]`	Detailed HPA info
`kubectl get hpa -w`	Watch HPA changes live
`kubectl delete hpa [name]`	Remove autoscaling
`kubectl top pods`	See current CPU/memory usage

“Black Friday traffic spike: 100x normal load. HPA scaled from 5 pods to 50 in minutes. Site stayed up, no manual intervention. After traffic died, scaled back down. Saved money + ensured uptime.”

— E-commerce Platform Engineer

Kubernetes — Pod Restarts Without Errors

Kubernetes Pods Restart Without Logs

Kubernetes: Mastering HPA for Elastic Traffic Management

Post Views: 2

M	T	W	T	F	S	S
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30

Bits of .NET

Kubernetes — Pod Restarts Without Errors

Kubernetes Pods Restart Without Logs

Kubernetes: Mastering HPA for Elastic Traffic Management

Leave a Reply Cancel reply

Most Viewed Posts

Recent Posts

Social

📈 Auto-Scale Based on Load

How HPA Works

⚙️ Apply HPA

Load Test: Watch It Scale

🧪 Generate Load

Custom Metrics (Advanced)

✅ Best Practices

⚠️ Important Notes

💡 Cheat Sheet

Related posts:

Kubernetes — Pod Restarts Without Errors

Kubernetes Pods Restart Without Logs

Kubernetes: Mastering HPA for Elastic Traffic Management

Leave a Reply Cancel reply

Most Viewed Posts

Recent Posts

Social