Everything green… pods still restart. Hidden killer Liveness probes too aggressive Short timeouts during GC pauses Cold starts under load Fix Separate readiness & liveness Increase initialDelaySeconds Avoid HTTP probes on heavy endpoints livenessProbe: initialDelaySeconds: 30 timeoutSeconds: 5
Tag: k8s reliability
Why Readiness Probes Matter More Than Liveness
Most outages happen because traffic hits half-ready pods. readinessProbe: httpGet: path: /health/ready port: 80 Key idea: Liveness = “restart me” Readiness = “send traffic or not” If you only use liveness → Kubernetes will happily route traffic to chaos.
