Sre

Writing up a production outage report today, I was making the case that workloads should fail their readiness probes when they can’t reach their dependencies — databases, caches, anything required to do useful work. The collaborating Claude session put it better than I had: A probe that doesn’t fail when its workload can’t reach its database isn’t a probe — it’s wallpaper. That’s the whole thing. A readiness probe answers one question: is this pod ready to serve traffic? If the answer depends on a database connection and you’re not checking for that, you’re not answering the question — you’re decorating the pod spec.

A Readiness Probe That Can't Fail Is Just Wallpaper

Nobody cares that your Kubernetes cluster is healthy (and what to measure instead)