When you shut down an application instance, don't stop the listener immediately — that's how you end up with failed requests during every application rollout. 😢
🛑 The Common Mistake:
I've seen many shutdown implementations that stop the listener as soon as the shutdown signal is received.
The assumption is usually:
“Stopping the listener will fail readiness probes, and traffic will be redirected.”
That's half right…
It will trigger traffic redirection, but not immediately.
⏱️ Probe Intervals Matter:
Readiness probes (Kubernetes), Load Balancer health checks, & service mesh probes all run at fixed intervals.
In Kubernetes, the default is 10 seconds.
That means it can take up to 10 seconds for the platform to detect an unhealthy status and adjust traffic.
Longer if the failure threshold is greater than 1.
💥 What Happens During Those 10 Seconds?
New traffic still goes to the unhealthy instance.
And because you stopped the listener, every request to that instance fails for 10 seconds.
Some clients retry and land on another instance.
Some will not.
Either way, every rollout will result in failed requests that could have been avoided.
✅ What You Should Do Instead
When shutting down an instance:
1️⃣ Keep the listener running; Don’t slam the door shut.
2️⃣ Fail readiness probes; Report failures from the readiness endpoint, but allow new requests to other endpoints.
3️⃣ Wait for traffic to drain; Let in-flight requests finish, and let the platform stop routing new requests.
4️⃣ Then stop the listener; Only when it's safe.
This is a graceful shutdown.
🧠 Final Thoughts
Resiliency isn't only about surviving failures, it's also about preventing them.
Handle shutdown properly, and you can roll out new code without ever failing a request.