Benjamin Cane
Portrait of Benjamin Cane
Benjamin Cane
September 5, 2025

A core capability for building low-latency platforms is quickly detecting and reacting to issues.

It sounds odd, how does resiliency factor into low latency?

Stay with me.

Automatic retries are common in modern platforms. I've shared my cautions about blind retries in financial systems, but in many cases, retries are a solid resiliency practice.

The problem?

Retries often become the only resiliency mechanism.

The overreliance on retries leads to shortcuts around key fundamentals.

✅ Readiness Probes

✅ Health Checks

✅ Graceful Shutdown

When faults occur, requests fail, get retried, and eventually succeed. New requests repeat this process until someone fixes the fault.

But every retry adds a delay.

This is especially true when multiple retries are needed. Every retry adds extra transit and error-handling time.

With multiple failures across a microservices-based platform, retries can add up.

Retries are great, but the full collection of resiliency practices powers reliable and fast platforms.

Back to the feed

Next Post

  • September 12, 2025 You’ve heard of feature flags, but what about operational flags? ⏯️

Previous Posts

  • August 22, 2025 Sometimes when I tell people that logging can impact a microservices response time, I get strange looks. 🤨
  • August 15, 2025 How many times have you seen analytics on an operational database create issues? I’ve seen it far too often.
  • August 8, 2025 I can't count how often I've seen issues made worse by minor oversights—like not setting a timeout value. ⏱️

Made with Eleventy and a dash of #Bengineering energy.