Topics / Reliability
Reliability
Practical patterns for building systems that keep working when things go wrong — retries, timeouts, circuit breakers, graceful degradation, compensating transactions, and the operational habits that separate stable platforms from fragile ones.
12 posts- May 21, 2026 Health-check the listener your gRPC traffic actually uses reliability
- May 7, 2026 YOLO Is a Terrible Strategy for Validating Production Changes reliability
- April 16, 2026 Are you using traffic mirroring in production? If not, try it out. reliability
- March 19, 2026 When You Go to Production with gRPC, Make Sure You’ve Solved Load Distribution First reliability
- March 12, 2026 You may be building for availability, but are you building for resiliency? reliability
- December 19, 2025 Canary deployments are an operational superpower, but the complexity they bring isn’t for everyone. reliability
- November 28, 2025 Does resource usage within your application or database suddenly spike periodically? Does it cause system slowdown? reliability
- November 21, 2025 When you shut down an application instance, don't stop the listener immediately — that's how you end up with failed requests during every application rollout. 😢 reliability
-
November 14, 2025
A common issue I see when teams first adopt
gRPCis managing persistent connections, especially during failovers. reliability - October 27, 2025 Have you heard of Store and Forward? It’s a resiliency design prevalent in card & bank payments, telecommunications, and other industries. reliability
- September 5, 2025 A core capability for building low-latency platforms is quickly detecting and reacting to issues. reliability
- August 8, 2025 I can't count how often I've seen issues made worse by minor oversights—like not setting a timeout value. ⏱️ reliability