You may be building for availability, but are you building for resiliency? • Benjamin Cane

Benjamin Cane

March 12, 2026

You may be building for availability, but are you building for resiliency? Many teams design for availability. Far fewer design for resiliency.

A concept that took me a while to really grasp is that building highly available systems and highly resilient systems is not the same thing.

The difference is how the system reacts to failure.

🚄 High Availability

When you build for high availability, the goal is simple: ensure there is always another path.

If something fails, traffic can be redirected somewhere else.

For example, a service might run across multiple availability zones or regions. If one fails, traffic is routed to another.

Detecting failures and redirecting traffic are core elements of building for high availability.

Availability is about rerouting traffic when something fails.

Building for resiliency is different.

The solution to failure isn’t another path; it’s how the system handles the error.

When a dependency fails, the decision becomes:

Do we retry? Do we continue without that dependency? Do we degrade functionality? Do we stop processing altogether?

Resiliency is about defining what happens when things go wrong.

Sometimes you can continue processing. Sometimes you can defer work and fix it later.

Resiliency is absorbing failure instead of avoiding it.

When you design systems with resiliency in mind, you tend to treat dependencies differently.

A simple example is configuration.

Many systems use distributed configuration services so that runtime behavior can change without redeployment.

But that configuration service then becomes a dependency. To avoid turning it into a hard dependency, many systems cache the configuration in memory.

When updates occur, the system fetches the new configuration and switches only after it’s fully loaded into memory.

If configuration refresh fails, the system continues operating with the last known configuration. Transient failures don’t bring the system down.

That’s resiliency.

When I talk about non-functional requirements, you’ll hear me say:

“Highly available and resilient systems”

I separate them intentionally because the approaches are different.

Availability ensures there is always another path. Resiliency ensures the system can continue operating when failures occur.

Availability routes around failure. Resiliency survives failure. You need both.