Benjamin Cane
Portrait of Benjamin Cane
Benjamin Cane
January 15, 2026

Many teams think performance testing means throwing traffic at a system until it breaks. That approach is fine, but it misses how systems are actually stressed in the real world.

The approach I’ve found most effective is to split performance testing into two distinct categories:

  • 🏋️‍♀️ Benchmark testing
  • 🚣‍♀️ Endurance testing

Both stress the system, but they answer different questions.

🏋️‍♀️ Benchmark Testing:

Benchmark tests are where most teams start: increasing load until the system fails.

Failure might mean:

  • ⏱️ Latency SLAs are exceeded
  • ⚠️ Error rates cross acceptable thresholds

Sometimes failure is measured by when the system stops responding entirely. This is known as breakpoint testing.

Even when SLAs are the target, I recommend running breakpoint tests after thresholds are exceeded.

Knowing how the system breaks under load is useful when dealing with the uncertainties of production.

🚣‍♀️ Endurance Testing:

Endurance tests answer a different question:

Can the system sustain high load over time?

Running at high but realistic levels (often near production max) over extended periods exposes different problems:

  • 🪣 Queues, file systems, and databases slowly fill
  • 🧹 Garbage collection and thread pools behave differently
  • 🧵 Memory or thread leaks become visible

These issues rarely show up in short spikes of traffic. If you only run benchmarks, you’ll discover them for the first time in production.

⌛️ Testing Thoroughly vs Deployment Speed:

Benchmarks run fast; Endurance testing takes time.

A 24-hour endurance test can slow down releases, especially when you want to release the same service multiple times a day.

It's a trade-off between the system's criticality and the need for rapid deployments.

How tolerant is the system to minor performance regressions?

If performance truly matters, slowing releases down to run endurance tests might be the right call.

🧠 Final Thoughts:

Effective performance testing isn’t just about surviving spikes.

Spikes matter, but so does answering:

  • 📈 Can the system withstand peak load for extended periods?
  • 🔎 If not, how does it fail, and why?

All too often, I see the system's capacity become the breaking point during unexpected traffic patterns.

While an application might handle spikes, the overall platform often can't sustain them. That's where endurance tests deliver their real value.

Back to the feed

Previous Posts

  • January 8, 2026 Pre-populating caches is a “bolt-on” cache-optimization I've used successfully in many systems. It works, but it adds complexity
  • January 1, 2026 Don't be afraid to build a tool. Just don't become too attached to it.
  • December 26, 2025 One of the toughest engineering skills to develop is accepting a decision you disagree with. 😖

Made with Eleventy and a dash of #Bengineering energy.