Benjamin Cane — #Bengineering

Don't be afraid to build a tool. Just don't become too attached to it.

Benjamin Cane — Thu, 01 Jan 2026 24:00:00 GMT

Don't be afraid to build a tool. Just don't become too attached to it.

When managing infrastructure or building and testing software, you'll inevitably find repetitive tasks that feel like a tool should exist to make them easier or faster.

Maybe one exists, perhaps it doesn't. Either way, your team doesn’t know.

So inevitably, the tasks stay manual, inefficient, and frustrating.

Everyone wishes there was a tool—so why not build one?

😱 Why People are Afraid to Build Tools:

1️⃣ “What if something already exists?”

You build something, then someone says, “Why didn't you just use X?”

That moment sucks, especially if you didn't know X even existed.

Don't defend it, own it, and say: “I didn't know X existed, let me take a look at it.”

No one expects you to know every tool out there.

Building something and later discovering a better tool is not a failure.

It's a learning experience; what you learn while building the tool is priceless.

2️⃣ “I don't want to maintain another system.”

This is a genuine concern.

But maintaining a small set of tools that save hours of repetitive work or enable something that couldn't be done otherwise is often worth it.

The key is to reduce overhead early on:

Prefer CLI tools vs services
Minimize dependencies

The less complexity, the easier it is to manage.

3️⃣ “We are too busy building features.”

This one is hard, not because it's a technical challenge. But because it's a mindset change.

Teams get so focused on what they’re building that they forget to look at how they are building it.

If a process is slow, manual, or error-prone, that's a signal that it could be improved.

The secret is taking a step back and re-evaluating how.

👨‍🔬 How I Approach Building Tools:

1️⃣ Look for existing tools

I prefer open source, so that's where I start. There is a lot out there, and you have to search for it.

2️⃣ Look for “close enough” projects

You might not find a project that does exactly what you want. But maybe it's 80-90% of the way there.

If it's close, extend it, contribute upstream. Or fork it (license permitting).

3️⃣ If nothing fits, build new

If nothing meets your needs, then build it.

Start small, use it yourself, share it with your team and solicit feedback.

😍 Don't Become Too Attached:

When someone says, “Why not use X?” Evaluate X objectively.

Is it the best tool now? If so, use it.

Using a well-known, widely adopted tool is often more efficient.

🧠 Final Thoughts:

Custom tools can be lifesavers.

They only become a problem when someone gets too attached and refuses to replace them with more standard solutions.

Build tools when needed, replace them when something better appears.

A Kenjutsu Instructor (Japanese Swordsmanship) I knew would always warn us. He’d say Americans always want to treat the sword like it’s a precious gem. It's a tool; treat it like you would a hammer.

I think that applies here as well.

One of the toughest engineering skills to develop is accepting a decision you disagree with. 😖

Benjamin Cane — Fri, 26 Dec 2025 24:00:00 GMT

One of the toughest engineering skills to develop is accepting a decision you disagree with. 😖

When you treat engineering as a craft, it’s easy to get passionate about solutions. Strong opinions are a good thing — many great engineers have them.

But you also need to know when to challenge a decision and when to accept it.

🎯 The Inflection Point

Every architecture review eventually narrows down to a few viable options. Maybe it’s captured in an ADR, maybe through discussion, maybe through a decision-maker.

If your preferred option isn’t chosen, you have two paths:

Keep challenging the decision
Accept it and support it fully

Knowing which path to take is a critical engineering skill.

🔥 When to Keep Challenging

My rule: Will this decision cause me to lose sleep — figuratively or literally?

If the decision risks:

Breaking production
Waking you up at 2 a.m.
Introducing significant operational or security risks

It’s worth continuing the conversation.

And the best way to challenge is respectfully — usually in a 1:1 with the decision-maker(s). This gives space for deeper context, trade-offs, and clearer alignment.

🤝 When to Support a Decision You Disagree With

If the decision isn’t dangerous — just not your preferred option— it’s time to commit.  Many architectural choices have multiple valid options; one may be your preference.

In these cases, being a good engineer means supporting the direction chosen.

You can still improve the solution by suggesting micro-adjustments that reduce risk or enhance reliability without reopening the whole debate.

Sometimes, you will find that the chosen path was actually right. Don’t worry, no one cares if you were right or wrong in the debate if you supported the implementation.

🧠 Final Thoughts

Sometimes decisions are mistakes. That’s normal.

What matters is catching them early and being willing to revisit them once real-world data reveals new information. Implementation often teaches us things the whiteboard never could.

Just be careful not to over-index on finding every minor issue as a fundamental flaw in the solution.

Good architecture isn’t about being right all the time. It’s about making informed decisions, supporting the team, and knowing when to push and when to commit.

Canary deployments are an operational superpower, but the complexity they bring isn’t for everyone.

Benjamin Cane — Fri, 19 Dec 2025 24:00:00 GMT

Canary deployments are an operational superpower, but the complexity they bring isn’t for everyone. So why not just use Blue/Green deployments instead? 🦸‍♂️

Let’s break it down.

🎞️ A Quick Recap

Both Blue/Green and Canary start the same way:

Take two instances (or clusters) of a service & deploy the new code version to the idle one.

Where they differ is how traffic shifts.

🐤 Canary

Canary deployments gradually shift traffic from old to new.

Both versions serve live traffic during the transition.

🔵|🟢 Blue/Green

A Blue/Green traffic shift is an all-or-nothing shift.

Only one instance is serving traffic; there is no gradual ramp-up.

⚙️ Why Canary Is More Complex

Running two versions at the same time (with both taking traffic) introduces challenges:

Backward compatibility
Shared (or replicated) databases
Sticky sessions
Context-aware routing
Event ordering across versions
Consistency of state

Blue/Green avoids most of this. You still need a rollback plan, but you don’t have to worry about parallel operations.

So if Canary is so complicated… why use it?

🏅 Why Canary Is Worth It (Sometimes)

Canary shines when:

The system is highly critical
It must run 24/7 with no interruption
You cannot accept even a brief outage
You want to reduce the blast radius of regressions
You release often and need tight control/quick fallback

Canary lets you validate a new version with a small percentage of traffic before gradually increasing it further. If something breaks, roll traffic back instantly.

More importantly, when it breaks, only a portion of traffic is impacted.

For high-risk and mission-critical systems, the complexity is worth it.

🧠 Final Thoughts

Blue/Green is a great default deployment strategy, and in many cases, the optimal one.

A perfect example is file-based batch workloads. Batch systems usually have flexibility in timing. You can:

Pause traffic
Cut over to the new version
Resume processing
And if it fails… reprocess the files

Yes, easier said than done, but still far simpler than Canary.

Both approaches have their place. The key is matching the deployment strategy to the system’s criticality and level of acceptable risk.

Everyone has bias, yes, even you. 🫵

Benjamin Cane — Fri, 12 Dec 2025 24:00:00 GMT

Everyone has bias, yes, even you. 🫵

Ever been in a technical debate where the other side seems way too attached to their solution?

Ever notice others feel the same way about you?

Sometimes your solution is the right solution. But sometimes… It’s just bias.

🧠 Understanding Bias

Bias gets a bad rep. Bias doesn’t always come from a negative space.

In the context of technical solutions, bias usually forms from experience.

Throughout our careers, we see countless architectures, patterns, outages, and wins.

We remember what worked.

We remember what didn’t.

Over time, we build a gut sense of solutions that are safe based on our experiences.

That gut sense is bias, and it’s often well intentioned.

💪 Working Through Bias (Without Ignoring It)

Accept that everyone has bias — including you.

This is the hardest part.

You need to assume that other people's biases stem from good intentions and real experience, just like yours do.

With this assumption, you can have more objective conversations and begin hearing other perspectives.

Ask yourself: Is this solution based on reality or comfort?

Why do you have a preference for your solution?

Are you pushing a strategy?

Are you avoiding something unfamiliar?

Are you sticking to what has worked in the past?

Understanding why you hold bias is key to making the case for your solution.

Use data to guide the decision, but make sure it’s objective.

Data makes decisions easier, but be careful. Bias can influence what data you choose to look at.

Sometimes we subconsciously cherry-pick data that supports our views.

It’s essential to take an objective look at the data, even if it challenges your case.

Bring in a trusted third party — but present data carefully.

An impartial opinion can help, but only if you give the whole picture.

When bringing in a third party, it’s crucial to present solutions and data objectively; that way, you get their honest opinion, not your opinion echoed back to you.

🧩 Final Thoughts

The most important part of technical decision-making is accepting the possibility that you might be wrong.

On many occasions, I've had to step back, evaluate my own bias, review the data objectively, and listen to opposing views.

Bias isn’t something you can eliminate; it’s something you recognize and manage.

Do you use Architecture Decision Records? I’m a big fan, and I think they’re a best practice every engineering org should adopt.

Benjamin Cane — Fri, 05 Dec 2025 24:00:00 GMT

Do you use Architecture Decision Records? I’m a big fan, and I think they’re a best practice every engineering org should adopt. 📐

🙋 What is an ADR?

An Architecture Decision Record (ADR) is a lightweight document that captures architectural decisions.

A good ADR typically consists of:

The context behind the problem
The options considered
The decision made, including the why

Different companies/teams will add their own spin, but these are the core elements.

🤔 Why ADRs Matter

The ADR itself is helpful; it gives product, architecture, and engineering teams a shared reference point. Clear documentation reduces ambiguity, enabling teams to align and build effectively.

But the real value is the process.

Writing an ADR forces you to explore alternatives, consider trade-offs, and debate options objectively. If done well, ADRs capture everyone’s input and clearly document why a path was chosen.

This keeps architectural decisions grounded in logic rather than bias or preferences.

🧠 Final Thoughts

The documentation and process are valuable, but they only work with the right culture.

Teams need a culture where:

Everyone is free to contribute to architectural decisions
Diverse options are encouraged
Decisions are made objectively
ADRs are accessible and visible to everyone

Without the culture, the process becomes a formality and a burden of red tape.

With the right culture, ADRs become a powerful tool for making well-balanced & transparent decisions.

Of course, that culture and the process need to be embraced by all levels of the team. ADRs are only as useful as the effort you put into them.

Does resource usage within your application or database suddenly spike periodically? Does it cause system slowdown?

Benjamin Cane — Fri, 28 Nov 2025 24:00:00 GMT

Does resource usage within your application or database suddenly spike periodically? Does it cause system slowdown? 🐢

A simple answer to your problem might be to add a bit of jitter (random timing delay).

When you schedule recurring tasks or loops, adding a bit of jitter can significantly improve how your application behaves.

📖 What is Jitter?

In simple terms, jitter is a small bit of randomness added to the time between two events.

While there are many types of jitter in computing, for this post, we will keep the scope to randomness, adding delay between events.

⚙️ Why it Matters:

When tasks don’t implement jitter, they can accidentally synchronize, running at the same time.

Which leads to:

CPU and Memory spikes
Thread contention
Request storms to downstream systems

🧩 A Simple Example

Imagine an API Gateway that caches responses. You decide to invalidate responses every 30 minutes with a scheduled thread.

No problem for a handful of APIs, but scale this to 1,000 APIs. Suddenly, every 30 minutes, 1,000 threads fire up at once.

The periodic spikes could cause performance issues or even crash the gateway.

Now add random Jitter: instead of running every 30 minutes, add or subtract a few random seconds for each task.

You’ve just spread out the load, making utilization smoother and more predictable.

⚠️ Caveats

Jitter isn’t perfect; this approach spreads out the load, but with random jitter, small spikes could still occur.

Still, for many scenarios, it’s a simple approach.

🧠 Final Thoughts

If you find your application slowing down periodically with spikes in resource utilization. You might be dealing with synchronized tasks, and adding random jitter might be a good solution.

It’s simple, it’s easy, and it usually works well.

When you shut down an application instance, don't stop the listener immediately — that's how you end up with failed requests during every application rollout. 😢

Benjamin Cane — Fri, 21 Nov 2025 24:00:00 GMT

When you shut down an application instance, don't stop the listener immediately — that's how you end up with failed requests during every application rollout. 😢

🛑 The Common Mistake:

I've seen many shutdown implementations that stop the listener as soon as the shutdown signal is received.

The assumption is usually:

“Stopping the listener will fail readiness probes, and traffic will be redirected.”

That's half right…

It will trigger traffic redirection, but not immediately.

⏱️ Probe Intervals Matter:

Readiness probes (Kubernetes), Load Balancer health checks, & service mesh probes all run at fixed intervals.

In Kubernetes, the default is 10 seconds.

That means it can take up to 10 seconds for the platform to detect an unhealthy status and adjust traffic.

Longer if the failure threshold is greater than 1.

💥 What Happens During Those 10 Seconds?

New traffic still goes to the unhealthy instance.

And because you stopped the listener, every request to that instance fails for 10 seconds.

Some clients retry and land on another instance.

Some will not.

Either way, every rollout will result in failed requests that could have been avoided.

✅ What You Should Do Instead

When shutting down an instance:

1️⃣ Keep the listener running; Don’t slam the door shut.

2️⃣ Fail readiness probes; Report failures from the readiness endpoint, but allow new requests to other endpoints.

3️⃣ Wait for traffic to drain; Let in-flight requests finish, and let the platform stop routing new requests.

4️⃣ Then stop the listener; Only when it's safe.

This is a graceful shutdown.

🧠 Final Thoughts

Resiliency isn't only about surviving failures, it's also about preventing them.

Handle shutdown properly, and you can roll out new code without ever failing a request.

A common issue I see when teams first adopt gRPC is managing persistent connections, especially during failovers.

Benjamin Cane — Fri, 14 Nov 2025 24:00:00 GMT

A common issue I see when teams first adopt gRPC is managing persistent connections, especially during failovers.

🤔 The Problem:

gRPC is fast thanks to protobuf and how it handles connections, mainly:

Persistent connections that avoid repeated TCP handshakes
Sending multiple requests over a single HTTP/2 connection.

However, these performance optimizations are also a source of failover challenges.

😫 Challenges with Failover:

Let’s say you’ve just implemented gRPC and want to trigger a manual failover for your service.

For many, failover typically happens at the load-balancer level, which works fine for HTTP/1.

When you take an instance down, new requests go to another instance.

However, with gRPC over HTTP/2, connections stay open and are reused, which means existing connections continue to send requests to the old instances even during failover.

Unless your load-balancer understands HTTP/2 and gRPC, failover will not work as it used to.

🛠️ Failover with gRPC

For proper failover, you’ve got two main options:

Use a load balancer that understands HTTP/2 and gRPC, such as an AWS Application Load Balancer vs. a Network Load Balancer, Envoy vs. HAProxy.
Cycle connections periodically—force clients to reconnect and redistribute the load.

Both options get the job done, but the first is overall cleaner.

💡 Final Thoughts:

There is a lot to love about gRPC: strong contracts, outstanding performance, and client-server simplicity.

But it takes work to operationalize it. Nobody tells you that upfront, though.

A dangerous mindset I’ve seen—and been guilty of—is assuming code doesn't change.

Benjamin Cane — Fri, 07 Nov 2025 24:00:00 GMT

A dangerous mindset I’ve seen—and been guilty of—is assuming code doesn't change.

Or when it changes, the next person will understand the original context.

Reality check, they wont.

The next person (future you) has no idea what you were thinking.

🔎 A Simple Example:

You are building a service that receives JSON requests.

You write a method that takes in an array from the request and accesses index 2.

The request handler has already validated the array length and content.

So you don't need to recheck it before accessing index two, right?

It's just less efficient to check it twice, right?

Wrong.

Your original implementation might work fine, but fast forward to years later, when someone else (perhaps yourself) uses that method.

Will they always ensure the array has the right length? 🤷‍♂️

If they don't, your method is a ticking time bomb.

🧠 Fix the Mindset:

Embrace defensive programming, where you expect that your methods will be misused.

Recheck the array's length before you use it, even if something else has previously checked.

Expect bad inputs, expect errors to occur, and have a path to do something about it.

💡 Expect the Unexpected

If you assume the following person:

Won't read your docs or code comments
Will reuse your code in a different context
Will misuse your code for things you've never designed it for

You will write safer, more resilient code.

⚡️Does saving 1 millisecond really matter? Answer: more than you’d think.

Benjamin Cane — Fri, 31 Oct 2025 24:00:00 GMT

⚡️Does saving 1 millisecond really matter? Answer: more than you’d think.

🧩 Context:

I recently shared performance tuning results where we reduced Microservice-to-Microservice latency from 1.3 ms to 0.3 ms in a new platform.

That’s a huge performance win, but it doesn’t sound like much.

In card payments, where every millisecond counts, it’s easy to see the value. But for an average backend system, does 1 ms matter?

A honeybee can flap its wings in 5 ms, so who is going to notice 1 ms?

🧘‍♂️ Perspective:

It’s not just 1 ms.

Modern distributed systems are built from many microservices and layers. A single customer journey typically touches dozens of components.

If you shave off 1 ms from every call, the gains compound quickly.

End-to-end, that can add up to tens or even hundreds of milliseconds for every incoming request.

💡Final Thoughts

Does saving 100 ms even matter?

Kind of.

Even if your platform isn’t latency-sensitive, throughput and latency are closely related.

Faster requests mean more available capacity.

That 100 ms may allow you to scale better or reduce infrastructure costs.

A 1 ms improvement doesn’t sound like much on the surface, but the compounding effect is massive, even for systems that “don’t care” about latency.

Have you heard of Store and Forward? It’s a resiliency design prevalent in card & bank payments, telecommunications, and other industries.

Benjamin Cane — Mon, 27 Oct 2025 24:00:00 GMT

Have you heard of Store and Forward? It’s a resiliency design prevalent in card & bank payments, telecommunications, and other industries.

The concept is that rather than failing a request when a dependency is down, store it, and send the request when it is back up.

🤔 How it works:

We have two services, Service A, which is highly dependent on Service B to process requests.

Traditionally, when Service B is down, Service A would have no choice but to reject requests with a failure.

With the Store and Forward design, when Service B is unavailable, Service A will reply to the request with a “degraded processing” (rejecting it, accepting it, or saying, “We’ll let you know later”).

But before replying to the request, it is “stored” somewhere that can be accessed quickly, such as a cache, a queue, a database, etc.

When Service B is back up, Service A will “forward” the stored requests to Service B.

🥹 What I like about this design pattern:

It accepts that failures are going to occur because everything fails.

Rather than creating “retry storms,” it adds more intelligence to the process, only sending requests to Service B when it’s back online.

It ensures that no request is lost, even in significant outages.

But this design pattern isn’t without complexity.

Blind retries are easy; you keep retrying.

But with Store and Forward, you need to:

🛑 Know when Service B is unavailable

🧠 Add logic around degraded processing

✅ Detect when Service B has recovered

🤹 Figure out the best way to dequeue the stored requests

While more complex than blind retries, store and forward is a great resiliency design for when every request matters.

In payments, where reliability and fast responses are critical, as is accuracy, the complexity of store-and-forward designs is a worthwhile trade-off.

When Building Low-Latency, High-Scale Systems, Push as Much Processing as Possible to Later

Benjamin Cane — Fri, 24 Oct 2025 24:00:00 GMT

When building low-latency, high-scale systems, a key strategy of mine is simple:

“Push as much processing as possible to later.”

Why It Matters? 🤔

In many systems—checkout, login, trade execution—latency matters because someone (or something) is waiting:

A customer at a point of sale
A user at a login screen
A system waiting on a transaction confirmation

Platforms that support these scenarios must respond in milliseconds. If not, requests will fail, and user experiences will suffer.

My Approach 🧠

I typically divide these platforms into two sub-platforms to optimize for speed and scale.

🏎️ Real-Time Platform: Optimized for scale and speed, only performing what is essential before responding to the request.

📥 Event-Driven Platform (sometimes Batch): Handles processing deferred from the real-time platform. It is still built for scale, but seconds, not milliseconds.

Deciding What Belongs Where 🗃

I try to break down processing into steps, and for each step I ask:

“Does this step need to happen before we respond or after?”

✅ If it MUST be performed before the response, use a real-time path.

⏭ If it can wait until after, event-driven path.

Things that tend to follow the event-driven path are:

Audit logging
Downstream asynchronous notifications
Enrichment and Transformations
Checks that trigger out-of-band tasks

These are not slow but don’t need to be “blocking.”

Final Thoughts ✍️

The key message is that the more you do on the real-time path, the slower it is.

This pattern is a good way to reduce the real-time workload.

But the trick is to find a reliable and fast way to move work from a real-time to an event-driven system.

Pub/Sub and gRPC streams are two of my go-to options.

Coding is a small part of software engineering.

Benjamin Cane — Fri, 10 Oct 2025 24:00:00 GMT

Coding is a small part of software engineering. 🤯

With AI Coding Assistants and Autonomous Agents being all the rage lately, I feel like this is something folks—especially those early in their careers—need to hear.

Coding is essential, but only a portion of what is required to build production systems.

Writing software can take a lot of effort, but just as much is spent before anyone starts to code.

Let's think about some of those tasks:

Defining API contracts
Designing a database schema
Choosing the right database
Configuring build & deployment pipelines
Selecting a runtime environment
Packaging software (Dockerfiles?)
Integrating observability tools
Writing runbooks and service manuals

And that's all before you even consider architectural trade-offs, system design, or cross-team alignment.

A lot of software engineering involves understanding what needs to be done and why—and then figuring out how to do it well.

Even with how fast AI is evolving, we are still a long way from AI completely taking over software engineering, but it's revolutionizing how we attack the process.

Are you embracing our robot overlords or resisting change? 🤖

Should I be an individual contributor or a people leader?

Benjamin Cane — Fri, 03 Oct 2025 24:00:00 GMT

Should I be an individual contributor or a people leader? 🤔 It's a question I get often.

My honest answer:

Which motivates you more?

It sounds simple, but it's an essential question for many.

🧰 If you enjoy building systems more than building people, then the IC track is probably right for you.

👔 If you enjoy growing and leading others, then people leadership might be the right path for you.

The good news is that you can switch later.

A lot of people change between IC and people leadership roles. Some love both equally and switch multiple times throughout their career.

Just keep those leadership and technical skills sharp.

A Word of Warning ⚠️

The higher you go, the fewer IC roles you’ll find—especially at companies without a strong IC path.

Some companies top out early, and some treat the IC track as a first-class path.

So, choosing the IC track might limit your options, but it's worth it if that's what you genuinely want to do.

Have you chosen your path?

For me, the IC path was the clear winner, but there were several times I considered people leadership opportunities.

Before I became a Staff Engineer, I applied for a Director of Engineering role. I didn't get it, but the process helped me figure out what I wanted.

Improve performance and reduce chances of request failures with this one simple trick! Avoid cross-region calls.

Benjamin Cane — Fri, 26 Sep 2025 24:00:00 GMT

Improve performance and reduce chances of request failures with this one simple trick! Avoid cross-region calls. 🫠

While the idea is simple, designing a system around this concept is anything but.

🤔 Why it’s Effective:

The core idea is straightforward:

Keeping traffic local is better for performance and resilience.

🚄 Performance:

Performance is easy to understand:

In-region traffic (including cross availability zone) usually sees single-digit millisecond latency (or less)
Cross-region traffic introduces latency, double-digit milliseconds or more, depending on the region

Latency adds up fast when you cross-regions multiple times in a microservices architecture.

🚀 Resilience:

Resilience is a bit more nuanced.

Every cross-region call passes through more network hops, such as firewalls, routers, switches, load balancers, etc.

More hops == More failure points

Keeping traffic local means fewer packet loss chances and less impact when things break.

🧙‍♂️ Complexities:

Designing for regional isolation (core concept of cell-based architecture) means:

1️⃣ Having active instances of critical services in each region (active-passive doesn't work with this approach)

2️⃣ Figuring out data replication and consistency across regions

3️⃣ Building robust routing and failover capabilities

4️⃣ Establishing management processes and capabilities that let you manage each region independently

Yes, the design is much more complex, and the operational overhead is much higher, but the blast radius of failure is smaller.

A failure with a critical service in one region only impacts that region.

🧠 Final Thoughts:

Perfect isolation isn't always possible; you might need to cross-region for data consistency or as a fallback.

When you need to cross-region:

✅ Reduce the number of cross-region hops as much as possible

✅ Do it up front, ideally before the request lands in your system.

The more cross-region routing you perform at the edge, the more you can avoid regional isolation complexities in the underlying systems.

Did you know Kube-proxy doesn’t perform load-balancing itself? It’s iptables (by default).

Benjamin Cane — Fri, 19 Sep 2025 24:00:00 GMT

Did you know Kube-proxy doesn’t perform load-balancing itself? It’s iptables (by default).

If you’ve run applications in Kubernetes, you’ve probably heard of Kube-proxy, the service responsible for routing traffic to Services.

But the interesting twist is that Kube-proxy doesn’t perform the routing, and iptables does (or IPVS, or nftables).

⚙️ How it works:

When you define a Service, Kubernetes will assign it an IP address.

Kube-proxy watches for these events and creates iptables rules that handle routing.

The iptables rules will:

Forward new connections with a destination of the Service IP to a Pod IP
Use the statistics module to select which Pod IP to forward the connection to

I like to think of it as follows: Kube-proxy identifies the need for routing, and iptables does the work.

🤔 Why it’s important:

If you plan to use gRPC, this is critical to understand.

gRPC uses HTTP/2 as its underlying protocol, which sends multiple requests down a single connection.

Since iptables forwards traffic at a connection level (layer 4), multiple requests down a single connection will all land on the same pod, even if more are available.

You might assume traffic will be balanced across pods, and be surprised to find it is not.

You're fine if you use HTTP/1.1 (without connection reuse). But anything that keeps long-lived connections open or sends multiple requests down a single connection, Kube-proxy won’t cut it.

🔭 What’s Next:

Scaling has been a challenge for iptables, as having lots of rules and connection tracking are known bottlenecks.

IPVS and nftables (iptables successor) have been introduced as new options for routing and load-balancing.

Both are still layer 4.

If you need layer 7 (request level routing), that’s where Istio comes in.

🧠 Final Thoughts:

Understanding Kube-proxy, iptables, and gRPC & HTTP/2 work is essential for anyone building fast, scalable backend systems on Kubernetes.

You can’t optimize what you don’t understand.

🔗 References:

Here are some reference links for those looking for a deeper dive.

You’ve heard of feature flags, but what about operational flags? ⏯️

Benjamin Cane — Fri, 12 Sep 2025 24:00:00 GMT

You’ve heard of feature flags, but what about operational flags? ⏯️

🎞️ Feature Flag Recap:

Feature flags are used to toggle features on/off while being developed and deployed.

They are usually temporary; once a feature is stable, the flag is removed from the code.

🧠 Operational Flags:

Operational flags are long-lived flags that turn system behavior on/off at runtime.

Not for releases, but for resiliency and manageability.

Well-placed Operational Flags can:

⛔ Disable an API call when a downstream service is in a brownout (e.g., reboot loop).
📨 Disable event publishing when consumers are backlogged.
🔌 Turn off non-essential dependencies during unexpected traffic spikes.

👨‍🏫 Real World Example:

I once had a service that published an event for every incoming request.

At one point, the downstream consumer started experiencing performance issues with their database.

This load was exaggerated by the incoming events being written to the database.

We couldn't just stop the downstream service because of the other functions it provided.

Luckily, we built an operational flag that could turn off the publishing of events.

We turned publishing off, giving the downstream consumer more time to address their performance issues.

We didn't build this flag with the consumer in mind; it was built in case we needed it.

⚠️ Not Plan A

Operational flags are not plan A; they are emergency switches that are useful when needed.

I try to build operational flags for any dependency or non-essential functionality in mission-critical systems. Just in case…

A core capability for building low-latency platforms is quickly detecting and reacting to issues.

Benjamin Cane — Fri, 05 Sep 2025 24:00:00 GMT

A core capability for building low-latency platforms is quickly detecting and reacting to issues.

It sounds odd, how does resiliency factor into low latency?

Stay with me.

Automatic retries are common in modern platforms. I've shared my cautions about blind retries in financial systems, but in many cases, retries are a solid resiliency practice.

The problem?

Retries often become the only resiliency mechanism.

The overreliance on retries leads to shortcuts around key fundamentals.

✅ Readiness Probes

✅ Health Checks

✅ Graceful Shutdown

When faults occur, requests fail, get retried, and eventually succeed. New requests repeat this process until someone fixes the fault.

But every retry adds a delay.

This is especially true when multiple retries are needed. Every retry adds extra transit and error-handling time.

With multiple failures across a microservices-based platform, retries can add up.

Retries are great, but the full collection of resiliency practices powers reliable and fast platforms.

Sometimes when I tell people that logging can impact a microservices response time, I get strange looks. 🤨

Benjamin Cane — Fri, 22 Aug 2025 24:00:00 GMT

Sometimes when I tell people that logging can impact a microservices response time, I get strange looks. 🤨

But I get it. Logging doesn’t feel significant enough to create issues, yet I’ve seen it repeatedly make a considerable difference.

Recently, the team was benchmarking a new microservice that performed well until the request volume passed a few thousand transactions per second.

Once the request volume increased, the p99 latency skyrocketed.

The fix? Two major actions.

1️⃣ Changed a log message in the hot path of every request from Info (logged always) to Debug (only logged when enabled).

2️⃣ Enabling Asynchronous logging puts log messages into a buffer where they can be written to stdout in a different goroutine instead of blocking requests.

These two changes improved the p99 latency by nearly 200%.

This post isn't about not logging.

It's about being intentional:

Are you logging in the hot path for every request?
Is the log level correct?
Do you need every request logged at this point in the process?

Because logging does matter.

How many times have you seen analytics on an operational database create issues? I’ve seen it far too often.

Benjamin Cane — Fri, 15 Aug 2025 24:00:00 GMT

How many times have you seen analytics on an operational database create issues? I’ve seen it far too often.

The story usually goes like this:

You've got a service and database that handle real-time requests, often mission-critical.

But now, you must create reports, data feeds, or some other batch process.

The most straightforward approach is to run those batch jobs against the same production database cluster.

It works fine, until it doesn't...

As the database grows and the batch jobs change, what once worked fine slowly (sometimes abruptly) increases the load on your production database.

Eventually, the load hits your mission-critical system, often taking it down completely.

So, how do you avoid this scenario?

💡 The Ideal Solution is to separate real-time and analytics systems. The real-time system can publish events consumed by a platform purpose-built for analytics and batch processing.

👨‍🔧 The Pragmatic Solution is to run the batch jobs on a read-only replica database instance/cluster dedicated for this purpose. Only costs a few extra database instances.

Both options will require more infrastructure, but it's a small price to pay to avoid a call at 3 a.m.

I can't count how often I've seen issues made worse by minor oversights—like not setting a timeout value. ⏱️

Benjamin Cane — Fri, 08 Aug 2025 24:00:00 GMT

I can't count how often I've seen issues made worse by minor oversights—like not setting a timeout value. ⏱️

Timeouts, max lifetimes, retries, idle connection limits, pool sizes—whether you are making HTTP calls or talking to a Database, there are settings that determine the behavior of clients and servers.

These settings are not usually the cause of issues, but they can amplify them when left untouched.

A real example I've seen:

Service A calls Service B. If the request fails, Service A has logic to handle the error gracefully and respond to users.

The problem?

The HTTP request timeout value was left default. When Service B was experiencing issues, requests failed, but not for 5 minutes (the default)

By then, users have already timed out and retried, amplifying the problem by overwhelming Service A with a backlog of stalled HTTP requests to Service B.

A lack of timeout did not create the issue but worsened it, even preventing Service A from processing requests that didn't need to call Service B.

The fix? 🛠

Awareness of and tuning these settings is important, but let's be honest—they are easy to overlook—I've done it a million times.

Introducing a culture of testing failures is more effective:

🧪 Create failure scenarios with unit tests

⛓️‍💥 Embrace chaos testing with fault injection in functional tests

People forget things; tests catch what we forget.