<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:media="http://search.yahoo.com/mrss/">
  <channel>
    <title>Benjamin Cane — #Bengineering</title>
    <link>https://bencane.com</link>
    <description>Short-form distributed-systems tradeoffs, reliability patterns, lessons learned, and leadership notes — shared weekly.</description>
    <language>en</language>
    
      <image>
        <url>https://bencane.com/assets/images/bengineering-hero.png</url>
        <title>Benjamin Cane — #Bengineering</title>
        <link>https://bencane.com</link>
      </image>
    
    
      <lastBuildDate>Thu, 05 Mar 2026 24:00:00 GMT</lastBuildDate>
    
    
      
      
      
      
        
      
      <item>
        <title>When your coding agent doesn’t understand your project, you’ll get junk</title>
        <link>https://bencane.com/posts/2026-03-05/</link>
        <guid isPermaLink="true">https://bencane.com/posts/2026-03-05/</guid>
        <dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Benjamin Cane</dc:creator>
        <description><![CDATA[<p>When your coding agent doesn’t understand your project, you’ll get junk.</p>
<p>Junk in, junk out.</p>
<p>One of the best ways to get more from agentic coding tools is to give the agent context.</p>
<p>The more an agent understands your project, the better its work will be.</p>
<p>If you ask an agent to add a method to a class, it will.
It might read the file.
It might infer some structure.
But it won’t understand the project's intent.</p>
<p>If you asked a human engineer to make the same change, they would have questions.</p>
<p>What is the purpose of this project?
How is it used?
What constraints exist?</p>
<p>If they skipped that step, you’d get exactly what you asked for, even if it was wrong.</p>
<p>That’s the same challenge many face with coding agents.
A lack of context means it only does what it’s told — which isn’t always what you actually need.</p>
<p>But when it understands a project, it operates with far more clarity.</p>
<h2>🧙‍♂️ My “Old School” Method</h2>
<p>Before I start serious work with an agent, I have it learn the project.</p>
<p>Read the docs 📚
Review the codebase ⚙️
Understand the architecture 🏙️
Learn how to build, test, and run the project locally 👩‍🔧</p>
<p>I even ask the agent to summarize its understanding back to me.</p>
<p>This started as a saved prompt, turned into a slash command, and is now a skill.</p>
<p>This step is a huge productivity boost.</p>
<h2>🤖 Agents Files (<code>AGENTS.md</code>)</h2>
<p>Over the past year, an open standard for providing agents with structured context has emerged.</p>
<p>Instead of prompting the agent to rediscover your project every time, document that context once — and the agent will reference it going forward.</p>
<p>Most modern agents support an Agents.md file and reference it during each interaction.</p>
<h2>💽 What Goes in an Agents File?</h2>
<p>Think of the Agents file as onboarding documentation, but for an agent.</p>
<p>Project context:</p>
<ul>
<li>Purpose</li>
<li>Architecture</li>
<li>Layout</li>
<li>CI/CD instructions</li>
</ul>
<p>Team context:</p>
<ul>
<li>Code style preferences</li>
<li>Testing philosophy (TDD or YOLO)</li>
<li>Tech stack constraints</li>
</ul>
<p>Any tribal knowledge you’d expect a new team member to learn belongs in an Agents file.</p>
<h2>👨‍💻 Personal Agent Files</h2>
<p>Many tools also support a personal Agents file in your home directory.</p>
<p>That’s where your workflow preferences live. Are you a two-space tabs person? Do you want your agent to prefer table tests?</p>
<p>If you have preferences you want to apply to every project, but are unique to you, they go in the personal Agents file.</p>
<h2>🧠 Final Thoughts</h2>
<p>Using an Agents file dramatically improves agent quality.</p>
<p>Even then, I still use my “learn-this” slash command — sometimes that extra context makes a difference.</p>
<p>If you wouldn’t drop a new engineer into a project without context, don’t do it to your agents.</p>
]]></description>
        <pubDate>Thu, 05 Mar 2026 24:00:00 GMT</pubDate>
        
          <media:content url="https://bencane.com/assets/images/bengineering-hero.png" medium="image" />
        
      </item>
    
      
      
      
      
        
      
      <item>
        <title>You can have 100% Code Coverage and still have ticking time bombs in your code. 💣</title>
        <link>https://bencane.com/posts/2026-02-26/</link>
        <guid isPermaLink="true">https://bencane.com/posts/2026-02-26/</guid>
        <dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Benjamin Cane</dc:creator>
        <description><![CDATA[<p>You can have 100% Code Coverage and still have ticking time bombs in your code. 💣</p>
<p>I was listening to a team recently, and an engineer was discussing how a coding agent added additional tests to a project that already had 100% code coverage.</p>
<p>The conversation reminded me that coverage is directional and often mistaken for quality.
Just because your coverage shows 100% doesn’t mean your software is fully tested.</p>
<h2>👨‍🏫 Understanding How Coverage Is Measured</h2>
<p>Code Coverage measures the percentage of executable lines that run during code tests.
Executed doesn’t mean well-tested.</p>
<p>Just because every function runs doesn’t mean it’s free of logic errors or safe.</p>
<h2>😃 Happy Path Testing</h2>
<p>A common challenge teams face with testing is focusing too much on the happy path.</p>
<p>Suppose you have a function that accepts an array.
In your tests, you always pass 5 elements — because that’s the expected usage.
Coverage shows all branches executed. You’re good, right?</p>
<p>What happens if you pass 4 elements? Or 0?</p>
<p>If you never test fewer than 5, how do you know?
You may say: “But wait, it’s only ever called with 5 elements.”
That may be true, for now.</p>
<h2>⚠️ Protecting Against Your Future Self</h2>
<p>Code is rarely static; someone will come along and change things.
That might be you, it might be someone else.</p>
<p>Eventually someone changes that function.
Will they add tests for new edge cases? Maybe.
Assume they won’t.</p>
<p>When you write tests, don’t just focus on how you know a function is going to be used; also include tests that misuse the function.</p>
<p>Rather than sending an array with 5 elements, send one with 4, 0, and send a nil value.</p>
<p>Rather than sending strings that match an expected pattern, send junk that doesn’t.</p>
<p>Does the function still behave correctly? Should it?</p>
<p>The more you test outside the happy path, the more resilient your code becomes — and the less likely it is to break later.</p>
<h2>🧠 Final Thoughts</h2>
<p>Code coverage is a guide, don’t let it give you false confidence.
Test the happy path, and the unexpected ones.
Validate function outputs against the input you provide.</p>
<p>100% Coverage is easy.
Writing reliable code is not.</p>
]]></description>
        <pubDate>Thu, 26 Feb 2026 24:00:00 GMT</pubDate>
        
          <media:content url="https://bencane.com/assets/images/bengineering-hero.png" medium="image" />
        
      </item>
    
      
      
      
      
        
      
      <item>
        <title>Getting More Out of Agentic Coding Tools</title>
        <link>https://bencane.com/posts/2026-02-19/</link>
        <guid isPermaLink="true">https://bencane.com/posts/2026-02-19/</guid>
        <dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Benjamin Cane</dc:creator>
        <description><![CDATA[<p>Are you getting the most out of Agentic Coding Tools?</p>
<p>Software engineering is changing fast.</p>
<p>Agentic coding tools became widely available last year, and if you’re not using them today, you’re already behind.
But many still struggle to move beyond the “fancy chat” experience.</p>
<p>Just like any tool in our engineering tool belts, knowing how to use it effectively matters.</p>
<h2>🤖 Agents Are More Than A Better Chat</h2>
<p>Last year, most were using tab-complete with a useful chat interface where you could ask questions, get suggestions, and maybe copy/paste into your code.</p>
<p>But agents can do much more than make suggestions — they can understand your codebase and act.</p>
<p>Instead of asking an agent:</p>
<blockquote>
<p>“Can you suggest additional tests?”</p>
</blockquote>
<p>Tell your agent:</p>
<blockquote>
<p>“Create additional test cases, then run make tests and validate they pass.”</p>
</blockquote>
<p>An agent can create tests, run them, inspect failures, adjust the implementation, and re-run the suite until it passes.</p>
<p>This isn’t about suggestions anymore; agents have more autonomy.</p>
<p>I think of coding agents as assistants working toward a shared goal.
They do some work, you do some, and you iterate together.</p>
<h2>🏆 Moving from Direction to Outcomes</h2>
<p>A big mental shift is moving away from simple directions to defining an outcome with guidance &amp; guardrails.</p>
<p>Agents don’t just perform a single task; they can execute multiple steps (and even parallelize them).
You don’t need to spoon-feed each directive one by one.</p>
<p>Instead, define the outcome you want, along with guidance and guardrails.</p>
<p>The clearer you are on the outcomes, constraints, and context around what you are trying to do, the better the agent will perform.</p>
<h2>📋 Examples: Real-world tasks I’ve asked Agents to handle</h2>
<blockquote>
<p>“Using the existing DB Driver X as a reference, create a set of table tests for driver Y. The tests should be structured similarly to the existing driver, surface any logic issues, concurrency issues, and act as a clear insurance against the defined interface.”</p>
</blockquote>
<blockquote>
<p>“Update CI workflows to Go 1.26.0, find and update any references to 1.25.6, then run tests to ensure everything still builds and passes”</p>
</blockquote>
<p>I also use agents for mundane work like git commits and opening pull requests.
They consistently produce better commit messages and PR descriptions than I would.</p>
<p>Agents don’t always get it exactly right, but with a bit of feedback and occasional adjustment, you can get a lot done quickly.</p>
<p>Avoid going down the rabbit hole of endless refinement, sometimes it’s better to reset with a clearer prompt.</p>
<h2>👨‍🏫 Context is Key</h2>
<p>If you want the best results from agents, you need to give them context.</p>
<p>Before I do serious work on a project, I have the agent:</p>
<ul>
<li>Read the Docs 📚</li>
<li>Review the Architecture 🏙️</li>
<li>Understand the Project Structure 📐</li>
<li>Understand how to build, test, and run the application locally 👩‍🔧</li>
</ul>
<p>The same steps that a human would take.
Agents are no different.</p>
<p>(I’ll dive deeper into Agent files, skills, and effective ways to provide more context in a future post)</p>
<h2>🧠 Final Thoughts</h2>
<p>Engineers are doing amazing things with agents, and new capabilities are being added daily.
But you don’t need to be at the bleeding edge to get more out of them (I certainly am not).</p>
<p>Don’t worry about the hype.
Understand what these tools can do, making small adjustments in how you use them can drastically change what you get back.</p>
]]></description>
        <pubDate>Thu, 19 Feb 2026 24:00:00 GMT</pubDate>
        
          <media:content url="https://bencane.com/assets/images/bengineering-hero.png" medium="image" />
        
      </item>
    
      
      
      
      
        
      
      <item>
        <title>Why is Infrastructure-as-Code so important? Hint: It&#39;s correctness</title>
        <link>https://bencane.com/posts/2026-02-12/</link>
        <guid isPermaLink="true">https://bencane.com/posts/2026-02-12/</guid>
        <dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Benjamin Cane</dc:creator>
        <description><![CDATA[<p>Why is Infrastructure-as-Code so important?
Hint: It's correctness.</p>
<p>I’ve worked on many systems in my career, and one thing that I’ve noticed is that those that leverage infrastructure-as-code tend to be more stable than those that don’t.</p>
<h2>🤔 But wait, isn’t everyone using IaC these days?</h2>
<p>You may be thinking, &quot;Why am I talking about IaC in 2026?
Isn’t this just the de facto standard at this point?&quot;</p>
<p>My hope is yes, everyone does this, but I’m sure many don’t invest the time into it.</p>
<p>I’m not here to tell you to use IaC; I’m here to tell you why it’s important, and it’s not necessarily about the speed of deployment.</p>
<h2>🏎️ Fast is great, but it’s not the biggest benefit</h2>
<p>A very clear and correct reason people leverage IaC is the speed of infrastructure provisioning.</p>
<p>It’s much faster to provision infrastructure with IaC; it takes less time, enabling you to scale faster, and it lets you do cool things like ephemeral environments.</p>
<p>But the biggest benefit of IaC, in my mind, is correctness.</p>
<h2>⚠️ IaC reduces human error</h2>
<p>Humans make mistakes.
When you ask humans to click the same buttons in the same sequence every time, you’ll get mixed results.</p>
<p>Steps get missed — especially when time passes or people rely on memory instead of process.</p>
<p>Documentation helps, but there are those of us who think, “I’ve done this a million times, I don’t need instructions.”</p>
<p>This attitude is the same reason one of my kid’s desks wobbles and the other one doesn’t…</p>
<p>IaC is a contract.
Once defined, every environment is created from the same source of truth.</p>
<h2>✅ Consistency is essential to production stability</h2>
<p>The consistency of IaC is what brings production stability.</p>
<p>When your performance testing environment matches production, your tests become more accurate.</p>
<p>If one service has a larger memory footprint in testing than it does in production, you might find yourself surprised by out-of-memory errors, especially if heap sizes are configured based on your test environment and not your production environment (because, of course, they would be the same, right?).</p>
<p>When I come across platforms that use IaC, I see fewer mistakes and fewer incorrect assumptions.
And production tends to be more stable, at least with respect to infrastructure and capacity-related issues.</p>
<h2>🧠 Final Thoughts</h2>
<p>So, to answer the question, why is IaC so important?
It’s not the speed of provisioning; it’s the correctness of the environments.</p>
<p>In production systems, correctness beats speed every time.</p>
]]></description>
        <pubDate>Thu, 12 Feb 2026 24:00:00 GMT</pubDate>
        
          <media:content url="https://bencane.com/assets/images/bengineering-hero.png" medium="image" />
        
      </item>
    
      
      
      
      
        
      
      <item>
        <title>Optimizing the team’s workflow can be more impactful than building business features</title>
        <link>https://bencane.com/posts/2026-02-05/</link>
        <guid isPermaLink="true">https://bencane.com/posts/2026-02-05/</guid>
        <dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Benjamin Cane</dc:creator>
        <description><![CDATA[<p>Optimizing the team’s workflow can be more impactful than building business features.
It defies logic, but it’s true.</p>
<p>I work with and talk to a lot of engineers, and to explain my point, I’ll describe two engineers on the same team.</p>
<h2>💪 Engineer 1</h2>
<p>The first engineer churns out a lot of code and user stories.
They’re focused, consistently finishing on time, and often doing more than they’re assigned.</p>
<p>When it comes to shipping business features, this person does a great job.</p>
<p>But this person is also more than happy to let their build run for 3 hours.</p>
<h2>🦾 Engineer 2</h2>
<p>The second engineer completes their assigned user stories, but when they encounter inefficiencies, they spend time fixing them.
Sometimes it’s improving the build pipeline, fixing flaky tests, making code more maintainable, etc.</p>
<p>While this engineer may finish fewer user stories because they are distracted by these “side quests,” they make a bigger impact.</p>
<h2>🏋️ Enabling Others</h2>
<p>While avoiding the 10x engineer trope, Engineer 2 has a bigger impact by resolving issues affecting the whole team.</p>
<p>A slow pipeline slows everyone’s work.</p>
<p>Open a single change, then wait 3 hours.
A test fails—wait another 3 hours.
Feedback comes in—wait 3 more.</p>
<p>Broken workflows turn simple changes into long, inefficient endeavors.</p>
<p>So fixing these not just for themselves but for everyone means the whole team can ship code faster.</p>
<h2>📈 Invest in Workflows</h2>
<p>Investing time in optimizing your workflow and the team’s workflow usually pays dividends.</p>
<p>Sometimes it’s hard to quantify, but the smallest optimizations can be huge.</p>
<p>Someone on the team who gets frustrated with inefficiencies and decides to fix them is incredibly valuable.</p>
<h2>👩‍🔧 Do you take ownership of your codebase?</h2>
<p>If you want to make a greater impact, look at how you work.</p>
<p>When you fix a bug, do you search the codebase for the same bug elsewhere?</p>
<p>When your build pipeline is slow, or you have flaky tests, do you fix them or live with them, complaining while nothing changes?</p>
]]></description>
        <pubDate>Thu, 05 Feb 2026 24:00:00 GMT</pubDate>
        
          <media:content url="https://bencane.com/assets/images/bengineering-hero.png" medium="image" />
        
      </item>
    
      
      
      
      
        
      
      <item>
        <title>I follow an architecture principle I call The Law of Collective Amnesia</title>
        <link>https://bencane.com/posts/2026-01-29/</link>
        <guid isPermaLink="true">https://bencane.com/posts/2026-01-29/</guid>
        <dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Benjamin Cane</dc:creator>
        <description><![CDATA[<p>I follow an architecture principle I call The Law of Collective Amnesia.</p>
<p>Over time, everyone (including yourself) forgets the original intention of the system's design as new requirements emerge.</p>
<p>This law applies at all levels, from system design to <em>microservices</em>, or even libraries.</p>
<h2>🧬 Systems Evolve (and Intent Fades)</h2>
<p>When building new platforms/services/whatever, we create a system design that follows a structure.</p>
<p>Different components have distinct responsibilities; they interact clearly with the rest of the system, and there is a plan.</p>
<p>But as time progresses, new people may not understand the original intentions of the design.</p>
<p>As new requirements come in, the pressure to deliver may push you or others down a path that doesn't align with the original plan.</p>
<p>When the architecture’s intent is understood, additions can be beneficial.
When it’s forgotten, they start to feel duct-taped on.</p>
<p>Duct-taped solutions turn into technical debt or operational/management complexity that starts to weigh the system down.</p>
<h2>📠 How Good Systems Become Legacy Nightmares</h2>
<p>We've all seen the legacy platform that feels brittle, does too much, and is daunting to refactor.</p>
<p>It didn't start that way.</p>
<p>At the time, it was probably a great design, but over time, new features and capabilities turned it into Frankenstein's monster.</p>
<h2>👮 How to Defend Architecture from Collective Amnesia</h2>
<p>While it may not be possible to prevent the system from devolving forever, you can reduce the need for duct tape solutions by designing for change.</p>
<h3>📜 Roles and Responsibilities</h3>
<p>An important—but not always effective—step is to document and define the roles and responsibilities of components within the system.</p>
<p>When a system is broken down into components with distinct roles and responsibilities, it becomes easier for people to make informed decisions about where new capabilities should reside.</p>
<p>The documentation “should” influence how change is implemented.</p>
<p>But it relies on people following that documentation, which is the fundamental flaw.</p>
<h3>🚧 Architectural Guardrails: Make the Right Path the Easy Path</h3>
<p>When I say &quot;architectural guardrails,&quot; you probably think of review boards and ADRs.
These processes are essential, but they don't always work as a prevention.</p>
<p><em>Instead, I mean designing the system so that the correct placement of functionality is the path of least resistance.</em></p>
<h3>🔏 Contracts as Constraints, Not Convenience</h3>
<p>In general, I feel like back-end <code>APIs</code> should provide as much data as possible, and it should be up to the clients to use what's relevant.</p>
<p>But sometimes contracts can be used to enforce design behaviors.</p>
<p>Systems can't act unless they receive the data required to act.</p>
<h3>🚪 Control Ingress and Egress to Control Evolution</h3>
<p>Ensuring that only specific systems serve as entry and exit points helps direct future design decisions.</p>
<p>It's often easier to add a new endpoint than to add a new platform that serves as an entry point.</p>
<p>Knowing this can allow you to put in place processing at those entry and exit points that ensure future capabilities follow specific patterns.</p>
<h2>🧩 Design for Change, Not Today’s Requirements</h2>
<p>When you are first building a system, it's easy to want to make it quickly based on the requirements in front of you.</p>
<p>But when you know a platform will evolve, it's beneficial to take time and implement interfaces that make the system more modular.</p>
<p>Within a <em>microservice</em>, this can be how you structure the application, how you create packages that can be extended even though you don't need them day one.</p>
<p>At a platform level, it could be the decision between <em>monolith</em> and <em>microservices</em>.
If you know there will be a rapid change, it may make sense to leverage <em>microservices</em>.
If you know there won't be a fast change, start with a <em>monolith</em>.</p>
<h2>🧠 Final Thoughts: Assume Intent Will Be Forgotten</h2>
<p>The above examples are just a subset of the ways you can enforce a design that aligns with your intentions.</p>
<p><strong>The key lesson:</strong> don't build a plan that relies on people to follow your intentions.
They won't.</p>
<p>You have to assume the next person won't design systems the way you do, they won't understand the reasons behind your design, and they'll be under pressure to deliver.</p>
]]></description>
        <pubDate>Thu, 29 Jan 2026 24:00:00 GMT</pubDate>
        
          <media:content url="https://bencane.com/assets/images/bengineering-hero.png" medium="image" />
        
      </item>
    
      
      
      
      
        
      
      <item>
        <title>Performance testing without a target is like running a race with no finish line</title>
        <link>https://bencane.com/posts/2026-01-22/</link>
        <guid isPermaLink="true">https://bencane.com/posts/2026-01-22/</guid>
        <dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Benjamin Cane</dc:creator>
        <description><![CDATA[<p>Performance testing without a target is like running a race with no finish line.</p>
<p><em>Did you win or did you stop early?</em></p>
<p>I previously shared my thoughts on benchmark and endurance tests, but before ever running a test, a target must be defined.</p>
<h2>🎯 Why Set Targets?</h2>
<p><em>Without a target, how do you know what good looks like?</em></p>
<p>I've often come across teams that have incorporated performance testing into their releases (which is excellent).
But they had no targets defined.</p>
<p>No production baseline.</p>
<p>No service-level objectives from the business.</p>
<p><em>How did they know whether the system was meeting expectations?</em>
They didn't.</p>
<p>In some cases, after targets were defined, the system was performing as needed.</p>
<p>In others, it clearly wasn't, and the team had no idea until targets were defined and compared with production.</p>
<h2>🏆 Defining Targets</h2>
<p>It's easier to define targets for existing systems (and modernization projects) than for a brand-new system.</p>
<p>Existing platforms have production numbers you can reference, user expectations, and service-level objectives that can be translated into performance targets.</p>
<p>New systems rarely have much to baseline from.</p>
<p>For a brand-new system, I like to work with the product/business team and understand their goals.</p>
<p><em>- 📈 What is the expected growth? Slow and steady, or fast and unpredictable?</em></p>
<p><em>- 🚨 What is the criticality of the platform? If it fails to respond, is it a problem or an inconvenience?</em></p>
<p><em>- 🌟 What unique constraints or features of the platform might influence performance requirements?</em></p>
<p>Once defined, targets should not be treated as static.</p>
<p>As traffic starts, you can adjust targets accordingly.
Maybe it's higher, perhaps it's lower.</p>
<h2>🪫 Leave Some Buffer</h2>
<p>Once a target is agreed upon, I like to add a bit of buffer.</p>
<p>If the requirement is 100ms, I’ll target closer to 75ms, or lower, depending on the system and its purpose.</p>
<p><em>Why?</em>
Adding capacity or tuning the system takes time.</p>
<p>Things change, sometimes in unexpected ways.</p>
<p>Sometimes unexpected changes can be handled by automatic/manual scaling, but not always.</p>
<p>It's important to give yourself a bit of buffer to respond to those changes.</p>
<h2>🧠 Final Thoughts</h2>
<p>I've talked a lot about setting targets and their importance.
But one of the most important aspects of having targets is monitoring and measuring production.</p>
<p>Having visibility in production helps validate that your targets are realistic.</p>
<p>Maybe they are too high, and you have wasted infrastructure reserved.</p>
<p>Perhaps they are too low, and you won't be able to survive the next traffic spike.</p>
<p>Traffic changes over time, and application performance naturally drifts as new capabilities are added.</p>
<p>Clear visibility into traffic and latency patterns is essential for anyone operating mission-critical, large-scale systems.</p>
<p>But also a foundational practice for most platforms.</p>
<p><em>Do you have performance targets for your platform?</em>
<em>Is it grounded in production measurements?</em>
<em>Should you?</em></p>
]]></description>
        <pubDate>Thu, 22 Jan 2026 24:00:00 GMT</pubDate>
        
          <media:content url="https://bencane.com/assets/images/bengineering-hero.png" medium="image" />
        
      </item>
    
      
      
      
      
        
      
      <item>
        <title>Many teams think performance testing means throwing traffic at a system until it breaks. That approach is fine, but it misses how systems are actually stressed in the real world.</title>
        <link>https://bencane.com/posts/2026-01-15/</link>
        <guid isPermaLink="true">https://bencane.com/posts/2026-01-15/</guid>
        <dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Benjamin Cane</dc:creator>
        <description><![CDATA[<p>Many teams think performance testing means throwing traffic at a system until it breaks.
That approach is fine, but it misses how systems are actually stressed in the real world.</p>
<p>The approach I’ve found most effective is to split performance testing into two distinct categories:</p>
<ul>
<li>🏋️‍♀️ <strong>Benchmark testing</strong></li>
<li>🚣‍♀️ <strong>Endurance testing</strong></li>
</ul>
<p>Both stress the system, but they answer <em>different questions</em>.</p>
<h2>🏋️‍♀️ Benchmark Testing:</h2>
<p>Benchmark tests are where most teams start: increasing load until the system fails.</p>
<p>Failure might mean:</p>
<ul>
<li>⏱️ Latency SLAs are exceeded</li>
<li>⚠️ Error rates cross acceptable thresholds</li>
</ul>
<p>Sometimes failure is measured by when the system stops responding entirely.
This is known as <em>breakpoint testing</em>.</p>
<p>Even when SLAs are the target, I recommend running breakpoint tests after thresholds are exceeded.</p>
<p>Knowing how the system breaks under load is useful when dealing with the uncertainties of production.</p>
<h2>🚣‍♀️ Endurance Testing:</h2>
<p>Endurance tests answer a <em>different question</em>:</p>
<blockquote>
<p>Can the system sustain high load over time?</p>
</blockquote>
<p>Running at high but realistic levels (often <em>near production max</em>) over extended periods exposes different problems:</p>
<ul>
<li>🪣 Queues, file systems, and databases slowly fill</li>
<li>🧹 Garbage collection and thread pools behave differently</li>
<li>🧵 Memory or thread leaks become visible</li>
</ul>
<p>These issues <em>rarely</em> show up in short spikes of traffic.
If you only run benchmarks, you’ll discover them for the first time in production.</p>
<h2>⌛️ Testing Thoroughly vs Deployment Speed:</h2>
<p>Benchmarks run fast; Endurance testing takes time.</p>
<p>A 24-hour endurance test can slow down releases, especially when you want to release the same service multiple times a day.</p>
<p>It's a <strong>trade-off</strong> between the system's criticality and the need for rapid deployments.</p>
<p>How tolerant is the system to minor performance regressions?</p>
<p>If performance truly matters, slowing releases down to run endurance tests might be the right call.</p>
<h2>🧠 Final Thoughts:</h2>
<p>Effective performance testing isn’t just about surviving spikes.</p>
<p>Spikes matter, but so does answering:</p>
<ul>
<li>📈 Can the system withstand peak load for extended periods?</li>
<li>🔎 If not, how does it fail, and why?</li>
</ul>
<p>All too often, I see the system's capacity become the breaking point during unexpected traffic patterns.</p>
<p>While an application might handle spikes, the overall platform often can't sustain them.
That's where endurance tests deliver their <strong>real value</strong>.</p>
]]></description>
        <pubDate>Thu, 15 Jan 2026 24:00:00 GMT</pubDate>
        
          <media:content url="https://bencane.com/assets/images/bengineering-hero.png" medium="image" />
        
      </item>
    
      
      
      
      
        
      
      <item>
        <title>Pre-populating caches is a “bolt-on” cache-optimization I&#39;ve used successfully in many systems. It works, but it adds complexity</title>
        <link>https://bencane.com/posts/2026-01-08/</link>
        <guid isPermaLink="true">https://bencane.com/posts/2026-01-08/</guid>
        <dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Benjamin Cane</dc:creator>
        <description><![CDATA[<p>Pre-populating caches is a “<em>bolt-on</em>” cache-optimization I've used successfully in many systems.</p>
<p>It works, but it <strong>adds complexity</strong>, which is why most teams avoid it.</p>
<h2>📖 Context</h2>
<p>For context, in this post, I’m talking about scenarios where one system requires data from another system, I.E., the <em>source of record (SOR).</em>
The data is needed frequently, and the decision to cache has already been made.</p>
<p>A good traditional approach is the <em>cache-aside pattern</em>, which maintains a local cache of data.</p>
<p>That cache is populated organically by checking for records as needed, finding that the data is not cached, fetching it from the SOR, and storing the result.</p>
<p>A pro of this approach is that the cache is <strong>transient</strong>.
If it's dropped, it's ok because you can always go back to the SOR, albeit with a performance penalty.</p>
<p><strong>But slow is better than broken.</strong></p>
<h2>🤔 Why?</h2>
<p>Calls to the SOR are problematic for low-latency or random-access workloads.</p>
<p>When 9 out of 10 requests all want the same data, you’ll have infrequent cache misses.
But when 9 out of 10 requests all require different data, you’ll have more cache misses, which reduces the effectiveness of caching.</p>
<p>Pre-populating caches is a way to avoid those cache misses by trading off latency for complexity.</p>
<h2>⚙️ How?</h2>
<p><strong>Caveat:</strong> I use pre-population purely as a <em>bolt-on</em> optimization, not a core dependency.</p>
<p>Typically, I keep the cache-aside path as the <em>primary mechanism</em>.
If anything goes wrong (and it will), there is always the option to go to the SOR for data (<code>slow &gt; broken</code>).</p>
<p><strong>A key decision</strong> is whether to pull the data or listen for it.</p>
<p>I prefer the SOR publishes updates as they occur, but platform constraints or circumstances may require you to pull the data.</p>
<p><code>Pub/sub</code> works great when the SOR publishes, but other options exist as well (webhooks, files) with their own trade-offs.</p>
<p><em>Use whatever makes sense for your environment.</em></p>
<h2>⚠️ Why Not?</h2>
<p>Implementing pre-populating a cache can be <em>easier said than done</em>, as a lot can go wrong.</p>
<p><em>What happens if you lose a message or two?</em></p>
<p><em>What happens when you’re rebuilding the cache (errors or new instances)?</em>
<em>How do you repopulate?</em></p>
<p>The cache-aside will cover any dropped messages, but implementing <strong>republish mechanisms is complicated</strong>.</p>
<p>You can’t rely solely on deltas; at some point, you'll need to <em>republish the entire dataset</em>.</p>
<p>Building all of these systems is complicated; there's more to monitor, patch, and manage.</p>
<p>If the latency hit and traffic volume to the SOR are not a concern, then that complexity is <em>not worth it</em>.</p>
<h2>🧠 Final Thoughts</h2>
<p>Pre-populating caches can be a <strong>significant performance win</strong>, but it can also be an <strong>operational overhead</strong>.</p>
<p>If your data is primarily static (<em>changing infrequently</em>), the overhead can be worthwhile.</p>
<p>If your data changes frequently, stick with <em>cache-aside</em> (and aggressive <code>TTLs</code>), or no cache at all.</p>
]]></description>
        <pubDate>Thu, 08 Jan 2026 24:00:00 GMT</pubDate>
        
          <media:content url="https://bencane.com/assets/images/bengineering-hero.png" medium="image" />
        
      </item>
    
      
      
      
      
        
      
      <item>
        <title>Don&#39;t be afraid to build a tool. Just don&#39;t become too attached to it.</title>
        <link>https://bencane.com/posts/2026-01-01/</link>
        <guid isPermaLink="true">https://bencane.com/posts/2026-01-01/</guid>
        <dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Benjamin Cane</dc:creator>
        <description><![CDATA[<p>Don't be afraid to build a tool.
Just don't become too attached to it.</p>
<p>When managing infrastructure or building and testing software, you'll inevitably find repetitive tasks that feel like a tool should exist to make them easier or faster.</p>
<p>Maybe one exists, perhaps it doesn't.
Either way, your team doesn’t know.</p>
<p>So inevitably, the tasks stay manual, inefficient, and frustrating.</p>
<p><em>Everyone wishes there was a tool—so why not build one?</em></p>
<h2>😱 Why People are Afraid to Build Tools:</h2>
<h4>1️⃣ “What if something already exists?”</h4>
<p>You build something, then someone says, “Why didn't you just use X?”</p>
<p>That moment sucks, especially if you didn't know X even existed.</p>
<p>Don't defend it, own it, and say: “I didn't know X existed, let me take a look at it.”</p>
<p>No one expects you to know every tool out there.</p>
<p>Building something and later discovering a better tool is not a failure.</p>
<p>It's a learning experience; what you learn while building the tool is priceless.</p>
<h4>2️⃣ “I don't want to maintain another system.”</h4>
<p>This is a genuine concern.</p>
<p>But maintaining a small set of tools that save hours of repetitive work or enable something that couldn't be done otherwise is often worth it.</p>
<p>The key is to reduce overhead early on:</p>
<ul>
<li>
<p>Prefer CLI tools vs services</p>
</li>
<li>
<p>Minimize dependencies</p>
</li>
</ul>
<p>The less complexity, the easier it is to manage.</p>
<h4>3️⃣ “We are too busy building features.”</h4>
<p>This one is hard, not because it's a technical challenge.
But because it's a mindset change.</p>
<p>Teams get so focused on what they’re building that they forget to look at how they are building it.</p>
<p>If a process is slow, manual, or error-prone, that's a signal that it could be improved.</p>
<p>The secret is taking a step back and re-evaluating how.</p>
<h2>👨‍🔬 How I Approach Building Tools:</h2>
<h4>1️⃣ Look for existing tools</h4>
<p>I prefer open source, so that's where I start.
There is a lot out there, and you have to search for it.</p>
<h4>2️⃣ Look for “close enough” projects</h4>
<p>You might not find a project that does exactly what you want.
But maybe it's 80-90% of the way there.</p>
<p>If it's close, extend it, contribute upstream.
Or fork it (license permitting).</p>
<h4>3️⃣ If nothing fits, build new</h4>
<p>If nothing meets your needs, then build it.</p>
<p>Start small, use it yourself, share it with your team and solicit feedback.</p>
<h2>😍 Don't Become Too Attached:</h2>
<p>When someone says, “Why not use X?” Evaluate X objectively.</p>
<p><em>Is it the best tool now?</em>
If so, use it.</p>
<p>Using a well-known, widely adopted tool is often more efficient.</p>
<h2>🧠 Final Thoughts:</h2>
<p>Custom tools can be lifesavers.</p>
<p>They only become a problem when someone gets too attached and refuses to replace them with more standard solutions.</p>
<p>Build tools when needed, replace them when something better appears.</p>
<p>A Kenjutsu Instructor (Japanese Swordsmanship) I knew would always warn us.
He’d say Americans always want to treat the sword like it’s a precious gem.
It's a tool; treat it like you would a hammer.</p>
<p>I think that applies here as well.</p>
]]></description>
        <pubDate>Thu, 01 Jan 2026 24:00:00 GMT</pubDate>
        
          <media:content url="https://bencane.com/assets/images/bengineering-hero.png" medium="image" />
        
      </item>
    
      
      
      
      
        
      
      <item>
        <title>One of the toughest engineering skills to develop is accepting a decision you disagree with. 😖</title>
        <link>https://bencane.com/posts/2025-12-26/</link>
        <guid isPermaLink="true">https://bencane.com/posts/2025-12-26/</guid>
        <dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Benjamin Cane</dc:creator>
        <description><![CDATA[<p>One of the toughest engineering skills to develop is accepting a decision you disagree with. 😖</p>
<p>When you treat engineering as a craft, it’s easy to get passionate about solutions.
Strong opinions are a good thing — many great engineers have them.</p>
<p>But you also need to know when to challenge a decision and when to accept it.</p>
<h2>🎯 The Inflection Point</h2>
<p>Every architecture review eventually narrows down to a few viable options.
Maybe it’s captured in an ADR, maybe through discussion, maybe through a decision-maker.</p>
<p>If your preferred option isn’t chosen, you have two paths:</p>
<ol>
<li>Keep challenging the decision</li>
<li>Accept it and support it fully</li>
</ol>
<p>Knowing which path to take is a critical engineering skill.</p>
<h2>🔥 When to Keep Challenging</h2>
<p><em>My rule: Will this decision cause me to lose sleep — figuratively or literally?</em></p>
<p>If the decision risks:</p>
<ul>
<li>Breaking production</li>
<li>Waking you up at 2 a.m.</li>
<li>Introducing significant operational or security risks</li>
</ul>
<p>It’s worth continuing the conversation.</p>
<p>And the best way to challenge is respectfully — usually in a 1:1 with the decision-maker(s).
This gives space for deeper context, trade-offs, and clearer alignment.</p>
<h2>🤝 When to Support a Decision You Disagree With</h2>
<p>If the decision isn’t dangerous — just not your preferred option— it’s time to commit.
Many architectural choices have multiple valid options; one may be your preference.</p>
<p>In these cases, being a good engineer means supporting the direction chosen.</p>
<p>You can still improve the solution by suggesting micro-adjustments that reduce risk or enhance reliability without reopening the whole debate.</p>
<p>Sometimes, you will find that the chosen path was actually right.
Don’t worry, no one cares if you were right or wrong in the debate if you supported the implementation.</p>
<h2>🧠 Final Thoughts</h2>
<p>Sometimes decisions are mistakes.
That’s normal.</p>
<p>What matters is catching them early and being willing to revisit them once real-world data reveals new information.
Implementation often teaches us things the whiteboard never could.</p>
<p>Just be careful not to over-index on finding every minor issue as a fundamental flaw in the solution.</p>
<p>Good architecture isn’t about being right all the time.
It’s about making informed decisions, supporting the team, and knowing when to push and when to commit.</p>
]]></description>
        <pubDate>Fri, 26 Dec 2025 24:00:00 GMT</pubDate>
        
          <media:content url="https://bencane.com/assets/images/bengineering-hero.png" medium="image" />
        
      </item>
    
      
      
      
      
        
      
      <item>
        <title>Canary deployments are an operational superpower, but the complexity they bring isn’t for everyone.</title>
        <link>https://bencane.com/posts/2025-12-19/</link>
        <guid isPermaLink="true">https://bencane.com/posts/2025-12-19/</guid>
        <dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Benjamin Cane</dc:creator>
        <description><![CDATA[<p>Canary deployments are an operational superpower, but the complexity they bring isn’t for everyone.
So why not just use Blue/Green deployments instead? 🦸‍♂️</p>
<p><em>Let’s break it down.</em></p>
<h2>🎞️ A Quick Recap</h2>
<p>Both Blue/Green and Canary start the same way:</p>
<p>Take two instances (or clusters) of a service &amp; deploy the new code version to the idle one.</p>
<p>Where they differ is how traffic shifts.</p>
<h4>🐤 Canary</h4>
<p>Canary deployments gradually shift traffic from old to new.</p>
<p>Both versions serve live traffic during the transition.</p>
<h4>🔵|🟢 Blue/Green</h4>
<p>A Blue/Green traffic shift is an all-or-nothing shift.</p>
<p>Only one instance is serving traffic; there is no gradual ramp-up.</p>
<h2>⚙️ Why Canary Is More Complex</h2>
<p>Running two versions at the same time (with both taking traffic) introduces challenges:</p>
<ul>
<li>Backward compatibility</li>
<li>Shared (or replicated) databases</li>
<li>Sticky sessions</li>
<li>Context-aware routing</li>
<li>Event ordering across versions</li>
<li>Consistency of state</li>
</ul>
<p>Blue/Green avoids most of this.
You still need a rollback plan, but you don’t have to worry about parallel operations.</p>
<p><em>So if Canary is so complicated… why use it?</em></p>
<h2>🏅 Why Canary Is Worth It (Sometimes)</h2>
<p>Canary shines when:</p>
<ul>
<li>The system is highly critical</li>
<li>It must run 24/7 with no interruption</li>
<li>You cannot accept even a brief outage</li>
<li>You want to reduce the blast radius of regressions</li>
<li>You release often and need tight control/quick fallback</li>
</ul>
<p>Canary lets you validate a new version with a small percentage of traffic before gradually increasing it further.
If something breaks, roll traffic back instantly.</p>
<p>More importantly, when it breaks, only a portion of traffic is impacted.</p>
<p>For high-risk and mission-critical systems, the complexity is worth it.</p>
<h2>🧠 Final Thoughts</h2>
<p>Blue/Green is a great default deployment strategy, and in many cases, the optimal one.</p>
<p>A perfect example is file-based batch workloads.
Batch systems usually have flexibility in timing.
You can:</p>
<ul>
<li>Pause traffic</li>
<li>Cut over to the new version</li>
<li>Resume processing</li>
<li>And if it fails… reprocess the files</li>
</ul>
<p>Yes, easier said than done, but still far simpler than Canary.</p>
<p>Both approaches have their place.
The key is matching the deployment strategy to the system’s criticality and level of acceptable risk.</p>
]]></description>
        <pubDate>Fri, 19 Dec 2025 24:00:00 GMT</pubDate>
        
          <media:content url="https://bencane.com/assets/images/bengineering-hero.png" medium="image" />
        
      </item>
    
      
      
      
      
        
      
      <item>
        <title>Everyone has bias, yes, even you. 🫵</title>
        <link>https://bencane.com/posts/2025-12-12/</link>
        <guid isPermaLink="true">https://bencane.com/posts/2025-12-12/</guid>
        <dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Benjamin Cane</dc:creator>
        <description><![CDATA[<p>Everyone has bias, yes, even you. 🫵</p>
<p><em>Ever been in a technical debate where the other side seems way too attached to their solution?</em></p>
<p><em>Ever notice others feel the same way about you?</em></p>
<p>Sometimes your solution is the right solution.
But sometimes… It’s just bias.</p>
<h2>🧠 Understanding Bias</h2>
<p>Bias gets a bad rep.
Bias doesn’t always come from a negative space.</p>
<p>In the context of technical solutions, bias usually forms from experience.</p>
<p>Throughout our careers, we see countless architectures, patterns, outages, and wins.</p>
<p>We remember what worked.</p>
<p>We remember what didn’t.</p>
<p>Over time, we build a gut sense of solutions that are safe based on our experiences.</p>
<p>That gut sense is bias, and it’s often well intentioned.</p>
<h2>💪 Working Through Bias (Without Ignoring It)</h2>
<h4>Accept that everyone has bias — including you.</h4>
<p>This is the hardest part.</p>
<p>You need to assume that other people's biases stem from good intentions and real experience, just like yours do.</p>
<p>With this assumption, you can have more objective conversations and begin hearing other perspectives.</p>
<h4>Ask yourself: Is this solution based on reality or comfort?</h4>
<p><em>Why do you have a preference for your solution?</em></p>
<p><em>Are you pushing a strategy?</em></p>
<p><em>Are you avoiding something unfamiliar?</em></p>
<p><em>Are you sticking to what has worked in the past?</em></p>
<p>Understanding why you hold bias is key to making the case for your solution.</p>
<h4>Use data to guide the decision, but make sure it’s objective.</h4>
<p>Data makes decisions easier, but be careful.
Bias can influence what data you choose to look at.</p>
<p>Sometimes we subconsciously cherry-pick data that supports our views.</p>
<p>It’s essential to take an objective look at the data, even if it challenges your case.</p>
<h4>Bring in a trusted third party — but present data carefully.</h4>
<p>An impartial opinion can help, but only if you give the whole picture.</p>
<p>When bringing in a third party, it’s crucial to present solutions and data objectively; that way, you get their honest opinion, not your opinion echoed back to you.</p>
<h2>🧩 Final Thoughts</h2>
<p>The most important part of technical decision-making is accepting the possibility that you might be wrong.</p>
<p>On many occasions, I've had to step back, evaluate my own bias, review the data objectively, and listen to opposing views.</p>
<p>Bias isn’t something you can eliminate; it’s something you recognize and manage.</p>
]]></description>
        <pubDate>Fri, 12 Dec 2025 24:00:00 GMT</pubDate>
        
          <media:content url="https://bencane.com/assets/images/bengineering-hero.png" medium="image" />
        
      </item>
    
      
      
      
      
        
      
      <item>
        <title>Do you use Architecture Decision Records?* I’m a big fan, and I think they’re a best practice every engineering org should adopt.</title>
        <link>https://bencane.com/posts/2025-12-05/</link>
        <guid isPermaLink="true">https://bencane.com/posts/2025-12-05/</guid>
        <dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Benjamin Cane</dc:creator>
        <description><![CDATA[<p><em>Do you use Architecture Decision Records?</em>
I’m a big fan, and I think they’re a best practice every engineering org should adopt. 📐</p>
<h2>🙋 What is an ADR?</h2>
<p>An Architecture Decision Record (ADR) is a lightweight document that captures architectural decisions.</p>
<p>A good ADR typically consists of:</p>
<ul>
<li>The context behind the problem</li>
<li>The options considered</li>
<li>The decision made, including the why</li>
</ul>
<p>Different companies/teams will add their own spin, but these are the core elements.</p>
<h2>🤔 Why ADRs Matter</h2>
<p>The ADR itself is helpful; it gives product, architecture, and engineering teams a shared reference point.
Clear documentation reduces ambiguity, enabling teams to align and build effectively.</p>
<p>But the real value is the process.</p>
<p>Writing an ADR forces you to explore alternatives, consider trade-offs, and debate options objectively.
If done well, ADRs capture everyone’s input and clearly document why a path was chosen.</p>
<p>This keeps architectural decisions grounded in logic rather than bias or preferences.</p>
<h2>🧠 Final Thoughts</h2>
<p>The documentation and process are valuable, but they only work with the right culture.</p>
<p>Teams need a culture where:</p>
<ul>
<li>Everyone is free to contribute to architectural decisions</li>
<li>Diverse options are encouraged</li>
<li>Decisions are made objectively</li>
<li>ADRs are accessible and visible to everyone</li>
</ul>
<p>Without the culture, the process becomes a formality and a burden of red tape.</p>
<p>With the right culture, ADRs become a powerful tool for making well-balanced &amp; transparent decisions.</p>
<p>Of course, that culture and the process need to be embraced by all levels of the team.
ADRs are only as useful as the effort you put into them.</p>
]]></description>
        <pubDate>Fri, 05 Dec 2025 24:00:00 GMT</pubDate>
        
          <media:content url="https://bencane.com/assets/images/bengineering-hero.png" medium="image" />
        
      </item>
    
      
      
      
      
        
      
      <item>
        <title>Does resource usage within your application or database suddenly spike periodically?* Does it cause system slowdown?</title>
        <link>https://bencane.com/posts/2025-11-28/</link>
        <guid isPermaLink="true">https://bencane.com/posts/2025-11-28/</guid>
        <dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Benjamin Cane</dc:creator>
        <description><![CDATA[<p><em>Does resource usage within your application or database suddenly spike periodically?</em>
Does it cause system slowdown? 🐢</p>
<p>A simple answer to your problem might be to add a bit of jitter (random timing delay).</p>
<p>When you schedule recurring tasks or loops, adding a bit of jitter can significantly improve how your application behaves.</p>
<h2>📖 What is Jitter?</h2>
<p>In simple terms, jitter is a small bit of randomness added to the time between two events.</p>
<p>While there are many types of jitter in computing, for this post, we will keep the scope to randomness, adding delay between events.</p>
<h2>⚙️ Why it Matters:</h2>
<p>When tasks don’t implement jitter, they can accidentally synchronize, running at the same time.</p>
<p>Which leads to:</p>
<ul>
<li>CPU and Memory spikes</li>
<li>Thread contention</li>
<li>Request storms to downstream systems</li>
</ul>
<h2>🧩 A Simple Example</h2>
<p>Imagine an API Gateway that caches responses.
You decide to invalidate responses every 30 minutes with a scheduled thread.</p>
<p>No problem for a handful of APIs, but scale this to 1,000 APIs.
Suddenly, every 30 minutes, 1,000 threads fire up at once.</p>
<p>The periodic spikes could cause performance issues or even crash the gateway.</p>
<p>Now add random Jitter: instead of running every 30 minutes, add or subtract a few random seconds for each task.</p>
<p>You’ve just spread out the load, making utilization smoother and more predictable.</p>
<h2>⚠️ Caveats</h2>
<p>Jitter isn’t perfect; this approach spreads out the load, but with random jitter, small spikes could still occur.</p>
<p>Still, for many scenarios, it’s a simple approach.</p>
<h2>🧠 Final Thoughts</h2>
<p>If you find your application slowing down periodically with spikes in resource utilization.
You might be dealing with synchronized tasks, and adding random jitter might be a good solution.</p>
<p>It’s simple, it’s easy, and it usually works well.</p>
]]></description>
        <pubDate>Fri, 28 Nov 2025 24:00:00 GMT</pubDate>
        
          <media:content url="https://bencane.com/assets/images/bengineering-hero.png" medium="image" />
        
      </item>
    
      
      
      
      
        
      
      <item>
        <title>When you shut down an application instance, don&#39;t stop the listener immediately — that&#39;s how you end up with failed requests during every application rollout. 😢</title>
        <link>https://bencane.com/posts/2025-11-21/</link>
        <guid isPermaLink="true">https://bencane.com/posts/2025-11-21/</guid>
        <dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Benjamin Cane</dc:creator>
        <description><![CDATA[<p>When you shut down an application instance, don't stop the listener immediately — that's how you end up with failed requests during every application rollout. 😢</p>
<h2>🛑 The Common Mistake:</h2>
<p>I've seen many shutdown implementations that stop the listener as soon as the shutdown signal is received.</p>
<p>The assumption is usually:</p>
<blockquote>
<p>“Stopping the listener will fail readiness probes, and traffic will be redirected.”</p>
</blockquote>
<p>That's half right…</p>
<p>It will trigger traffic redirection, but not immediately.</p>
<h2>⏱️ Probe Intervals Matter:</h2>
<p>Readiness probes (Kubernetes), Load Balancer health checks, &amp; service mesh probes all run at fixed intervals.</p>
<p>In Kubernetes, the default is 10 seconds.</p>
<p>That means it can take up to 10 seconds for the platform to detect an unhealthy status and adjust traffic.</p>
<p>Longer if the failure threshold is greater than 1.</p>
<h2>💥 What Happens During Those 10 Seconds?</h2>
<p>New traffic still goes to the unhealthy instance.</p>
<p>And because you stopped the listener, every request to that instance fails for 10 seconds.</p>
<p>Some clients retry and land on another instance.</p>
<p>Some will not.</p>
<p>Either way, every rollout will result in failed requests that could have been avoided.</p>
<h2>✅ What You Should Do Instead</h2>
<p>When shutting down an instance:</p>
<p>1️⃣ Keep the listener running; Don’t slam the door shut.</p>
<p>2️⃣ Fail readiness probes; Report failures from the readiness endpoint, but allow new requests to other endpoints.</p>
<p>3️⃣ Wait for traffic to drain; Let in-flight requests finish, and let the platform stop routing new requests.</p>
<p>4️⃣ Then stop the listener; Only when it's safe.</p>
<p>This is a graceful shutdown.</p>
<h2>🧠 Final Thoughts</h2>
<p>Resiliency isn't only about surviving failures, it's also about preventing them.</p>
<p>Handle shutdown properly, and you can roll out new code without ever failing a request.</p>
]]></description>
        <pubDate>Fri, 21 Nov 2025 24:00:00 GMT</pubDate>
        
          <media:content url="https://bencane.com/assets/images/bengineering-hero.png" medium="image" />
        
      </item>
    
      
      
      
      
        
      
      <item>
        <title>A common issue I see when teams first adopt `gRPC` is managing persistent connections, especially during failovers.</title>
        <link>https://bencane.com/posts/2025-11-14/</link>
        <guid isPermaLink="true">https://bencane.com/posts/2025-11-14/</guid>
        <dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Benjamin Cane</dc:creator>
        <description><![CDATA[<p>A common issue I see when teams first adopt <code>gRPC</code> is managing persistent connections, especially during failovers.</p>
<h2>🤔 The Problem:</h2>
<p><code>gRPC</code> is fast thanks to protobuf and how it handles connections, mainly:</p>
<ul>
<li>Persistent connections that avoid repeated TCP handshakes</li>
<li>Sending multiple requests over a single <code>HTTP/2</code> connection.</li>
</ul>
<p>However, these performance optimizations are also a source of failover challenges.</p>
<h2>😫 Challenges with Failover:</h2>
<p><em>Let’s say you’ve just implemented <code>gRPC</code> and want to trigger a manual failover for your service.</em></p>
<p>For many, failover typically happens at the load-balancer level, which works fine for <code>HTTP/1</code>.</p>
<p>When you take an instance down, new requests go to another instance.</p>
<p>However, with <code>gRPC</code> over <code>HTTP/2</code>, connections stay open and are reused, which means existing connections continue to send requests to the old instances even during failover.</p>
<p>Unless your load-balancer understands <code>HTTP/2</code> and <code>gRPC</code>, failover will not work as it used to.</p>
<h2>🛠️ Failover with gRPC</h2>
<p>For proper failover, you’ve got two main options:</p>
<ol>
<li>Use a load balancer that understands <code>HTTP/2</code> and <code>gRPC</code>, such as an AWS Application Load Balancer vs. a Network Load Balancer, Envoy vs. HAProxy.</li>
<li>Cycle connections periodically—force clients to reconnect and redistribute the load.</li>
</ol>
<p>Both options get the job done, but the first is overall cleaner.</p>
<h2>💡 Final Thoughts:</h2>
<p>There is a lot to love about <code>gRPC</code>: strong contracts, outstanding performance, and client-server simplicity.</p>
<p>But it takes work to operationalize it.
Nobody tells you that upfront, though.</p>
]]></description>
        <pubDate>Fri, 14 Nov 2025 24:00:00 GMT</pubDate>
        
          <media:content url="https://bencane.com/assets/images/bengineering-hero.png" medium="image" />
        
      </item>
    
      
      
      
      
        
      
      <item>
        <title>A dangerous mindset I’ve seen—and been guilty of—is assuming code doesn&#39;t change.</title>
        <link>https://bencane.com/posts/2025-11-7/</link>
        <guid isPermaLink="true">https://bencane.com/posts/2025-11-7/</guid>
        <dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Benjamin Cane</dc:creator>
        <description><![CDATA[<p>A dangerous mindset I’ve seen—and been guilty of—is assuming code doesn't change.</p>
<p>Or when it changes, the next person will understand the original context.</p>
<p>Reality check, they wont.</p>
<p>The next person (future you) has no idea what you were thinking.</p>
<h2>🔎 A Simple Example:</h2>
<p>You are building a service that receives JSON requests.</p>
<p>You write a method that takes in an array from the request and accesses index 2.</p>
<p>The request handler has already validated the array length and content.</p>
<p><em>So you don't need to recheck it before accessing index two, right?</em></p>
<p><em>It's just less efficient to check it twice, right?</em></p>
<p>Wrong.</p>
<p>Your original implementation might work fine, but fast forward to years later, when someone else (perhaps yourself) uses that method.</p>
<p>Will they always ensure the array has the right length? 🤷‍♂️</p>
<p>If they don't, your method is a ticking time bomb.</p>
<h2>🧠 Fix the Mindset:</h2>
<p>Embrace defensive programming, where you expect that your methods will be misused.</p>
<p>Recheck the array's length before you use it, even if something else has previously checked.</p>
<p>Expect bad inputs, expect errors to occur, and have a path to do something about it.</p>
<h2>💡 Expect the Unexpected</h2>
<p>If you assume the following person:</p>
<ul>
<li>Won't read your docs or code comments</li>
<li>Will reuse your code in a different context</li>
<li>Will misuse your code for things you've never designed it for</li>
</ul>
<p>You will write safer, more resilient code.</p>
]]></description>
        <pubDate>Fri, 07 Nov 2025 24:00:00 GMT</pubDate>
        
          <media:content url="https://bencane.com/assets/images/bengineering-hero.png" medium="image" />
        
      </item>
    
      
      
      
      
        
      
      <item>
        <title>⚡️Does saving 1 millisecond really matter?* Answer: more than you’d think.</title>
        <link>https://bencane.com/posts/2025-10-31/</link>
        <guid isPermaLink="true">https://bencane.com/posts/2025-10-31/</guid>
        <dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Benjamin Cane</dc:creator>
        <description><![CDATA[<p><em>⚡️Does saving 1 millisecond really matter?</em>
Answer: more than you’d think.</p>
<h2>🧩 Context:</h2>
<p>I recently shared performance tuning results where we reduced Microservice-to-Microservice latency from 1.3 ms to 0.3 ms in a new platform.</p>
<p>That’s a huge performance win, but it doesn’t sound like much.</p>
<p>In card payments, where every millisecond counts, it’s easy to see the value.
<em>But for an average backend system, does 1 ms matter?</em></p>
<p><em>A honeybee can flap its wings in 5 ms, so who is going to notice 1 ms?</em></p>
<h2>🧘‍♂️ Perspective:</h2>
<p>It’s not just 1 ms.</p>
<p>Modern distributed systems are built from many microservices and layers.
A single customer journey typically touches dozens of components.</p>
<p>If you shave off 1 ms from every call, the gains compound quickly.</p>
<p>End-to-end, that can add up to tens or even hundreds of milliseconds for every incoming request.</p>
<h2>💡Final Thoughts</h2>
<p><em>Does saving 100 ms even matter?</em></p>
<p>Kind of.</p>
<p>Even if your platform isn’t latency-sensitive, throughput and latency are closely related.</p>
<p>Faster requests mean more available capacity.</p>
<p>That 100 ms may allow you to scale better or reduce infrastructure costs.</p>
<p>A 1 ms improvement doesn’t sound like much on the surface, but the compounding effect is massive, even for systems that “don’t care” about latency.</p>
]]></description>
        <pubDate>Fri, 31 Oct 2025 24:00:00 GMT</pubDate>
        
          <media:content url="https://bencane.com/assets/images/bengineering-hero.png" medium="image" />
        
      </item>
    
      
      
      
      
        
      
      <item>
        <title>Have you heard of Store and Forward?* It’s a resiliency design prevalent in card &amp; bank payments, telecommunications, and other industries.</title>
        <link>https://bencane.com/posts/2025-10-27/</link>
        <guid isPermaLink="true">https://bencane.com/posts/2025-10-27/</guid>
        <dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Benjamin Cane</dc:creator>
        <description><![CDATA[<p><em>Have you heard of Store and Forward?</em>
It’s a resiliency design prevalent in card &amp; bank payments, telecommunications, and other industries.</p>
<p>The concept is that rather than failing a request when a dependency is down, store it, and send the request when it is back up.</p>
<h2>🤔 How it works:</h2>
<p>We have two services, Service A, which is highly dependent on Service B to process requests.</p>
<p>Traditionally, when Service B is down, Service A would have no choice but to reject requests with a failure.</p>
<p>With the Store and Forward design, when Service B is unavailable, Service A will reply to the request with a “degraded processing” (rejecting it, accepting it, or saying, “We’ll let you know later”).</p>
<p>But before replying to the request, it is “stored” somewhere that can be accessed quickly, such as a cache, a queue, a database, etc.</p>
<p>When Service B is back up, Service A will “forward” the stored requests to Service B.</p>
<h2>🥹 What I like about this design pattern:</h2>
<p>It accepts that failures are going to occur because everything fails.</p>
<p>Rather than creating “retry storms,” it adds more intelligence to the process, only sending requests to Service B when it’s back online.</p>
<p>It ensures that no request is lost, even in significant outages.</p>
<p>But this design pattern isn’t without complexity.</p>
<p>Blind retries are easy; you keep retrying.</p>
<p>But with Store and Forward, you need to:</p>
<p>🛑 Know when Service B is unavailable</p>
<p>🧠 Add logic around degraded processing</p>
<p>✅ Detect when Service B has recovered</p>
<p>🤹 Figure out the best way to dequeue the stored requests</p>
<p>While more complex than blind retries, store and forward is a great resiliency design for when every request matters.</p>
<p>In payments, where reliability and fast responses are critical, as is accuracy, the complexity of store-and-forward designs is a worthwhile trade-off.</p>
]]></description>
        <pubDate>Mon, 27 Oct 2025 24:00:00 GMT</pubDate>
        
          <media:content url="https://bencane.com/assets/images/bengineering-hero.png" medium="image" />
        
      </item>
    
      
      
      
      
        
      
      <item>
        <title>When Building Low-Latency, High-Scale Systems, Push as Much Processing as Possible to Later</title>
        <link>https://bencane.com/posts/2025-10-24/</link>
        <guid isPermaLink="true">https://bencane.com/posts/2025-10-24/</guid>
        <dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Benjamin Cane</dc:creator>
        <description><![CDATA[<p>When building low-latency, high-scale systems, a key strategy of mine is simple:</p>
<blockquote>
<p>“Push as much processing as possible to later.”</p>
</blockquote>
<h2>Why It Matters? 🤔</h2>
<p>In many systems—checkout, login, trade execution—latency matters because someone (or something) is waiting:</p>
<ul>
<li>
<p>A customer at a point of sale</p>
</li>
<li>
<p>A user at a login screen</p>
</li>
<li>
<p>A system waiting on a transaction confirmation</p>
</li>
</ul>
<p>Platforms that support these scenarios must respond in milliseconds.
If not, requests will fail, and user experiences will suffer.</p>
<h2>My Approach 🧠</h2>
<p>I typically divide these platforms into two sub-platforms to optimize for speed and scale.</p>
<p>🏎️ Real-Time Platform: Optimized for scale and speed, only performing what is essential before responding to the request.</p>
<p>📥 Event-Driven Platform (sometimes Batch): Handles processing deferred from the real-time platform.
It is still built for scale, but seconds, not milliseconds.</p>
<h2>Deciding What Belongs Where 🗃</h2>
<p>I try to break down processing into steps, and for each step I ask:</p>
<blockquote>
<p>“Does this step need to happen before we respond or after?”</p>
</blockquote>
<p>✅ If it MUST be performed before the response, use a real-time path.</p>
<p>⏭ If it can wait until after, event-driven path.</p>
<p>Things that tend to follow the event-driven path are:</p>
<ul>
<li>
<p>Audit logging</p>
</li>
<li>
<p>Downstream asynchronous notifications</p>
</li>
<li>
<p>Enrichment and Transformations</p>
</li>
<li>
<p>Checks that trigger out-of-band tasks</p>
</li>
</ul>
<p>These are not slow but don’t need to be “blocking.”</p>
<h2>Final Thoughts ✍️</h2>
<p>The key message is that the more you do on the real-time path, the slower it is.</p>
<p>This pattern is a good way to reduce the real-time workload.</p>
<p>But the trick is to find a reliable and fast way to move work from a real-time to an event-driven system.</p>
<p>Pub/Sub and <code>gRPC</code> streams are two of my go-to options.</p>
]]></description>
        <pubDate>Fri, 24 Oct 2025 24:00:00 GMT</pubDate>
        
          <media:content url="https://bencane.com/assets/images/bengineering-hero.png" medium="image" />
        
      </item>
    
      
      
      
      
        
      
      <item>
        <title>Coding is a small part of software engineering.</title>
        <link>https://bencane.com/posts/2025-10-10/</link>
        <guid isPermaLink="true">https://bencane.com/posts/2025-10-10/</guid>
        <dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Benjamin Cane</dc:creator>
        <description><![CDATA[<p>Coding is a small part of software engineering. 🤯</p>
<p>With AI Coding Assistants and Autonomous Agents being all the rage lately, I feel like this is something folks—especially those early in their careers—need to hear.</p>
<p>Coding is essential, but only a portion of what is required to build production systems.</p>
<p>Writing software can take a lot of effort, but just as much is spent before anyone starts to code.</p>
<p>Let's think about some of those tasks:</p>
<ul>
<li>Defining API contracts</li>
<li>Designing a database schema</li>
<li>Choosing the right database</li>
<li>Configuring build &amp; deployment pipelines</li>
<li>Selecting a runtime environment</li>
<li>Packaging software (Dockerfiles?)</li>
<li>Integrating observability tools</li>
<li>Writing runbooks and service manuals</li>
</ul>
<p>And that's all before you even consider architectural trade-offs, system design, or cross-team alignment.</p>
<p>A lot of software engineering involves understanding what needs to be done and why—and then figuring out how to do it well.</p>
<p>Even with how fast AI is evolving, we are still a long way from AI completely taking over software engineering, but it's revolutionizing how we attack the process.</p>
<p>Are you embracing our robot overlords or resisting change? 🤖</p>
]]></description>
        <pubDate>Fri, 10 Oct 2025 24:00:00 GMT</pubDate>
        
          <media:content url="https://bencane.com/assets/images/bengineering-hero.png" medium="image" />
        
      </item>
    
      
      
      
      
        
      
      <item>
        <title>Should I be an individual contributor or a people leader?</title>
        <link>https://bencane.com/posts/2025-10-3/</link>
        <guid isPermaLink="true">https://bencane.com/posts/2025-10-3/</guid>
        <dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Benjamin Cane</dc:creator>
        <description><![CDATA[<p>Should I be an individual contributor or a people leader? 🤔 It's a question I get often.</p>
<h2>My honest answer:</h2>
<p><em>&gt; Which motivates you more?</em></p>
<p>It sounds simple, but it's an essential question for many.</p>
<p>🧰 If you enjoy building systems more than building people, then the IC track is probably right for you.</p>
<p>👔 If you enjoy growing and leading others, then people leadership might be the right path for you.</p>
<h2>The good news is that you can switch later.</h2>
<p>A lot of people change between IC and people leadership roles.
Some love both equally and switch multiple times throughout their career.</p>
<p>Just keep those leadership and technical skills sharp.</p>
<h2>A Word of Warning ⚠️</h2>
<p>The higher you go, the fewer IC roles you’ll find—especially at companies without a strong IC path.</p>
<p>Some companies top out early, and some treat the IC track as a first-class path.</p>
<p>So, choosing the IC track might limit your options, but it's worth it if that's what you genuinely want to do.</p>
<h2>Have you chosen your path?</h2>
<p>For me, the IC path was the clear winner, but there were several times I considered people leadership opportunities.</p>
<p>Before I became a Staff Engineer, I applied for a Director of Engineering role.
I didn't get it, but the process helped me figure out what I wanted.</p>
]]></description>
        <pubDate>Fri, 03 Oct 2025 24:00:00 GMT</pubDate>
        
          <media:content url="https://bencane.com/assets/images/bengineering-hero.png" medium="image" />
        
      </item>
    
      
      
      
      
        
      
      <item>
        <title>Improve performance and reduce chances of request failures with this one simple trick! Avoid cross-region calls.</title>
        <link>https://bencane.com/posts/2025-9-26/</link>
        <guid isPermaLink="true">https://bencane.com/posts/2025-9-26/</guid>
        <dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Benjamin Cane</dc:creator>
        <description><![CDATA[<p>Improve performance and reduce chances of request failures with this one simple trick!
Avoid cross-region calls. 🫠</p>
<p>While the idea is simple, designing a system around this concept is anything but.</p>
<h2>🤔 Why it’s Effective:</h2>
<p>The core idea is straightforward:</p>
<p>Keeping traffic local is better for performance and resilience.</p>
<h4>🚄 Performance:</h4>
<p>Performance is easy to understand:</p>
<ul>
<li>
<p>In-region traffic (including cross availability zone) usually sees single-digit millisecond latency (or less)</p>
</li>
<li>
<p>Cross-region traffic introduces latency, double-digit milliseconds or more, depending on the region</p>
</li>
</ul>
<p>Latency adds up fast when you cross-regions multiple times in a microservices architecture.</p>
<h4>🚀 Resilience:</h4>
<p>Resilience is a bit more nuanced.</p>
<p>Every cross-region call passes through more network hops, such as firewalls, routers, switches, load balancers, etc.</p>
<p>More hops == More failure points</p>
<p>Keeping traffic local means fewer packet loss chances and less impact when things break.</p>
<h4>🧙‍♂️ Complexities:</h4>
<p>Designing for regional isolation (core concept of cell-based architecture) means:</p>
<p>1️⃣ Having active instances of critical services in each region (active-passive doesn't work with this approach)</p>
<p>2️⃣ Figuring out data replication and consistency across regions</p>
<p>3️⃣ Building robust routing and failover capabilities</p>
<p>4️⃣ Establishing management processes and capabilities that let you manage each region independently</p>
<p>Yes, the design is much more complex, and the operational overhead is much higher, but the blast radius of failure is smaller.</p>
<p>A failure with a critical service in one region only impacts that region.</p>
<h2>🧠 Final Thoughts:</h2>
<p>Perfect isolation isn't always possible; you might need to cross-region for data consistency or as a fallback.</p>
<p>When you need to cross-region:</p>
<p>✅ Reduce the number of cross-region hops as much as possible</p>
<p>✅ Do it up front, ideally before the request lands in your system.</p>
<p>The more cross-region routing you perform at the edge, the more you can avoid regional isolation complexities in the underlying systems.</p>
]]></description>
        <pubDate>Fri, 26 Sep 2025 24:00:00 GMT</pubDate>
        
          <media:content url="https://bencane.com/assets/images/bengineering-hero.png" medium="image" />
        
      </item>
    
      
      
      
      
        
      
      <item>
        <title>Did you know Kube-proxy doesn’t perform load-balancing itself?* It’s iptables (by default).</title>
        <link>https://bencane.com/posts/2025-9-19/</link>
        <guid isPermaLink="true">https://bencane.com/posts/2025-9-19/</guid>
        <dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Benjamin Cane</dc:creator>
        <description><![CDATA[<p><em>Did you know Kube-proxy doesn’t perform load-balancing itself?</em>
It’s iptables (by default).</p>
<p>If you’ve run applications in Kubernetes, you’ve probably heard of Kube-proxy, the service responsible for routing traffic to Services.</p>
<p>But the interesting twist is that Kube-proxy doesn’t perform the routing, and iptables does (or IPVS, or nftables).</p>
<h2>⚙️ How it works:</h2>
<p>When you define a Service, Kubernetes will assign it an IP address.</p>
<p>Kube-proxy watches for these events and creates iptables rules that handle routing.</p>
<p>The iptables rules will:</p>
<ul>
<li>Forward new connections with a destination of the Service IP to a Pod IP</li>
<li>Use the statistics module to select which Pod IP to forward the connection to</li>
</ul>
<p>I like to think of it as follows: Kube-proxy identifies the need for routing, and iptables does the work.</p>
<h2>🤔 Why it’s important:</h2>
<p>If you plan to use <code>gRPC</code>, this is critical to understand.</p>
<p><code>gRPC</code> uses <code>HTTP/2</code> as its underlying protocol, which sends multiple requests down a single connection.</p>
<p>Since iptables forwards traffic at a connection level (layer 4), multiple requests down a single connection will all land on the same pod, even if more are available.</p>
<p>You might assume traffic will be balanced across pods, and be surprised to find it is not.</p>
<p>You're fine if you use ``HTTP/1<code>.1</code> (without connection reuse).
But anything that keeps long-lived connections open or sends multiple requests down a single connection, Kube-proxy won’t cut it.</p>
<h2>🔭 What’s Next:</h2>
<p>Scaling has been a challenge for iptables, as having lots of rules and connection tracking are known bottlenecks.</p>
<p>IPVS and nftables (iptables successor) have been introduced as new options for routing and load-balancing.</p>
<p>Both are still layer 4.</p>
<p>If you need layer 7 (request level routing), that’s where Istio comes in.</p>
<h2>🧠 Final Thoughts:</h2>
<p>Understanding Kube-proxy, iptables, and <code>gRPC</code> &amp; <code>HTTP/2</code> work is essential for anyone building fast, scalable backend systems on Kubernetes.</p>
<p>You can’t optimize what you don’t understand.</p>
<h2>🔗 References:</h2>
<p>Here are some reference links for those looking for a deeper dive.</p>
<ul>
<li>
<p><a href="https://kubernetes.io/docs/reference/networking/virtual-ips/">https://kubernetes.io/docs/reference/networking/virtual-ips/</a></p>
</li>
<li>
<p><a href="https://kubernetes.io/docs/reference/command-line-tools-reference/kube-proxy/">https://kubernetes.io/docs/reference/command-line-tools-reference/kube-proxy/</a></p>
</li>
<li>
<p><a href="https://en.m.wikipedia.org/wiki/IP_Virtual_Server">https://en.m.wikipedia.org/wiki/IP_Virtual_Server</a></p>
</li>
</ul>
]]></description>
        <pubDate>Fri, 19 Sep 2025 24:00:00 GMT</pubDate>
        
          <media:content url="https://bencane.com/assets/images/bengineering-hero.png" medium="image" />
        
      </item>
    
  </channel>
</rss>
