YOLO is a terrible strategy for validating production changes.
How many times have you seen it?
Your platform is running smoothly. No alerts, no issues. Then suddenly, something breaks.
After digging in, you discover the cause: another system you depend on made a change, and that change broke your platform.
They didn’t notice it broke. You did, much too late…
How many times have you been the cause of another platform breaking?
🥶 Cold Reality
I wish the above scenario were rare, but it happens constantly across the technology industry.
It happens between internal teams, third-party integrations, and shared infrastructure teams.
These scenarios make you wonder, “How was that change validated?”
Maybe they tested it, and their validation had gaps. Maybe they did little validation at all. If any.
Either way, the result is the same: they validated their change with 100% of production traffic. Bad plan.
💡 Better Ways to Validate Changes
There are many ways teams can reduce production risk when rolling out changes, and the best teams combine the following approaches.
Canary Releases 🐤
I talk about canary deployments often.
Instead of moving 100% of traffic at once, move small percentages gradually and observe behavior closely.
That observed part matters. Look at error rates, latency changes (beyond normal platform warmup), resource spikes, and unexpected retries. All of these indicate customer impact.
Canary deployments are one of the best ways to reduce the blast radius of changes, identify problems quickly, and self-correct.
Shadow Traffic 🪞
Traffic mirroring sends production traffic to a new version before routing live traffic there.
Responses are ignored, but you observe behavior and monitor the same signals you would with a canary release without sacrificing a customer request.
Synthetic Traffic 🤖
Synthetic traffic simulates user behavior continuously. It’s great for monitoring customer experience, but also a great way to validate new deployments.
Route synthetic traffic to upgraded instances first and verify behavior before moving real traffic. If it fails with synthetic traffic, it likely won’t survive real traffic.
Smoke Tests 😶🌫️
The classic approach. After deployment, run a small set of fast tests to confirm the platform is fundamentally working.
Smoke tests don’t need to be fancy; they can be shell scripts, API calls, read-only requests, a test file, or full end-to-end validation.
Their purpose is simple: to quickly catch obvious breakage.
🧠 Final Thoughts
Don’t think of the above methods as mutually exclusive choices. Combine them.
Some platforms I work on combine canary releases, shadow traffic, and synthetic traffic. Others use smoke tests plus canary releases.
The more layers of validation you have, the more likely you are to catch issues before your customers do. Because having your customers validate changes for you is a poor strategy.