When building low-latency, high-scale systems, a key strategy of mine is simple:
“Push as much processing as possible to later.”
Why It Matters? 🤔
In many systems—checkout, login, trade execution—latency matters because someone (or something) is waiting:
-
A customer at a point of sale
-
A user at a login screen
-
A system waiting on a transaction confirmation
Platforms that support these scenarios must respond in milliseconds. If not, requests will fail, and user experiences will suffer.
My Approach 🧠
I typically divide these platforms into two sub-platforms to optimize for speed and scale.
🏎️ Real-Time Platform: Optimized for scale and speed, only performing what is essential before responding to the request.
📥 Event-Driven Platform (sometimes Batch): Handles processing deferred from the real-time platform. It is still built for scale, but seconds, not milliseconds.
Deciding What Belongs Where 🗃
I try to break down processing into steps, and for each step I ask:
“Does this step need to happen before we respond or after?”
✅ If it MUST be performed before the response, use a real-time path.
⏭ If it can wait until after, event-driven path.
Things that tend to follow the event-driven path are:
-
Audit logging
-
Downstream asynchronous notifications
-
Enrichment and Transformations
-
Checks that trigger out-of-band tasks
These are not slow but don’t need to be “blocking.”
Final Thoughts ✍️
The key message is that the more you do on the real-time path, the slower it is.
This pattern is a good way to reduce the real-time workload.
But the trick is to find a reliable and fast way to move work from a real-time to an event-driven system.
Pub/Sub and gRPC streams are two of my go-to options.