Distributed Transactions - The Saga Pattern
1. The Background: The Monolithic Safety Net
In a traditional, single-server backend (a Monolith), transactions are easy. You rely on the database's ACID properties. If a user buys a laptop, you wrap the whole process in an SQL Transaction:
BEGIN TRANSACTION- Create the Order.
- Deduct $1,000 from the user's account.
- Deduct 1 Laptop from Inventory.
COMMIT(Save permanently).
If the Inventory check fails (out of stock!), the database instantly executes a ROLLBACK. The $1,000 deduction is magically erased as if it never happened. It is "All or Nothing."
2. The Problem: The Microservice Nightmare
Modern tech companies (Netflix, Uber, Amazon) do not use Monoliths. They use Microservices.
The Order Service, Payment Service, and Inventory Service are three completely separate codebases running on different physical servers, each with their own private database.
- The Disaster: A user clicks "Buy."
- The
Order Servicecreates an order. (Success) - The
Payment Servicededucts $1,000 from the credit card. (Success) - The
Inventory Servicechecks for laptops... but they are out of stock. (Failure!)
You cannot run an SQL ROLLBACK across three different databases over the internet. The Payment database has no idea the Inventory database even exists. You have successfully taken $1,000 from a customer for a laptop that does not exist, and there is no automatic way to undo it.
3. The Solutions: 2PC vs. The Saga
Engineers created two main ways to solve this. One is the "old school" way, and the other is the modern industry standard.
Solution A: Two-Phase Commit / 2PC (The "Hold Your Breath" Strategy)
This relies on a central "Coordinator" server.
- Phase 1 (Prepare): The Coordinator calls all three services: "Are you ready? Lock your rows!" All three databases lock their tables and reply "Ready."
- Phase 2 (Commit): The Coordinator says, "Okay, everyone execute!"
- The Fatal Flaw: This is incredibly slow. While the databases are locked waiting for the Coordinator, no other users can buy laptops. Worse, if the Coordinator crashes in the middle of Phase 1, all your databases are permanently locked. It is terrible for performance and violates the "Availability" rule of the CAP theorem.
Solution B: The Saga Pattern (The Modern Standard)
Instead of one massive, locked transaction, a Saga is a sequence of completely independent, local database transactions.
There is no magical ROLLBACK. Instead, if step 3 fails, the system must automatically execute Compensating Transactions. A Compensating Transaction is a brand-new, explicitly written piece of code designed to act as an "Apology."
How a Saga handles the failure:
Order Serviceruns a local transaction: Status = PENDING.Payment Serviceruns a local transaction: Charges $1,000.Inventory Servicetries to run its local transaction, but fails (Out of stock!).- The Rollback (The Apologies): The Inventory Service fires a "Failure Event" to a message broker (like Kafka).
- The
Payment Servicehears the failure and runs its Compensating Transaction: Issues a $1,000 Refund. - The
Order Servicehears the failure and runs its Compensating Transaction: Changes Order Status to CANCELLED.
The data is eventually consistent, no databases were ever locked, and the system stayed blazingly fast.