Saga Pattern: Distributed Transactions in Microservices
In a monolith, a single database transaction ensures that creating an order, reserving inventory, and charging the customer either all succeed or all fail. In microservices, each of these operations lives in a different service with its own database. You can't use a single ACID transaction across them.
The Saga pattern solves this by breaking a distributed transaction into a sequence of local transactions. Each service performs its local transaction and publishes an event. If any step fails, the saga executes compensating transactions to undo the work done by preceding steps.
Two Approaches
1. Choreography (Event-Based)
Each service listens for events and decides what to do next. There's no central coordinator — services react to events in a chain.
2. Orchestration (Central Coordinator)
A central saga orchestrator (e.g., AWS Step Functions) tells each service what to do and handles the compensation logic. The flow is explicit and easy to visualize.
- Pros — explicit flow, easy to understand, centralized error handling, visual debugging
- Cons — orchestrator is a single point of coordination (not failure if managed), services coupled to orchestrator
- Best for — complex sagas with many steps, conditional branching, and timeouts
Compensating Transactions
Each saga step needs a compensating action — a way to undo its work if a later step fails. Compensations are not rollbacks; they're new actions that semantically reverse the original.
CreateOrder→ compensate withCancelOrderReserveStock→ compensate withReleaseStockChargePayment→ compensate withRefundPaymentSendConfirmation→ compensate withSendCancellationEmail
AWS Step Functions Implementation
// Step Functions state machine (simplified)
{
"StartAt": "CreateOrder",
"States": {
"CreateOrder": {
"Type": "Task",
"Resource": "arn:aws:lambda:...:createOrder",
"Next": "ReserveStock",
"Catch": [{ "ErrorEquals": ["States.ALL"], "Next": "CancelOrder" }]
},
"ReserveStock": {
"Type": "Task",
"Resource": "arn:aws:lambda:...:reserveStock",
"Next": "ProcessPayment",
"Catch": [{ "ErrorEquals": ["States.ALL"], "Next": "ReleaseStockAndCancel" }]
},
"ProcessPayment": {
"Type": "Task",
"Resource": "arn:aws:lambda:...:processPayment",
"Next": "ConfirmOrder",
"Catch": [{ "ErrorEquals": ["States.ALL"], "Next": "ReleaseStockAndCancel" }]
},
"ConfirmOrder": { "Type": "Task", "Resource": "...", "End": true },
"ReleaseStockAndCancel": {
"Type": "Parallel",
"Branches": [
{ "StartAt": "ReleaseStock", "States": { "ReleaseStock": { "Type": "Task", "Resource": "...", "End": true } } },
{ "StartAt": "CancelOrder", "States": { "CancelOrder": { "Type": "Task", "Resource": "...", "End": true } } }
],
"Next": "SagaFailed"
},
"SagaFailed": { "Type": "Fail", "Error": "SagaCompensated" }
}
}
Choreography vs Orchestration: When to Use Each
- 2-3 steps, simple flow → Choreography. Publish events via SNS/SQS, each service handles its part.
- 4+ steps, conditional logic, timeouts → Orchestration. Use Step Functions for visibility and error handling.
- Need visual debugging → Orchestration. Step Functions shows exactly which step failed and why.
- Teams own their services independently → Choreography. No central orchestrator to coordinate.
Common Pitfalls
- Forgetting compensations — every forward step MUST have a compensation. If you can't compensate, you can't use sagas for that step.
- Non-compensatable actions — sending an email or SMS can't be undone. Place these at the end of the saga, after all compensatable steps succeed.
- Partial failures in compensations — what if the compensation itself fails? Implement retries with exponential backoff. Step Functions handles this natively.
- Semantic lock anti-pattern — don't lock resources across saga steps. Use reservation patterns instead (reserve stock, don't decrement it).
Sagas trade atomicity for availability. You lose the simplicity of ACID transactions, but you gain the ability to span multiple services without distributed locks. The key is designing compensations that are reliable and idempotent.
Conclusion
The Saga pattern is essential for managing distributed transactions in microservices. Choose choreography for simple, loosely-coupled flows and orchestration (Step Functions) for complex, multi-step processes. Always design compensating transactions for every step, and place non-compensatable actions at the end of the saga.
At TechTrailCamp, the Saga pattern is a core topic in our Microservices and AWS tracks. You'll implement real sagas with Step Functions and SQS through hands-on, 1:1 mentoring.
Want to master distributed transactions?
Join TechTrailCamp's 1:1 training and implement saga patterns with AWS Step Functions.
Start Your Learning Journey
TechTrailCamp