Circuit Breaker Pattern: Preventing Cascade Failures
In a microservices architecture, services depend on each other. When the Payment Service slows down, the Order Service waits. Threads pile up. The Order Service becomes slow. The API Gateway times out. Now the entire system is down — because of one slow service.
The circuit breaker pattern prevents this cascade. Like an electrical circuit breaker, it "trips open" when a downstream service is failing, returning fallback responses immediately instead of waiting for timeouts.
The Three States
Closed (Normal)
Requests flow normally. The circuit breaker monitors the failure rate. If failures exceed a threshold (e.g., 50% of the last 100 calls), it trips to Open.
Open (Failing)
All requests fail immediately — no call to the downstream service. Return a fallback response (cached data, default value, or graceful error). After a timeout period (e.g., 30 seconds), transition to Half-Open.
Half-Open (Testing)
Allow a limited number of test requests through. If they succeed, transition back to Closed. If they fail, return to Open. This prevents hammering a recovering service.
Implementation with Resilience4j
// Resilience4j circuit breaker configuration
CircuitBreakerConfig config = CircuitBreakerConfig.custom()
.failureRateThreshold(50) // Trip at 50% failure rate
.waitDurationInOpenState(Duration.ofSeconds(30)) // Wait 30s before half-open
.slidingWindowSize(100) // Evaluate last 100 calls
.permittedNumberOfCallsInHalfOpenState(5) // 5 test calls
.build();
CircuitBreaker circuitBreaker = CircuitBreaker.of("paymentService", config);
// Usage
Supplier<PaymentResult> decorated = CircuitBreaker
.decorateSupplier(circuitBreaker, () -> paymentService.charge(order));
Try<PaymentResult> result = Try.ofSupplier(decorated)
.recover(throwable -> PaymentResult.pendingRetry()); // Fallback
Fallback Strategies
- Cached response — return the last successful response. Works for product catalogs, user profiles.
- Default value — return a sensible default. "Estimated delivery: 3-5 days" when the shipping calculator is down.
- Graceful degradation — hide the failing feature. Remove recommendations widget when the recommendation service is down.
- Queue for retry — put the request in SQS and process it when the service recovers. Works for non-time-sensitive operations.
Related Resilience Patterns
Retry with Backoff
Retry transient failures with exponential backoff + jitter. But don't retry if the circuit is open — that defeats the purpose.
Timeout
Set aggressive timeouts on all HTTP calls. A 30-second timeout is almost always too long. For most service-to-service calls, 2-5 seconds is appropriate.
Bulkhead
Isolate thread pools per downstream service. If the Payment Service consumes all threads, the User Service calls still have their own dedicated pool.
Common Mistakes
- No fallback — a circuit breaker without a fallback just returns an error faster. Always define meaningful fallbacks.
- Too-high thresholds — a 90% failure rate threshold means 90 out of 100 requests fail before the breaker trips. Start at 50%.
- Too-long timeout in Open state — 5 minutes means users see degraded service for 5 minutes. Start at 30 seconds.
- Retrying through an open circuit — retries should be inside the circuit breaker, not outside. Otherwise you're hammering a dead service.
The circuit breaker pattern is about failing fast and failing gracefully. It's better to return a cached response in 5ms than to wait 30 seconds for a timeout from a dead service.
Conclusion
The circuit breaker pattern is essential for any microservices system. Combined with retries, timeouts, and bulkheads, it creates a resilient system that degrades gracefully instead of failing catastrophically. Implement it with Resilience4j for Java services, and always define meaningful fallback responses.
At TechTrailCamp, resilience patterns are a core part of our Microservices and AWS tracks. You'll implement circuit breakers, retries, and bulkheads through hands-on, 1:1 mentoring.
Want to build resilient microservices?
Join TechTrailCamp's 1:1 training and master fault tolerance patterns.
Start Your Learning Journey
TechTrailCamp