TechTrailCamp
← Back to Blog

Circuit Breaker Pattern: Preventing Cascade Failures

CIRCUIT BREAKER STATES CLOSED Requests flow normally Monitoring failure rate OPEN Requests fail immediately Returns fallback response HALF- OPEN Limited test requests Probing if service recovered timeout expires failure threshold hit test succeeds test fails Like an electrical circuit breaker: trips open to protect the system, tests periodically to restore Without it, one failing service takes down every service that depends on it

In a microservices architecture, services depend on each other. When the Payment Service slows down, the Order Service waits. Threads pile up. The Order Service becomes slow. The API Gateway times out. Now the entire system is down — because of one slow service.

The circuit breaker pattern prevents this cascade. Like an electrical circuit breaker, it "trips open" when a downstream service is failing, returning fallback responses immediately instead of waiting for timeouts.

The Three States

Closed (Normal)

Requests flow normally. The circuit breaker monitors the failure rate. If failures exceed a threshold (e.g., 50% of the last 100 calls), it trips to Open.

Open (Failing)

All requests fail immediately — no call to the downstream service. Return a fallback response (cached data, default value, or graceful error). After a timeout period (e.g., 30 seconds), transition to Half-Open.

Half-Open (Testing)

Allow a limited number of test requests through. If they succeed, transition back to Closed. If they fail, return to Open. This prevents hammering a recovering service.

Implementation with Resilience4j

// Resilience4j circuit breaker configuration
CircuitBreakerConfig config = CircuitBreakerConfig.custom()
    .failureRateThreshold(50)           // Trip at 50% failure rate
    .waitDurationInOpenState(Duration.ofSeconds(30))  // Wait 30s before half-open
    .slidingWindowSize(100)             // Evaluate last 100 calls
    .permittedNumberOfCallsInHalfOpenState(5)  // 5 test calls
    .build();

CircuitBreaker circuitBreaker = CircuitBreaker.of("paymentService", config);

// Usage
Supplier<PaymentResult> decorated = CircuitBreaker
    .decorateSupplier(circuitBreaker, () -> paymentService.charge(order));

Try<PaymentResult> result = Try.ofSupplier(decorated)
    .recover(throwable -> PaymentResult.pendingRetry()); // Fallback

Fallback Strategies

  • Cached response — return the last successful response. Works for product catalogs, user profiles.
  • Default value — return a sensible default. "Estimated delivery: 3-5 days" when the shipping calculator is down.
  • Graceful degradation — hide the failing feature. Remove recommendations widget when the recommendation service is down.
  • Queue for retry — put the request in SQS and process it when the service recovers. Works for non-time-sensitive operations.

Related Resilience Patterns

Retry with Backoff

Retry transient failures with exponential backoff + jitter. But don't retry if the circuit is open — that defeats the purpose.

Timeout

Set aggressive timeouts on all HTTP calls. A 30-second timeout is almost always too long. For most service-to-service calls, 2-5 seconds is appropriate.

Bulkhead

Isolate thread pools per downstream service. If the Payment Service consumes all threads, the User Service calls still have their own dedicated pool.

Resilience Patterns Stack Retry Handles transient errors Exponential backoff Max 3 attempts + jitter to prevent stampede Circuit Breaker Handles persistent failures Trips after threshold Fast-fails requests Returns fallback response Timeout Limits wait time 2-5 sec for service calls Frees threads quickly Feeds into circuit breaker Bulkhead Isolates resources Separate thread pools per dependency
Use all four patterns together for comprehensive resilience

Common Mistakes

  • No fallback — a circuit breaker without a fallback just returns an error faster. Always define meaningful fallbacks.
  • Too-high thresholds — a 90% failure rate threshold means 90 out of 100 requests fail before the breaker trips. Start at 50%.
  • Too-long timeout in Open state — 5 minutes means users see degraded service for 5 minutes. Start at 30 seconds.
  • Retrying through an open circuit — retries should be inside the circuit breaker, not outside. Otherwise you're hammering a dead service.
The circuit breaker pattern is about failing fast and failing gracefully. It's better to return a cached response in 5ms than to wait 30 seconds for a timeout from a dead service.

Conclusion

The circuit breaker pattern is essential for any microservices system. Combined with retries, timeouts, and bulkheads, it creates a resilient system that degrades gracefully instead of failing catastrophically. Implement it with Resilience4j for Java services, and always define meaningful fallback responses.

At TechTrailCamp, resilience patterns are a core part of our Microservices and AWS tracks. You'll implement circuit breakers, retries, and bulkheads through hands-on, 1:1 mentoring.

Want to build resilient microservices?

Join TechTrailCamp's 1:1 training and master fault tolerance patterns.

Start Your Learning Journey