Circuit Breaker
Track per-provider:model health across four states (healthy, degraded, down, throttled) to prevent cascading failures and minimize wasted latency.
Four-State Model
Each provider-model combination maintains its own independent health state:
3 consecutive 5xx 5 consecutive 5xx
┌──────────┐ ──────────▶ ┌──────────┐ ──────────▶ ┌──────────┐
│ healthy │ │ degraded │ │ down │
│ │ ◀────────── │ │ ◀────────── │ │
└──────────┘ success └──────────┘ success └──────────┘
│
30s probe │ success
▼
back to healthy
429 received
┌──────────┐ ──────────────────────▶ ┌───────────┐
│ any state│ │ throttled │
│ │ ◀────────────────────── │ │
└──────────┘ 60s cooldown expires └───────────┘State Definitions
| State | Condition | Behavior |
|---|---|---|
| healthy | No recent failures, or a successful probe | Requests routed normally |
| degraded | 3+ consecutive 5xx errors | Requests still routed, but flagged. The router prefers this provider less if alternatives exist. |
| down | 5+ consecutive 5xx errors | Requests blocked. One probe request allowed every 30 seconds to test recovery. |
| throttled | Received HTTP 429 | Requests blocked for 60 seconds. Automatic recovery after cooldown. |
Granularity: Per Provider:Model
The circuit breaker operates at the provider:model level, not the provider level. This is an important distinction:
- If Claude Sonnet 4.6 goes down on one provider, only requests for that specific model via that provider are affected.
- Other models on the same provider remain healthy and continue serving traffic.
- The failing model's chain triggers failover to alternative providers, while all other models operate normally.
This per-model granularity prevents a single model's issues from cascading to the entire provider.
State Transitions
Healthy to Degraded (3 failures)
When a provider returns three consecutive 5xx responses for a specific model, the state moves to degraded. The breaker still allows requests through -- this is a warning state, not a blocking state.
Degraded to Down (5 failures)
Two more 5xx responses (total 5 consecutive) move the state to down. The breaker now blocks all requests to this provider-model pair, except for periodic probe requests.
Down to Healthy (probe success)
Every 30 seconds, the breaker allows a single request through as a probe. If that request succeeds, the state resets to healthy with zero failures. If it fails, the state stays down and the next probe happens 30 seconds later.
Any State to Throttled (429)
When the upstream returns HTTP 429 (rate limited), the state immediately becomes throttled regardless of the current state. All requests are blocked for 60 seconds. After the cooldown, the state resets to healthy.
Automatic Reset
If no requests are made to a specific provider-model pair for 5 minutes, the health state automatically expires and resets to healthy. This prevents stale data from blocking traffic indefinitely.
Interaction with Routing
The router checks the health state for each provider in the chain. The check is fast and returns:
| State | isAvailable() returns |
|---|---|
| healthy | true |
| degraded | true (still accepting requests) |
| down | false (except for one probe every 30s) |
| throttled | false until throttledUntil timestamp passes |
When a provider is unavailable, the router skips it and tries the next one in the chain. No upstream request is made, saving the round-trip latency.
Recording Outcomes
After every upstream request completes, the gateway records the outcome:
- Success -- resets the health state to
healthy - Server error (5xx) -- increments failure count, may escalate state
- Rate limited (429) -- sets
throttledfor 60 seconds
A single successful response resets the health state entirely. There is no "recovery period" or gradual ramp-up -- one success means the provider is healthy again for that model.
Design Decisions
Why not per-provider (without model)? A single provider hosts many models. A capacity issue with one model should not affect routing for other models on the same provider.
Why 5 failures, not 3? The down threshold is set at 5 consecutive failures to avoid premature circuit opening from transient errors. The degraded state at 3 failures provides early warning.
Why 30s probe interval? Balances recovery speed against unnecessary load. A 30-second probe interval means a provider can recover within 30 seconds of the underlying issue being resolved.
Why 60s throttle? Upstream 429 responses typically include a Retry-After header suggesting 30-60 seconds. The 60-second cooldown is conservative and avoids hammering a rate-limited provider.
Next Steps
- Automatic Failover — what happens when a provider goes down
- Provider Routing — how the router selects a provider
- Upstream Providers — the infrastructure behind each model