Circuit Breaker

Track per-provider:model health across four states (healthy, degraded, down, throttled) to prevent cascading failures and minimize wasted latency.

Four-State Model

Each provider-model combination maintains its own independent health state:

              3 consecutive 5xx          5 consecutive 5xx
  ┌──────────┐  ──────────▶  ┌──────────┐  ──────────▶  ┌──────────┐
  │ healthy  │               │ degraded │               │   down   │
  │          │  ◀──────────  │          │  ◀──────────  │          │
  └──────────┘   success     └──────────┘   success     └──────────┘
                                                              │
                                                     30s probe │ success
                                                              ▼
                                                        back to healthy


                          429 received
  ┌──────────┐  ──────────────────────▶  ┌───────────┐
  │ any state│                           │ throttled │
  │          │  ◀──────────────────────  │           │
  └──────────┘     60s cooldown expires  └───────────┘

State Definitions

State	Condition	Behavior
healthy	No recent failures, or a successful probe	Requests routed normally
degraded	3+ consecutive 5xx errors	Requests still routed, but flagged. The router prefers this provider less if alternatives exist.
down	5+ consecutive 5xx errors	Requests blocked. One probe request allowed every 30 seconds to test recovery.
throttled	Received HTTP 429	Requests blocked for 60 seconds. Automatic recovery after cooldown.

Granularity: Per Provider:Model

The circuit breaker operates at the provider:model level, not the provider level. This is an important distinction:

If Claude Sonnet 4.6 goes down on one provider, only requests for that specific model via that provider are affected.
Other models on the same provider remain healthy and continue serving traffic.
The failing model's chain triggers failover to alternative providers, while all other models operate normally.

This per-model granularity prevents a single model's issues from cascading to the entire provider.

State Transitions

Healthy to Degraded (3 failures)

When a provider returns three consecutive 5xx responses for a specific model, the state moves to degraded. The breaker still allows requests through -- this is a warning state, not a blocking state.

Degraded to Down (5 failures)

Two more 5xx responses (total 5 consecutive) move the state to down. The breaker now blocks all requests to this provider-model pair, except for periodic probe requests.

Down to Healthy (probe success)

Every 30 seconds, the breaker allows a single request through as a probe. If that request succeeds, the state resets to healthy with zero failures. If it fails, the state stays down and the next probe happens 30 seconds later.

Any State to Throttled (429)

When the upstream returns HTTP 429 (rate limited), the state immediately becomes throttled regardless of the current state. All requests are blocked for 60 seconds. After the cooldown, the state resets to healthy.

Automatic Reset

If no requests are made to a specific provider-model pair for 5 minutes, the health state automatically expires and resets to healthy. This prevents stale data from blocking traffic indefinitely.

Interaction with Routing

The router checks the health state for each provider in the chain. The check is fast and returns:

State	`isAvailable()` returns
healthy	`true`
degraded	`true` (still accepting requests)
down	`false` (except for one probe every 30s)
throttled	`false` until `throttledUntil` timestamp passes

When a provider is unavailable, the router skips it and tries the next one in the chain. No upstream request is made, saving the round-trip latency.

Recording Outcomes

After every upstream request completes, the gateway records the outcome:

Success -- resets the health state to healthy
Server error (5xx) -- increments failure count, may escalate state
Rate limited (429) -- sets throttled for 60 seconds

A single successful response resets the health state entirely. There is no "recovery period" or gradual ramp-up -- one success means the provider is healthy again for that model.

Design Decisions

Why not per-provider (without model)? A single provider hosts many models. A capacity issue with one model should not affect routing for other models on the same provider.

Why 5 failures, not 3? The down threshold is set at 5 consecutive failures to avoid premature circuit opening from transient errors. The degraded state at 3 failures provides early warning.

Why 30s probe interval? Balances recovery speed against unnecessary load. A 30-second probe interval means a provider can recover within 30 seconds of the underlying issue being resolved.

Why 60s throttle? Upstream 429 responses typically include a Retry-After header suggesting 30-60 seconds. The 60-second cooldown is conservative and avoids hammering a rate-limited provider.

Next Steps

Automatic Failover — what happens when a provider goes down
Provider Routing — how the router selects a provider
Upstream Providers — the infrastructure behind each model