Automatic Failover

When an upstream provider fails, the gateway transparently retries with the next provider in the chain within the same HTTP request.

How It Works

The failover mechanism is built into the routing loop. When the Gateway sends a request to an upstream provider and receives a failure response, three things happen:

  1. The circuit breaker records the failure for that specific provider:model pair
  2. The router moves to the next provider in the chain
  3. The request is re-sent to the new provider (after protocol translation if needed)

The user's request stays open during this process. From the client's perspective, latency may increase slightly (one extra round-trip), but the request succeeds.

Trigger Conditions

Upstream ResponseGateway Behavior
HTTP 5xx (server error)Record failure in circuit breaker. Try next provider. After 5 consecutive 5xx errors for this provider:model, circuit opens (state: down).
HTTP 429 (rate limited)Set circuit to throttled state for 60 seconds. Try next provider immediately.
Timeout (no response)Treat as a failure. Try next provider.
HTTP 4xx (client error)No failover. Return error to user. These indicate bad input, not provider problems.

Worked Example: Claude Sonnet 4.6

anthropic/claude-sonnet-4-6 has a three-backup path set for high availability.

Scenario: Primary provider returns 503

                 ┌─────────────────────────────────────────────────────┐
                 │ User Request                                       │
                 │ POST /v1/chat/completions                          │
                 │ model: "anthropic/claude-sonnet-4-6"               │
                 └────────────────────┬────────────────────────────────┘
                                      │
                     ┌────────────────▼─────────────────────┐
                     │ Router: Try Provider A (priority 1)   │
                     └────────────────┬─────────────────────┘
                                      │
                ┌─────────────────────▼──────────────────────┐
                │ Provider A → 503 Service Unavailable       │
                │ Health monitor: record failure             │
                └─────────────────────┬──────────────────────┘
                                      │
                     ┌────────────────▼─────────────────────┐
                     │ Router: Try Provider B (priority 2)   │
                     └────────────────┬─────────────────────┘
                                      │
                ┌─────────────────────▼──────────────────────┐
                │ Provider B → 200 OK                        │
                │ Response streamed to user                  │
                └────────────────────────────────────────────┘

The user receives a successful response. The only observable effect is slightly higher latency (one extra upstream round-trip).

Scenario: Primary provider is already marked as down

When the health monitor has detected ongoing failures for a provider, the router skips it entirely without making a network request. No wasted latency on a provider known to be failing.

Worked Example: DeepSeek V3.2

deepseek/deepseek-chat (V3.2) chains across three providers in different cloud ecosystems and geographic regions.

Scenario: Primary provider returns 429 (rate limited)

                     ┌──────────────────────────────────┐
                     │ Router: Try Provider A (priority 1) │
                     └────────────────┬─────────────────┘
                                      │
                ┌─────────────────────▼──────────────────────┐
                │ Provider A → 429 Too Many Requests         │
                │ Health monitor: set "throttled" for 60s    │
                └─────────────────────┬──────────────────────┘
                                      │
                     ┌────────────────▼─────────────────────┐
                     │ Router: Try Provider B (priority 2)   │
                     └────────────────┬─────────────────────┘
                                      │
                ┌─────────────────────▼──────────────────────┐
                │ Provider B → 200 OK                        │
                │ Response returned to user                  │
                └────────────────────────────────────────────┘

For the next 60 seconds, requests automatically skip the throttled provider and go to the backup. After the cooldown window expires, the primary provider is tried again.

This cross-ecosystem failover means a rate limit on one cloud provider triggers automatic routing to a different provider hosting the same model, with no user intervention and no configuration change.

What Failover Does NOT Do

  • Does not retry on 4xx errors. If the user sends an invalid request (bad JSON, unsupported parameter, missing required field), that error is returned directly. Switching providers would not help.

  • Does not retry within the same provider. If Bedrock returns 500, the router tries Vertex next. It does not retry Bedrock with the same request.

  • Does not change the model. Failover only switches the upstream provider serving the same model. It never substitutes a different model (e.g., falling back from Sonnet to Haiku). For model-level substitution, see Smart Routing.

  • Does not mask persistent outages. If all providers in a chain are down, the request fails with a 503 ProviderUnavailableError. The model disappears from GET /v1/models until at least one provider recovers.

Availability Impact

The more providers in a model's chain, the higher its effective availability:

Chain DepthAvailability (per provider at 99%)Example Models
1 provider99.0%openai/o3, doubao/doubao-1.5-pro-256k
2 providers99.99%openai/gpt-5, deepseek/deepseek-v3.1
3 providers99.999%anthropic/claude-sonnet-4-6, deepseek/deepseek-chat
4 providers99.9999%moonshot/kimi-k2.5, meta/llama-4-maverick

These are theoretical maximums assuming independent failure modes. In practice, correlated failures (e.g., a model vendor's global outage affecting all providers simultaneously) reduce the benefit. Even so, multi-backup path sets provide substantial resilience compared to single-provider routing.

The overall platform targets 99.4%+ availability across the full model catalog, based on the diversity of upstream infrastructure.

Next Steps

Automatic Failover — Chuizi AI Docs | Chuizi AI