Error Handling

Error Response Formats

OpenAI Protocol (/v1/*)

All errors follow the OpenAI error format:

config.json
json
{
  "error": {
    "message": "Insufficient balance. Current: $0.12, Required: $0.50",
    "type": "insufficient_quota",
    "code": "402"
  }
}

Anthropic Protocol (/anthropic/*)

Errors from the Anthropic protocol use the Anthropic error format:

config.json
json
{
  "type": "error",
  "error": {
    "type": "authentication_error",
    "message": "Invalid API key"
  }
}

Error Code Reference

HTTP StatusError TypeDescriptionRetryable
400invalid_request_errorMalformed request, missing required fields, invalid parameter valuesNo
401authentication_errorInvalid or missing API keyNo
402insufficient_quotaAccount balance too low for the estimated request costNo (top up first)
403permission_errorAPI key does not have permission for this model or actionNo
404not_foundModel does not exist or is not availableNo
429rate_limit_errorToo many requests. Check Retry-After headerYes (after delay)
500server_errorInternal gateway errorYes
502upstream_errorUpstream provider returned an invalid responseYes
503service_unavailableGateway is overloaded or upstream provider is downYes
504timeoutRequest timed out waiting for upstream providerYes

Common Errors and Solutions

401: Authentication Error

config.json
json
{
  "error": {
    "message": "Invalid API key",
    "type": "authentication_error",
    "code": "401"
  }
}

Causes:

  • API key is missing from the request.
  • API key does not start with ck-.
  • API key has been revoked or deactivated.

Fix: Check that your Authorization: Bearer ck-... header is present and the key is active in the dashboard.

402: Insufficient Balance

config.json
json
{
  "error": {
    "message": "Insufficient balance. Current: $0.12",
    "type": "insufficient_quota",
    "code": "402"
  }
}

Fix: Top up your account in the dashboard under Billing. The gateway pre-deducts estimated cost before sending the request upstream. Ensure your balance covers the estimated cost.

429: Rate Limited

config.json
json
{
  "error": {
    "message": "Rate limit exceeded. Retry after 2 seconds.",
    "type": "rate_limit_error",
    "code": "429"
  }
}

Fix: Wait for the duration specified in the Retry-After response header, then retry. Default rate limit is 60 RPM per API key.

504: Timeout

config.json
json
{
  "error": {
    "message": "Request timed out after 120 seconds",
    "type": "timeout",
    "code": "504"
  }
}

Fix: Large models (Opus 4, GPT-5, o3) with long outputs may take over 60 seconds. Set your client timeout to at least 120 seconds. Consider using streaming to get partial results faster.

Retry Strategy: Exponential Backoff

For retryable errors (429, 500, 502, 503, 504), use exponential backoff with jitter:

example.py
python
import time
import random
from openai import OpenAI, APIError, RateLimitError, APITimeoutError

client = OpenAI(
    base_url="https://api.chuizi.ai/v1",
    api_key="ck-your-key-here",
)


def chat_with_retry(messages, model="openai/gpt-4.1", max_retries=5):
    for attempt in range(max_retries):
        try:
            return client.chat.completions.create(
                model=model,
                messages=messages,
                max_tokens=1024,
            )
        except RateLimitError as e:
            # Use Retry-After header if available
            retry_after = getattr(e, "headers", {}).get("retry-after")
            wait = float(retry_after) if retry_after else (2 ** attempt) + random.random()
            print(f"Rate limited. Retrying in {wait:.1f}s...")
            time.sleep(wait)
        except APITimeoutError:
            wait = (2 ** attempt) + random.random()
            print(f"Timeout. Retrying in {wait:.1f}s...")
            time.sleep(wait)
        except APIError as e:
            if e.status_code in (500, 502, 503):
                wait = (2 ** attempt) + random.random()
                print(f"Server error {e.status_code}. Retrying in {wait:.1f}s...")
                time.sleep(wait)
            else:
                raise  # Non-retryable error
    raise Exception(f"Failed after {max_retries} retries")


response = chat_with_retry([{"role": "user", "content": "Hello"}])
print(response.choices[0].message.content)
index.mjs
javascript
import OpenAI from 'openai';

const client = new OpenAI({
  baseURL: 'https://api.chuizi.ai/v1',
  apiKey: 'ck-your-key-here',
  maxRetries: 0, // We handle retries ourselves
});

async function chatWithRetry(messages, model = 'openai/gpt-4.1', maxRetries = 5) {
  for (let attempt = 0; attempt < maxRetries; attempt++) {
    try {
      return await client.chat.completions.create({
        model,
        messages,
        max_tokens: 1024,
      });
    } catch (error) {
      const status = error.status;
      const retryable = [429, 500, 502, 503, 504].includes(status);

      if (!retryable || attempt === maxRetries - 1) {
        throw error;
      }

      // Use Retry-After header or exponential backoff
      const retryAfter = error.headers?.['retry-after'];
      const wait = retryAfter
        ? parseFloat(retryAfter) * 1000
        : Math.pow(2, attempt) * 1000 + Math.random() * 1000;

      console.log(`Error ${status}. Retrying in ${(wait / 1000).toFixed(1)}s...`);
      await new Promise((resolve) => setTimeout(resolve, wait));
    }
  }
}

const response = await chatWithRetry([{ role: 'user', content: 'Hello' }]);
console.log(response.choices[0].message.content);

Idempotency and Safe Retries

All Chuizi.AI POST endpoints are safe to retry because:

  1. Pre-deduction is idempotent. If a request fails after the balance freeze, the frozen amount is automatically released.
  2. Generation IDs are unique. Each request creates a new gen-xxx ID. Retrying creates a new generation, not a duplicate charge on the same one.
  3. No side effects upstream. Chat completion requests are stateless -- the same request sent twice produces two independent completions.

However, be aware that retried requests do consume tokens and incur charges for each attempt that reaches the upstream provider. The retry strategies above only retry on errors that typically occur before the upstream provider is contacted (rate limits) or after it fails (timeouts, server errors).

Tips

  • Log the x-chuizi-generation-id header. Every response includes a generation ID. When reporting issues, include this ID for faster debugging.
  • Use the OpenAI SDK's built-in retry. Both the Python and Node.js OpenAI SDKs have built-in retry logic. Set max_retries when initializing the client.
  • Set client-side timeouts. The default HTTP timeout in many libraries is 30 seconds, which is too short for large model responses. Set it to 120-300 seconds.
  • Monitor 429 rates. If you hit rate limits frequently, contact support to increase your per-key RPM limit, or distribute requests across multiple API keys.

Next Steps

Error Handling — Chuizi AI Docs | Chuizi AI