Error Handling
Error Response Formats
OpenAI Protocol (/v1/*)
All errors follow the OpenAI error format:
{ "error": { "message": "Insufficient balance. Current: $0.12, Required: $0.50", "type": "insufficient_quota", "code": "402" } }
Anthropic Protocol (/anthropic/*)
Errors from the Anthropic protocol use the Anthropic error format:
{ "type": "error", "error": { "type": "authentication_error", "message": "Invalid API key" } }
Error Code Reference
| HTTP Status | Error Type | Description | Retryable |
|---|---|---|---|
| 400 | invalid_request_error | Malformed request, missing required fields, invalid parameter values | No |
| 401 | authentication_error | Invalid or missing API key | No |
| 402 | insufficient_quota | Account balance too low for the estimated request cost | No (top up first) |
| 403 | permission_error | API key does not have permission for this model or action | No |
| 404 | not_found | Model does not exist or is not available | No |
| 429 | rate_limit_error | Too many requests. Check Retry-After header | Yes (after delay) |
| 500 | server_error | Internal gateway error | Yes |
| 502 | upstream_error | Upstream provider returned an invalid response | Yes |
| 503 | service_unavailable | Gateway is overloaded or upstream provider is down | Yes |
| 504 | timeout | Request timed out waiting for upstream provider | Yes |
Common Errors and Solutions
401: Authentication Error
{ "error": { "message": "Invalid API key", "type": "authentication_error", "code": "401" } }
Causes:
- API key is missing from the request.
- API key does not start with
ck-. - API key has been revoked or deactivated.
Fix: Check that your Authorization: Bearer ck-... header is present and the key is active in the dashboard.
402: Insufficient Balance
{ "error": { "message": "Insufficient balance. Current: $0.12", "type": "insufficient_quota", "code": "402" } }
Fix: Top up your account in the dashboard under Billing. The gateway pre-deducts estimated cost before sending the request upstream. Ensure your balance covers the estimated cost.
429: Rate Limited
{ "error": { "message": "Rate limit exceeded. Retry after 2 seconds.", "type": "rate_limit_error", "code": "429" } }
Fix: Wait for the duration specified in the Retry-After response header, then retry. Default rate limit is 60 RPM per API key.
504: Timeout
{ "error": { "message": "Request timed out after 120 seconds", "type": "timeout", "code": "504" } }
Fix: Large models (Opus 4, GPT-5, o3) with long outputs may take over 60 seconds. Set your client timeout to at least 120 seconds. Consider using streaming to get partial results faster.
Retry Strategy: Exponential Backoff
For retryable errors (429, 500, 502, 503, 504), use exponential backoff with jitter:
import time import random from openai import OpenAI, APIError, RateLimitError, APITimeoutError client = OpenAI( base_url="https://api.chuizi.ai/v1", api_key="ck-your-key-here", ) def chat_with_retry(messages, model="openai/gpt-4.1", max_retries=5): for attempt in range(max_retries): try: return client.chat.completions.create( model=model, messages=messages, max_tokens=1024, ) except RateLimitError as e: # Use Retry-After header if available retry_after = getattr(e, "headers", {}).get("retry-after") wait = float(retry_after) if retry_after else (2 ** attempt) + random.random() print(f"Rate limited. Retrying in {wait:.1f}s...") time.sleep(wait) except APITimeoutError: wait = (2 ** attempt) + random.random() print(f"Timeout. Retrying in {wait:.1f}s...") time.sleep(wait) except APIError as e: if e.status_code in (500, 502, 503): wait = (2 ** attempt) + random.random() print(f"Server error {e.status_code}. Retrying in {wait:.1f}s...") time.sleep(wait) else: raise # Non-retryable error raise Exception(f"Failed after {max_retries} retries") response = chat_with_retry([{"role": "user", "content": "Hello"}]) print(response.choices[0].message.content)
import OpenAI from 'openai'; const client = new OpenAI({ baseURL: 'https://api.chuizi.ai/v1', apiKey: 'ck-your-key-here', maxRetries: 0, // We handle retries ourselves }); async function chatWithRetry(messages, model = 'openai/gpt-4.1', maxRetries = 5) { for (let attempt = 0; attempt < maxRetries; attempt++) { try { return await client.chat.completions.create({ model, messages, max_tokens: 1024, }); } catch (error) { const status = error.status; const retryable = [429, 500, 502, 503, 504].includes(status); if (!retryable || attempt === maxRetries - 1) { throw error; } // Use Retry-After header or exponential backoff const retryAfter = error.headers?.['retry-after']; const wait = retryAfter ? parseFloat(retryAfter) * 1000 : Math.pow(2, attempt) * 1000 + Math.random() * 1000; console.log(`Error ${status}. Retrying in ${(wait / 1000).toFixed(1)}s...`); await new Promise((resolve) => setTimeout(resolve, wait)); } } } const response = await chatWithRetry([{ role: 'user', content: 'Hello' }]); console.log(response.choices[0].message.content);
Idempotency and Safe Retries
All Chuizi.AI POST endpoints are safe to retry because:
- Pre-deduction is idempotent. If a request fails after the balance freeze, the frozen amount is automatically released.
- Generation IDs are unique. Each request creates a new
gen-xxxID. Retrying creates a new generation, not a duplicate charge on the same one. - No side effects upstream. Chat completion requests are stateless -- the same request sent twice produces two independent completions.
However, be aware that retried requests do consume tokens and incur charges for each attempt that reaches the upstream provider. The retry strategies above only retry on errors that typically occur before the upstream provider is contacted (rate limits) or after it fails (timeouts, server errors).
Tips
- Log the
x-chuizi-generation-idheader. Every response includes a generation ID. When reporting issues, include this ID for faster debugging. - Use the OpenAI SDK's built-in retry. Both the Python and Node.js OpenAI SDKs have built-in retry logic. Set
max_retrieswhen initializing the client. - Set client-side timeouts. The default HTTP timeout in many libraries is 30 seconds, which is too short for large model responses. Set it to 120-300 seconds.
- Monitor 429 rates. If you hit rate limits frequently, contact support to increase your per-key RPM limit, or distribute requests across multiple API keys.
Next Steps
- Error Codes Reference — complete error code listing with descriptions
- Rate Limits — understand and configure rate limiting
- Production Best Practices — comprehensive production readiness guide