Cost Optimization
Strategy 1: Prompt Caching (50-90% Savings)
The single biggest cost lever. If you send the same system prompt across many requests, prompt caching avoids re-processing those tokens.
| Provider | Savings | Trigger |
|---|---|---|
| OpenAI / Azure | 50% | Automatic, prefix >= 1024 tokens |
| Anthropic | 90% | Gateway auto-injects cache_control (system prompt >= 3000 chars) |
| DeepSeek | 90% | Automatic disk caching |
| Gemini 2.5+ | 90% | Implicit caching |
Impact example: A Claude Code user sending 50 requests/day with 20K-token system prompts saves ~$81/month on input tokens alone.
See the Prompt Caching guide for details.
Strategy 2: Optimal Routing
The same model is often available through multiple providers. Chuizi.AI automatically routes to the optimal provider for best cost and performance. You benefit from this without any configuration -- just use the model name (e.g., deepseek/deepseek-chat) and the gateway handles provider selection.
For models available through multiple providers, this automatic routing can result in significant cost savings compared to using a single provider directly.
Strategy 3: Model Tiering
Not every task needs the most capable model. Match model capability to task difficulty:
| Task Type | Recommended Model | Input Price | Output Price |
|---|---|---|---|
| Classification, tagging, simple extraction | openai/gpt-4.1-nano | $0.10/M | $0.40/M |
| Summarization, translation, Q&A | openai/gpt-4.1-mini | $0.40/M | $1.60/M |
| Code generation, analysis | anthropic/claude-sonnet-4-6 | $3.00/M | $15.00/M |
| Complex reasoning, research | openai/gpt-5 | $2.63/M | $15.00/M |
| Hardest problems, agentic workflows | anthropic/claude-opus-4-6 | $15.00/M | $75.00/M |
Cost difference: Processing 1M input + 100K output tokens:
- With GPT-4.1-nano: $0.14
- With Claude Opus 4-6: $22.50
- Ratio: 160x
Use a routing strategy in your application: classify the task first with a cheap model, then dispatch to the appropriate tier.
Strategy 4: Reduce Token Usage
Set max_tokens explicitly
Without max_tokens, some models generate until their context limit. Always set a reasonable cap:
{ "model": "openai/gpt-4.1", "messages": [{"role": "user", "content": "Translate to French: Hello"}], "max_tokens": 100 }
Trim conversation history
Long conversation histories inflate input tokens. Keep only the messages the model needs:
# Instead of sending the entire history: messages = conversation_history[-10:] # Keep last 10 turns # Or summarize older messages: messages = [ {"role": "system", "content": "Previous context summary: ..."}, *recent_messages, ]
Use detail: "low" for vision
When analyzing images, use low detail mode unless you need OCR or fine-grained analysis:
{ "type": "image_url", "image_url": { "url": "https://example.com/photo.jpg", "detail": "low" } }
This reduces image token cost from ~1000+ tokens to ~85 tokens.
Monthly Cost Estimates
Assumptions: Chuizi.AI 1.05x multiplier included. Average request is 2000 input tokens + 500 output tokens.
Light Usage (Individual Developer)
| Metric | Value |
|---|---|
| Requests/day | 50 |
| Model | GPT-4.1-mini |
| Input cost | 50 x 2000 / 1M x $0.40 = $0.04/day |
| Output cost | 50 x 500 / 1M x $1.60 = $0.04/day |
| Monthly (x1.05) | ~$2.52 |
Medium Usage (Small Team / Claude Code)
| Metric | Value |
|---|---|
| Requests/day | 500 |
| Model | Claude Sonnet 4 (with caching) |
| Input cost (90% cached) | 500 x 2000 / 1M x ($3.00 x 0.1 + $3.00 x 0.9 x 0.1) = $0.57/day |
| Output cost | 500 x 500 / 1M x $15.00 = $3.75/day |
| Monthly (x1.05) | ~$136 |
Heavy Usage (Production Application)
| Metric | Value |
|---|---|
| Requests/day | 10,000 |
| Model mix | 70% GPT-4.1-nano, 20% GPT-4.1-mini, 10% Claude Sonnet |
| Blended input cost/req | ~$0.0003 |
| Blended output cost/req | ~$0.0008 |
| Monthly (x1.05) | ~$346 |
Monitoring Costs
Per-Request Cost
Every response includes cost data in the x_chuizi field:
{ "x_chuizi": { "generation_id": "gen-xxxxxxxxxxxxxxxx", "cost": "0.00057600" } }
Generation API
Query detailed cost breakdown for any request:
curl https://api.chuizi.ai/v1/generation?id=gen-xxxxxxxxxxxxxxxx \ -H "Authorization: Bearer ck-your-key-here"
Daily Limits
Set a daily spending cap on your API key to prevent runaway costs. Configure this in the dashboard under API Keys > Edit > Daily Limit.
Next Steps
- Prompt Caching — detailed caching guide for all providers
- Billing Model — understand how costs are calculated
- Balance Alerts — get notified before running out of credits