Cost Optimization

Strategy 1: Prompt Caching (50-90% Savings)

The single biggest cost lever. If you send the same system prompt across many requests, prompt caching avoids re-processing those tokens.

ProviderSavingsTrigger
OpenAI / Azure50%Automatic, prefix >= 1024 tokens
Anthropic90%Gateway auto-injects cache_control (system prompt >= 3000 chars)
DeepSeek90%Automatic disk caching
Gemini 2.5+90%Implicit caching

Impact example: A Claude Code user sending 50 requests/day with 20K-token system prompts saves ~$81/month on input tokens alone.

See the Prompt Caching guide for details.

Strategy 2: Optimal Routing

The same model is often available through multiple providers. Chuizi.AI automatically routes to the optimal provider for best cost and performance. You benefit from this without any configuration -- just use the model name (e.g., deepseek/deepseek-chat) and the gateway handles provider selection.

For models available through multiple providers, this automatic routing can result in significant cost savings compared to using a single provider directly.

Strategy 3: Model Tiering

Not every task needs the most capable model. Match model capability to task difficulty:

Task TypeRecommended ModelInput PriceOutput Price
Classification, tagging, simple extractionopenai/gpt-4.1-nano$0.10/M$0.40/M
Summarization, translation, Q&Aopenai/gpt-4.1-mini$0.40/M$1.60/M
Code generation, analysisanthropic/claude-sonnet-4-6$3.00/M$15.00/M
Complex reasoning, researchopenai/gpt-5$2.63/M$15.00/M
Hardest problems, agentic workflowsanthropic/claude-opus-4-6$15.00/M$75.00/M

Cost difference: Processing 1M input + 100K output tokens:

  • With GPT-4.1-nano: $0.14
  • With Claude Opus 4-6: $22.50
  • Ratio: 160x

Use a routing strategy in your application: classify the task first with a cheap model, then dispatch to the appropriate tier.

Strategy 4: Reduce Token Usage

Set max_tokens explicitly

Without max_tokens, some models generate until their context limit. Always set a reasonable cap:

config.json
json
{
  "model": "openai/gpt-4.1",
  "messages": [{"role": "user", "content": "Translate to French: Hello"}],
  "max_tokens": 100
}

Trim conversation history

Long conversation histories inflate input tokens. Keep only the messages the model needs:

example.py
python
# Instead of sending the entire history:
messages = conversation_history[-10:]  # Keep last 10 turns

# Or summarize older messages:
messages = [
    {"role": "system", "content": "Previous context summary: ..."},
    *recent_messages,
]

Use detail: "low" for vision

When analyzing images, use low detail mode unless you need OCR or fine-grained analysis:

config.json
json
{
  "type": "image_url",
  "image_url": {
    "url": "https://example.com/photo.jpg",
    "detail": "low"
  }
}

This reduces image token cost from ~1000+ tokens to ~85 tokens.

Monthly Cost Estimates

Assumptions: Chuizi.AI 1.05x multiplier included. Average request is 2000 input tokens + 500 output tokens.

Light Usage (Individual Developer)

MetricValue
Requests/day50
ModelGPT-4.1-mini
Input cost50 x 2000 / 1M x $0.40 = $0.04/day
Output cost50 x 500 / 1M x $1.60 = $0.04/day
Monthly (x1.05)~$2.52

Medium Usage (Small Team / Claude Code)

MetricValue
Requests/day500
ModelClaude Sonnet 4 (with caching)
Input cost (90% cached)500 x 2000 / 1M x ($3.00 x 0.1 + $3.00 x 0.9 x 0.1) = $0.57/day
Output cost500 x 500 / 1M x $15.00 = $3.75/day
Monthly (x1.05)~$136

Heavy Usage (Production Application)

MetricValue
Requests/day10,000
Model mix70% GPT-4.1-nano, 20% GPT-4.1-mini, 10% Claude Sonnet
Blended input cost/req~$0.0003
Blended output cost/req~$0.0008
Monthly (x1.05)~$346

Monitoring Costs

Per-Request Cost

Every response includes cost data in the x_chuizi field:

config.json
json
{
  "x_chuizi": {
    "generation_id": "gen-xxxxxxxxxxxxxxxx",
    "cost": "0.00057600"
  }
}

Generation API

Query detailed cost breakdown for any request:

terminal
bash
curl https://api.chuizi.ai/v1/generation?id=gen-xxxxxxxxxxxxxxxx \
  -H "Authorization: Bearer ck-your-key-here"

Daily Limits

Set a daily spending cap on your API key to prevent runaway costs. Configure this in the dashboard under API Keys > Edit > Daily Limit.

Next Steps

Cost Optimization — Chuizi AI Docs | Chuizi AI