Cost Optimization

Strategy 1: Prompt Caching (50-90% Savings)

The single biggest cost lever. If you send the same system prompt across many requests, prompt caching avoids re-processing those tokens.

Provider	Savings	Trigger
OpenAI / Azure	50%	Automatic, prefix >= 1024 tokens
Anthropic	90%	Gateway auto-injects `cache_control` (system prompt >= 3000 chars)
DeepSeek	90%	Automatic disk caching
Gemini 2.5+	90%	Implicit caching

Impact example: A Claude Code user sending 50 requests/day with 20K-token system prompts saves ~$81/month on input tokens alone.

See the Prompt Caching guide for details.

Strategy 2: Optimal Routing

The same model is often available through multiple providers. Chuizi.AI automatically routes to the optimal provider for best cost and performance. You benefit from this without any configuration -- just use the model name (e.g., deepseek/deepseek-chat) and the gateway handles provider selection.

For models available through multiple providers, this automatic routing can result in significant cost savings compared to using a single provider directly.

Strategy 3: Model Tiering

Not every task needs the most capable model. Match model capability to task difficulty:

Task Type	Recommended Model	Input Price	Output Price
Classification, tagging, simple extraction	`openai/gpt-4.1-nano`	$0.10/M	$0.40/M
Summarization, translation, Q&A	`openai/gpt-4.1-mini`	$0.40/M	$1.60/M
Code generation, analysis	`anthropic/claude-sonnet-4-6`	$3.00/M	$15.00/M
Complex reasoning, research	`openai/gpt-5`	$2.63/M	$15.00/M
Hardest problems, agentic workflows	`anthropic/claude-opus-4-6`	$15.00/M	$75.00/M

Cost difference: Processing 1M input + 100K output tokens:

With GPT-4.1-nano: $0.14
With Claude Opus 4-6: $22.50
Ratio: 160x

Use a routing strategy in your application: classify the task first with a cheap model, then dispatch to the appropriate tier.

Strategy 4: Reduce Token Usage

Set `max_tokens` explicitly

Without max_tokens, some models generate until their context limit. Always set a reasonable cap:

config.json

json

{
  "model": "openai/gpt-4.1",
  "messages": [{"role": "user", "content": "Translate to French: Hello"}],
  "max_tokens": 100
}

Trim conversation history

Long conversation histories inflate input tokens. Keep only the messages the model needs:

example.py

python

# Instead of sending the entire history:
messages = conversation_history[-10:]  # Keep last 10 turns

# Or summarize older messages:
messages = [
    {"role": "system", "content": "Previous context summary: ..."},
    *recent_messages,
]

Use `detail: "low"` for vision

When analyzing images, use low detail mode unless you need OCR or fine-grained analysis:

config.json

json

{
  "type": "image_url",
  "image_url": {
    "url": "https://example.com/photo.jpg",
    "detail": "low"
  }
}

This reduces image token cost from ~1000+ tokens to ~85 tokens.

Monthly Cost Estimates

Assumptions: Chuizi.AI 1.05x multiplier included. Average request is 2000 input tokens + 500 output tokens.

Light Usage (Individual Developer)

Metric	Value
Requests/day	50
Model	GPT-4.1-mini
Input cost	50 x 2000 / 1M x $0.40 = $0.04/day
Output cost	50 x 500 / 1M x $1.60 = $0.04/day
Monthly (x1.05)	~$2.52

Medium Usage (Small Team / Claude Code)

Metric	Value
Requests/day	500
Model	Claude Sonnet 4 (with caching)
Input cost (90% cached)	500 x 2000 / 1M x ($3.00 x 0.1 + $3.00 x 0.9 x 0.1) = $0.57/day
Output cost	500 x 500 / 1M x $15.00 = $3.75/day
Monthly (x1.05)	~$136

Heavy Usage (Production Application)

Metric	Value
Requests/day	10,000
Model mix	70% GPT-4.1-nano, 20% GPT-4.1-mini, 10% Claude Sonnet
Blended input cost/req	~$0.0003
Blended output cost/req	~$0.0008
Monthly (x1.05)	~$346

Monitoring Costs

Per-Request Cost

Every response includes cost data in the x_chuizi field:

config.json

json

{
  "x_chuizi": {
    "generation_id": "gen-xxxxxxxxxxxxxxxx",
    "cost": "0.00057600"
  }
}

Generation API

Query detailed cost breakdown for any request:

terminal

bash

curl https://api.chuizi.ai/v1/generation?id=gen-xxxxxxxxxxxxxxxx \
  -H "Authorization: Bearer ck-your-key-here"

Daily Limits

Set a daily spending cap on your API key to prevent runaway costs. Configure this in the dashboard under API Keys > Edit > Daily Limit.

Next Steps

Prompt Caching — detailed caching guide for all providers
Billing Model — understand how costs are calculated
Balance Alerts — get notified before running out of credits

Cost Optimization

Strategy 1: Prompt Caching (50-90% Savings)

Strategy 2: Optimal Routing

Strategy 3: Model Tiering

Strategy 4: Reduce Token Usage

Set max_tokens explicitly

Trim conversation history

Use detail: "low" for vision

Monthly Cost Estimates

Light Usage (Individual Developer)

Medium Usage (Small Team / Claude Code)

Heavy Usage (Production Application)

Monitoring Costs

Per-Request Cost

Generation API

Daily Limits

Next Steps

Set `max_tokens` explicitly

Use `detail: "low"` for vision