Cache Discount Pricing
Prompt Caching allows repeated system prompts and long context prefixes to be cached on the provider side. Subsequent requests that hit the cache pay only a fraction of the input price. For long-conversation tools like Claude Code and Cursor, this saves 80-90% on input costs.
What Is Prompt Caching
When you send multiple requests with the same system prompt or conversation prefix, the provider caches the processed tokens. Cached tokens do not need to be reprocessed and are billed at a significantly reduced rate.
Key point: The Chuizi.AI gateway automatically injects cache_control markers for Anthropic models when the system prompt is 3000 characters or longer. You do not need to change any code.
Providers with Cache Support
| Provider | Cache Type | Trigger | Savings |
|---|---|---|---|
| Anthropic | Explicit caching | Gateway auto-injects cache_control | 90% |
| OpenAI | Automatic caching | Prefix >= 1024 tokens | 50% |
| DeepSeek | Disk caching | Automatic, 64-token alignment | 90% |
| Google Gemini | Implicit caching | Automatic | 90% |
Next Steps
- Billing Model — Understand the three billing types and the 1.05x multiplier
- Cost Optimization — Strategies to reduce your overall API spend
- Prompt Caching Guide — Step-by-step guide to enabling caching in your requests