Cache Discount Pricing

Prompt Caching allows repeated system prompts and long context prefixes to be cached on the provider side. Subsequent requests that hit the cache pay only a fraction of the input price. For long-conversation tools like Claude Code and Cursor, this saves 80-90% on input costs.

What Is Prompt Caching

When you send multiple requests with the same system prompt or conversation prefix, the provider caches the processed tokens. Cached tokens do not need to be reprocessed and are billed at a significantly reduced rate.

Key point: The Chuizi.AI gateway automatically injects cache_control markers for Anthropic models when the system prompt is 3000 characters or longer. You do not need to change any code.

Providers with Cache Support

ProviderCache TypeTriggerSavings
AnthropicExplicit cachingGateway auto-injects cache_control90%
OpenAIAutomatic cachingPrefix >= 1024 tokens50%
DeepSeekDisk cachingAutomatic, 64-token alignment90%
Google GeminiImplicit cachingAutomatic90%

Next Steps