Provider-Specific Behavior

While Chuizi.AI normalizes API responses to a consistent format, upstream providers have behavioral differences that can affect your application. This page documents the key differences you should be aware of.

Azure OpenAI

Content Filtering

Azure OpenAI applies a mandatory content filter that is stricter than the standard OpenAI API. Requests that succeed on OpenAI may be rejected on Azure.

BehaviorDetails
Filter categoriesHate, self-harm, sexual content, violence
Default severityMedium threshold for all categories
Response on block400 bad_request with filter category in the error message
ImpactSome creative writing, medical, or educational prompts may be blocked

When a request is blocked by the content filter, the error message includes which category triggered the block. You cannot disable the filter through the gateway.

config.json
json
{
  "error": {
    "message": "Content blocked by Azure content filter: violence (severity: medium)",
    "type": "invalid_request_error",
    "code": "bad_request"
  }
}

Responses API

Azure supports the OpenAI Responses API for select models (codex and pro variants). The gateway translates Responses API requests to the appropriate Azure endpoints. Note that tool_choice and some parameters may behave differently compared to direct OpenAI access.

Amazon Bedrock (Anthropic Models)

Max Tokens

Some Claude models have lower max_tokens limits when served through Bedrock compared to the Anthropic direct API.

ModelAnthropic DirectBedrock
Claude Opus 4.632,7684,096 (default)
Claude Sonnet 4.68,1924,096 (default)

If you need higher output limits on Bedrock-routed models, explicitly set max_tokens in your request. The gateway passes this through but cannot exceed Bedrock's hard limits.

Streaming Format

Bedrock uses a different streaming wire format (AWS event stream) compared to Anthropic's SSE. The gateway handles this conversion transparently. You always receive standard SSE regardless of the upstream provider.

Google Gemini

Safety Settings

Gemini applies safety filters by default that are more conservative than most other providers. Content may be blocked across multiple categories.

CategoryDefault Threshold
HarassmentBLOCK_MEDIUM_AND_ABOVE
Hate speechBLOCK_MEDIUM_AND_ABOVE
Sexually explicitBLOCK_MEDIUM_AND_ABOVE
Dangerous contentBLOCK_MEDIUM_AND_ABOVE

When content is blocked, the response includes a finish_reason of safety instead of stop. The gateway maps this to the standard response format but preserves the SAFETY finish reason.

Grounding and Citations

Gemini models may include grounding metadata and web citations in responses. The gateway preserves this information in the response when present but does not add it for non-Gemini models.

Function Calling

Gemini's function calling uses a different schema format internally. The gateway translates between OpenAI's tools format and Gemini's native format. Minor behavioral differences may occur.

DifferenceDetails
tool_choice: "required"Supported but may produce different selection behavior.
Parallel tool callsGemini may return multiple tool calls in one response more aggressively.
Tool result formatHandled by gateway. Send results in OpenAI format.

DeepSeek

Reasoning Tokens

DeepSeek models that support chain-of-thought reasoning return reasoning_tokens in the usage object. These tokens represent the model's internal reasoning process.

config.json
json
{
  "usage": {
    "prompt_tokens": 150,
    "completion_tokens": 200,
    "total_tokens": 350,
    "reasoning_tokens": 80
  }
}

Reasoning tokens are billed as output tokens. The completion_tokens count includes reasoning tokens.

FIM (Fill-in-the-Middle)

DeepSeek Coder models support FIM completion via the standard completions endpoint. Use the suffix parameter. This is not available through the chat completions endpoint.

Alibaba Qwen (via DashScope)

Function Calling

Qwen supports function calling but with subtle behavioral differences.

DifferenceDetails
tool_choiceSupports "auto" and "none". Named tool choice ({"type": "function", "function": {"name": "xxx"}}) may not always force the specified tool.
Tool descriptionsQwen is more sensitive to tool description quality. Vague descriptions may produce unexpected tool selections.
Streaming tool callsTool call arguments may arrive in differently sized chunks.

Long Context

Qwen-Long models support up to 10M tokens of context. Requests with very long context may have higher latency. The gateway does not impose additional context limits beyond what the upstream model supports.

Volcengine Doubao

Response Format

Doubao's structured output support (response_format: { type: "json_object" }) is available but less reliable than OpenAI's implementation for complex schemas. Test thoroughly with your specific use case.

Streaming Differences

Doubao may send larger chunks during streaming compared to OpenAI or Anthropic. This does not affect the content but may cause more "bursty" streaming behavior in your UI.

General Differences

Streaming Chunk Size

Different providers send SSE chunks at different granularities.

ProviderTypical Chunk SizeNotes
OpenAI1-3 tokensVery granular, smooth streaming
Anthropic1-5 tokensSlightly larger chunks
Google5-20 tokensLarger, less frequent chunks
DeepSeek1-3 tokensSimilar to OpenAI
Chinese providersVariableTends toward larger chunks

The gateway passes through chunks as-is without rebuffering. Streaming smoothness depends on the upstream provider.

Model Version Pinning

Some providers use date-based model versions. The gateway maps aliases to the latest stable version.

AliasResolves ToProvider
gpt-4oLatest stable GPT-4o snapshotOpenAI
claude-sonnet-4-6claude-sonnet-4-6Anthropic
gemini-2.5-flashLatest stable 2.5 FlashGoogle

To pin a specific version, use the full versioned model name (e.g., openai/gpt-4o).

Token Counting Variations

Different providers count tokens differently for the same input text. The gateway reports whatever the upstream provider returns in the usage field.

FactorImpact
Tokenizer differencesThe same text may be 100 tokens on OpenAI and 110 on Anthropic.
System prompt handlingSome providers count system prompts differently.
Image tokensToken costs for vision inputs vary significantly between providers.

Billing is always based on the upstream provider's reported token counts, not an independent count by the gateway. See Billing Model for how token counts translate to costs.

Next Steps

Provider-Specific Behavior — Chuizi AI Docs | Chuizi AI