Streaming
Enabling Streaming
Set "stream": true in your request body:
{ "model": "anthropic/claude-sonnet-4-6", "messages": [{"role": "user", "content": "Write a haiku about APIs."}], "stream": true, "max_tokens": 256 }
The response arrives as a series of Server-Sent Events (SSE) over a single HTTP connection.
SSE Event Format
Each event is a line prefixed with data: , followed by a JSON object. Events are separated by two newlines. The stream ends with data: [DONE].
data: {"id":"gen-xxx","object":"chat.completion.chunk",...}
data: {"id":"gen-xxx","object":"chat.completion.chunk",...}
data: [DONE]Protocol Differences
OpenAI Protocol (/v1/chat/completions)
This is the default protocol. Each chunk contains a delta object with partial content:
data: {"id":"gen-abc123","object":"chat.completion.chunk","created":1712000000,"model":"openai/gpt-4.1","choices":[{"index":0,"delta":{"role":"assistant"},"finish_reason":null}]}
data: {"id":"gen-abc123","object":"chat.completion.chunk","created":1712000000,"model":"openai/gpt-4.1","choices":[{"index":0,"delta":{"content":"In"},"finish_reason":null}]}
data: {"id":"gen-abc123","object":"chat.completion.chunk","created":1712000000,"model":"openai/gpt-4.1","choices":[{"index":0,"delta":{"content":" the"},"finish_reason":null}]}
data: {"id":"gen-abc123","object":"chat.completion.chunk","created":1712000000,"model":"openai/gpt-4.1","choices":[{"index":0,"delta":{},"finish_reason":"stop"}],"usage":{"prompt_tokens":14,"completion_tokens":17,"total_tokens":31}}
data: [DONE]Anthropic Protocol (/anthropic/v1/messages)
The Anthropic protocol uses typed events with an event: field:
event: message_start
data: {"type":"message_start","message":{"id":"msg_abc123","type":"message","role":"assistant","model":"claude-sonnet-4-6","usage":{"input_tokens":25,"output_tokens":1}}}
event: content_block_start
data: {"type":"content_block_start","index":0,"content_block":{"type":"text","text":""}}
event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"In"}}
event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":" the"}}
event: content_block_stop
data: {"type":"content_block_stop","index":0}
event: message_delta
data: {"type":"message_delta","delta":{"stop_reason":"end_turn"},"usage":{"output_tokens":17}}
event: message_stop
data: {"type":"message_stop"}Gemini Protocol (/gemini/v1beta/models/*/streamGenerateContent)
Gemini streaming uses its native format. Each chunk contains partial candidates:
{"candidates":[{"content":{"parts":[{"text":"In"}],"role":"model"}}],"usageMetadata":{"promptTokenCount":10,"candidatesTokenCount":1}}
Getting Usage in Streaming Responses
By default, the OpenAI protocol does not include token usage in streaming responses. To receive usage data in the final chunk, add stream_options:
{ "model": "openai/gpt-4.1", "messages": [{"role": "user", "content": "Hello"}], "stream": true, "stream_options": { "include_usage": true } }
The last chunk before [DONE] will contain a usage field with prompt_tokens, completion_tokens, and total_tokens.
For the Anthropic protocol, usage is always included in the message_start and message_delta events.
Code Examples
curl -X POST https://api.chuizi.ai/v1/chat/completions \ -H "Authorization: Bearer ck-your-key-here" \ -H "Content-Type: application/json" \ -N \ -d '{ "model": "anthropic/claude-sonnet-4-6", "messages": [{"role": "user", "content": "Write a haiku about APIs."}], "stream": true, "stream_options": {"include_usage": true} }'
Error Handling in Streams
If an error occurs mid-stream, the server sends an error event before closing the connection:
data: {"error":{"message":"Upstream provider timeout","type":"server_error","code":"504"}}
data: [DONE]Your client should handle connection drops and partial responses gracefully. See the Error Handling guide for retry strategies.
Tips
- Always set
max_tokenswhen streaming. Without it, some models may generate excessively long responses. - Use
stream_options.include_usageif you need to track token consumption per request. Without it, the OpenAI protocol does not report usage in streaming mode. - Buffer for rendering. Rendering every single token update to the DOM can cause layout thrashing. Buffer a few tokens before updating the UI.
- Set appropriate timeouts. Streaming connections can last several minutes for long responses. Set your HTTP client timeout to at least 120 seconds.
Next Steps
- Chat Completions API — full parameter reference for the streaming endpoint
- Error Handling — handle mid-stream errors and connection drops
- Choosing a Protocol — pick between OpenAI, Anthropic, or Gemini streaming formats