Streaming

Enabling Streaming

Set "stream": true in your request body:

config.json

json

{
  "model": "anthropic/claude-sonnet-4-6",
  "messages": [{"role": "user", "content": "Write a haiku about APIs."}],
  "stream": true,
  "max_tokens": 256
}

The response arrives as a series of Server-Sent Events (SSE) over a single HTTP connection.

SSE Event Format

Each event is a line prefixed with data: , followed by a JSON object. Events are separated by two newlines. The stream ends with data: [DONE].

data: {"id":"gen-xxx","object":"chat.completion.chunk",...}

data: {"id":"gen-xxx","object":"chat.completion.chunk",...}

data: [DONE]

Protocol Differences

OpenAI Protocol (`/v1/chat/completions`)

This is the default protocol. Each chunk contains a delta object with partial content:

data: {"id":"gen-abc123","object":"chat.completion.chunk","created":1712000000,"model":"openai/gpt-4.1","choices":[{"index":0,"delta":{"role":"assistant"},"finish_reason":null}]}

data: {"id":"gen-abc123","object":"chat.completion.chunk","created":1712000000,"model":"openai/gpt-4.1","choices":[{"index":0,"delta":{"content":"In"},"finish_reason":null}]}

data: {"id":"gen-abc123","object":"chat.completion.chunk","created":1712000000,"model":"openai/gpt-4.1","choices":[{"index":0,"delta":{"content":" the"},"finish_reason":null}]}

data: {"id":"gen-abc123","object":"chat.completion.chunk","created":1712000000,"model":"openai/gpt-4.1","choices":[{"index":0,"delta":{},"finish_reason":"stop"}],"usage":{"prompt_tokens":14,"completion_tokens":17,"total_tokens":31}}

data: [DONE]

Anthropic Protocol (`/anthropic/v1/messages`)

The Anthropic protocol uses typed events with an event: field:

event: message_start
data: {"type":"message_start","message":{"id":"msg_abc123","type":"message","role":"assistant","model":"claude-sonnet-4-6","usage":{"input_tokens":25,"output_tokens":1}}}

event: content_block_start
data: {"type":"content_block_start","index":0,"content_block":{"type":"text","text":""}}

event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"In"}}

event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":" the"}}

event: content_block_stop
data: {"type":"content_block_stop","index":0}

event: message_delta
data: {"type":"message_delta","delta":{"stop_reason":"end_turn"},"usage":{"output_tokens":17}}

event: message_stop
data: {"type":"message_stop"}

Gemini Protocol (`/gemini/v1beta/models/*/streamGenerateContent`)

Gemini streaming uses its native format. Each chunk contains partial candidates:

config.json

json

{"candidates":[{"content":{"parts":[{"text":"In"}],"role":"model"}}],"usageMetadata":{"promptTokenCount":10,"candidatesTokenCount":1}}

Getting Usage in Streaming Responses

By default, the OpenAI protocol does not include token usage in streaming responses. To receive usage data in the final chunk, add stream_options:

config.json

json

{
  "model": "openai/gpt-4.1",
  "messages": [{"role": "user", "content": "Hello"}],
  "stream": true,
  "stream_options": {
    "include_usage": true
  }
}

The last chunk before [DONE] will contain a usage field with prompt_tokens, completion_tokens, and total_tokens.

For the Anthropic protocol, usage is always included in the message_start and message_delta events.

Code Examples

terminal

bash

curl -X POST https://api.chuizi.ai/v1/chat/completions \
  -H "Authorization: Bearer ck-your-key-here" \
  -H "Content-Type: application/json" \
  -N \
  -d '{
    "model": "anthropic/claude-sonnet-4-6",
    "messages": [{"role": "user", "content": "Write a haiku about APIs."}],
    "stream": true,
    "stream_options": {"include_usage": true}
  }'

Error Handling in Streams

If an error occurs mid-stream, the server sends an error event before closing the connection:

data: {"error":{"message":"Upstream provider timeout","type":"server_error","code":"504"}}

data: [DONE]

Your client should handle connection drops and partial responses gracefully. See the Error Handling guide for retry strategies.

Tips

Always set max_tokens when streaming. Without it, some models may generate excessively long responses.
Use stream_options.include_usage if you need to track token consumption per request. Without it, the OpenAI protocol does not report usage in streaming mode.
Buffer for rendering. Rendering every single token update to the DOM can cause layout thrashing. Buffer a few tokens before updating the UI.
Set appropriate timeouts. Streaming connections can last several minutes for long responses. Set your HTTP client timeout to at least 120 seconds.

Next Steps

Chat Completions API — full parameter reference for the streaming endpoint
Error Handling — handle mid-stream errors and connection drops
Choosing a Protocol — pick between OpenAI, Anthropic, or Gemini streaming formats

Streaming

Enabling Streaming

SSE Event Format

Protocol Differences

OpenAI Protocol (/v1/chat/completions)

Anthropic Protocol (/anthropic/v1/messages)

Gemini Protocol (/gemini/v1beta/models/*/streamGenerateContent)

Getting Usage in Streaming Responses

Code Examples

Error Handling in Streams

Tips

Next Steps

OpenAI Protocol (`/v1/chat/completions`)

Anthropic Protocol (`/anthropic/v1/messages`)

Gemini Protocol (`/gemini/v1beta/models/*/streamGenerateContent`)