Chat Completions
POST /v1/chat/completions
Send chat messages to any of 221 models and receive completions, with support for streaming, tool calling, vision, and structured output.
Request
POST https://api.chuizi.ai/v1/chat/completions
Authentication
Authorization: Bearer ck-your-api-key
Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
model | string | Yes | — | Model name, e.g. anthropic/claude-sonnet-4-6, openai/gpt-4.1 |
messages | array | Yes | — | Array of messages (max 2048) |
max_tokens | integer | No | Model default | Maximum tokens to generate, 1-1,000,000 |
stream | boolean | No | false | Enable SSE streaming output |
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
temperature | number | No | Model default | Sampling temperature, 0-2 |
top_p | number | No | Model default | Nucleus sampling parameter, 0-1 |
stop | string/array | No | — | Stop sequences (max 4, each up to 256 chars) |
tools | array | No | — | Tool/function definitions for function calling |
tool_choice | string/object | No | auto | Tool calling strategy: auto, none, required, or a specific function |
response_format | object | No | — | Structured output format (json_object or json_schema) |
presence_penalty | number | No | 0 | Presence penalty, -2 to 2 |
frequency_penalty | number | No | 0 | Frequency penalty, -2 to 2 |
n | integer | No | 1 | Number of completions to generate, 1-4 |
Message Format
config.json
json
{ "role": "system" | "user" | "assistant" | "tool" | "developer", "content": "string or content parts array", "name": "optional sender name", "tool_calls": [{ "id": "...", "type": "function", "function": { "name": "...", "arguments": "..." } }], "tool_call_id": "corresponding tool_call ID (required when role=tool)" }
Request Example
config.json
json
{ "model": "anthropic/claude-sonnet-4-6", "messages": [ { "role": "system", "content": "You are a helpful assistant." }, { "role": "user", "content": "Explain the basics of quantum computing." } ], "max_tokens": 1024, "temperature": 0.7 }
Response
Non-Streaming Response
config.json
json
{ "id": "gen-xxxxxxxxxxxxxxxx", "object": "chat.completion", "created": 1712000000, "model": "anthropic/claude-sonnet-4-6", "choices": [ { "index": 0, "message": { "role": "assistant", "content": "Quantum computing leverages the principles of quantum mechanics..." }, "finish_reason": "stop" } ], "usage": { "prompt_tokens": 24, "completion_tokens": 150, "total_tokens": 174, "prompt_tokens_details": { "cached_tokens": 0, "cache_creation_tokens": 0 } }, "x_chuizi": { "generation_id": "gen-xxxxxxxxxxxxxxxx", "latency_ms": 1200, "cost": "0.00057600" } }
Streaming Response (SSE)
When stream: true, the response uses Server-Sent Events format:
data: {"id":"gen-xxx","object":"chat.completion.chunk","created":1712000000,"model":"anthropic/claude-sonnet-4-6","choices":[{"index":0,"delta":{"role":"assistant"},"finish_reason":null}]}
data: {"id":"gen-xxx","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":"Quantum"},"finish_reason":null}]}
data: {"id":"gen-xxx","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":" computing"},"finish_reason":null}]}
data: {"id":"gen-xxx","object":"chat.completion.chunk","choices":[{"index":0,"delta":{},"finish_reason":"stop"}],"usage":{"prompt_tokens":24,"completion_tokens":150,"total_tokens":174}}
data: [DONE]Error Response
config.json
json
{ "error": { "message": "Insufficient balance. Current: $0.12", "type": "insufficient_quota", "code": "402" } }
Code Examples
terminal
bash
curl -X POST https://api.chuizi.ai/v1/chat/completions \ -H "Authorization: Bearer ck-your-key" \ -H "Content-Type: application/json" \ -d '{ "model": "anthropic/claude-sonnet-4-6", "messages": [{"role": "user", "content": "Hello!"}], "max_tokens": 100 }'
Prompt Caching
For Anthropic and Bedrock models, you can enable prompt caching with the cache_control field to reduce costs on repeated system prompts by up to 90%:
config.json
json
{ "model": "anthropic/claude-haiku-4-5", "messages": [ { "role": "system", "content": "Your long system prompt...", "cache_control": { "type": "ephemeral" } }, { "role": "user", "content": "Your question" } ] }
When caching is active, usage.prompt_tokens_details will report cached_tokens and cache_creation_tokens. See the Prompt Caching Guide for details.
Next Steps
- Streaming Guide — implement token-by-token streaming responses
- Function Calling — let models invoke your tools and APIs
- Structured Output — get JSON responses matching your schema
- Error Handling — handle errors and retries in production