Chat Completions

POST /v1/chat/completions

Send chat messages to any of 221 models and receive completions, with support for streaming, tool calling, vision, and structured output.

Request

POST https://api.chuizi.ai/v1/chat/completions

Authentication

Authorization: Bearer ck-your-api-key

Parameters

Parameter	Type	Required	Default	Description
`model`	string	Yes	—	Model name, e.g. `anthropic/claude-sonnet-4-6`, `openai/gpt-4.1`
`messages`	array	Yes	—	Array of messages (max 2048)
`max_tokens`	integer	No	Model default	Maximum tokens to generate, 1-1,000,000
`stream`	boolean	No	`false`	Enable SSE streaming output

Parameter	Type	Required	Default	Description
`temperature`	number	No	Model default	Sampling temperature, 0-2
`top_p`	number	No	Model default	Nucleus sampling parameter, 0-1
`stop`	string/array	No	—	Stop sequences (max 4, each up to 256 chars)
`tools`	array	No	—	Tool/function definitions for function calling
`tool_choice`	string/object	No	`auto`	Tool calling strategy: `auto`, `none`, `required`, or a specific function
`response_format`	object	No	—	Structured output format (`json_object` or `json_schema`)
`presence_penalty`	number	No	0	Presence penalty, -2 to 2
`frequency_penalty`	number	No	0	Frequency penalty, -2 to 2
`n`	integer	No	1	Number of completions to generate, 1-4

Message Format

config.json

json

{
  "role": "system" | "user" | "assistant" | "tool" | "developer",
  "content": "string or content parts array",
  "name": "optional sender name",
  "tool_calls": [{ "id": "...", "type": "function", "function": { "name": "...", "arguments": "..." } }],
  "tool_call_id": "corresponding tool_call ID (required when role=tool)"
}

Request Example

config.json

json

{
  "model": "anthropic/claude-sonnet-4-6",
  "messages": [
    { "role": "system", "content": "You are a helpful assistant." },
    { "role": "user", "content": "Explain the basics of quantum computing." }
  ],
  "max_tokens": 1024,
  "temperature": 0.7
}

Response

Non-Streaming Response

config.json

json

{
  "id": "gen-xxxxxxxxxxxxxxxx",
  "object": "chat.completion",
  "created": 1712000000,
  "model": "anthropic/claude-sonnet-4-6",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Quantum computing leverages the principles of quantum mechanics..."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 24,
    "completion_tokens": 150,
    "total_tokens": 174,
    "prompt_tokens_details": {
      "cached_tokens": 0,
      "cache_creation_tokens": 0
    }
  },
  "x_chuizi": {
    "generation_id": "gen-xxxxxxxxxxxxxxxx",
    "latency_ms": 1200,
    "cost": "0.00057600"
  }
}

Streaming Response (SSE)

When stream: true, the response uses Server-Sent Events format:

data: {"id":"gen-xxx","object":"chat.completion.chunk","created":1712000000,"model":"anthropic/claude-sonnet-4-6","choices":[{"index":0,"delta":{"role":"assistant"},"finish_reason":null}]}

data: {"id":"gen-xxx","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":"Quantum"},"finish_reason":null}]}

data: {"id":"gen-xxx","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":" computing"},"finish_reason":null}]}

data: {"id":"gen-xxx","object":"chat.completion.chunk","choices":[{"index":0,"delta":{},"finish_reason":"stop"}],"usage":{"prompt_tokens":24,"completion_tokens":150,"total_tokens":174}}

data: [DONE]

Error Response

config.json

json

{
  "error": {
    "message": "Insufficient balance. Current: $0.12",
    "type": "insufficient_quota",
    "code": "402"
  }
}

Code Examples

terminal

bash

curl -X POST https://api.chuizi.ai/v1/chat/completions \
  -H "Authorization: Bearer ck-your-key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "anthropic/claude-sonnet-4-6",
    "messages": [{"role": "user", "content": "Hello!"}],
    "max_tokens": 100
  }'

Prompt Caching

For Anthropic and Bedrock models, you can enable prompt caching with the cache_control field to reduce costs on repeated system prompts by up to 90%:

config.json

json

{
  "model": "anthropic/claude-haiku-4-5",
  "messages": [
    {
      "role": "system",
      "content": "Your long system prompt...",
      "cache_control": { "type": "ephemeral" }
    },
    { "role": "user", "content": "Your question" }
  ]
}

When caching is active, usage.prompt_tokens_details will report cached_tokens and cache_creation_tokens. See the Prompt Caching Guide for details.

Next Steps

Streaming Guide — implement token-by-token streaming responses
Function Calling — let models invoke your tools and APIs
Structured Output — get JSON responses matching your schema
Error Handling — handle errors and retries in production