Vision

Supported Models

Chuizi.AI supports 80 models with vision capabilities, including:

ProviderModels
OpenAI / AzureGPT-4.1, GPT-4.1-mini, GPT-4o, GPT-5, o3, o4-mini
AnthropicClaude Opus 4, Claude Sonnet 4, Claude Sonnet 3.5, Claude Haiku 3.5
GoogleGemini 2.5 Pro, Gemini 2.5 Flash, Gemini 2.0 Flash
DeepSeekDeepSeek V3 (via compatible endpoints)
QwenQwen-VL-Max, Qwen-VL-Plus
OtherLlama 4 Scout, Llama 4 Maverick, Nova Pro/Lite

Check GET /v1/models for the full list -- models with "vision": true in their capabilities support image input.

Request Format

Images are sent as part of the content array in a message, alongside text:

config.json
json
{
  "model": "openai/gpt-4.1",
  "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "What is in this image?"
        },
        {
          "type": "image_url",
          "image_url": {
            "url": "https://example.com/photo.jpg"
          }
        }
      ]
    }
  ],
  "max_tokens": 1024
}

Image Input Methods

Public URL

Pass a publicly accessible URL. The model fetches the image directly:

config.json
json
{
  "type": "image_url",
  "image_url": {
    "url": "https://example.com/photo.jpg"
  }
}

Supported formats: JPEG, PNG, GIF, WebP.

Base64 Data URL

Encode the image as base64 and embed it in the request using a data URL:

config.json
json
{
  "type": "image_url",
  "image_url": {
    "url": "data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAAAQ..."
  }
}

Use base64 when:

  • The image is not publicly accessible.
  • You want to avoid an extra HTTP round-trip.
  • The image is generated dynamically.

Multiple Images

You can include multiple images in a single message:

config.json
json
{
  "role": "user",
  "content": [
    {"type": "text", "text": "Compare these two screenshots and list the differences."},
    {"type": "image_url", "image_url": {"url": "https://example.com/before.png"}},
    {"type": "image_url", "image_url": {"url": "https://example.com/after.png"}}
  ]
}

Detail Parameter

The detail parameter controls image resolution and token cost:

ValueBehaviorToken Cost
"auto"Model chooses based on image size (default)Varies
"low"Resized to 512x512. Faster, cheaper.~85 tokens
"high"Full resolution analyzed in tiles. More accurate.~174 tokens per tile
config.json
json
{
  "type": "image_url",
  "image_url": {
    "url": "https://example.com/diagram.png",
    "detail": "high"
  }
}

Use "low" for simple classification or presence detection. Use "high" for OCR, small text, or detailed analysis.

Code Examples

example.py
python
from openai import OpenAI
import base64

client = OpenAI(
    base_url="https://api.chuizi.ai/v1",
    api_key="ck-your-key-here",
)

# Using a public URL
response = client.chat.completions.create(
    model="openai/gpt-4.1",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "Describe this image in detail."},
                {
                    "type": "image_url",
                    "image_url": {"url": "https://example.com/photo.jpg"},
                },
            ],
        }
    ],
    max_tokens=1024,
)
print(response.choices[0].message.content)

# Using base64
with open("screenshot.png", "rb") as f:
    image_data = base64.b64encode(f.read()).decode("utf-8")

response = client.chat.completions.create(
    model="anthropic/claude-sonnet-4-6",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "Extract all text from this screenshot."},
                {
                    "type": "image_url",
                    "image_url": {
                        "url": f"data:image/png;base64,{image_data}",
                        "detail": "high",
                    },
                },
            ],
        }
    ],
    max_tokens=2048,
)
print(response.choices[0].message.content)

Tips

  • Image size limits. Most models accept images up to 20 MB. Larger images are rejected. Resize before sending if needed.
  • Token cost scales with resolution. A high-detail 2048x2048 image can consume 1000+ tokens. Use "detail": "low" when full resolution is not needed.
  • Combine vision with tools. You can use vision and function calling together -- for example, analyze a receipt image and call a function to log the expense.
  • Not all models support all formats. Some models only accept JPEG and PNG. Check the model's documentation if you encounter format errors.

Next Steps

Vision — Chuizi AI Docs | Chuizi AI