Vision
Supported Models
Chuizi.AI supports 59+ models with vision capabilities, including:
| Provider | Models |
|---|---|
| OpenAI / Azure | GPT-4.1, GPT-4.1-mini, GPT-4o, GPT-5, o3, o4-mini |
| Anthropic | Claude Opus 4, Claude Sonnet 4, Claude Sonnet 3.5, Claude Haiku 3.5 |
| Gemini 2.5 Pro, Gemini 2.5 Flash, Gemini 2.0 Flash | |
| DeepSeek | DeepSeek V3 (via compatible endpoints) |
| Qwen | Qwen-VL-Max, Qwen-VL-Plus |
| Other | Llama 4 Scout, Llama 4 Maverick, Nova Pro/Lite |
Check GET /v1/models for the full list -- models with "vision": true in their capabilities support image input.
Request Format
Images are sent as part of the content array in a message, alongside text:
config.json
json
{ "model": "openai/gpt-4.1", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "What is in this image?" }, { "type": "image_url", "image_url": { "url": "https://example.com/photo.jpg" } } ] } ], "max_tokens": 1024 }
Image Input Methods
Public URL
Pass a publicly accessible URL. The model fetches the image directly:
config.json
json
{ "type": "image_url", "image_url": { "url": "https://example.com/photo.jpg" } }
Supported formats: JPEG, PNG, GIF, WebP.
Base64 Data URL
Encode the image as base64 and embed it in the request using a data URL:
config.json
json
{ "type": "image_url", "image_url": { "url": "data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAAAQ..." } }
Use base64 when:
- The image is not publicly accessible.
- You want to avoid an extra HTTP round-trip.
- The image is generated dynamically.
Code Examples
example.py
python
from openai import OpenAI import base64 client = OpenAI( base_url="https://api.chuizi.ai/v1", api_key="ck-your-key-here", ) # Using a public URL response = client.chat.completions.create( model="openai/gpt-4.1", messages=[ { "role": "user", "content": [ {"type": "text", "text": "Describe this image in detail."}, { "type": "image_url", "image_url": {"url": "https://example.com/photo.jpg"}, }, ], } ], max_tokens=1024, ) print(response.choices[0].message.content) # Using base64 with open("screenshot.png", "rb") as f: image_data = base64.b64encode(f.read()).decode("utf-8") response = client.chat.completions.create( model="anthropic/claude-sonnet-4-6", messages=[ { "role": "user", "content": [ {"type": "text", "text": "Extract all text from this screenshot."}, { "type": "image_url", "image_url": { "url": f"data:image/png;base64,{image_data}", "detail": "high", }, }, ], } ], max_tokens=2048, ) print(response.choices[0].message.content)
Next Steps
- Chat Completions API — full parameter reference for vision requests
- Image Generation — generate images from text prompts
- Choose a Model — compare vision-capable models