Phi 4 Multimodal

azure-maas
microsoft/phi-4-multimodal

16K context, vision

Context Window

16K

16,384 tokens

Max Output

4K

4,096 tokens

About this model

Phi-4 Multimodal β€” vision + text understanding

This model supports up to 16K tokens of context. It includes native vision understanding for analyzing images and documents.

Access it through Chuizi.AI with a single ck- API key β€” no separate Microsoft account needed.

Highlights

16K context window
4K max output
Native vision support

Best For

Image analysisDocument OCRVisual Q&AMultimodal chat
2025-02-26

Capabilities

ChatVisionAudiotools

Aliases

phi-4-multimodal

Pricing (per 1M tokens)

Pricing (per 1M tokens)/ 1M tokens
Input / 1M$0.08
Output / 1M$0.16

Final prices shown

Quick Start

main.py
from openai import OpenAI

client = OpenAI(
    base_url="https://api.chuizi.ai/v1",
    api_key="ck-your-key-here",
)

response = client.chat.completions.create(
    model="microsoft/phi-4-multimodal",
    messages=[{"role": "user", "content": "Hello!"}],
)
print(response.choices[0].message.content)

FAQ

Related Models

Phi 4 Multimodal β€” Pricing, Context, Capabilities | Chuizi AI