Phi 4 Multimodal
azure-maas
microsoft/phi-4-multimodal
16K context, vision
Context Window
16K
16,384 tokens
Max Output
4K
4,096 tokens
About this model
Phi-4 Multimodal β vision + text understanding
This model supports up to 16K tokens of context. It includes native vision understanding for analyzing images and documents.
Access it through Chuizi.AI with a single ck- API key β no separate Microsoft account needed.
Highlights
16K context window
4K max output
Native vision support
Best For
Image analysisDocument OCRVisual Q&AMultimodal chat
2025-02-26
Capabilities
ChatVisionAudiotools
Aliases
phi-4-multimodalPricing (per 1M tokens)
| Pricing (per 1M tokens) | / 1M tokens |
|---|---|
| Input / 1M | $0.08 |
| Output / 1M | $0.16 |
Final prices shown
Quick Start
main.py
from openai import OpenAI client = OpenAI( base_url="https://api.chuizi.ai/v1", api_key="ck-your-key-here", ) response = client.chat.completions.create( model="microsoft/phi-4-multimodal", messages=[{"role": "user", "content": "Hello!"}], ) print(response.choices[0].message.content)