GLM 5v Turbo
Zhipu
zhipu/glm-5v-turbo
200K vision coding
Context Window
200K
200,000 tokens
Max Output
131K
131,072 tokens
About this model
GLM-5V Turbo is Z.ai's multimodal foundation model for vision-based coding and agent workflows. It natively accepts images, video, text, and files, with a 200K context window and up to 128K output tokens.
Use it for screenshot-to-code, design implementation, GUI debugging, visual exploration, and document-heavy agent tasks. Public availability depends on direct Zhipu key configuration.
Highlights
200K context window
128K max output
Video/file input
Vision coding
Best For
Screenshot-to-codeDesign implementationGUI debuggingMultimodal agents
2026-04-01Multimodal TransformerProprietary
Capabilities
ChatVisionReasoningCodetoolspdfcache
Pricing (per 1M tokens)
| Pricing (per 1M tokens) | / 1M tokens |
|---|---|
| Input / 1M | $1.26 |
| Output / 1M | $4.20 |
| Cache Read | $0.252 |
Final prices shown
Quick Start
main.py
from openai import OpenAI client = OpenAI( base_url="https://api.chuizi.ai/v1", api_key="ck-your-key-here", ) response = client.chat.completions.create( model="zhipu/glm-5v-turbo", messages=[{"role": "user", "content": "Hello!"}], ) print(response.choices[0].message.content)