Deepseek V4 Flash

DeepSeek
deepseek/deepseek-v4-flash

V4 Flash, 1M context

Context Window

1.0M

1,048,576 tokens

Max Output

384K

384,000 tokens

About this model

DeepSeek V4 Flash is the fast, cost-efficient member of the V4 family with the same 1M token context window. It is a strong fit for daily coding, batch analysis, draft generation, and cost-sensitive agent workflows.

Served through Bailian, it is well suited as a default fast tier or lightweight smart-routing option.

Highlights

1M context
V4 fast tier
Low cost
Batch friendly

Best For

Daily codingBatch analysisLow-cost reasoningLong-doc summaries
2026-04-24MoEMoE TransformerProprietary

Capabilities

ChatReasoningCodecache

Pricing (per 1M tokens)

Pricing (per 1M tokens)/ 1M tokens
Input / 1M$0.146
Output / 1M$0.292
Cache Read$0.029
Cache Write$0.182

Final prices shown

Quick Start

from openai import OpenAI

client = OpenAI(
    base_url="https://api.chuizi.ai/v1",
    api_key="ck-your-key-here",
)

response = client.chat.completions.create(
    model="deepseek/deepseek-v4-flash",
    messages=[{"role": "user", "content": "Hello!"}],
)
print(response.choices[0].message.content)

FAQ

Related Models