Gemini 2.5 Flash
Google
google/gemini-2.5-flash
Lightning fast, 1M context
Context Window
1.0M
1,000,000 tokens
Max Output
66K
65,536 tokens
About this model
Gemini 2.5 Flash is Google's high-speed reasoning model offering powerful reasoning at extremely low cost. 1M token context window with input pricing at just $0.32/M tokens.
An excellent alternative to Gemini 2.5 Pro in terms of speed and cost. Supports multimodal input including text, images, and code. Implicit Context Caching is automatic.
Ideal for high-throughput production environments.
Highlights
1M context
$0.32/M ultra-low price
Strong reasoning
Auto caching
Best For
High-throughput APIsCode generationDocument summarizationData processing pipelines
2025-04-17MoE TransformerProprietary
Capabilities
ChatVisionReasoningCodepdftoolscache
Aliases
gemini-2.5-flashgemini-flashPricing (per 1M tokens)
| Pricing (per 1M tokens) | / 1M tokens |
|---|---|
| Input / 1M | $0.32 |
| Output / 1M | $2.63 |
| Cache Read | $0.03 |
Final prices shown
Quick Start
main.py
from openai import OpenAI client = OpenAI( base_url="https://api.chuizi.ai/v1", api_key="ck-your-key-here", ) response = client.chat.completions.create( model="google/gemini-2.5-flash", messages=[{"role": "user", "content": "Hello!"}], ) print(response.choices[0].message.content)