Rate Limits
Chuizi.AI uses a three-tier rate limiting system with sliding window counters. Rate limits protect both the gateway infrastructure and upstream providers from excessive traffic.
Rate Limit Tiers
Every request is evaluated against three independent rate limit checks. If any tier is exceeded, the request receives a 429 Too Many Requests response.
Tier 1: Per-Key Limit
Each API key has its own requests-per-minute (RPM) limit. This is the primary rate limit most users interact with.
| Source | Priority | Description |
|---|---|---|
| Key-level override | Highest | Set via rpm_limit on the API key itself. |
| User-level override | Medium | Set via custom_rpm_limit on the user account. |
| Default | Lowest | Falls back to the system default of 60 RPM. |
The gateway checks these in order and uses the first non-null value.
Tier 2: Per-User Limit
The per-user limit aggregates traffic across all of a user's API keys. This prevents circumventing per-key limits by creating multiple keys.
| Limit | Value |
|---|---|
| Per-user RPM | 2x the per-key limit |
If your per-key limit is 60 RPM, your per-user limit is 120 RPM across all keys combined.
Tier 3: Per-Model Limit
Each user has a separate rate limit per model. This protects expensive models from being overwhelmed by a single user.
| Limit | Value |
|---|---|
| Per-model RPM | 30 RPM per user per model |
This limit applies regardless of which API key you use. If you send 30 requests to anthropic/claude-sonnet-4-6 in one minute, further requests to that model are blocked even if your per-key limit is not exhausted.
Sliding Window
All rate limits use a 60-second sliding window. The window is not fixed to clock minutes; it slides forward continuously. If you send a request at T, the gateway counts all requests from T - 60s to T.
Next Steps
- Error Codes — Full reference for 429 and other error responses
- Production Best Practices — Design resilient applications with backoff and retry logic
- Error Handling Guide — Implement exponential backoff in your code