Rate Limits

Chuizi.AI uses a three-tier rate limiting system with sliding window counters. Rate limits protect both the gateway infrastructure and upstream providers from excessive traffic.

Rate Limit Tiers

Every request is evaluated against three independent rate limit checks. If any tier is exceeded, the request receives a 429 Too Many Requests response.

Tier 1: Per-Key Limit

Each API key has its own requests-per-minute (RPM) limit. This is the primary rate limit most users interact with.

SourcePriorityDescription
Key-level overrideHighestSet via rpm_limit on the API key itself.
User-level overrideMediumSet via custom_rpm_limit on the user account.
DefaultLowestFalls back to the system default of 60 RPM.

The gateway checks these in order and uses the first non-null value.

Tier 2: Per-User Limit

The per-user limit aggregates traffic across all of a user's API keys. This prevents circumventing per-key limits by creating multiple keys.

LimitValue
Per-user RPM2x the per-key limit

If your per-key limit is 60 RPM, your per-user limit is 120 RPM across all keys combined.

Tier 3: Per-Model Limit

Each user has a separate rate limit per model. This protects expensive models from being overwhelmed by a single user.

LimitValue
Per-model RPM30 RPM per user per model

This limit applies regardless of which API key you use. If you send 30 requests to anthropic/claude-sonnet-4-6 in one minute, further requests to that model are blocked even if your per-key limit is not exhausted.

Sliding Window

All rate limits use a 60-second sliding window. The window is not fixed to clock minutes; it slides forward continuously. If you send a request at T, the gateway counts all requests from T - 60s to T.

Next Steps