Pre-deduct & Reconciliation
Chuizi.AI uses a prepaid wallet with temporary balance reservation + reconciliation. Before a request reaches an model service, the gateway reserves a conservative estimated amount from your available balance. After the provider returns usage, the billing worker releases the reservation, deducts the actual cost, and writes the final records.
This protects users and the platform from high-concurrency overdraft while keeping the final bill based on real upstream usage.
Billing Flow
Request arrives │ ▼ ① Auth — verify ck-xxx API key │ ▼ ② Estimate request cost │ text: estimated input + max output tokens │ media: request count, duration, or model unit price │ ▼ ③ Reserve balance in Redis │ available balance = balance - active reservations │ insufficient available balance → 402 Insufficient Balance │ ▼ ④ Forward request to the model service │ ▼ ⑤ Extract actual upstream usage or generated unit count │ ▼ ⑥ Reconcile asynchronously │ ├─ release the reservation │ ├─ deduct actual cost from the wallet │ └─ write generations + transactions records
What Is Reserved
| Workload | Reservation basis |
|---|---|
| Chat / vision / embeddings | Estimated input tokens plus requested max_tokens or a safe default |
| Image / rerank | Model request price multiplied by quantity |
| Video / audio seconds | Model second price multiplied by requested duration |
The reservation is not the final charge. It is a temporary hold used to decide whether the request can safely run.
Reconciliation
Final billing uses upstream usage whenever the provider returns it. The reconciler records actual input, output, cache, reasoning, native token fields, latency, upstream cost, public cost, and the final wallet transaction.
| Outcome | Result |
|---|---|
| Actual cost is lower than the reservation | Unused reserve is released automatically |
| Actual cost matches the reservation | The hold becomes the final charge |
| Actual cost is higher than the reservation | The wallet is charged for the actual cost |
| Request fails before upstream usage exists | The reservation is released and no usage charge is written |
| Provider returns partial usage | Billing follows the usage returned by the provider |
Dashboard balances may lag briefly while the async worker writes final records. For the source of truth, query the Generation API.
Generation ID
Every request receives a gen- ID. Use it to inspect the final bill:
curl "https://api.chuizi.ai/v1/generation?id=gen-abc123def456" \ -H "Authorization: Bearer ck-your-key"
Generation IDs work across protocols: OpenAI-compatible, Anthropic native, and Gemini native requests all share the same billing lookup.
Operational Guarantees
- Reservation is atomic in Redis and blocks requests that would exceed available balance.
- Final wallet deduction is performed by the billing worker using precise decimal arithmetic.
- Failed preflight or upstream errors release the hold so users are not charged for requests without billable usage.
- Reservation records are written for audit and stale reservations are swept automatically.
Next Steps
- Generation API — Query final request cost and usage
- Billing Model — Understand token, request, and second-based billing
- Balance Alerts — Keep enough balance before production traffic