Pre-deduct & Reconciliation

Chuizi.AI uses a prepaid wallet with temporary balance reservation + reconciliation. Before a request reaches an model service, the gateway reserves a conservative estimated amount from your available balance. After the provider returns usage, the billing worker releases the reservation, deducts the actual cost, and writes the final records.

This protects users and the platform from high-concurrency overdraft while keeping the final bill based on real upstream usage.

Billing Flow

Request arrives
  │
  ▼
① Auth — verify ck-xxx API key
  │
  ▼
② Estimate request cost
  │  text: estimated input + max output tokens
  │  media: request count, duration, or model unit price
  │
  ▼
③ Reserve balance in Redis
  │  available balance = balance - active reservations
  │  insufficient available balance → 402 Insufficient Balance
  │
  ▼
④ Forward request to the model service
  │
  ▼
⑤ Extract actual upstream usage or generated unit count
  │
  ▼
⑥ Reconcile asynchronously
  │  ├─ release the reservation
  │  ├─ deduct actual cost from the wallet
  │  └─ write generations + transactions records

What Is Reserved

WorkloadReservation basis
Chat / vision / embeddingsEstimated input tokens plus requested max_tokens or a safe default
Image / rerankModel request price multiplied by quantity
Video / audio secondsModel second price multiplied by requested duration

The reservation is not the final charge. It is a temporary hold used to decide whether the request can safely run.

Reconciliation

Final billing uses upstream usage whenever the provider returns it. The reconciler records actual input, output, cache, reasoning, native token fields, latency, upstream cost, public cost, and the final wallet transaction.

OutcomeResult
Actual cost is lower than the reservationUnused reserve is released automatically
Actual cost matches the reservationThe hold becomes the final charge
Actual cost is higher than the reservationThe wallet is charged for the actual cost
Request fails before upstream usage existsThe reservation is released and no usage charge is written
Provider returns partial usageBilling follows the usage returned by the provider

Dashboard balances may lag briefly while the async worker writes final records. For the source of truth, query the Generation API.

Generation ID

Every request receives a gen- ID. Use it to inspect the final bill:

terminal
bash
curl "https://api.chuizi.ai/v1/generation?id=gen-abc123def456" \
  -H "Authorization: Bearer ck-your-key"

Generation IDs work across protocols: OpenAI-compatible, Anthropic native, and Gemini native requests all share the same billing lookup.

Operational Guarantees

  • Reservation is atomic in Redis and blocks requests that would exceed available balance.
  • Final wallet deduction is performed by the billing worker using precise decimal arithmetic.
  • Failed preflight or upstream errors release the hold so users are not charged for requests without billable usage.
  • Reservation records are written for audit and stale reservations are swept automatically.

Next Steps

Pre-deduct & Reconciliation — Chuizi AI Docs | Chuizi AI