example.py

python

import os
from openai import OpenAI

# Do this
client = OpenAI(
    base_url="https://api.chuizi.ai/v1",
    api_key=os.environ["CHUIZI_API_KEY"],
)

# Never this
client = OpenAI(
    base_url="https://api.chuizi.ai/v1",
    api_key="ck-abc123...",  # Leaked if committed to git
)

Use separate keys per environment

Create distinct API keys for development, staging, and production. This lets you:

Restrict keys

Rotate keys regularly

Timeout Configuration

Large models can take 30-120+ seconds to respond, especially for long outputs or complex reasoning tasks (o3, Opus 4).

Recommended timeouts

Scenario	Timeout
Simple chat (GPT-4.1-mini, Haiku)	30 seconds
Standard chat (GPT-4.1, Sonnet)	60 seconds
Complex reasoning (o3, GPT-5, Opus 4)	120 seconds
Image/video generation	300 seconds
Streaming (any model)	300 seconds (connection timeout)

example.py

python

from openai import OpenAI

client = OpenAI(
    base_url="https://api.chuizi.ai/v1",
    api_key="ck-your-key-here",
    timeout=120.0,  # 120 seconds
)

index.mjs

javascript

import OpenAI from 'openai';

const client = new OpenAI({
  baseURL: 'https://api.chuizi.ai/v1',
  apiKey: 'ck-your-key-here',
  timeout: 120 * 1000, // 120 seconds in milliseconds
});

Concurrency Control

Rate limits

The default rate limit is 60 requests per minute (RPM) per API key. If you need higher throughput:

Request queue pattern

example.py

python

import asyncio
from openai import AsyncOpenAI

client = AsyncOpenAI(
    base_url="https://api.chuizi.ai/v1",
    api_key="ck-your-key-here",
)

# Semaphore limits concurrent requests
semaphore = asyncio.Semaphore(10)  # Max 10 concurrent requests


async def chat(messages):
    async with semaphore:
        return await client.chat.completions.create(
            model="openai/gpt-4.1-mini",
            messages=messages,
            max_tokens=1024,
        )


async def process_batch(items):
    tasks = [chat([{"role": "user", "content": item}]) for item in items]
    return await asyncio.gather(*tasks, return_exceptions=True)

index.mjs

javascript

import OpenAI from 'openai';
import pLimit from 'p-limit';

const client = new OpenAI({
  baseURL: 'https://api.chuizi.ai/v1',
  apiKey: 'ck-your-key-here',
});

const limit = pLimit(10); // Max 10 concurrent requests

async function processBatch(items) {
  const tasks = items.map((item) =>
    limit(() =>
      client.chat.completions.create({
        model: 'openai/gpt-4.1-mini',
        messages: [{ role: 'user', content: item }],
        max_tokens: 1024,
      })
    )
  );
  return Promise.allSettled(tasks);
}

Request Monitoring

Generation ID tracking

Every response includes a generation ID in the x_chuizi field and the x-chuizi-generation-id response header. Log this for every request:

example.py

python

response = client.chat.completions.create(
    model="openai/gpt-4.1",
    messages=[{"role": "user", "content": "Hello"}],
)

# Log the generation ID for debugging
gen-id = response.model_extra.get("x_chuizi", {}).get("generation_id")
logger.info(f"Request completed: gen-id={gen-id}, model={response.model}")

Querying request details

terminal

bash

curl https://api.chuizi.ai/v1/generation?id=gen-xxxxxxxxxxxxxxxx \
  -H "Authorization: Bearer ck-your-key-here"

This returns input tokens, output tokens, cached tokens, cost, latency, and status code for the request.

Health checks

terminal

bash

# Simple health check
curl -s -o /dev/null -w "%{http_code}" https://api.chuizi.ai/v1/models \
  -H "Authorization: Bearer ck-your-key-here"
# Returns 200 if healthy

Cost Guardrails

Daily spending limits

Set a daily spending cap on each API key in the dashboard. When the limit is reached, requests return 402 until the next day.

Max tokens per request

config.json

json

{
  "model": "openai/gpt-4.1",
  "messages": [{"role": "user", "content": "Write a summary."}],
  "max_tokens": 500
}

Budget alerts

Monitor your daily spend through the dashboard. Set up alerts for when spending exceeds thresholds (e.g., 80% of daily limit).

Model restrictions

Restrict API keys to specific models. A key that only has access to gpt-4.1-mini cannot accidentally be used with the 10x more expensive claude-opus-4-6.

Logging Best Practices

example.py

python

import logging
import time

logger = logging.getLogger(__name__)


def chat_with_logging(messages, model="openai/gpt-4.1"):
    start = time.time()
    try:
        response = client.chat.completions.create(
            model=model,
            messages=messages,
            max_tokens=1024,
        )
        latency = time.time() - start
        x_chuizi = response.model_extra.get("x_chuizi", {})

        logger.info(
            "chat_completion",
            extra={
                "generation_id": x_chuizi.get("generation_id"),
                "model": response.model,
                "prompt_tokens": response.usage.prompt_tokens,
                "completion_tokens": response.usage.completion_tokens,
                "cost": x_chuizi.get("cost"),
                "latency_ms": int(latency * 1000),
            },
        )
        return response
    except Exception as e:
        latency = time.time() - start
        logger.error(
            "chat_completion_error",
            extra={
                "model": model,
                "error": str(e),
                "latency_ms": int(latency * 1000),
            },
        )
        raise

Production Best Practices

API Key Security

Never hardcode keys

Use separate keys per environment

Restrict keys

Rotate keys regularly

Timeout Configuration

Recommended timeouts

Concurrency Control

Rate limits

Request queue pattern

Request Monitoring

Generation ID tracking

Querying request details

Health checks

Cost Guardrails

Daily spending limits

Max tokens per request

Budget alerts

Model restrictions

Logging Best Practices

Checklist

Next Steps