Gateway API
OpenAI-compatible API for direct model access
Drop-in replacement for OpenAI SDK. Works with any OpenAI-compatible client.
Gateway vs Chat API
Use the Gateway API (/api/v1/chat/completions) for direct, single-model access with advanced features like fallback routing, caching, load balancing, and guardrails. It returns standard OpenAI-format responses.
Use the Chat API (/api/chat) for multi-model battles, debates with voting, and autonomous web search. It returns custom SSE events.
Create Chat Completion
POST /api/v1/chat/completions
Request
Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
model | string | Yes | Model ID or alias (see Models for full list) |
messages | array | Yes | Array of message objects (supports multimodal content with images) |
stream | boolean | No | Enable SSE streaming (default: false) |
temperature | number | No | 0-2, controls randomness (default: 0.7) |
max_tokens | number | No | Max tokens to generate |
top_p | number | No | Nucleus sampling (0-1) |
frequency_penalty | number | No | Penalize frequent tokens (-2 to 2) |
presence_penalty | number | No | Penalize repeated topics (-2 to 2) |
stop | string/array | No | Stop sequences |
tools | array | No | Tool/function definitions the model may call (see Tool Use) |
tool_choice | string/object | No | "auto" (default), "none", "required", or {type:"function",function:{name}} |
fallback | string[] | No | Fallback models if primary fails (e.g. ["gpt", "gemini"]) |
retries | number | No | Max retries on transient errors with exponential backoff (default: 2, max: 5) |
timeout | number | No | Request timeout in ms (default: 60000, max: 300000) |
load_balance | object | No | Load balancing config (see below) |
guardrails | object | No | Input/output guardrails (see below) |
prompt_id | string | No | Langfuse prompt template name |
prompt_version | number | No | Prompt template version (default: production) |
prompt_variables | object | No | Variables to substitute in prompt template |
Request Headers
| Header | Description |
|---|---|
X-Cache: no-cache | Skip response cache |
X-Cache-TTL: 3600 | Cache TTL in seconds (default: 3600, max: 86400) |
X-Guardrails: pii,content_moderation | Alternative to body guardrails |
Response
Response Headers
| Header | Description |
|---|---|
X-Request-ID | Unique request identifier |
X-Model-Used | Actual model that served the request |
X-Cache: HIT/MISS | Cache status (non-streaming only) |
X-Fallback-From | Original model if fallback was used |
X-Retry-Count | Number of retries attempted |
X-Guardrail-Status | Guardrail results (e.g. pii:redact,content_moderation:pass) |
Gateway Features
Fallback Routing
Automatically try backup models if the primary fails:
Response Caching
Responses are cached automatically for non-streaming requests (1 hour default). Control via headers:
See Caching with tools for how tool-use requests interact with the cache and how to use Anthropic cache_control for prompt caching.
Load Balancing
Distribute requests across models:
Strategies: weighted, round-robin, least-latency.
Guardrails
Pre-process input and post-process output:
Available guardrails:
pii— Detects and redacts emails, phones, SSN, credit cards, IPscontent_moderation— Blocks dangerous contentschema_validation— Validates output against JSON schema
Prompt Templates (Langfuse)
Use managed prompt templates:
BYOK (Bring Your Own Keys)
Store your own provider API keys via the dashboard. When making requests, your key is automatically used instead of the platform key. See Authentication > BYOK for setup.
Vision Support
The Gateway supports images using OpenAI's multimodal message format. Vision-enabled models (GPT, Claude, Gemini, Grok) see the image directly. Non-vision models receive an AI-generated description transparently. See Models > Vision Support for the full matrix.
Streaming
Set "stream": true to receive SSE frames that match OpenAI's shape exactly:
See Streaming for the full SSE grammar, tool-call delta shape, and per-provider streaming quirks.
System Messages
Each model handles system messages in its native format (e.g., Claude uses the system parameter, OpenAI uses instructions). This is transparent — just use the standard "role": "system" format.
Limitations
The Gateway API is a transparent passthrough — it does not include autonomous web search. For web-search-augmented responses with citations, use the Chat API with battle or fight mode.
Tool Use (Function Calling)
The Gateway supports OpenAI-compatible tool use across every supported provider — Anthropic Claude, Google Gemini, OpenAI, xAI Grok, DeepSeek, Mistral, Kimi, Llama, and MiniMax. It plugs into the Vercel AI SDK out of the box.
See the tool-use docs for the full contract:
- Tool Use — defining tools,
tool_choice, therole:"tool"round-trip, error codes. - Streaming — SSE frame grammar, tool-call delta shape, invariants.
- Caching with tools — response cache bypass + Anthropic
cache_controlpassthrough. - Provider compatibility — per-provider matrix and known quirks.
Tool results are never cached
Responses that contain tool_calls bypass the Gateway's response cache (X-Cache: BYPASS). Tool outputs are stateful — caching them would return stale results.