Acceptance Tests
Verified-on-providers results for the Gateway's tool-use contract
The Gateway ships with a full tool-use acceptance battery under scripts/gateway-tests/. This page publishes the latest run results so customers can see what's verified end-to-end — no need to build your own test harness before integrating.
The smoke test — §12.11
This is the shape wednesday.bot (and most real AI-SDK integrations) actually use: streamText with maxSteps, a tool with a Zod schema, and the SDK drives the tool loop automatically.
The test passes if, after the SDK drives the full tool loop end-to-end (request → tool-call delta → local execute → role:"tool" result → final assistant text), the Gateway returns a non-empty final text containing the tool's output.
Latest verified results
Run against https://concurred.ai/api/v1 on 2026-04-23 with ai@^3, @ai-sdk/openai@^0.0.66, zod:
| Provider alias | Resolved model | Status | Wall time | Final text (first line) |
|---|---|---|---|---|
claude | claude-sonnet-4-5-20250929 | ✅ Pass | 5,079 ms | "The current weather in Paris is 72°F (approximately 22°C)." |
gpt | gpt-5.2 | ✅ Pass | 5,054 ms | "Paris weather: 72°F." |
gemini | gemini-3-pro-preview | ✅ Pass | 3,397 ms | "The weather in Paris is 72 degrees." |
deepseek | deepseek/deepseek-v3.2 | ✅ Pass | 6,765 ms | "I'll get the weather for Paris for you. The current weather in Paris is 72°F." |
All four providers completed the full SDK-driven tool loop on the first try. Each model:
- accepted the OpenAI-style
tools[]+tool_choice: "auto"input, - emitted a tool-call delta that the AI SDK parsed without custom bridging,
- resumed the stream on the
role:"tool"turn, - produced a complete natural-language final response incorporating the tool's return value.
Full battery
The smoke test above is §12.11 in a larger suite. The other 10 tests cover:
| # | What it covers |
|---|---|
| §12.1 | Single tool call round-trip |
| §12.2 | Multi-turn loop (assistant → tool result → final text) |
| §12.3 | Streaming deltas for tool-call arguments |
| §12.4 | Parallel tool calls (Gemini specifically) |
| §12.5 | tool_choice: "required" — force a tool call |
| §12.6 | tool_choice: { type: "function", function: { name } } — force a specific tool |
| §12.7 | Anthropic prompt caching with cache_control: { type: "ephemeral" } |
| §12.8 | tool_call_id_mismatch error is returned for out-of-order tool results |
| §12.9 | tool_unsupported_for_model returned on deepseek-reasoner |
| §12.10 | Response cache is bypassed when tools[] is present |
Run the full battery against your deployment:
Re-running §12.11 yourself
Point the test at whichever base URL you want to verify:
What this does and doesn't prove
These results verify the Gateway's tool-use wire protocol end-to-end with a real third-party SDK and real upstream providers. They don't cover every edge case — they cover the happy path plus the most common things that break in integration.
Paste any failure you see into an issue with the request body and the raw SSE response; we'll add the case to the battery.
See also
- Tool use — request/response contract.
- Tool-use errors — machine-parseable codes for malformed requests.
- Provider compatibility — per-provider matrix.