The Gateway ships with a full tool-use acceptance battery under scripts/gateway-tests/. This page publishes the latest run results so customers can see what's verified end-to-end — no need to build your own test harness before integrating.

The smoke test — §12.11

This is the shape wednesday.bot (and most real AI-SDK integrations) actually use: streamText with maxSteps, a tool with a Zod schema, and the SDK drives the tool loop automatically.

import { createOpenAI } from '@ai-sdk/openai';
import { streamText, tool } from 'ai';
import { z } from 'zod';
 
const llm = createOpenAI({
  baseURL: 'https://concurred.ai/api/v1',
  apiKey: process.env.CONCURRED_API_KEY,
});
 
const result = await streamText({
  model: llm('claude'),                  // or 'gpt' / 'gemini' / 'deepseek'
  tools: {
    get_weather: tool({
      description: 'Get weather for a city.',
      parameters: z.object({ city: z.string() }),
      execute: async ({ city }) => ({ temp: 72, city }),
    }),
  },
  messages: [{ role: 'user', content: 'Weather in Paris?' }],
  maxSteps: 3,
});
 
for await (const chunk of result.fullStream) {
  if (chunk.type === 'text-delta') process.stdout.write(chunk.textDelta);
}

The test passes if, after the SDK drives the full tool loop end-to-end (request → tool-call delta → local execute → role:"tool" result → final assistant text), the Gateway returns a non-empty final text containing the tool's output.

Latest verified results

Run against https://concurred.ai/api/v1 on 2026-04-23 with ai@^3, @ai-sdk/openai@^0.0.66, zod:

Provider alias	Resolved model	Status	Wall time	Final text (first line)
`claude`	`claude-sonnet-4-5-20250929`	✅ Pass	5,079 ms	"The current weather in Paris is 72°F (approximately 22°C)."
`gpt`	`gpt-5.2`	✅ Pass	5,054 ms	"Paris weather: 72°F."
`gemini`	`gemini-3-pro-preview`	✅ Pass	3,397 ms	"The weather in Paris is 72 degrees."
`deepseek`	`deepseek/deepseek-v3.2`	✅ Pass	6,765 ms	"I'll get the weather for Paris for you. The current weather in Paris is 72°F."

All four providers completed the full SDK-driven tool loop on the first try. Each model:

accepted the OpenAI-style tools[] + tool_choice: "auto" input,
emitted a tool-call delta that the AI SDK parsed without custom bridging,
resumed the stream on the role:"tool" turn,
produced a complete natural-language final response incorporating the tool's return value.

Full battery

The smoke test above is §12.11 in a larger suite. The other 10 tests cover:

#	What it covers
§12.1	Single tool call round-trip
§12.2	Multi-turn loop (assistant → tool result → final text)
§12.3	Streaming deltas for tool-call arguments
§12.4	Parallel tool calls (Gemini specifically)
§12.5	`tool_choice: "required"` — force a tool call
§12.6	`tool_choice: { type: "function", function: { name } }` — force a specific tool
§12.7	Anthropic prompt caching with `cache_control: { type: "ephemeral" }`
§12.8	`tool_call_id_mismatch` error is returned for out-of-order tool results
§12.9	`tool_unsupported_for_model` returned on `deepseek-reasoner`
§12.10	Response cache is bypassed when `tools[]` is present

Run the full battery against your deployment:

cd scripts/gateway-tests
CONCURRED_API_KEY=ck_... ./_run_all_continue.sh

Re-running §12.11 yourself

Point the test at whichever base URL you want to verify:

cd scripts/gateway-tests
npm i --no-save ai@^3 @ai-sdk/openai@^0.0.66 zod
 
for MODEL in claude gpt gemini deepseek; do
  MODEL=$MODEL \
  CONCURRED_API_KEY=ck_... \
  CONCURRED_BASE_URL=https://concurred.ai/api/v1 \
  npx tsx 12.11_ai_sdk_smoke.ts
done

What this does and doesn't prove

These results verify the Gateway's tool-use wire protocol end-to-end with a real third-party SDK and real upstream providers. They don't cover every edge case — they cover the happy path plus the most common things that break in integration.

Paste any failure you see into an issue with the request body and the raw SSE response; we'll add the case to the battery.

Acceptance Tests

The smoke test — §12.11

Latest verified results

Full battery

Re-running §12.11 yourself

See also

On this page