ConcurredConcurred API
Gateway API

Acceptance Tests

Verified-on-providers results for the Gateway's tool-use contract

The Gateway ships with a full tool-use acceptance battery under scripts/gateway-tests/. This page publishes the latest run results so customers can see what's verified end-to-end — no need to build your own test harness before integrating.

The smoke test — §12.11

This is the shape wednesday.bot (and most real AI-SDK integrations) actually use: streamText with maxSteps, a tool with a Zod schema, and the SDK drives the tool loop automatically.

import { createOpenAI } from '@ai-sdk/openai';
import { streamText, tool } from 'ai';
import { z } from 'zod';
 
const llm = createOpenAI({
  baseURL: 'https://concurred.ai/api/v1',
  apiKey: process.env.CONCURRED_API_KEY,
});
 
const result = await streamText({
  model: llm('claude'),                  // or 'gpt' / 'gemini' / 'deepseek'
  tools: {
    get_weather: tool({
      description: 'Get weather for a city.',
      parameters: z.object({ city: z.string() }),
      execute: async ({ city }) => ({ temp: 72, city }),
    }),
  },
  messages: [{ role: 'user', content: 'Weather in Paris?' }],
  maxSteps: 3,
});
 
for await (const chunk of result.fullStream) {
  if (chunk.type === 'text-delta') process.stdout.write(chunk.textDelta);
}

The test passes if, after the SDK drives the full tool loop end-to-end (request → tool-call delta → local executerole:"tool" result → final assistant text), the Gateway returns a non-empty final text containing the tool's output.

Latest verified results

Run against https://concurred.ai/api/v1 on 2026-04-23 with ai@^3, @ai-sdk/openai@^0.0.66, zod:

Provider aliasResolved modelStatusWall timeFinal text (first line)
claudeclaude-sonnet-4-5-20250929✅ Pass5,079 ms"The current weather in Paris is 72°F (approximately 22°C)."
gptgpt-5.2✅ Pass5,054 ms"Paris weather: 72°F."
geminigemini-3-pro-preview✅ Pass3,397 ms"The weather in Paris is 72 degrees."
deepseekdeepseek/deepseek-v3.2✅ Pass6,765 ms"I'll get the weather for Paris for you. The current weather in Paris is 72°F."

All four providers completed the full SDK-driven tool loop on the first try. Each model:

  • accepted the OpenAI-style tools[] + tool_choice: "auto" input,
  • emitted a tool-call delta that the AI SDK parsed without custom bridging,
  • resumed the stream on the role:"tool" turn,
  • produced a complete natural-language final response incorporating the tool's return value.

Full battery

The smoke test above is §12.11 in a larger suite. The other 10 tests cover:

#What it covers
§12.1Single tool call round-trip
§12.2Multi-turn loop (assistant → tool result → final text)
§12.3Streaming deltas for tool-call arguments
§12.4Parallel tool calls (Gemini specifically)
§12.5tool_choice: "required" — force a tool call
§12.6tool_choice: { type: "function", function: { name } } — force a specific tool
§12.7Anthropic prompt caching with cache_control: { type: "ephemeral" }
§12.8tool_call_id_mismatch error is returned for out-of-order tool results
§12.9tool_unsupported_for_model returned on deepseek-reasoner
§12.10Response cache is bypassed when tools[] is present

Run the full battery against your deployment:

cd scripts/gateway-tests
CONCURRED_API_KEY=ck_... ./_run_all_continue.sh

Re-running §12.11 yourself

Point the test at whichever base URL you want to verify:

cd scripts/gateway-tests
npm i --no-save ai@^3 @ai-sdk/openai@^0.0.66 zod
 
for MODEL in claude gpt gemini deepseek; do
  MODEL=$MODEL \
  CONCURRED_API_KEY=ck_... \
  CONCURRED_BASE_URL=https://concurred.ai/api/v1 \
  npx tsx 12.11_ai_sdk_smoke.ts
done

What this does and doesn't prove

These results verify the Gateway's tool-use wire protocol end-to-end with a real third-party SDK and real upstream providers. They don't cover every edge case — they cover the happy path plus the most common things that break in integration.

Paste any failure you see into an issue with the request body and the raw SSE response; we'll add the case to the battery.

See also

On this page