ConcurredConcurred API
Gateway API

Streaming

SSE event grammar for Gateway responses, including tool-call deltas

Set "stream": true to receive responses as Server-Sent Events. The frame grammar matches OpenAI exactly so the Vercel AI SDK and OpenAI SDKs parse it out of the box.

Frame grammar

  • Every frame is data: <json>\n\n.
  • End of stream is data: [DONE]\n\n.
  • Each JSON object is a partial chat.completion.chunk.

Example text stream:

data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"role":"assistant","content":"Hello"}}]}

data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":" world"}}]}

data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}

data: [DONE]

Tool-call deltas

When the model emits tool calls, deltas arrive with a tool_calls array on each chunk. Example stream for a turn that calls two tools then stops:

data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"role":"assistant","content":null}}]}

data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"tool_calls":[{"index":0,"id":"call_abc123","type":"function","function":{"name":"search_messages","arguments":""}}]}}]}

data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"tool_calls":[{"index":0,"function":{"arguments":"{\"mailbox_id\":\""}}]}}]}

data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"tool_calls":[{"index":0,"function":{"arguments":"8f4abc...\"}"}}]}}]}

data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"tool_calls":[{"index":1,"id":"call_def456","type":"function","function":{"name":"fetch_message","arguments":"{\"mailbox_id\":\"8f4\",\"uid\":4211}"}}]}}]}

data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","choices":[{"index":0,"delta":{},"finish_reason":"tool_calls"}]}

data: [DONE]

Required invariants

  1. The first delta for a new tool call carries id, type:"function", and function.name. Subsequent deltas for the same call fill function.arguments incrementally and may omit id / name.
  2. Each tool call has a stable index across all of its deltas. Clients key off index, not id (the AI SDK does this too).
  3. The final chunk before [DONE] has delta: {} and finish_reason: "tool_calls".
  4. Text content and tool-call deltas can interleave in the same stream — Claude sometimes emits "Let me check..." text before a tool_use block. Handle both as they arrive.
  5. No provider-native event names (tool_use, content_block_delta, functionCall) leak to clients. All shapes are normalized to OpenAI deltas.

Streaming argument shape per provider

ProviderHow function.arguments streams
Anthropic ClaudeCharacter-by-character (input_json_delta).
OpenAICharacter-by-character.
Google GeminiOne complete JSON object per tool call (Gemini doesn't chunk args).
xAI GrokCharacter-by-character.
DeepSeek / Mistral / Kimi / Llama / MiniMaxCharacter-by-character via OpenRouter / native endpoints.

Finish reasons

finish_reasonMeaning
stopNatural end of turn.
tool_callsAssistant wants one or more tools invoked.
lengthHit the max_tokens budget.
content_filterGuardrails or upstream safety classifier blocked the response.

Errors mid-stream

If the upstream provider errors after the stream has started, an error frame is emitted followed by [DONE]:

data: {"error":{"type":"server_error","code":"tool_provider_error","message":"Anthropic returned 529 overloaded"}}

data: [DONE]

See Errors for the full code list.

Fallback behavior

Failover applies only before any delta has been emitted. See Provider compatibility → Fallback with tool use for the full policy.

See also

  • Tool Use — defining tools, tool_choice, and the round-trip contract.
  • Caching with tools — why tool responses bypass the response cache.

On this page