Set "stream": true to receive responses as Server-Sent Events. The frame grammar matches OpenAI exactly so the Vercel AI SDK and OpenAI SDKs parse it out of the box.

Frame grammar

Every frame is data: <json>\n\n.
End of stream is data: [DONE]\n\n.
Each JSON object is a partial chat.completion.chunk.

Example text stream:

data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"role":"assistant","content":"Hello"}}]}

data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":" world"}}]}

data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}

data: [DONE]

Tool-call deltas

When the model emits tool calls, deltas arrive with a tool_calls array on each chunk. Example stream for a turn that calls two tools then stops:

data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"role":"assistant","content":null}}]}

data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"tool_calls":[{"index":0,"id":"call_abc123","type":"function","function":{"name":"search_messages","arguments":""}}]}}]}

data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"tool_calls":[{"index":0,"function":{"arguments":"{\"mailbox_id\":\""}}]}}]}

data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"tool_calls":[{"index":0,"function":{"arguments":"8f4abc...\"}"}}]}}]}

data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"tool_calls":[{"index":1,"id":"call_def456","type":"function","function":{"name":"fetch_message","arguments":"{\"mailbox_id\":\"8f4\",\"uid\":4211}"}}]}}]}

data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","choices":[{"index":0,"delta":{},"finish_reason":"tool_calls"}]}

data: [DONE]

Required invariants

The first delta for a new tool call carries id, type:"function", and function.name. Subsequent deltas for the same call fill function.arguments incrementally and may omit id / name.
Each tool call has a stable index across all of its deltas. Clients key off index, not id (the AI SDK does this too).
The final chunk before [DONE] has delta: {} and finish_reason: "tool_calls".
Text content and tool-call deltas can interleave in the same stream — Claude sometimes emits "Let me check..." text before a tool_use block. Handle both as they arrive.
No provider-native event names (tool_use, content_block_delta, functionCall) leak to clients. All shapes are normalized to OpenAI deltas.

Streaming argument shape per provider

Provider	How `function.arguments` streams
Anthropic Claude	Character-by-character (`input_json_delta`).
OpenAI	Character-by-character.
Google Gemini	One complete JSON object per tool call (Gemini doesn't chunk args).
xAI Grok	Character-by-character.
DeepSeek / Mistral / Kimi / Llama / MiniMax	Character-by-character via OpenRouter / native endpoints.

Finish reasons

`finish_reason`	Meaning
`stop`	Natural end of turn.
`tool_calls`	Assistant wants one or more tools invoked.
`length`	Hit the `max_tokens` budget.
`content_filter`	Guardrails or upstream safety classifier blocked the response.

Errors mid-stream

If the upstream provider errors after the stream has started, an error frame is emitted followed by [DONE]:

data: {"error":{"type":"server_error","code":"tool_provider_error","message":"Anthropic returned 529 overloaded"}}

data: [DONE]

See Errors for the full code list.

Fallback behavior

Failover applies only before any delta has been emitted. See Provider compatibility → Fallback with tool use for the full policy.

Streaming