diff --git a/.agent/self-learning/.gitignore b/.agent/self-learning/.gitignore new file mode 100644 index 000000000..5b7b12729 --- /dev/null +++ b/.agent/self-learning/.gitignore @@ -0,0 +1,2 @@ +fallback-counts.json +*.bak diff --git a/.agent/self-learning/INDEX.md b/.agent/self-learning/INDEX.md new file mode 100644 index 000000000..87ee14155 --- /dev/null +++ b/.agent/self-learning/INDEX.md @@ -0,0 +1,10 @@ +# Lessons Index + +Read this index every turn. Each entry below is a routing condition. +If a `Use when ...` condition matches the current task, read the full lesson file. + + + + +- [build-before-running-examples](lessons/2026-05-14-build-before-running-examples.md) — Use when starting any tanstack/ai example dev server — build workspace packages first + diff --git a/.agent/self-learning/config.yml b/.agent/self-learning/config.yml new file mode 100644 index 000000000..b9e1e5e96 --- /dev/null +++ b/.agent/self-learning/config.yml @@ -0,0 +1,15 @@ +# Self-improve plugin behavior knobs. Edit and commit per repo. +correction_detection: + enabled: true + regex_strictness: loose # loose | strict +coupling_detection: + enabled: true + regex_strictness: loose +enforcement: + pre_push_block: true # false = warn only, do not block push +curation: + default_interval_days: 30 +promotion: + auto_suggest_global: true + skill_improve_threshold: 3 +skills_repo: ~/.claude/skills diff --git a/.agent/self-learning/coupling.json b/.agent/self-learning/coupling.json new file mode 100644 index 000000000..1a7826a75 --- /dev/null +++ b/.agent/self-learning/coupling.json @@ -0,0 +1,4 @@ +{ + "$schema": "./coupling.schema.json", + "couplings": [] +} diff --git a/.agent/self-learning/curation-state.yml b/.agent/self-learning/curation-state.yml new file mode 100644 index 000000000..1f8bb0ab5 --- /dev/null +++ b/.agent/self-learning/curation-state.yml @@ -0,0 +1,3 @@ +last_curated: 2026-05-14 +next_nag: 2026-06-13 +default_interval_days: 30 diff --git a/.agent/self-learning/lessons/2026-05-14-build-before-running-examples.md b/.agent/self-learning/lessons/2026-05-14-build-before-running-examples.md new file mode 100644 index 000000000..62bfaaa06 --- /dev/null +++ b/.agent/self-learning/lessons/2026-05-14-build-before-running-examples.md @@ -0,0 +1,19 @@ +--- +name: build-before-running-examples +description: Use when starting any tanstack/ai example dev server — build workspace packages first +tags: [monorepo, examples, dev-workflow, build] +scope: repo +source: + type: auto-captured + created: 2026-05-14T13:05:00Z +related_skill: null +related: [] +--- + +# Build Workspace Packages Before Running Examples + +**Rule:** Run `pnpm -w run build:all` from the repo root before starting any example dev server (`examples/ts-react-chat`, `ts-solid-chat`, `ts-vue-chat`, `ts-svelte-chat`, `vanilla-chat`, `php-slim`, `python-fastapi`, `ts-group-chat`). + +**Why:** "this was a mistake by you, you should always build packages inside of this repo before you run the examples" — examples import workspace packages (`@tanstack/ai`, `@tanstack/react-ai-devtools`, `@tanstack/ai-devtools-core`, etc.) via `workspace:*` and resolve through each package's `exports` field pointing at `dist/`. If `dist/` is missing for any package — including transitive ones — vite's dep-scan fails and SSR returns a 500. Fixing the first missing package one at a time wastes round-trips: I tried `pnpm --filter @tanstack/react-ai-devtools build`, hit a missing `@tanstack/ai-devtools-core`, etc. The cure is one command up front. + +**How to apply:** Before any `pnpm --filter "" dev` (or running an example via its own directory), run `pnpm -w run build:all` from the worktree root. Nx caches the build so re-runs are cheap. Skip only if the user has just explicitly said the workspace is freshly built. diff --git a/.agent/self-learning/lessons/promoted/.gitkeep b/.agent/self-learning/lessons/promoted/.gitkeep new file mode 100644 index 000000000..e69de29bb diff --git a/.changeset/decouple-openrouter-collapse-openai-base.md b/.changeset/decouple-openrouter-collapse-openai-base.md deleted file mode 100644 index 3dab48e95..000000000 --- a/.changeset/decouple-openrouter-collapse-openai-base.md +++ /dev/null @@ -1,35 +0,0 @@ ---- -'@tanstack/openai-base': minor -'@tanstack/ai-openai': patch -'@tanstack/ai-grok': patch -'@tanstack/ai-groq': patch -'@tanstack/ai-openrouter': patch ---- - -Decouple `@tanstack/ai-openrouter` from the shared OpenAI base, and collapse the base into a thinner shim over the `openai` SDK. - -Three changes that ship together: - -**1. Rename `@tanstack/ai-openai-compatible` → `@tanstack/openai-base`.** The previous name implied a multi-vendor protocol surface. After ai-openrouter is decoupled (see below), the only remaining consumers (`ai-openai`, `ai-grok`, `ai-groq`) all back onto the `openai` SDK with a different `baseURL` — "base" describes that role accurately. Imports change: - -```diff -- import { OpenAICompatibleChatCompletionsTextAdapter } from '@tanstack/ai-openai-compatible' -+ import { OpenAIBaseChatCompletionsTextAdapter } from '@tanstack/openai-base' -- import { OpenAICompatibleResponsesTextAdapter } from '@tanstack/ai-openai-compatible' -+ import { OpenAIBaseResponsesTextAdapter } from '@tanstack/openai-base' -``` - -`@tanstack/ai-openai-compatible@0.2.x` remains published for anyone with a pinned lockfile reference but will receive no further updates. - -**2. `@tanstack/openai-base` adopts the `openai` SDK directly.** The previous package vendored ~720 LOC of hand-written wire-format types (`ChatCompletion`, `ResponseStreamEvent`, etc.) and exposed abstract `callChatCompletion*` / `callResponse*` hooks subclasses had to implement. Both are gone: - -- The base now depends on `openai` again and imports types directly from `openai/resources/...`. The vendored `src/types/` directory is removed; consumers that imported wire types from the package (e.g. `import type { ResponseInput } from '@tanstack/ai-openai-compatible'`) should now import from the openai SDK. -- The abstract SDK-call methods are removed. The base constructor takes a pre-built `OpenAI` client (`new OpenAIBaseChatCompletionsTextAdapter(model, name, openaiClient)`) and calls `client.chat.completions.create` / `client.responses.create` itself. Subclasses (`ai-openai`, `ai-grok`, `ai-groq`) now just construct the SDK with their provider-specific `baseURL` and pass it to `super` — `callChatCompletion*` / `callResponse*` overrides go away. - -The other extension hooks (`extractReasoning`, `extractTextFromResponse`, `processStreamChunks`, `makeStructuredOutputCompatible`, `transformStructuredOutput`, `mapOptionsToRequest`, `convertMessage`) remain. Groq's `processStreamChunks` and `makeStructuredOutputCompatible` overrides (for `x_groq.usage` promotion and Groq's structured-output schema quirks) are unchanged. - -**3. Decouple `@tanstack/ai-openrouter` from the OpenAI base entirely.** OpenRouter ships its own SDK (`@openrouter/sdk`) with a camelCase shape, so inheriting from the OpenAI-shaped base forced a snake_case ↔ camelCase round-trip on every request and stream event. ai-openrouter now extends `BaseTextAdapter` directly and inlines its own stream processors (`OpenRouterTextAdapter` for chat-completions, `OpenRouterResponsesTextAdapter` for the Responses beta), reading OpenRouter's camelCase types natively. The `@tanstack/openai-base` and `openai` dependencies are removed from ai-openrouter; only `@openrouter/sdk`, `@tanstack/ai`, and `@tanstack/ai-utils` remain. - -Public API is unchanged: `openRouterText`, `openRouterResponsesText`, `createOpenRouterText`, `createOpenRouterResponsesText`, the OpenRouter tool factories, provider routing surface (`provider`, `models`, `plugins`, `variant`, `transforms`), app attribution headers (`httpReferer`, `appTitle`), `:variant` model suffixing, `RequestAbortedError` propagation, and the OpenRouter-specific structured-output null-preservation all behave the same. The ~300 LOC of inbound/outbound shape converters (`toOpenRouterRequest`, `toChatCompletion`, `adaptOpenRouterStreamChunks`, `toSnakeResponseResult`, …) are gone. - -`ai-ollama` remains on `BaseTextAdapter` directly — its native API uses a different wire format from Chat Completions and was never on the shared base. diff --git a/.changeset/openrouter-narrow-stream-chunk-types.md b/.changeset/openrouter-narrow-stream-chunk-types.md deleted file mode 100644 index 2d2e2fc2b..000000000 --- a/.changeset/openrouter-narrow-stream-chunk-types.md +++ /dev/null @@ -1,5 +0,0 @@ ---- -'@tanstack/ai-openrouter': patch ---- - -Internal: drop the remaining duck-typed `as { ... }` casts on stream chunks in `OpenRouterResponsesTextAdapter`. Five sites (`response.created/in_progress/incomplete/failed` model + error capture, `response.content_part.added/done` payload, and the `response.completed` function-call detection) now narrow via the SDK's discriminated unions directly. Behaviourally identical; reduces the chance of a SDK type rename silently slipping past us. diff --git a/.changeset/streaming-structured-output.md b/.changeset/streaming-structured-output.md new file mode 100644 index 000000000..d5a3d65e8 --- /dev/null +++ b/.changeset/streaming-structured-output.md @@ -0,0 +1,95 @@ +--- +'@tanstack/ai': minor +'@tanstack/openai-base': minor +'@tanstack/ai-openai': minor +'@tanstack/ai-grok': minor +'@tanstack/ai-groq': minor +'@tanstack/ai-openrouter': minor +'@tanstack/ai-react': minor +'@tanstack/ai-vue': minor +'@tanstack/ai-solid': minor +'@tanstack/ai-svelte': minor +'@tanstack/ai-anthropic': patch +'@tanstack/ai-gemini': patch +'@tanstack/ai-ollama': patch +--- + +Streaming structured output across the OpenAI-compatible providers, an OpenAI Chat Completions sibling adapter, a summarize-subsystem unification, and the decoupling of `@tanstack/ai-openrouter` from the shared OpenAI base. + +## Core — `@tanstack/ai` + +- New `chat({ outputSchema, stream: true })` overload returning `StructuredOutputStream>`. The stream yields raw JSON deltas via `TEXT_MESSAGE_CONTENT` plus a terminal `CUSTOM` `structured-output.complete` event whose `value.object` is typed against the caller's schema with no helper or cast required. +- `StructuredOutputStream` is a discriminated union over three tagged `CUSTOM` variants — `structured-output.complete`, `approval-requested`, and `tool-input-available` (new `ApprovalRequestedEvent` / `ToolInputAvailableEvent` interfaces exported from `@tanstack/ai`). Narrowing on `chunk.type === 'CUSTOM' && chunk.name === ''` resolves `chunk.value` to the exact shape per variant. The bare `CustomEvent` (with `value: any`) is deliberately excluded to keep the narrow from collapsing to `any`; user-emitted events via the `emitCustomEvent` context API still flow at runtime and are documented as a small residual gap. +- Activity-layer hardening: always-finalise after the stream loop (no silent hangs on missing `finishReason`), typed `RUN_ERROR` on empty content, mid-stream provider errors terminate cleanly, schema-validation failures carry `runId / model / timestamp`. +- `fallbackStructuredOutputStream` in the activity layer is the single source of truth for adapters that don't implement `structuredOutputStream` natively; `BaseTextAdapter` no longer ships a default. +- `ChatStreamSummarizeAdapter.summarizeStream` accumulates summary text and emits a terminal `CUSTOM` `generation:result` event before the final `RUN_FINISHED`. Fixes `useSummarize` never populating `result` over streaming connections (the client only sets `result` on that specific CUSTOM event). +- `SummarizationOptions` is now generic in `TProviderOptions` and `modelOptions` is plumbed through end-to-end (previously silently dropped by `runSummarize` / `runStreamingSummarize`). + +## Framework hooks — `@tanstack/ai-react`, `@tanstack/ai-vue`, `@tanstack/ai-solid`, `@tanstack/ai-svelte` + +`useChat` (React/Vue/Solid) and `createChat` (Svelte) now accept an `outputSchema` option mirroring `chat({ outputSchema })` on the server. When supplied, the hook's return adds two managed reactive fields: + +- `partial` — the live progressive object, typed `DeepPartial>`. Updated from `TEXT_MESSAGE_CONTENT` deltas via `parsePartialJSON`. Resets on every new run. +- `final` — the validated terminal payload from the `structured-output.complete` event, typed `InferSchemaType | null`. `null` until the run completes. + +Both fields are typed against the schema with no helper or cast — each hook is generic on `TSchema` and conditionally adds the fields to the return type. Without `outputSchema`, the return type is unchanged. Works the same for streaming and non-streaming endpoints — for non-streaming, `partial` stays `{}` and `final` snaps when the single terminal event arrives. Reasoning text and tool calls aren't surfaced as separate hook fields — they're already on `messages[…].parts` (as `ThinkingPart`, `ToolCallPart`, `ToolResultPart`), same as a normal chat. When `outputSchema` is set, the assistant's `TextPart` contains the raw JSON the model produced; filter `text` parts out of your message renderer and let the structured view (driven by `partial` / `final`) replace it. + +Reactivity primitive per framework: + +| Framework | `partial` type | `final` type | +| ------------------------------ | ------------------------------------------------------- | ------------------------------------------------ | +| React (`@tanstack/ai-react`) | `DeepPartial` (plain state) | `T \| null` (plain state) | +| Vue (`@tanstack/ai-vue`) | `Readonly>>` | `Readonly>` | +| Solid (`@tanstack/ai-solid`) | `Accessor>` | `Accessor` | +| Svelte (`@tanstack/ai-svelte`) | `readonly partial: DeepPartial` (rune-backed getter) | `readonly final: T \| null` (rune-backed getter) | + +`DeepPartial` is exported from each framework package for callers who want to annotate handlers explicitly. + +## Base — `@tanstack/openai-base` + +- Package renamed from `@tanstack/ai-openai-compatible` (which remains published for pinned lockfiles but receives no further updates). Imports change: + + ```diff + - import { OpenAICompatibleChatCompletionsTextAdapter } from '@tanstack/ai-openai-compatible' + + import { OpenAIBaseChatCompletionsTextAdapter } from '@tanstack/openai-base' + - import { OpenAICompatibleResponsesTextAdapter } from '@tanstack/ai-openai-compatible' + + import { OpenAIBaseResponsesTextAdapter } from '@tanstack/openai-base' + ``` + +- Centralised `structuredOutputStream` on both bases. Chat Completions uses `response_format: { type: 'json_schema', strict: true }` + `stream: true`; Responses uses `text.format: { type: 'json_schema', strict: true }` + `stream: true`. Subclasses (`ai-openai`, `ai-grok`, `ai-groq`) inherit it; OpenRouter implements its own (see below). +- Base now adopts the `openai` SDK directly and imports types from `openai/resources/...`. The previously-vendored ~720 LOC of wire-format types (`ChatCompletion`, `ResponseStreamEvent`, etc.) is removed; consumers that imported wire types from the package should import them from the openai SDK instead. The abstract `callChatCompletion*` / `callResponse*` hooks are gone — the base constructor now takes a pre-built `OpenAI` client (`new OpenAIBaseChatCompletionsTextAdapter(model, name, openaiClient)`) and calls `client.chat.completions.create` / `client.responses.create` itself. +- New protected `isAbortError(error)` hook duck-types abort detection so `RUN_ERROR { code: 'aborted' }` is emitted consistently across SDK error types — subclasses with proprietary error classes (e.g. `@openrouter/sdk`'s `RequestAbortedError`) override. +- Per-chunk `logger.provider(...)` debug logging now fires inside `structuredOutputStream` loops, matching the existing pattern in `chatStream` for end-to-end introspection in debug mode. + +The other extension hooks (`extractReasoning`, `extractTextFromResponse`, `processStreamChunks`, `makeStructuredOutputCompatible`, `transformStructuredOutput`, `mapOptionsToRequest`, `convertMessage`) remain. Groq's `processStreamChunks` and `makeStructuredOutputCompatible` overrides (for `x_groq.usage` promotion and Groq's structured-output schema quirks) are unchanged. + +## Provider adapters + +| Adapter | API | Reasoning surface | +| ---------------------------------------------------------- | ---------------- | ---------------------------------------------------------------------------------------------------------------- | +| `@tanstack/ai-openai` `openaiText` | Responses | `response.reasoning_text.delta` + `response.reasoning_summary_text.delta` (requires `reasoning.summary: 'auto'`) | +| `@tanstack/ai-openai` `openaiChatCompletions` (new) | Chat Completions | reasoning emitted silently — Chat Completions has no `reasoning.summary` opt-in | +| `@tanstack/ai-grok` `grokText` | Chat Completions | `delta.reasoning_content` (DeepSeek convention; not typed by OpenAI SDK) | +| `@tanstack/ai-groq` `groqText` | Chat Completions | `delta.reasoning` (requires `reasoning_format: 'parsed'`; not typed by groq-sdk) | +| `@tanstack/ai-openrouter` `openRouterText` | Chat Completions | `delta.reasoningDetails` (camelCase) | +| `@tanstack/ai-openrouter` `openRouterResponsesText` (beta) | Responses (beta) | `response.reasoning_text.delta` + `response.reasoning_summary_text.delta` via `normalizeStreamEvent` | + +All six emit the contractual `REASONING_*` lifecycle (`REASONING_START` → `REASONING_MESSAGE_START` → `REASONING_MESSAGE_CONTENT` deltas → `REASONING_MESSAGE_END` → `REASONING_END`) and close it before `TEXT_MESSAGE_START`. Accumulated reasoning is also surfaced on `structured-output.complete.value.reasoning` for consumers that only subscribe to the terminal event. OpenRouter SDK's proprietary `RequestAbortedError` is mapped (alongside DOM `AbortError`) to `code: 'aborted'` in the two openrouter adapters. + +`@tanstack/ai-openai` also exports a new `OpenAIChatCompletionsTextAdapter` / `openaiChatCompletions` / `createOpenaiChatCompletions` factory — a sibling to the existing Responses adapter for callers who want the older `/v1/chat/completions` wire format against the OpenAI SDK. + +## Decouple `@tanstack/ai-openrouter` from the OpenAI base + +OpenRouter ships its own SDK (`@openrouter/sdk`) with a camelCase shape, so inheriting from the OpenAI-shaped base forced a snake_case ↔ camelCase round-trip on every request and stream event. ai-openrouter now extends `BaseTextAdapter` directly and inlines its own stream processors (`OpenRouterTextAdapter` for chat-completions, `OpenRouterResponsesTextAdapter` for the Responses beta), reading OpenRouter's camelCase types natively. The `@tanstack/openai-base` and `openai` dependencies are removed from ai-openrouter; only `@openrouter/sdk`, `@tanstack/ai`, and `@tanstack/ai-utils` remain. The ~300 LOC of inbound/outbound shape converters (`toOpenRouterRequest`, `toChatCompletion`, `adaptOpenRouterStreamChunks`, `toSnakeResponseResult`, …) are gone. Internal: duck-typed `as { ... }` casts on stream chunks in `OpenRouterResponsesTextAdapter` are replaced with direct narrowing via the SDK's discriminated unions. + +Public OpenRouter API is unchanged: `openRouterText`, `openRouterResponsesText`, `createOpenRouterText`, `createOpenRouterResponsesText`, the OpenRouter tool factories, provider routing surface (`provider`, `models`, `plugins`, `variant`, `transforms`), app attribution headers (`httpReferer`, `appTitle`), `:variant` model suffixing, `RequestAbortedError` propagation, and the OpenRouter-specific structured-output null-preservation all behave the same. + +`ai-ollama` remains on `BaseTextAdapter` directly — its native API uses a different wire format from Chat Completions and was never on the shared base. + +## Summarize subsystem + +Anthropic, Gemini, Ollama, and OpenRouter previously each shipped a bespoke 200–300 LOC summarize adapter. They now construct a `ChatStreamSummarizeAdapter` (formerly `ChatStreamWrapperAdapter`, renamed and exported from `@tanstack/ai/activities`) wrapping their own text adapter, matching the existing OpenAI/Grok pattern. Removes ~600 LOC of duplicated logic across the six providers and ensures behavioural parity. + +Bespoke `*SummarizeProviderOptions` interfaces (e.g. `OpenAISummarizeProviderOptions`, `AnthropicSummarizeProviderOptions`, `GeminiSummarizeProviderOptions`, `OllamaSummarizeProviderOptions`, `OpenRouterSummarizeProviderOptions`) are removed from the provider packages' public exports. Consumers who imported them should switch to inferring the type from the adapter (`InferTextProviderOptions`) or remove the explicit annotation (it'll be inferred from the adapter argument). + +`SummarizeAdapter` interface methods are now generic in `TProviderOptions`. `summarize` and `summarizeStream` previously took `SummarizationOptions` (defaulted, so `modelOptions` was effectively `Record` regardless of the adapter's typed shape). They now take `SummarizationOptions`. Source-compatible for callers that didn't specify the generic; type-tighter for implementers and downstream consumers. `SummarizationOptions`, `SummarizeAdapter`, `BaseSummarizeAdapter`, and `ChatStreamSummarizeAdapter` previously had a mixed `Record` / `Record` / `object` set of defaults for `TProviderOptions`; they now uniformly default to `Record`. diff --git a/.changeset/summarize-unify-on-chat-stream-wrapper.md b/.changeset/summarize-unify-on-chat-stream-wrapper.md deleted file mode 100644 index 1e1607888..000000000 --- a/.changeset/summarize-unify-on-chat-stream-wrapper.md +++ /dev/null @@ -1,23 +0,0 @@ ---- -'@tanstack/ai': patch -'@tanstack/ai-anthropic': patch -'@tanstack/ai-gemini': patch -'@tanstack/ai-grok': patch -'@tanstack/ai-ollama': patch -'@tanstack/ai-openai': patch -'@tanstack/ai-openrouter': patch ---- - -Unify the summarize subsystem on a shared chat-stream wrapper, plumb `modelOptions` through end-to-end, and tighten the `TProviderOptions` generic. - -**Provider summarize adapters now share one implementation.** Anthropic, Gemini, Ollama, and OpenRouter previously each shipped a bespoke 200–300 LOC summarize adapter that re-implemented streaming, error handling, usage accounting, and chunk assembly on top of their text adapter. They now construct a `ChatStreamSummarizeAdapter` (formerly `ChatStreamWrapperAdapter`, renamed and exported from `@tanstack/ai/activities`) wrapping their own text adapter, matching the existing OpenAI/Grok pattern. Removes ~600 LOC of duplicated logic across the six providers and ensures behavioural parity. - -**`SummarizationOptions.modelOptions` now reaches the wire.** Previously the activity layer (`runSummarize` / `runStreamingSummarize`) silently dropped `modelOptions` when building the internal `SummarizationOptions` it forwarded to the adapter. Provider-specific knobs (Anthropic cache control, OpenRouter plugins, Gemini safety settings, Groq tuning params, …) now flow through correctly. - -**Provider summarize types resolve from the wrapped text adapter.** Each provider previously shipped a bespoke `XSummarizeProviderOptions` interface (a partial copy of its text provider options). Those interfaces are removed; summarize provider options are now inferred from the text adapter's `~types` via the new `InferTextProviderOptions` helper exported from `@tanstack/ai/activities`. IntelliSense for `modelOptions` on `summarize({ adapter: openai('gpt-4o'), … })` now matches what `chat({ adapter: openai('gpt-4o'), … })` would show. - -**`SummarizeAdapter` interface methods are now generic in `TProviderOptions`.** `summarize` and `summarizeStream` previously took `SummarizationOptions` (defaulted, so `modelOptions` was effectively `Record` regardless of the adapter's typed shape). They now take `SummarizationOptions`, threading the class's `TProviderOptions` generic through. Source-compatible for callers that didn't specify the generic; type-tighter for implementers and downstream consumers. - -**Default aligned across the summarize surface.** `SummarizationOptions`, `SummarizeAdapter`, `BaseSummarizeAdapter`, and `ChatStreamSummarizeAdapter` previously had a mixed `Record` / `Record` / `object` set of defaults for `TProviderOptions`. They now uniformly default to `Record` so unparameterised consumers narrow before indexed access on `modelOptions`. The `extends object` constraint is unchanged — per-model typed interfaces (e.g. `OpenAIBaseOptions & OpenAIReasoningOptions & ...`) inferred via `InferTextProviderOptions` continue to satisfy it without needing a string index signature. No public-surface signature change for callers that supply a concrete provider-options shape (every shipping adapter does). - -Bespoke `*SummarizeProviderOptions` interfaces (e.g. `OpenAISummarizeProviderOptions`, `AnthropicSummarizeProviderOptions`, `GeminiSummarizeProviderOptions`, `OllamaSummarizeProviderOptions`, `OpenRouterSummarizeProviderOptions`) are removed from the provider packages' public exports. Consumers who imported them should switch to inferring the type from the adapter (`InferTextProviderOptions`) or remove the explicit annotation (it'll be inferred from the adapter argument). diff --git a/docs/adapters/openai.md b/docs/adapters/openai.md index 122aaf520..1d042c4ac 100644 --- a/docs/adapters/openai.md +++ b/docs/adapters/openai.md @@ -35,6 +35,50 @@ const stream = chat({ }); ``` +## Chat Completions API + +`@tanstack/ai-openai` ships two text adapters that hit different OpenAI endpoints. `openaiText` (default) calls the Responses API (`/v1/responses`). `openaiChatCompletions` calls the older Chat Completions API (`/v1/chat/completions`). + +Pick whichever fits your wire format and feature needs: + +| | `openaiText` (Responses) | `openaiChatCompletions` (Chat Completions) | +|---|---|---| +| Endpoint | `/v1/responses` | `/v1/chat/completions` | +| Reasoning summaries | Yes — set `modelOptions.reasoning.summary: 'auto'` to surface reasoning text via `REASONING_*` events | No — reasoning tokens are still consumed but cannot be exposed | +| Wire-format compatibility | OpenAI-only | Matches the older de-facto industry shape (Grok, Groq, OpenRouter, many local model servers) | +| Structured output streaming | `text.format: { type: 'json_schema', strict: true }` + `stream: true` | `response_format: { type: 'json_schema', strict: true }` + `stream: true` | + +Use `openaiText` when you want reasoning-summary streaming or OpenAI-specific Responses features. Use `openaiChatCompletions` when you're migrating off a Chat-Completions-style provider, share request-building code with other Chat-Completions adapters in your stack, or want the more battle-tested wire format. + +```typescript +import { chat } from "@tanstack/ai"; +import { openaiChatCompletions } from "@tanstack/ai-openai"; + +const stream = chat({ + adapter: openaiChatCompletions("gpt-5.2"), + messages: [{ role: "user", content: "Hello!" }], +}); +``` + +With an explicit API key: + +```typescript +import { chat } from "@tanstack/ai"; +import { createOpenaiChatCompletions } from "@tanstack/ai-openai"; + +const adapter = createOpenaiChatCompletions("gpt-5.2", { + apiKey: process.env.OPENAI_API_KEY!, + // organization, baseURL, headers — all optional +}); + +const stream = chat({ + adapter, + messages: [{ role: "user", content: "Hello!" }], +}); +``` + +Both adapters work identically with [Structured Outputs](../chat/structured-outputs) — including `stream: true` — and accept the same `modelOptions` (temperature, top_p, max_tokens, stop, …). The reasoning section below applies to `openaiText`; `openaiChatCompletions` accepts `modelOptions.reasoning.effort` but cannot stream summary text. + ## Basic Usage - Custom API Key ```typescript @@ -289,6 +333,26 @@ Creates an OpenAI chat adapter with an explicit API key. **Returns:** An OpenAI chat adapter instance. +### `openaiChatCompletions(model)` + +Creates an OpenAI chat adapter that targets `/v1/chat/completions` instead of the Responses API. See [Chat Completions API](#chat-completions-api) for when to use this over `openaiText`. + +**Returns:** An OpenAI chat adapter instance using the Chat Completions wire format. + +### `createOpenaiChatCompletions(model, config)` + +Creates an OpenAI chat-completions adapter with an explicit API key. + +**Parameters:** + +- `model` - OpenAI model id (e.g. `"gpt-5.2"`, `"gpt-4o-mini"`) +- `config.apiKey` - Your OpenAI API key +- `config.organization?` - Organization ID (optional) +- `config.baseURL?` - Custom base URL (optional) +- `config.headers?` - Additional headers (optional) + +**Returns:** An OpenAI chat adapter instance using the Chat Completions wire format. + ### `openaiSummarize(config?)` Creates an OpenAI summarization adapter using environment variables. diff --git a/docs/chat/structured-outputs.md b/docs/chat/structured-outputs.md index 6bcd9c7a7..134a58c24 100644 --- a/docs/chat/structured-outputs.md +++ b/docs/chat/structured-outputs.md @@ -83,8 +83,9 @@ The return type of `chat()` changes based on the `outputSchema` prop: | Configuration | Return Type | |--------------|-------------| -| No `outputSchema` | `AsyncIterable` | +| No `outputSchema` | `AsyncIterable` | | With `outputSchema` | `Promise>` | +| With `outputSchema` and `stream: true` | `StructuredOutputStream>` | When you provide an `outputSchema`, TanStack AI automatically infers the TypeScript type from your schema: @@ -181,6 +182,186 @@ console.log(company.headquarters.city); console.log(company.employees[0].role); ``` +## Streaming Structured Output + +Pass `stream: true` alongside `outputSchema` to receive incremental JSON deltas while the model is generating, plus a final validated, typed object. This is the path to take when you want a progressive UI — a streaming form, a typewriter-style preview, partial cards filling in field by field — instead of a single blocking await. + +You build it in two halves: a server route that runs `chat({ outputSchema, stream: true })` and pipes the result as Server-Sent Events, and a client that wires `useChat` to that endpoint and updates state as chunks arrive. The same flow as regular streaming chat (see [Streaming](./streaming)) — `outputSchema + stream: true` just adds one terminal event with the validated object. + +### Server endpoint + +```typescript +// app/api/extract-person/route.ts (or your framework's equivalent) +import { chat, toServerSentEventsResponse } from "@tanstack/ai"; +import { openaiText } from "@tanstack/ai-openai"; +import { z } from "zod"; + +const PersonSchema = z.object({ + name: z.string().meta({ description: "The person's full name" }), + age: z.number().meta({ description: "The person's age in years" }), + email: z.string().email(), +}); + +export async function POST(request: Request) { + const { messages } = await request.json(); + + const stream = chat({ + adapter: openaiText("gpt-5.2"), + messages, + outputSchema: PersonSchema, + stream: true, + }); + + return toServerSentEventsResponse(stream); +} +``` + +That's the entire server side. `chat({ outputSchema, stream: true })` returns a `StructuredOutputStream>` — the same kind of `AsyncIterable` that `toServerSentEventsResponse` accepts for any streaming chat endpoint. The schema travels in the request as JSON Schema, validation runs server-side after the stream completes, and the validated object is emitted as the terminal `structured-output.complete` event. + +### Client with `useChat` + +Pass the same schema to `useChat` and the hook tracks the progressive object and the validated terminal object for you — `partial` updates as JSON streams in, `final` snaps when `structured-output.complete` arrives. No external state, no `onChunk` ceremony, no `parsePartialJSON` calls: + +```tsx +import { useChat, fetchServerSentEvents } from "@tanstack/ai-react"; +import { z } from "zod"; + +const PersonSchema = z.object({ + name: z.string(), + age: z.number(), + email: z.string().email(), +}); + +function PersonExtractor() { + const { sendMessage, isLoading, partial, final } = useChat({ + connection: fetchServerSentEvents("/api/extract-person"), + outputSchema: PersonSchema, + }); + + return ( +
{ + e.preventDefault(); + sendMessage("Extract: John Doe, 30, john@example.com"); + }} + > + + {/* `partial` fills in field by field as JSON streams in. */} +

Name: {partial.name ?? "…"}

+

Age: {partial.age ?? "…"}

+

Email: {partial.email ?? "…"}

+ {final &&
Validated: {JSON.stringify(final, null, 2)}
} +
+ ); +} +``` + +What the hook does for you: + +- **`partial`** is `DeepPartial>` — every property optional, every nested array element optional. Updated from `TEXT_MESSAGE_CONTENT` deltas via `parsePartialJSON`. Resets on every new `sendMessage` / `reload`. +- **`final`** is `z.infer | null` — the validated terminal payload from the `structured-output.complete` event. `null` until the run completes successfully. +- **`outputSchema`** is used purely for client-side **type inference**. Validation still runs on the server against the schema you pass to `chat({ outputSchema })` on the server route. +- This same hook shape works for **non-streaming structured output too**. If your server returns a single `structured-output.complete` event (the fallback path for adapters that don't natively stream), `partial` stays `{}` and `final` populates when the event arrives — same consumer code. + +The `outputSchema` field is optional: if you omit it, `useChat`'s return type is unchanged, and `partial` / `final` aren't present. + +### Rendering reasoning and tool calls + +Reasoning tokens and tool calls aren't on `partial` / `final` — they're already where they'd be in a normal chat: on `messages[…].parts`. The stream processor inside `useChat` routes each chunk type to its canonical part: + +| Chunk type | Where it lands | +|---|---| +| `REASONING_MESSAGE_CONTENT` | `ThinkingPart` on the assistant message | +| `TOOL_CALL_START` / `_ARGS` / `_END` | `ToolCallPart` on the assistant message | +| `TOOL_CALL_RESULT` | `ToolResultPart` on the tool message | +| `TEXT_MESSAGE_CONTENT` | `TextPart` on the assistant message (this is the raw JSON when `outputSchema` is set — see below) | + +So render reasoning and tool calls the same way you'd render them in a normal chat UI: + +```tsx +const last = messages.at(-1); + +return ( + <> + {last?.parts.map((part, i) => { + if (part.type === "thinking") return ; + if (part.type === "tool-call") return ; + // Hide raw JSON text — the structured view below replaces it. + if (part.type === "text") return null; + return null; + })} + + + +); +``` + +> **Note:** When `outputSchema` is set, the assistant's `TextPart` contains the raw JSON the model produced (e.g. `{"name":"John","age":30,…}`). That's not meant to be shown to end users — the structured view powered by `partial` / `final` replaces it. Filter `text` parts out of your message renderer in this mode, as in the snippet above. + +> **Going lower-level?** `useChat` still exposes `onChunk` if you want to observe individual chunks alongside the managed `partial` / `final` state (e.g. to drive a custom progress UI). The two paths compose — internal partial/final tracking always runs first, then your `onChunk` callback fires with the same chunk. + +`useChat` (React, Vue, Solid) and `createChat` (Svelte) all accept the same `outputSchema` option and expose `partial` / `final` with the same semantics — only the reactivity primitive differs (React state, Vue `shallowRef`, Solid `Accessor`, Svelte reactive getter). See your framework's quick-start for the local idioms. + +### What the stream contains + +`chat({ outputSchema, stream: true })` returns a `StructuredOutputStream` — an `AsyncIterable` over the standard `StreamChunk` lifecycle plus a terminal `CUSTOM` event named `structured-output.complete`: + +```typescript +{ + type: "CUSTOM", + name: "structured-output.complete", + value: { + object: T; // validated, parsed, typed + raw: string; // full accumulated JSON text + reasoning?: string; // present only for thinking/reasoning models + }, + // ...standard event fields (timestamp, model, …) +} +``` + +### Adapter coverage + +Streaming structured output works with **every adapter**, but only some support a true single-request streaming wire format: + +| Adapter | Behavior with `outputSchema` + `stream: true` | +|---------|-----------------------------------------------| +| `@tanstack/ai-openai` | Native single-request stream (Responses API, `text.format: json_schema`) | +| `@tanstack/ai-openrouter` | Native single-request stream (`response_format: json_schema`) | +| `@tanstack/ai-grok` | Native single-request stream (Chat Completions, `response_format: json_schema`) | +| `@tanstack/ai-groq` | Native single-request stream (Chat Completions, `response_format: json_schema`) | +| Other adapters (anthropic, gemini, ollama, …) | Fallback: runs non-streaming `structuredOutput` and emits the final object as one `structured-output.complete` event | + +The fallback path keeps the consumer code identical across providers — you always read the final object off `structured-output.complete` — but you won't see incremental deltas unless the adapter implements `structuredOutputStream` natively. + +### Advanced: iterating the stream directly + +When you don't need the SSE-over-HTTP boundary — Node scripts, CLIs, server endpoints that respond with a final JSON object instead of a stream, or tests — you can consume `chat({ outputSchema, stream: true })` as a plain async iterable: + +```typescript +import { chat } from "@tanstack/ai"; +import { openaiText } from "@tanstack/ai-openai"; +import { z } from "zod"; + +const PersonSchema = z.object({ name: z.string(), age: z.number(), email: z.string().email() }); + +const stream = chat({ + adapter: openaiText("gpt-5.2"), + messages: [{ role: "user", content: "Extract: John Doe is 30, john@example.com" }], + outputSchema: PersonSchema, + stream: true, +}); + +for await (const chunk of stream) { + if (chunk.type === "CUSTOM" && chunk.name === "structured-output.complete") { + // Validated and typed against PersonSchema. + console.log(chunk.value.object.name); + console.log(chunk.value.object.age); + } +} +``` + +This is the same `StructuredOutputStream` the server endpoint above hands to `toServerSentEventsResponse`. Pick this shape when you're a single process end-to-end; use the server-endpoint-plus-`useChat` shape when there's a network in the middle. + ## Combining with Tools Structured outputs work seamlessly with the agentic tool loop. When both `outputSchema` and `tools` are provided, TanStack AI will: @@ -228,6 +409,58 @@ console.log(recommendation.currentPrice); console.log(recommendation.reason); ``` +### Streaming with tools that may pause + +When you combine `tools` + `outputSchema` + `stream: true`, the agent loop runs first — its events stream through, and only after all tools complete does the structured output stream emit `structured-output.complete`. Two situations can interrupt that flow before the terminal event arrives: + +1. **A server tool with `needsApproval: true` is queued.** The agent loop pauses and the queued tool-call lands on the assistant message as a `ToolCallPart` with `state === "approval-requested"`. You respond by calling `addToolApprovalResponse({ id, approved })` from the hook return — same flow as in a normal chat. See [Tool Approval Flow](../tools/tool-approval) for the full pattern. +2. **A client tool is invoked.** If you registered the tool with an `execute` function, the client runs it automatically and posts the result back — no extra code on your side. If you want to handle it manually, listen for `onToolCall` and respond with `addToolResult({ toolCallId, tool, output, state })`. See [Client Tools](../tools/client-tools) for details. + +There's nothing structured-output-specific in either flow — both reuse the standard chat pause/resume APIs. The structured stream layers on top: once tools complete (or the user approves), the agent loop finishes, the structured-output stream takes over, `partial` fills in, and `final` snaps when `structured-output.complete` arrives. For example, an approval-gated tool inside a structured-output run looks like: + +```tsx +const { messages, sendMessage, partial, final, addToolApprovalResponse } = useChat({ + connection: fetchServerSentEvents("/api/recommend"), + outputSchema: RecommendationSchema, + tools: [sendEmail], // server tool with needsApproval: true +}); + +const last = messages.at(-1); + +return ( + <> + {last?.parts.map((part, i) => { + // Surface approval prompts inline, the same way Tool Approval Flow shows it. + if ( + part.type === "tool-call" && + part.state === "approval-requested" && + part.approval + ) { + return ( + + addToolApprovalResponse({ id: part.approval!.id, approved: true }) + } + onDeny={() => + addToolApprovalResponse({ id: part.approval!.id, approved: false }) + } + /> + ); + } + if (part.type === "thinking") return ; + if (part.type === "tool-call") return ; + return null; // hide TextPart (raw JSON when outputSchema is set) + })} + + + +); +``` + +While the approval is pending, `partial` stays at its last value and `final` stays `null`. As soon as the user approves (or denies and the loop resumes), the agent loop continues, the structured stream runs, and `partial` / `final` populate. + ## Using Plain JSON Schema If you prefer not to use a schema library, you can pass a plain JSON Schema object: diff --git a/examples/ts-react-chat/src/components/Header.tsx b/examples/ts-react-chat/src/components/Header.tsx index 4cd9fc4d8..7dda9649a 100644 --- a/examples/ts-react-chat/src/components/Header.tsx +++ b/examples/ts-react-chat/src/components/Header.tsx @@ -163,7 +163,7 @@ export default function Header() { }} > - Structured Output (OpenRouter) + Structured Output
diff --git a/examples/ts-react-chat/src/lib/server-fns.ts b/examples/ts-react-chat/src/lib/server-fns.ts index c64168d23..51100e9b5 100644 --- a/examples/ts-react-chat/src/lib/server-fns.ts +++ b/examples/ts-react-chat/src/lib/server-fns.ts @@ -191,11 +191,12 @@ export const summarizeFn = createServerFn({ method: 'POST' }) text: z.string(), maxLength: z.number().optional(), style: z.enum(['bullet-points', 'paragraph', 'concise']).optional(), + model: z.string().optional(), }), ) .handler(async ({ data }) => { return summarize({ - adapter: openaiSummarize('gpt-4o-mini'), + adapter: openaiSummarize((data.model ?? 'gpt-4o-mini') as 'gpt-4o-mini'), text: data.text, maxLength: data.maxLength, style: data.style, @@ -338,12 +339,15 @@ export const summarizeStreamFn = createServerFn({ method: 'POST' }) text: z.string(), maxLength: z.number().optional(), style: z.enum(['bullet-points', 'paragraph', 'concise']).optional(), + model: z.string().optional(), }), ) .handler(({ data }) => { return toServerSentEventsResponse( summarize({ - adapter: openaiSummarize('gpt-4o-mini'), + adapter: openaiSummarize( + (data.model ?? 'gpt-4o-mini') as 'gpt-4o-mini', + ), text: data.text, maxLength: data.maxLength, style: data.style, diff --git a/examples/ts-react-chat/src/routes/api.structured-output.ts b/examples/ts-react-chat/src/routes/api.structured-output.ts index aa1d045f2..73d5325c0 100644 --- a/examples/ts-react-chat/src/routes/api.structured-output.ts +++ b/examples/ts-react-chat/src/routes/api.structured-output.ts @@ -1,7 +1,14 @@ import { createFileRoute } from '@tanstack/react-router' -import { chat } from '@tanstack/ai' -import { openRouterText } from '@tanstack/ai-openrouter' +import { chat, toServerSentEventsResponse } from '@tanstack/ai' +import { openaiChatCompletions, openaiText } from '@tanstack/ai-openai' +import { grokText } from '@tanstack/ai-grok' +import { groqText } from '@tanstack/ai-groq' +import { + openRouterResponsesText, + openRouterText, +} from '@tanstack/ai-openrouter' import { z } from 'zod' +import type { AnyTextAdapter, StreamChunk } from '@tanstack/ai' const GuitarRecommendationSchema = z.object({ title: z.string().describe('Short headline for the recommendation'), @@ -21,23 +28,161 @@ const GuitarRecommendationSchema = z.object({ nextSteps: z.array(z.string()).describe('Practical follow-up actions'), }) +type Provider = + | 'openai' + | 'openai-chat' + | 'grok' + | 'groq' + | 'openrouter' + | 'openrouter-responses' + +const StructuredOutputRequestSchema = z.object({ + prompt: z.string().min(1), + provider: z + .enum([ + 'openai', + 'openai-chat', + 'grok', + 'groq', + 'openrouter', + 'openrouter-responses', + ]) + .optional(), + model: z.string().optional(), + stream: z.boolean().optional(), +}) + +function adapterFor(provider: Provider, model?: string): AnyTextAdapter { + switch (provider) { + case 'openai': + return openaiText((model || 'gpt-5.2') as 'gpt-5.2') + case 'openai-chat': + // Same model surface as the Responses adapter, but talks to + // `/v1/chat/completions`. Useful for side-by-side comparison of + // streaming structured output across the two OpenAI wire formats. + return openaiChatCompletions((model || 'gpt-4o') as 'gpt-4o') + case 'grok': + return grokText( + (model || 'grok-4-1-fast-reasoning') as 'grok-4-1-fast-reasoning', + ) + case 'groq': + return groqText( + (model || + 'meta-llama/llama-4-maverick-17b-128e-instruct') as 'meta-llama/llama-4-maverick-17b-128e-instruct', + ) + case 'openrouter': + return openRouterText( + (model || 'anthropic/claude-opus-4.6') as 'anthropic/claude-opus-4.6', + ) + case 'openrouter-responses': + // OpenRouter Responses (beta) endpoint — same model surface as the + // chat-completions adapter, but routes through `/v1/responses`. This + // is what exercises `OpenRouterResponsesTextAdapter.structuredOutputStream`. + return openRouterResponsesText( + (model || 'anthropic/claude-opus-4.6') as 'anthropic/claude-opus-4.6', + ) + } +} + +// Per-provider modelOptions to opt into reasoning surfacing. Without these, +// reasoning models reason silently and the UI never sees REASONING_* events. +function reasoningOptionsFor( + provider: Provider, + model: string | undefined, +): Record | undefined { + switch (provider) { + case 'openai': + // Responses API: `reasoning.summary: 'auto'` is what makes the API emit + // `response.reasoning_summary_text.delta` events. Only valid on + // reasoning models (gpt-5.x, o-series); older models (gpt-4o) reject it. + if ( + model?.startsWith('gpt-5') || + model?.startsWith('o3') || + model?.startsWith('o4') + ) { + return { reasoning: { summary: 'auto' } } + } + return undefined + case 'openai-chat': + // Chat Completions API doesn't surface reasoning summaries the way + // Responses does. Reasoning models still reason silently; no opt-in + // option to inject here. + return undefined + case 'groq': + // Groq's Chat Completions only streams `delta.reasoning` when + // `reasoning_format: 'parsed'`. Required for gpt-oss / qwen3 / kimi-k2 + // to emit reasoning during structured output (json_schema mode). + if ( + model?.startsWith('openai/gpt-oss') || + model?.startsWith('qwen') || + model?.startsWith('moonshotai/kimi') + ) { + return { reasoning_format: 'parsed' } + } + return undefined + case 'openrouter': + case 'openrouter-responses': + // OpenRouter normalises across providers. `reasoning.effort` triggers + // the upstream model's reasoning + surfaces the deltas. Same option on + // both the chat-completions and Responses-beta endpoints. + return { reasoning: { effort: 'medium' } } + case 'grok': + // xAI surfaces `delta.reasoning_content` automatically on reasoning + // models (grok-3-mini, grok-4-fast-reasoning, grok-4-1-fast-reasoning). + // No request param needed. + return undefined + } +} + export const Route = createFileRoute('/api/structured-output')({ server: { handlers: { POST: async ({ request }) => { - const body = await request.json() - const { prompt, model } = body as { - prompt: string - model?: string - } - try { + const parsed = StructuredOutputRequestSchema.safeParse( + await request.json(), + ) + if (!parsed.success) { + return new Response( + JSON.stringify({ error: 'Invalid request body' }), + { + status: 400, + headers: { 'Content-Type': 'application/json' }, + }, + ) + } + const { prompt, provider, model, stream } = parsed.data + const resolvedProvider: Provider = provider || 'openrouter' + const modelOptions = reasoningOptionsFor(resolvedProvider, model) + + if (stream) { + const abortController = new AbortController() + request.signal.addEventListener('abort', () => + abortController.abort(), + ) + const streamIterable = chat({ + adapter: adapterFor(resolvedProvider, model), + modelOptions: modelOptions as never, + messages: [{ role: 'user', content: prompt }], + outputSchema: GuitarRecommendationSchema, + stream: true, + abortController, + }) as AsyncIterable + return toServerSentEventsResponse(streamIterable, { + abortController, + }) + } + + const abortController = new AbortController() + request.signal.addEventListener('abort', () => + abortController.abort(), + ) const result = await chat({ - adapter: openRouterText( - (model || 'openai/gpt-5.2') as 'openai/gpt-5.2', - ), + adapter: adapterFor(resolvedProvider, model), + modelOptions: modelOptions as never, messages: [{ role: 'user', content: prompt }], outputSchema: GuitarRecommendationSchema, + abortController, }) return new Response(JSON.stringify({ data: result }), { diff --git a/examples/ts-react-chat/src/routes/generations.structured-output.tsx b/examples/ts-react-chat/src/routes/generations.structured-output.tsx index 9b123308f..4049c2a7e 100644 --- a/examples/ts-react-chat/src/routes/generations.structured-output.tsx +++ b/examples/ts-react-chat/src/routes/generations.structured-output.tsx @@ -1,98 +1,389 @@ -import { useState } from 'react' +import { useRef, useState } from 'react' import { createFileRoute } from '@tanstack/react-router' +import { parsePartialJSON } from '@tanstack/ai' const SAMPLE_PROMPT = 'I play indie rock and have a $1500 budget. Recommend two electric guitars and one acoustic to round out my rig.' -const OPENROUTER_MODELS = [ - { value: 'openai/gpt-5.2', label: 'OpenAI GPT-5.2' }, - { value: 'openai/gpt-5.2-pro', label: 'OpenAI GPT-5.2 Pro' }, - { value: 'openai/gpt-5.1', label: 'OpenAI GPT-5.1' }, - { value: 'anthropic/claude-opus-4.7', label: 'Claude Opus 4.7' }, - { value: 'anthropic/claude-sonnet-4.6', label: 'Claude Sonnet 4.6' }, - { value: 'google/gemini-3.1-pro-preview', label: 'Gemini 3.1 Pro (Preview)' }, - { value: 'x-ai/grok-4.1-fast', label: 'Grok 4.1 Fast' }, -] as const - -interface RecommendationResult { - title: string - summary: string - recommendations: Array<{ - name: string - brand: string - type: 'acoustic' | 'electric' | 'bass' | 'classical' - priceRangeUsd: { min: number; max: number } - reason: string - }> - nextSteps: Array +type Provider = + | 'openai' + | 'openai-chat' + | 'grok' + | 'groq' + | 'openrouter' + | 'openrouter-responses' + +const PROVIDER_MODELS: Record< + Provider, + Array<{ value: string; label: string }> +> = { + openai: [ + { value: 'gpt-5.2', label: 'GPT-5.2 (frontier)' }, + { value: 'gpt-5.2-pro', label: 'GPT-5.2 Pro' }, + { value: 'gpt-5.1', label: 'GPT-5.1' }, + { value: 'gpt-5', label: 'GPT-5' }, + { value: 'gpt-5-mini', label: 'GPT-5 Mini' }, + { value: 'gpt-4o', label: 'GPT-4o' }, + ], + // OpenAI Chat Completions: same model surface, older `/v1/chat/completions` + // wire format. The reasoning-summary opt-in isn't available here, so + // streaming reasoning won't be surfaced for gpt-5.x even though the model + // is still doing it under the hood. + 'openai-chat': [ + { value: 'gpt-4o', label: 'GPT-4o' }, + { value: 'gpt-5-mini', label: 'GPT-5 Mini' }, + { value: 'gpt-5', label: 'GPT-5' }, + { value: 'gpt-5.1', label: 'GPT-5.1' }, + { value: 'gpt-5.2', label: 'GPT-5.2 (frontier)' }, + ], + grok: [ + { value: 'grok-4-1-fast-reasoning', label: 'Grok 4.1 Fast (reasoning)' }, + { + value: 'grok-4-1-fast-non-reasoning', + label: 'Grok 4.1 Fast (non-reasoning)', + }, + { value: 'grok-4', label: 'Grok 4' }, + { value: 'grok-3', label: 'Grok 3' }, + ], + groq: [ + { + value: 'meta-llama/llama-4-maverick-17b-128e-instruct', + label: 'Llama 4 Maverick 17B', + }, + { + value: 'meta-llama/llama-4-scout-17b-16e-instruct', + label: 'Llama 4 Scout 17B', + }, + { + value: 'moonshotai/kimi-k2-instruct-0905', + label: 'Kimi K2 Instruct', + }, + { value: 'llama-3.3-70b-versatile', label: 'Llama 3.3 70B Versatile' }, + { value: 'openai/gpt-oss-120b', label: 'GPT-OSS 120B' }, + ], + openrouter: [ + { value: 'anthropic/claude-opus-4.6', label: 'Claude Opus 4.6' }, + { value: 'anthropic/claude-sonnet-4.6', label: 'Claude Sonnet 4.6' }, + { value: 'openai/gpt-5.2', label: 'GPT-5.2 (via OpenRouter)' }, + { value: 'x-ai/grok-4.1-fast', label: 'Grok 4.1 Fast (via OpenRouter)' }, + ], + // OpenRouter Responses (beta) endpoint — same upstream models, but the + // request/response uses the Responses API wire format. Useful to compare + // streaming behaviour against the chat-completions adapter above. + 'openrouter-responses': [ + { value: 'anthropic/claude-opus-4.6', label: 'Claude Opus 4.6' }, + { value: 'anthropic/claude-sonnet-4.6', label: 'Claude Sonnet 4.6' }, + { value: 'openai/gpt-5.2', label: 'GPT-5.2 (via OpenRouter)' }, + { value: 'x-ai/grok-4.1-fast', label: 'Grok 4.1 Fast (via OpenRouter)' }, + ], +} + +interface PartialRecommendation { + name?: string + brand?: string + type?: 'acoustic' | 'electric' | 'bass' | 'classical' | string + priceRangeUsd?: { min?: number; max?: number } + reason?: string +} + +interface PartialResult { + title?: string + summary?: string + recommendations?: Array + nextSteps?: Array +} + +interface StreamChunkPayload { + type: string + delta?: string + content?: string + name?: string + value?: { object?: unknown; raw?: string; reasoning?: string } + message?: string +} + +// Pick the last meaningful sentence/line out of an accumulating reasoning +// stream so the UI can render a single rolling line of "what it's thinking +// right now" rather than a growing wall of text. +function latestThought(reasoning: string): string { + const trimmed = reasoning.trimEnd() + if (!trimmed) return '' + // Prefer the last sentence; fall back to the last newline-delimited line. + const sentenceMatch = trimmed.match(/[^.!?\n]+[.!?]?\s*$/) + const candidate = sentenceMatch ? sentenceMatch[0] : trimmed + const last = candidate.split('\n').filter(Boolean).pop() ?? candidate + return last.trim() } function StructuredOutputPage() { + const providerId = 'structured-output-provider' + const modelId = 'structured-output-model' + const promptId = 'structured-output-prompt' const [prompt, setPrompt] = useState(SAMPLE_PROMPT) - const [model, setModel] = useState(OPENROUTER_MODELS[0].value) - const [result, setResult] = useState(null) + const [provider, setProvider] = useState('openai') + const [model, setModel] = useState(PROVIDER_MODELS.openai[0].value) + const [stream, setStream] = useState(true) + const [result, setResult] = useState(null) + const [rawJson, setRawJson] = useState('') + const [deltaCount, setDeltaCount] = useState(0) + const [isStreaming, setIsStreaming] = useState(false) + const [hasFinalResult, setHasFinalResult] = useState(false) + const [reasoningLine, setReasoningLine] = useState('') + const [reasoningFull, setReasoningFull] = useState('') const [error, setError] = useState(null) const [isLoading, setIsLoading] = useState(false) + const abortRef = useRef(null) + + const onProviderChange = (next: Provider) => { + setProvider(next) + setModel(PROVIDER_MODELS[next][0].value) + } + + const reset = () => { + setResult(null) + setRawJson('') + setDeltaCount(0) + setHasFinalResult(false) + setReasoningLine('') + setReasoningFull('') + setError(null) + } const handleGenerate = async () => { if (!prompt.trim()) return setIsLoading(true) - setError(null) - setResult(null) + reset() + setIsStreaming(stream) + + const controller = new AbortController() + abortRef.current = controller try { const response = await fetch('/api/structured-output', { method: 'POST', headers: { 'Content-Type': 'application/json' }, - body: JSON.stringify({ prompt: prompt.trim(), model }), + body: JSON.stringify({ + prompt: prompt.trim(), + provider, + model, + stream, + }), + signal: controller.signal, }) - const payload = await response.json() + if (!response.ok) { - throw new Error(payload.error || 'Request failed') + const errPayload = await response.json().catch(() => ({})) + throw new Error( + errPayload.error || `Request failed (${response.status})`, + ) + } + + if (!stream) { + const payload = await response.json() + setResult(payload.data as PartialResult) + setHasFinalResult(true) + return + } + + // Streaming path — parse SSE, accumulate raw JSON, render the partially + // parsed object live, snap to the validated terminal payload. + const reader = response.body!.getReader() + const decoder = new TextDecoder() + let buffer = '' + let accumulated = '' + let reasoning = '' + let deltas = 0 + let sawComplete = false + + const processBuffer = () => { + let sepIdx = buffer.indexOf('\n\n') + while (sepIdx !== -1) { + const frame = buffer.slice(0, sepIdx) + buffer = buffer.slice(sepIdx + 2) + sepIdx = buffer.indexOf('\n\n') + + for (const line of frame.split('\n')) { + if (!line.startsWith('data: ')) continue + const json = line.slice(6).trim() + if (!json) continue + let chunk: StreamChunkPayload + try { + chunk = JSON.parse(json) as StreamChunkPayload + } catch { + continue + } + + if (chunk.type === 'TEXT_MESSAGE_CONTENT' && chunk.delta) { + accumulated += chunk.delta + deltas += 1 + setRawJson(accumulated) + setDeltaCount(deltas) + // partial-json tolerates incomplete JSON — it returns whatever + // structure can be inferred. Render it directly so the UI fills + // in field by field as the model produces them. + const partial = parsePartialJSON(accumulated) as + | PartialResult + | undefined + if (partial && typeof partial === 'object') { + setResult(partial) + } + } else if ( + chunk.type === 'REASONING_MESSAGE_CONTENT' && + chunk.delta + ) { + reasoning += chunk.delta + setReasoningFull(reasoning) + // One-liner: take the last non-empty line/sentence so consumers + // see "what it's thinking right now" without a wall of text. + setReasoningLine(latestThought(reasoning)) + } else if ( + chunk.type === 'CUSTOM' && + chunk.name === 'structured-output.complete' && + chunk.value?.object + ) { + sawComplete = true + setResult(chunk.value.object as PartialResult) + setHasFinalResult(true) + if ( + typeof (chunk.value as { reasoning?: string }).reasoning === + 'string' + ) { + const finalReasoning = (chunk.value as { reasoning: string }) + .reasoning + setReasoningFull(finalReasoning) + setReasoningLine(latestThought(finalReasoning)) + } + } else if (chunk.type === 'RUN_ERROR') { + throw new Error(chunk.message || 'Stream failed') + } + } + } + } + + while (true) { + const { done, value } = await reader.read() + if (done) break + buffer += decoder.decode(value, { stream: true }) + processBuffer() + } + + // Flush any buffered bytes from incomplete multi-byte UTF-8 sequences + // so the final SSE frame isn't dropped. + buffer += decoder.decode() + processBuffer() + + if (!sawComplete) { + throw new Error('Stream ended before structured-output.complete') } - setResult(payload.data as RecommendationResult) } catch (err) { - setError(err instanceof Error ? err.message : 'Unknown error') + if (err instanceof Error && err.name === 'AbortError') { + setError('Aborted') + } else { + setError(err instanceof Error ? err.message : 'Unknown error') + } } finally { setIsLoading(false) + setIsStreaming(false) + abortRef.current = null } } + const handleAbort = () => abortRef.current?.abort() + + const renderingPartial = isStreaming && !hasFinalResult + const recommendations = result?.recommendations ?? [] + const nextSteps = result?.nextSteps ?? [] + return (
-

- Structured Output (OpenRouter) -

+

Structured Output

Calls chat() with an{' '} - outputSchema via the{' '} - openRouterText adapter and - parses the JSON result. + outputSchema. Toggle{' '} + stream to exercise{' '} + structuredOutputStream on the + selected provider; the UI fills in progressively via{' '} + parsePartialJSON, then snaps + to the validated payload from the terminal{' '} + structured-output.complete{' '} + event. Reasoning models surface a live thinking strip from{' '} + REASONING_MESSAGE_CONTENT{' '} + deltas — openai (Responses API), openrouter, xAI ( + delta.reasoning_content), and + Groq (delta.reasoning) all + stream chain-of-thought.

-
- - onProviderChange(e.target.value as Provider)} + disabled={isLoading} + className="w-full rounded-lg border border-orange-500/20 bg-gray-800/50 px-3 py-2 text-sm text-white focus:outline-none focus:ring-2 focus:ring-orange-500/50 disabled:opacity-50" + > + + + + + - ))} - + + +
+
+ + +
+ +
- +