feat(llm-sdk): MCP servers + reasoning effort + runtime.response() (supersedes #98)#99
Open
Sahil5963 wants to merge 7 commits into
Open
feat(llm-sdk): MCP servers + reasoning effort + runtime.response() (supersedes #98)#99Sahil5963 wants to merge 7 commits into
Sahil5963 wants to merge 7 commits into
Conversation
Adds two missing per-request config fields — `mcpServers` and `reasoningEffort` — that reuse the existing `responseFormat` sanitizers landed in PR #93 instead of forking a parallel `respond()` surface. PR #98 also targets this gap but reinvents builders for `text.format` and `output_config` already implemented in `adapters/base.ts`, and introduces a nested-shape regression on Anthropic structured output. This change supersedes #98 by extending the existing pipeline rather than bypassing it. Adapter behavior - OpenAI: `shouldUseResponsesApi()` now also fires when `mcpServers` or `reasoningEffort` is set, routing the call through `/v1/responses` (where these fields are valid). Tools array merges function tools + `tools[type=mcp]` entries from the new helper. `reasoning` block is emitted from `toOpenAIReasoning()`. The legacy `tool_search` meta-tool is no longer injected when the caller didn't wire deferred tools, so MCP-only / reasoning-only requests stay minimal. - Anthropic: `buildRequestOptions()` returns `betas` alongside `options`. When `mcpServers` is set we emit the `mcp-client-2025-11-20` two-array shape (`mcp_servers` entries + separate `tools[type=mcp_toolset]` filtering) and route through `client.beta.messages.{create,stream}` so the SDK attaches the `anthropic-beta` header. `reasoningEffort` maps via `toAnthropicThinking()` — adaptive+effort for Claude 4.6/4.7, explicit `budget_tokens` for older models. - OpenAI adapter throws a clear error when `mcpServers` or `reasoningEffort` is set on a non-OpenAI/Azure baseUrl (Google/xAI/ OpenRouter all share this adapter via baseUrl). Fallback chains with `retryableErrors: () => true` skip past these providers naturally. Helpers in `adapters/base.ts` (alongside existing structured-output sanitizers): - `toOpenAIResponsesMcpTools` — Responses-API `tools[type=mcp]` entries - `toAnthropicMcp` — 2025-11-20 split (`mcp_servers` + `mcp_toolset` tools) with `Authorization: Bearer …` → bare `authorization_token` mapping - `toOpenAIReasoning` — `{ effort, summary: "auto" }` - `toAnthropicThinking` — adaptive for 4.6/4.7, budget_tokens otherwise - `toGeminiThinkingBudget`, `toXAIReasoningEffort` — for future local-MCP fallback work Surfaces - `RequestLLMConfig` (adapter), `ChatRequest.config` (server), `DoGenerateParams` (modern provider), `GenerateTextParams` (top-level), `LLMConfig` — all gain `mcpServers?` + `reasoningEffort?`. - New `AudioPart` content type added to `UserContentPart` (passthrough wiring deferred to a follow-up). - New `runtime.response({ prompt, mcpServers?, reasoningEffort?, responseFormat?, … })` convenience method — thin wrapper over `runtime.generate()` for the one-shot prompt-in/text-out case. Demo - `examples/fallback-demo` adds `POST /response` exercising the bundle across an OpenAI → Anthropic priority chain, mirroring the production self-learning extractor (KB MCP + JSON-schema FAQ output + `high` reasoning). Deferred to follow-up PR - Local-MCP-execution fallback for Google/xAI/Fireworks/OpenRouter (needs a minimal embedded MCP JSON-RPC client + per-adapter tool interception loop). Today these providers throw clearly; fallback chains route past them. Verification - `pnpm --filter @yourgpt/llm-sdk typecheck` clean - `pnpm --filter @yourgpt/llm-sdk build` clean - `npx tsc --noEmit` in `examples/fallback-demo` clean Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…sponse() Sibling to the code change in the previous commit. Mirrors the docs pattern established by PR #93 for `responseFormat`. - `llm-sdk/structured-output.mdx` — adds "MCP servers" and "Reasoning effort" sections with per-provider mapping tables and a brief `runtime.response()` pointer. - New `llm-sdk/response.mdx` — dedicated reference page covering the one-shot wrapper, parameters, result shape, when to use vs generate()/chat(), provider routing under the hood, and the self-learning example. Added to `meta.json` sidebar. - `llm-sdk/generate-text.mdx` + `stream-text.mdx` — add `mcpServers` and `reasoningEffort` to the params examples and cross-link. - `providers/anthropic.mdx` — new sections for MCP (2025-11-20 shape + beta header attached automatically) and Extended thinking (adaptive on 4.6/4.7, budget_tokens on older; per-request override semantics). - `providers/openai.mdx` — new section explaining MCP/reasoning routes through /v1/responses, including the `store: false` semantics. - `packages/llm-sdk/README.md` — short mention with link to the guide. Verification: typecheck + build still clean on llm-sdk and the fallback-demo example. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
The Anthropic SDK's `client.beta.messages.create(params, options?)`
takes `betas` inside the `BetaMessageCreateParams` object — the second
argument is `RequestOptions` and has no `betas` field. The previous
patch passed `{ betas }` as the second arg, which the SDK silently
ignored, meaning the `anthropic-beta: mcp-client-2025-11-20` header
was never attached and MCP server passthrough would fail at runtime.
Spread `betas` into the params object instead. Same fix for the
`.stream()` call path.
Verified against the installed SDK type definition:
resources/beta/messages/messages.d.ts:2248
betas?: Array<BetaAPI.AnthropicBeta>;
(inside MessageCreateParamsBase, not RequestOptions)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`runtime.generate()` previously consumed the stream's `done` event but
dropped its `usage` payload before constructing the `GenerateResult`,
so callers had no way to read token counts off a non-streaming call.
This blocked usage-based credit billing for any consumer of
`runtime.response()` (which wraps `generate()` and is the recommended
shape for one-shot MCP + reasoning + schema calls).
- Capture `event.usage` from the `done` event and pass it through
`GenerateResultData.usage` as `{ promptTokens, completionTokens,
totalTokens }` — matches `GenerateTextResult.usage` from the modern
generateText() path.
- New `result.usage` getter on `GenerateResult`. Optional; undefined
if the underlying provider didn't emit usage (rare).
- `runtime.response()` returns `{ text, toolCalls, usage }` instead of
`{ text, toolCalls }` — purely additive.
- Docs updated.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…g on capable models
Two real bugs caught by live smoke testing against the API.
1. **OpenAI: `reasoning.effort` only valid on reasoning models.**
Sending `reasoning: { effort: "high" }` to `gpt-4o` returns
`400 Unsupported parameter: 'reasoning.effort' is not supported with
this model`. The Responses API only accepts `reasoning` on
o-series / gpt-5.x. Gate via the existing `isOpenAIReasoningModel`
helper — if the active model isn't a reasoning model, silently drop
the `reasoning` block and warn the caller. The Responses API call
still goes through (it's needed for MCP/structured-output anyway);
the model just doesn't reason.
2. **Anthropic Claude 4.6/4.7: adaptive shape with effort on output_config.**
The API tells you the exact shape itself:
> "thinking.type.enabled" is not supported for this model.
> Use "thinking.type.adaptive" and "output_config.effort" to control
> thinking behavior.
Earlier code emitted `thinking: { type: "adaptive", effort }` (the
shape some pre-release docs described). Live API rejects that with
`thinking.adaptive.effort: Extra inputs are not permitted`.
Correct shape:
`thinking: { type: "adaptive" }` (no effort knob)
`output_config: { effort, format: {...} }` (effort lives on
output_config alongside format)
Older Claude (≤4.5) still wants `{ type: "enabled", budget_tokens }`
and rejects `adaptive`. Both paths now correct.
`toAnthropicThinking` now returns a struct (`thinking` block + optional
`outputConfigEffort`) so the adapter can splice each piece into the
right place. The adapter merges `outputConfigEffort` into `output_config`
alongside the `format` field.
Verified live (May 2026):
- POST /response on gpt-5.2 → returns valid JSON, usage{}
- POST /response/claude (dead OpenAI primary → Anthropic fallback) on
claude-opus-4-7 → returns valid JSON
Demo updated to use gpt-5.2 (reasoning-capable) and adds a
/response/claude route exercising the Anthropic fallback path.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two doc updates following live API verification:
- `structured-output.mdx` + `providers/anthropic.mdx`: Claude 4.6/4.7
adaptive thinking takes NO `effort` field inside `thinking`. Effort
moves to `output_config.effort` as a sibling of `output_config.format`.
Sending the previously-documented `{ type: "adaptive", effort }` to
the live API returns 400. Updated tables and prose.
- OpenAI Responses reasoning is now correctly flagged as "reasoning
models only" — gpt-4o silently drops it with a warning.
Also adds `/response/claude` to the fallback-demo route list (the
forced-Anthropic smoke route already added in the previous commit).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…sponse() release Incrementing the beta counter on the 2.5.1 line. Matches the existing @yourgpt/copilot-sdk@2.5.1-beta.1 numbering. Contents are additive on top of @yourgpt/llm-sdk@2.5.1-beta.0: - MCP server passthrough (OpenAI Responses + Anthropic mcp-client-2025-11-20) - reasoningEffort per-request param (live-verified shapes for both providers) - AudioPart content type - runtime.response() ergonomic wrapper with usage in the return Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds two missing per-request config fields —
mcpServersandreasoningEffort— plus aruntime.response()ergonomic wrapper, by extending the existing pipeline from #93 rather than forking a parallel surface.Supersedes #98. That PR also targets this gap but reinvents structured-output builders (
text.format,output_config) that already live inadapters/base.ts, and introduces a nested-shape regression on Anthropic'soutput_configthat would causeJSON.parsefailures when the fallback chain hits Claude. This PR uses the existing sanitizers from #93 directly so those bugs disappear by construction.Driving use case: the
yourgpt-core-apisrefactor/fallback-models-chainingbranch's self-learning extractor (ChatbotJobController.js:1268-1285) needs MCP-knowledgebase + JSON-schema + high reasoning in a single fallback-chain call.What's in
Adapter changes
shouldUseResponsesApi()now also fires whenmcpServersorreasoningEffortis set, routing through/v1/responses(the only OpenAI endpoint that accepts these fields). Function tools +tools[type=mcp]entries merged;reasoningblock emitted;store: falsefor one-shot semantics.tool_searchmeta-tool no longer injected when no native tools are wired. User role is"user"— not"developer"— for user prompts (fixes the bug from feat(llm-sdk): Responses API — MCP tools, reasoning & structured output #98).buildRequestOptions()now returns{options, messages, betas}. When MCP is set, emits themcp-client-2025-11-20two-array shape (mcp_servers+ separatetools[type=mcp_toolset]forallowedTools), routed throughclient.beta.messages.*so the SDK attaches the beta header.reasoningEffortmaps viatoAnthropicThinking()— adaptive thinking on Claude 4.6/4.7, explicitbudget_tokenson older models.retryableErrors: () => trueskip past them naturally.New translators in
adapters/base.ts(alongside existing structured-output sanitizers)toOpenAIResponsesMcpTools— Responses-APItools[type=mcp]entriestoAnthropicMcp— splits intomcp_servers+tools[mcp_toolset], hoistsAuthorization: Bearer …→ bareauthorization_tokentoOpenAIReasoning—{ effort, summary: "auto" }withbudgetTokens → effortcoerciontoAnthropicThinking— adaptive for 4.6/4.7,budget_tokensfor oldertoGeminiThinkingBudget,toXAIReasoningEffort— staged for the follow-up local-MCP workSurface plumbing —
RequestLLMConfig(adapter),ChatRequest.config(server),DoGenerateParams(modern provider),GenerateTextParams/StreamTextParams(top-level),LLMConfig— all gainmcpServers?+reasoningEffort?. Forwarded throughgenerate-text.tsandstream-text.tstodoGenerate()/doStream().New
AudioPartcontent type inUserContentPartunion (passthrough wiring deferred — type is non-breaking).runtime.response()— thin wrapper overgenerate(). Signature:{ prompt, systemPrompt?, mcpServers?, reasoningEffort?, responseFormat?, maxTokens?, temperature? } → { text, toolCalls }.Demo —
examples/fallback-demoaddsPOST /responseexercising OpenAI → Anthropic fallback with KB MCP +reasoningEffort: "high"+ FAQ JSON schema, mirroring the production self-learning case.Docs (
docs(llm-sdk)commit)llm-sdk/structured-output.mdx— new "MCP servers" and "Reasoning effort" sections with per-provider mapping tablesllm-sdk/response.mdx— full reference for the wrapperllm-sdk/generate-text.mdx+stream-text.mdx— params updatedproviders/anthropic.mdx— MCP + Extended thinking sectionsproviders/openai.mdx— Responses-API routing sectionpackages/llm-sdk/README.md— short overview blockmeta.jsonsidebar updatedDeferred to follow-up PR
providers/*/provider.ts)doGenerate()accepts the new params in its type but doesn't read them yet — only the legacy adapter path (whichRuntimeuses) honors them today.AudioPartruntime passthrough in OpenAI/Google adapters — type added, wiring mechanical, isolated to a separate PR.Compatibility
AudioPartwidensUserContentPartunion — exhaustiveswitch(part.type)consumers need one new casestore: falsenow sent explicitly on/v1/responsescalls (was implicittruebefore); intentional for one-shot semanticsTest plan
pnpm --filter @yourgpt/llm-sdk typecheckcleanpnpm --filter @yourgpt/llm-sdk buildcleantsc --noEmitclean inexamples/fallback-demoPOST /responseagainst the fallback-demo with real OPENAI_API_KEY / ANTHROPIC_API_KEY and an MCP endpointyourgpt-core-apisrefactor/fallback-models-chainingself-learning extractor once the SDK version is bumpedRelated
responseFormatsanitizers across providers)🤖 Generated with Claude Code