feat(llm-sdk): MCP servers + reasoning effort + runtime.response() (supersedes #98) by Sahil5963 · Pull Request #99 · YourGPT/copilot-sdk

Sahil5963 · 2026-05-13T11:31:43Z

Summary

Adds two missing per-request config fields — mcpServers and reasoningEffort — plus a runtime.response() ergonomic wrapper, by extending the existing pipeline from #93 rather than forking a parallel surface.

Supersedes #98. That PR also targets this gap but reinvents structured-output builders (text.format, output_config) that already live in adapters/base.ts, and introduces a nested-shape regression on Anthropic's output_config that would cause JSON.parse failures when the fallback chain hits Claude. This PR uses the existing sanitizers from #93 directly so those bugs disappear by construction.

Driving use case: the yourgpt-core-apis refactor/fallback-models-chaining branch's self-learning extractor (ChatbotJobController.js:1268-1285) needs MCP-knowledgebase + JSON-schema + high reasoning in a single fallback-chain call.

What's in

Adapter changes

OpenAI — shouldUseResponsesApi() now also fires when mcpServers or reasoningEffort is set, routing through /v1/responses (the only OpenAI endpoint that accepts these fields). Function tools + tools[type=mcp] entries merged; reasoning block emitted; store: false for one-shot semantics. tool_search meta-tool no longer injected when no native tools are wired. User role is "user" — not "developer" — for user prompts (fixes the bug from feat(llm-sdk): Responses API — MCP tools, reasoning & structured output #98).
Anthropic — buildRequestOptions() now returns {options, messages, betas}. When MCP is set, emits the mcp-client-2025-11-20 two-array shape (mcp_servers + separate tools[type=mcp_toolset] for allowedTools), routed through client.beta.messages.* so the SDK attaches the beta header. reasoningEffort maps via toAnthropicThinking() — adaptive thinking on Claude 4.6/4.7, explicit budget_tokens on older models.
Non-OpenAI/Azure baseUrls (Google/xAI/OpenRouter share the OpenAI adapter via baseUrl) — throw a clear error when MCP/reasoning is set, so fallback chains with retryableErrors: () => true skip past them naturally.

New translators in adapters/base.ts (alongside existing structured-output sanitizers)

toOpenAIResponsesMcpTools — Responses-API tools[type=mcp] entries
toAnthropicMcp — splits into mcp_servers + tools[mcp_toolset], hoists Authorization: Bearer … → bare authorization_token
toOpenAIReasoning — { effort, summary: "auto" } with budgetTokens → effort coercion
toAnthropicThinking — adaptive for 4.6/4.7, budget_tokens for older
toGeminiThinkingBudget, toXAIReasoningEffort — staged for the follow-up local-MCP work

Surface plumbing — RequestLLMConfig (adapter), ChatRequest.config (server), DoGenerateParams (modern provider), GenerateTextParams / StreamTextParams (top-level), LLMConfig — all gain mcpServers? + reasoningEffort?. Forwarded through generate-text.ts and stream-text.ts to doGenerate() / doStream().

New AudioPart content type in UserContentPart union (passthrough wiring deferred — type is non-breaking).

runtime.response() — thin wrapper over generate(). Signature: { prompt, systemPrompt?, mcpServers?, reasoningEffort?, responseFormat?, maxTokens?, temperature? } → { text, toolCalls }.

Demo — examples/fallback-demo adds POST /response exercising OpenAI → Anthropic fallback with KB MCP + reasoningEffort: "high" + FAQ JSON schema, mirroring the production self-learning case.

Docs (docs(llm-sdk) commit)

llm-sdk/structured-output.mdx — new "MCP servers" and "Reasoning effort" sections with per-provider mapping tables
New llm-sdk/response.mdx — full reference for the wrapper
llm-sdk/generate-text.mdx + stream-text.mdx — params updated
providers/anthropic.mdx — MCP + Extended thinking sections
providers/openai.mdx — Responses-API routing section
packages/llm-sdk/README.md — short overview block
meta.json sidebar updated

Deferred to follow-up PR

Local-MCP-execution fallback for Google/xAI/Fireworks/OpenRouter — needs a minimal MCP JSON-RPC client (~200 LOC) + per-adapter tool interception loops (~100 LOC each). Would 3× this PR's size. Today these providers throw cleanly and chains route past them.
Modern provider path (providers/*/provider.ts) doGenerate() accepts the new params in its type but doesn't read them yet — only the legacy adapter path (which Runtime uses) honors them today.
AudioPart runtime passthrough in OpenAI/Google adapters — type added, wiring mechanical, isolated to a separate PR.

Compatibility

No runtime breaking changes for existing callers (every new field is optional; behavior changes only when set)
TS-level: AudioPart widens UserContentPart union — exhaustive switch(part.type) consumers need one new case
Behavior shift: store: false now sent explicitly on /v1/responses calls (was implicit true before); intentional for one-shot semantics

Test plan

pnpm --filter @yourgpt/llm-sdk typecheck clean
pnpm --filter @yourgpt/llm-sdk build clean
tsc --noEmit clean in examples/fallback-demo
Live smoke: POST /response against the fallback-demo with real OPENAI_API_KEY / ANTHROPIC_API_KEY and an MCP endpoint
Live smoke: equivalent call from yourgpt-core-apis refactor/fallback-models-chaining self-learning extractor once the SDK version is bumped

Adds two missing per-request config fields — `mcpServers` and `reasoningEffort` — that reuse the existing `responseFormat` sanitizers landed in PR #93 instead of forking a parallel `respond()` surface. PR #98 also targets this gap but reinvents builders for `text.format` and `output_config` already implemented in `adapters/base.ts`, and introduces a nested-shape regression on Anthropic structured output. This change supersedes #98 by extending the existing pipeline rather than bypassing it. Adapter behavior - OpenAI: `shouldUseResponsesApi()` now also fires when `mcpServers` or `reasoningEffort` is set, routing the call through `/v1/responses` (where these fields are valid). Tools array merges function tools + `tools[type=mcp]` entries from the new helper. `reasoning` block is emitted from `toOpenAIReasoning()`. The legacy `tool_search` meta-tool is no longer injected when the caller didn't wire deferred tools, so MCP-only / reasoning-only requests stay minimal. - Anthropic: `buildRequestOptions()` returns `betas` alongside `options`. When `mcpServers` is set we emit the `mcp-client-2025-11-20` two-array shape (`mcp_servers` entries + separate `tools[type=mcp_toolset]` filtering) and route through `client.beta.messages.{create,stream}` so the SDK attaches the `anthropic-beta` header. `reasoningEffort` maps via `toAnthropicThinking()` — adaptive+effort for Claude 4.6/4.7, explicit `budget_tokens` for older models. - OpenAI adapter throws a clear error when `mcpServers` or `reasoningEffort` is set on a non-OpenAI/Azure baseUrl (Google/xAI/ OpenRouter all share this adapter via baseUrl). Fallback chains with `retryableErrors: () => true` skip past these providers naturally. Helpers in `adapters/base.ts` (alongside existing structured-output sanitizers): - `toOpenAIResponsesMcpTools` — Responses-API `tools[type=mcp]` entries - `toAnthropicMcp` — 2025-11-20 split (`mcp_servers` + `mcp_toolset` tools) with `Authorization: Bearer …` → bare `authorization_token` mapping - `toOpenAIReasoning` — `{ effort, summary: "auto" }` - `toAnthropicThinking` — adaptive for 4.6/4.7, budget_tokens otherwise - `toGeminiThinkingBudget`, `toXAIReasoningEffort` — for future local-MCP fallback work Surfaces - `RequestLLMConfig` (adapter), `ChatRequest.config` (server), `DoGenerateParams` (modern provider), `GenerateTextParams` (top-level), `LLMConfig` — all gain `mcpServers?` + `reasoningEffort?`. - New `AudioPart` content type added to `UserContentPart` (passthrough wiring deferred to a follow-up). - New `runtime.response({ prompt, mcpServers?, reasoningEffort?, responseFormat?, … })` convenience method — thin wrapper over `runtime.generate()` for the one-shot prompt-in/text-out case. Demo - `examples/fallback-demo` adds `POST /response` exercising the bundle across an OpenAI → Anthropic priority chain, mirroring the production self-learning extractor (KB MCP + JSON-schema FAQ output + `high` reasoning). Deferred to follow-up PR - Local-MCP-execution fallback for Google/xAI/Fireworks/OpenRouter (needs a minimal embedded MCP JSON-RPC client + per-adapter tool interception loop). Today these providers throw clearly; fallback chains route past them. Verification - `pnpm --filter @yourgpt/llm-sdk typecheck` clean - `pnpm --filter @yourgpt/llm-sdk build` clean - `npx tsc --noEmit` in `examples/fallback-demo` clean Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…sponse() Sibling to the code change in the previous commit. Mirrors the docs pattern established by PR #93 for `responseFormat`. - `llm-sdk/structured-output.mdx` — adds "MCP servers" and "Reasoning effort" sections with per-provider mapping tables and a brief `runtime.response()` pointer. - New `llm-sdk/response.mdx` — dedicated reference page covering the one-shot wrapper, parameters, result shape, when to use vs generate()/chat(), provider routing under the hood, and the self-learning example. Added to `meta.json` sidebar. - `llm-sdk/generate-text.mdx` + `stream-text.mdx` — add `mcpServers` and `reasoningEffort` to the params examples and cross-link. - `providers/anthropic.mdx` — new sections for MCP (2025-11-20 shape + beta header attached automatically) and Extended thinking (adaptive on 4.6/4.7, budget_tokens on older; per-request override semantics). - `providers/openai.mdx` — new section explaining MCP/reasoning routes through /v1/responses, including the `store: false` semantics. - `packages/llm-sdk/README.md` — short mention with link to the guide. Verification: typecheck + build still clean on llm-sdk and the fallback-demo example. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

vercel · 2026-05-13T11:31:48Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
copilot-playground	Ready	Preview, Comment	May 13, 2026 11:58am
copilot-sdk-docs	Ready	Preview, Comment	May 13, 2026 11:58am

The Anthropic SDK's `client.beta.messages.create(params, options?)` takes `betas` inside the `BetaMessageCreateParams` object — the second argument is `RequestOptions` and has no `betas` field. The previous patch passed `{ betas }` as the second arg, which the SDK silently ignored, meaning the `anthropic-beta: mcp-client-2025-11-20` header was never attached and MCP server passthrough would fail at runtime. Spread `betas` into the params object instead. Same fix for the `.stream()` call path. Verified against the installed SDK type definition: resources/beta/messages/messages.d.ts:2248 betas?: Array<BetaAPI.AnthropicBeta>; (inside MessageCreateParamsBase, not RequestOptions) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

`runtime.generate()` previously consumed the stream's `done` event but dropped its `usage` payload before constructing the `GenerateResult`, so callers had no way to read token counts off a non-streaming call. This blocked usage-based credit billing for any consumer of `runtime.response()` (which wraps `generate()` and is the recommended shape for one-shot MCP + reasoning + schema calls). - Capture `event.usage` from the `done` event and pass it through `GenerateResultData.usage` as `{ promptTokens, completionTokens, totalTokens }` — matches `GenerateTextResult.usage` from the modern generateText() path. - New `result.usage` getter on `GenerateResult`. Optional; undefined if the underlying provider didn't emit usage (rare). - `runtime.response()` returns `{ text, toolCalls, usage }` instead of `{ text, toolCalls }` — purely additive. - Docs updated. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…g on capable models Two real bugs caught by live smoke testing against the API. 1. **OpenAI: `reasoning.effort` only valid on reasoning models.** Sending `reasoning: { effort: "high" }` to `gpt-4o` returns `400 Unsupported parameter: 'reasoning.effort' is not supported with this model`. The Responses API only accepts `reasoning` on o-series / gpt-5.x. Gate via the existing `isOpenAIReasoningModel` helper — if the active model isn't a reasoning model, silently drop the `reasoning` block and warn the caller. The Responses API call still goes through (it's needed for MCP/structured-output anyway); the model just doesn't reason. 2. **Anthropic Claude 4.6/4.7: adaptive shape with effort on output_config.** The API tells you the exact shape itself: > "thinking.type.enabled" is not supported for this model. > Use "thinking.type.adaptive" and "output_config.effort" to control > thinking behavior. Earlier code emitted `thinking: { type: "adaptive", effort }` (the shape some pre-release docs described). Live API rejects that with `thinking.adaptive.effort: Extra inputs are not permitted`. Correct shape: `thinking: { type: "adaptive" }` (no effort knob) `output_config: { effort, format: {...} }` (effort lives on output_config alongside format) Older Claude (≤4.5) still wants `{ type: "enabled", budget_tokens }` and rejects `adaptive`. Both paths now correct. `toAnthropicThinking` now returns a struct (`thinking` block + optional `outputConfigEffort`) so the adapter can splice each piece into the right place. The adapter merges `outputConfigEffort` into `output_config` alongside the `format` field. Verified live (May 2026): - POST /response on gpt-5.2 → returns valid JSON, usage{} - POST /response/claude (dead OpenAI primary → Anthropic fallback) on claude-opus-4-7 → returns valid JSON Demo updated to use gpt-5.2 (reasoning-capable) and adds a /response/claude route exercising the Anthropic fallback path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Two doc updates following live API verification: - `structured-output.mdx` + `providers/anthropic.mdx`: Claude 4.6/4.7 adaptive thinking takes NO `effort` field inside `thinking`. Effort moves to `output_config.effort` as a sibling of `output_config.format`. Sending the previously-documented `{ type: "adaptive", effort }` to the live API returns 400. Updated tables and prose. - OpenAI Responses reasoning is now correctly flagged as "reasoning models only" — gpt-4o silently drops it with a warning. Also adds `/response/claude` to the fallback-demo route list (the forced-Anthropic smoke route already added in the previous commit). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…sponse() release Incrementing the beta counter on the 2.5.1 line. Matches the existing @yourgpt/copilot-sdk@2.5.1-beta.1 numbering. Contents are additive on top of @yourgpt/llm-sdk@2.5.1-beta.0: - MCP server passthrough (OpenAI Responses + Anthropic mcp-client-2025-11-20) - reasoningEffort per-request param (live-verified shapes for both providers) - AudioPart content type - runtime.response() ergonomic wrapper with usage in the return Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Sahil5963 and others added 2 commits May 13, 2026 16:50

vercel Bot deployed to Preview – copilot-sdk-docs May 13, 2026 11:32 View deployment

vercel Bot deployed to Preview – copilot-playground May 13, 2026 11:32 View deployment

vercel Bot deployed to Preview – copilot-sdk-docs May 13, 2026 11:38 View deployment

vercel Bot deployed to Preview – copilot-playground May 13, 2026 11:38 View deployment

vercel Bot deployed to Preview – copilot-sdk-docs May 13, 2026 11:41 View deployment

vercel Bot deployed to Preview – copilot-playground May 13, 2026 11:41 View deployment

vercel Bot deployed to Preview – copilot-sdk-docs May 13, 2026 11:51 View deployment

vercel Bot deployed to Preview – copilot-playground May 13, 2026 11:51 View deployment

vercel Bot deployed to Preview – copilot-sdk-docs May 13, 2026 11:52 View deployment

vercel Bot deployed to Preview – copilot-playground May 13, 2026 11:53 View deployment

vercel Bot deployed to Preview – copilot-sdk-docs May 13, 2026 11:57 View deployment

vercel Bot deployed to Preview – copilot-playground May 13, 2026 11:58 View deployment

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(llm-sdk): MCP servers + reasoning effort + runtime.response() (supersedes #98)#99

feat(llm-sdk): MCP servers + reasoning effort + runtime.response() (supersedes #98)#99
Sahil5963 wants to merge 7 commits into
betafrom
feat/responses-api-v2

Sahil5963 commented May 13, 2026

Uh oh!

vercel Bot commented May 13, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Sahil5963 commented May 13, 2026

Summary

What's in

Deferred to follow-up PR

Compatibility

Test plan

Related

Uh oh!

vercel Bot commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

vercel Bot commented May 13, 2026 •

edited

Loading