Skip to content

feat(llm-sdk): MCP servers + reasoning effort + runtime.response() (supersedes #98)#99

Open
Sahil5963 wants to merge 7 commits into
betafrom
feat/responses-api-v2
Open

feat(llm-sdk): MCP servers + reasoning effort + runtime.response() (supersedes #98)#99
Sahil5963 wants to merge 7 commits into
betafrom
feat/responses-api-v2

Conversation

@Sahil5963
Copy link
Copy Markdown
Contributor

Summary

Adds two missing per-request config fields — mcpServers and reasoningEffort — plus a runtime.response() ergonomic wrapper, by extending the existing pipeline from #93 rather than forking a parallel surface.

Supersedes #98. That PR also targets this gap but reinvents structured-output builders (text.format, output_config) that already live in adapters/base.ts, and introduces a nested-shape regression on Anthropic's output_config that would cause JSON.parse failures when the fallback chain hits Claude. This PR uses the existing sanitizers from #93 directly so those bugs disappear by construction.

Driving use case: the yourgpt-core-apis refactor/fallback-models-chaining branch's self-learning extractor (ChatbotJobController.js:1268-1285) needs MCP-knowledgebase + JSON-schema + high reasoning in a single fallback-chain call.

What's in

Adapter changes

  • OpenAIshouldUseResponsesApi() now also fires when mcpServers or reasoningEffort is set, routing through /v1/responses (the only OpenAI endpoint that accepts these fields). Function tools + tools[type=mcp] entries merged; reasoning block emitted; store: false for one-shot semantics. tool_search meta-tool no longer injected when no native tools are wired. User role is "user" — not "developer" — for user prompts (fixes the bug from feat(llm-sdk): Responses API — MCP tools, reasoning & structured output #98).
  • AnthropicbuildRequestOptions() now returns {options, messages, betas}. When MCP is set, emits the mcp-client-2025-11-20 two-array shape (mcp_servers + separate tools[type=mcp_toolset] for allowedTools), routed through client.beta.messages.* so the SDK attaches the beta header. reasoningEffort maps via toAnthropicThinking()adaptive thinking on Claude 4.6/4.7, explicit budget_tokens on older models.
  • Non-OpenAI/Azure baseUrls (Google/xAI/OpenRouter share the OpenAI adapter via baseUrl) — throw a clear error when MCP/reasoning is set, so fallback chains with retryableErrors: () => true skip past them naturally.

New translators in adapters/base.ts (alongside existing structured-output sanitizers)

  • toOpenAIResponsesMcpTools — Responses-API tools[type=mcp] entries
  • toAnthropicMcp — splits into mcp_servers + tools[mcp_toolset], hoists Authorization: Bearer … → bare authorization_token
  • toOpenAIReasoning{ effort, summary: "auto" } with budgetTokens → effort coercion
  • toAnthropicThinking — adaptive for 4.6/4.7, budget_tokens for older
  • toGeminiThinkingBudget, toXAIReasoningEffort — staged for the follow-up local-MCP work

Surface plumbingRequestLLMConfig (adapter), ChatRequest.config (server), DoGenerateParams (modern provider), GenerateTextParams / StreamTextParams (top-level), LLMConfig — all gain mcpServers? + reasoningEffort?. Forwarded through generate-text.ts and stream-text.ts to doGenerate() / doStream().

New AudioPart content type in UserContentPart union (passthrough wiring deferred — type is non-breaking).

runtime.response() — thin wrapper over generate(). Signature: { prompt, systemPrompt?, mcpServers?, reasoningEffort?, responseFormat?, maxTokens?, temperature? } → { text, toolCalls }.

Demoexamples/fallback-demo adds POST /response exercising OpenAI → Anthropic fallback with KB MCP + reasoningEffort: "high" + FAQ JSON schema, mirroring the production self-learning case.

Docs (docs(llm-sdk) commit)

  • llm-sdk/structured-output.mdx — new "MCP servers" and "Reasoning effort" sections with per-provider mapping tables
  • New llm-sdk/response.mdx — full reference for the wrapper
  • llm-sdk/generate-text.mdx + stream-text.mdx — params updated
  • providers/anthropic.mdx — MCP + Extended thinking sections
  • providers/openai.mdx — Responses-API routing section
  • packages/llm-sdk/README.md — short overview block
  • meta.json sidebar updated

Deferred to follow-up PR

  1. Local-MCP-execution fallback for Google/xAI/Fireworks/OpenRouter — needs a minimal MCP JSON-RPC client (~200 LOC) + per-adapter tool interception loops (~100 LOC each). Would 3× this PR's size. Today these providers throw cleanly and chains route past them.
  2. Modern provider path (providers/*/provider.ts) doGenerate() accepts the new params in its type but doesn't read them yet — only the legacy adapter path (which Runtime uses) honors them today.
  3. AudioPart runtime passthrough in OpenAI/Google adapters — type added, wiring mechanical, isolated to a separate PR.

Compatibility

  • No runtime breaking changes for existing callers (every new field is optional; behavior changes only when set)
  • TS-level: AudioPart widens UserContentPart union — exhaustive switch(part.type) consumers need one new case
  • Behavior shift: store: false now sent explicitly on /v1/responses calls (was implicit true before); intentional for one-shot semantics

Test plan

  • pnpm --filter @yourgpt/llm-sdk typecheck clean
  • pnpm --filter @yourgpt/llm-sdk build clean
  • tsc --noEmit clean in examples/fallback-demo
  • Live smoke: POST /response against the fallback-demo with real OPENAI_API_KEY / ANTHROPIC_API_KEY and an MCP endpoint
  • Live smoke: equivalent call from yourgpt-core-apis refactor/fallback-models-chaining self-learning extractor once the SDK version is bumped

Related

🤖 Generated with Claude Code

Sahil5963 and others added 2 commits May 13, 2026 16:50
Adds two missing per-request config fields — `mcpServers` and
`reasoningEffort` — that reuse the existing `responseFormat` sanitizers
landed in PR #93 instead of forking a parallel `respond()` surface.

PR #98 also targets this gap but reinvents builders for `text.format`
and `output_config` already implemented in `adapters/base.ts`, and
introduces a nested-shape regression on Anthropic structured output.
This change supersedes #98 by extending the existing pipeline rather
than bypassing it.

Adapter behavior
- OpenAI: `shouldUseResponsesApi()` now also fires when `mcpServers` or
  `reasoningEffort` is set, routing the call through `/v1/responses`
  (where these fields are valid). Tools array merges function tools +
  `tools[type=mcp]` entries from the new helper. `reasoning` block is
  emitted from `toOpenAIReasoning()`. The legacy `tool_search` meta-tool
  is no longer injected when the caller didn't wire deferred tools, so
  MCP-only / reasoning-only requests stay minimal.
- Anthropic: `buildRequestOptions()` returns `betas` alongside `options`.
  When `mcpServers` is set we emit the `mcp-client-2025-11-20` two-array
  shape (`mcp_servers` entries + separate `tools[type=mcp_toolset]`
  filtering) and route through `client.beta.messages.{create,stream}` so
  the SDK attaches the `anthropic-beta` header. `reasoningEffort` maps
  via `toAnthropicThinking()` — adaptive+effort for Claude 4.6/4.7,
  explicit `budget_tokens` for older models.
- OpenAI adapter throws a clear error when `mcpServers` or
  `reasoningEffort` is set on a non-OpenAI/Azure baseUrl (Google/xAI/
  OpenRouter all share this adapter via baseUrl). Fallback chains with
  `retryableErrors: () => true` skip past these providers naturally.

Helpers in `adapters/base.ts` (alongside existing structured-output
sanitizers):
- `toOpenAIResponsesMcpTools` — Responses-API `tools[type=mcp]` entries
- `toAnthropicMcp` — 2025-11-20 split (`mcp_servers` + `mcp_toolset` tools)
  with `Authorization: Bearer …` → bare `authorization_token` mapping
- `toOpenAIReasoning` — `{ effort, summary: "auto" }`
- `toAnthropicThinking` — adaptive for 4.6/4.7, budget_tokens otherwise
- `toGeminiThinkingBudget`, `toXAIReasoningEffort` — for future local-MCP
  fallback work

Surfaces
- `RequestLLMConfig` (adapter), `ChatRequest.config` (server),
  `DoGenerateParams` (modern provider), `GenerateTextParams` (top-level),
  `LLMConfig` — all gain `mcpServers?` + `reasoningEffort?`.
- New `AudioPart` content type added to `UserContentPart` (passthrough
  wiring deferred to a follow-up).
- New `runtime.response({ prompt, mcpServers?, reasoningEffort?,
  responseFormat?, … })` convenience method — thin wrapper over
  `runtime.generate()` for the one-shot prompt-in/text-out case.

Demo
- `examples/fallback-demo` adds `POST /response` exercising the bundle
  across an OpenAI → Anthropic priority chain, mirroring the production
  self-learning extractor (KB MCP + JSON-schema FAQ output + `high`
  reasoning).

Deferred to follow-up PR
- Local-MCP-execution fallback for Google/xAI/Fireworks/OpenRouter
  (needs a minimal embedded MCP JSON-RPC client + per-adapter tool
  interception loop). Today these providers throw clearly; fallback
  chains route past them.

Verification
- `pnpm --filter @yourgpt/llm-sdk typecheck` clean
- `pnpm --filter @yourgpt/llm-sdk build` clean
- `npx tsc --noEmit` in `examples/fallback-demo` clean

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…sponse()

Sibling to the code change in the previous commit. Mirrors the docs
pattern established by PR #93 for `responseFormat`.

- `llm-sdk/structured-output.mdx` — adds "MCP servers" and "Reasoning
  effort" sections with per-provider mapping tables and a brief
  `runtime.response()` pointer.
- New `llm-sdk/response.mdx` — dedicated reference page covering the
  one-shot wrapper, parameters, result shape, when to use vs
  generate()/chat(), provider routing under the hood, and the
  self-learning example. Added to `meta.json` sidebar.
- `llm-sdk/generate-text.mdx` + `stream-text.mdx` — add `mcpServers`
  and `reasoningEffort` to the params examples and cross-link.
- `providers/anthropic.mdx` — new sections for MCP (2025-11-20 shape +
  beta header attached automatically) and Extended thinking (adaptive
  on 4.6/4.7, budget_tokens on older; per-request override semantics).
- `providers/openai.mdx` — new section explaining MCP/reasoning routes
  through /v1/responses, including the `store: false` semantics.
- `packages/llm-sdk/README.md` — short mention with link to the guide.

Verification: typecheck + build still clean on llm-sdk and the
fallback-demo example.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@vercel
Copy link
Copy Markdown

vercel Bot commented May 13, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
copilot-playground Ready Ready Preview, Comment May 13, 2026 11:58am
copilot-sdk-docs Ready Ready Preview, Comment May 13, 2026 11:58am

Request Review

The Anthropic SDK's `client.beta.messages.create(params, options?)`
takes `betas` inside the `BetaMessageCreateParams` object — the second
argument is `RequestOptions` and has no `betas` field. The previous
patch passed `{ betas }` as the second arg, which the SDK silently
ignored, meaning the `anthropic-beta: mcp-client-2025-11-20` header
was never attached and MCP server passthrough would fail at runtime.

Spread `betas` into the params object instead. Same fix for the
`.stream()` call path.

Verified against the installed SDK type definition:
  resources/beta/messages/messages.d.ts:2248
    betas?: Array<BetaAPI.AnthropicBeta>;
  (inside MessageCreateParamsBase, not RequestOptions)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`runtime.generate()` previously consumed the stream's `done` event but
dropped its `usage` payload before constructing the `GenerateResult`,
so callers had no way to read token counts off a non-streaming call.
This blocked usage-based credit billing for any consumer of
`runtime.response()` (which wraps `generate()` and is the recommended
shape for one-shot MCP + reasoning + schema calls).

- Capture `event.usage` from the `done` event and pass it through
  `GenerateResultData.usage` as `{ promptTokens, completionTokens,
  totalTokens }` — matches `GenerateTextResult.usage` from the modern
  generateText() path.
- New `result.usage` getter on `GenerateResult`. Optional; undefined
  if the underlying provider didn't emit usage (rare).
- `runtime.response()` returns `{ text, toolCalls, usage }` instead of
  `{ text, toolCalls }` — purely additive.
- Docs updated.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…g on capable models

Two real bugs caught by live smoke testing against the API.

1. **OpenAI: `reasoning.effort` only valid on reasoning models.**
   Sending `reasoning: { effort: "high" }` to `gpt-4o` returns
   `400 Unsupported parameter: 'reasoning.effort' is not supported with
   this model`. The Responses API only accepts `reasoning` on
   o-series / gpt-5.x. Gate via the existing `isOpenAIReasoningModel`
   helper — if the active model isn't a reasoning model, silently drop
   the `reasoning` block and warn the caller. The Responses API call
   still goes through (it's needed for MCP/structured-output anyway);
   the model just doesn't reason.

2. **Anthropic Claude 4.6/4.7: adaptive shape with effort on output_config.**
   The API tells you the exact shape itself:
     > "thinking.type.enabled" is not supported for this model.
     > Use "thinking.type.adaptive" and "output_config.effort" to control
     > thinking behavior.
   Earlier code emitted `thinking: { type: "adaptive", effort }` (the
   shape some pre-release docs described). Live API rejects that with
   `thinking.adaptive.effort: Extra inputs are not permitted`.
   Correct shape:
     `thinking: { type: "adaptive" }` (no effort knob)
     `output_config: { effort, format: {...} }` (effort lives on
       output_config alongside format)
   Older Claude (≤4.5) still wants `{ type: "enabled", budget_tokens }`
   and rejects `adaptive`. Both paths now correct.

`toAnthropicThinking` now returns a struct (`thinking` block + optional
`outputConfigEffort`) so the adapter can splice each piece into the
right place. The adapter merges `outputConfigEffort` into `output_config`
alongside the `format` field.

Verified live (May 2026):
  - POST /response on gpt-5.2 → returns valid JSON, usage{}
  - POST /response/claude (dead OpenAI primary → Anthropic fallback) on
    claude-opus-4-7 → returns valid JSON

Demo updated to use gpt-5.2 (reasoning-capable) and adds a
/response/claude route exercising the Anthropic fallback path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two doc updates following live API verification:

- `structured-output.mdx` + `providers/anthropic.mdx`: Claude 4.6/4.7
  adaptive thinking takes NO `effort` field inside `thinking`. Effort
  moves to `output_config.effort` as a sibling of `output_config.format`.
  Sending the previously-documented `{ type: "adaptive", effort }` to
  the live API returns 400. Updated tables and prose.
- OpenAI Responses reasoning is now correctly flagged as "reasoning
  models only" — gpt-4o silently drops it with a warning.

Also adds `/response/claude` to the fallback-demo route list (the
forced-Anthropic smoke route already added in the previous commit).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…sponse() release

Incrementing the beta counter on the 2.5.1 line. Matches the existing
@yourgpt/copilot-sdk@2.5.1-beta.1 numbering. Contents are additive on
top of @yourgpt/llm-sdk@2.5.1-beta.0:
  - MCP server passthrough (OpenAI Responses + Anthropic mcp-client-2025-11-20)
  - reasoningEffort per-request param (live-verified shapes for both providers)
  - AudioPart content type
  - runtime.response() ergonomic wrapper with usage in the return

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant