F4b-Anthropic — native structured output via forced tool_use#104
Closed
fxspeiser wants to merge 3 commits into
Closed
F4b-Anthropic — native structured output via forced tool_use#104fxspeiser wants to merge 3 commits into
fxspeiser wants to merge 3 commits into
Conversation
…_schema).
Defense-in-depth complement to F4c (tolerant JSON parser) + F4a (raised
pick token budget) + F4d (scoring_errors[] in pick envelopes). When the
provider/model combo supports native structured output, requestStructured
now constrains the response shape at the API level instead of relying on
the model obeying "return JSON only" — which is what failed for gpt-5 in
the 2026-06-07 incident (prose-prefixed JSON).
What lands
src/providers/types.ts:
SendArgs gains `jsonSchema?: Record<string, unknown>`. Documented as
optional — adapters that don't support native structured output
silently ignore the field, and the schema-in-prompt + tolerant
parser path (F4c) remains the universal fallback. Mirror of OpenAI's
json_schema spec; downstream adapters normalise to their own
provider's native shape.
src/providers/openai-compatible.ts:
- OpenAIRequestBody extended with optional `response_format` of
shape {type: "json_schema", json_schema: {name, schema, strict}}.
- buildOpenAICompatibleRequest accepts `jsonSchema` and sets
response_format when supportsNativeJsonSchema is true.
- NEW supportsNativeJsonSchema(provider, model): conservative
whitelist. Returns true for `openai` + non-o1 models. False for
o1-* (which historically rejected response_format on
chat/completions). False for xai/mistral/groq/deepseek (we haven't
validated those endpoints; follow-up PRs can add each as the live
API is verified).
- NEW exported OPENAI_COMPAT_NATIVE_STRUCTURED set (currently {"openai"}).
- sendOpenAICompatible threads args.jsonSchema into the builder.
src/core/structured.ts:
- askOne opts gains `jsonSchema?` and passes through to provider.send.
- RequestStructuredOptions gains `useNativeStructured?: boolean`
(default true). Operators can flip this off for A/B comparison.
- requestStructured passes the schema down on every attempt
(including the validation-failure retry — defense-in-depth doesn't
degrade across retries).
What does NOT change
- Anthropic + Gemini adapters: untouched. They have their own native
structured-output mechanisms (tool_use input_schema for Anthropic;
responseSchema / responseMimeType for Gemini) but each is a separate
surgical change with its own translation layer. Tagged as F4b
follow-ups; this PR is intentionally OpenAI-only so the incident's
primary symptom (gpt-5 prose-prefix JSON) is fixed without bundling
the larger Gemini/Anthropic translations.
- Schema-in-prompt path: still rides on every requestStructured call.
Providers that don't support native mode rely on it; providers that
do get belt-and-suspenders.
- Tolerant parser (F4c): still the safety net for any miss.
Native unit tests (+17)
test/providers/openai-native-json.test.ts (12):
builder (7): openai + gpt-4o-mini → response_format set with
strict=true; openai + gpt-5 → set (reasoning model, not o1
family); openai + o1-mini → NOT set; openai + o1 → NOT set;
deepseek → NOT set; xai → NOT set; openai without jsonSchema →
response_format undefined (baseline regression guard).
supportsNativeJsonSchema (5): openai chat models → true; openai
o1 family → false; non-whitelist providers → false; provider
name lookup case-insensitive; whitelist set has exactly one
entry currently.
test/core/structured-native.test.ts (5):
- default (useNativeStructured unset) → schema passes through to
Provider.send
- useNativeStructured: true → passes through
- useNativeStructured: false → NOT sent
- schema-in-prompt still appended on every call (defense-in-depth
keeps the prose path)
- validation-failure retry: jsonSchema rides on every attempt
Scoreboard
TS: 1,305 / 1,305
typecheck + build: clean
Why this is the last F4 PR
F4d (scoring_errors[]) + F4c (tolerant parser) + F4a (token budget)
shipped via #97 and address the regression's user-visible symptoms.
F4b is defense-in-depth — when the model itself constrains its
output shape via the API rather than via prompt-following, we don't
depend on the model's discipline. Pulling Anthropic + Gemini into
the same surface is a separate, larger PR each (different native
mechanisms; non-trivial schema translation).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…Type. Second slice of the F4b family (after #102 shipped OpenAI). Same shape: when SendArgs.jsonSchema is set, the Gemini adapter wires generationConfig.responseMimeType="application/json" plus a responseSchema translated from the caller's JSON Schema into Gemini's OpenAPI-subset Schema proto. What lands src/providers/gemini.ts: - GeminiRequestBody.generationConfig gains optional responseMimeType ("application/json") and responseSchema (Record<string,unknown>). - buildGeminiRequest accepts opts.jsonSchema. When supplied, body gets responseMimeType + responseSchema = jsonSchemaToGeminiSchema(...). - sendGemini threads args.jsonSchema into the builder. NEW jsonSchemaToGeminiSchema translator: - Upper-cases `type` (STRING/NUMBER/INTEGER/BOOLEAN/ARRAY/OBJECT) — Gemini's Schema proto rejects lowercase JSON Schema type names. - Recurses into properties + items. - Passes through Gemini-supported keywords: description, enum, format, nullable, required, minItems, maxItems, minimum, maximum. - Handles type-arrays (["string","null"]) → STRING + nullable:true. - Drops $schema and additionalProperties (Gemini-unsupported). - Never throws on degenerate input; degraded translation paths emit a relaxed superset and rely on F4c's tolerant parser + downstream validateSchema as the safety net. test/providers/gemini-native-json.test.ts (11 tests): builder (3): responseSchema attached when jsonSchema supplied; omitted in baseline; body remains JSON-serializable. translator (8): primitive type uppercasing; recursion into properties + items; passthrough of required/enum/format/description; type-array collapse to nullable; $schema and additionalProperties dropped; deeply nested schemas (3 levels); degenerate input doesn't throw. Scoreboard TS: 1,316 / 1,316 (+11 over PR #102's 1,305) typecheck + build: clean Defense-in-depth chain now covers OpenAI via response_format: json_schema (PR #102) Gemini via responseSchema + responseMimeType (this PR) Anthropic via tool_use input_schema (follow-up) Universal schema-in-prompt + tolerant parser (F4c) (already in main) The translation layer is intentionally permissive: when a passed schema uses constructs Gemini's Schema proto doesn't support, we strip the offending keyword and emit a looser shape. The model may then return a slightly richer-than-intended object, which validateSchema catches at the parse/validation boundary. Bias toward "native output mostly works" + parse-side defense rather than "fail the call because the schema can't be perfectly translated." Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Third and final slice of the F4b family (#102 OpenAI, #103 Gemini, this PR Anthropic). When SendArgs.jsonSchema is set, the Anthropic adapter forces the model to emit its result through a tool_use block — closing the universal defense-in-depth coverage. What lands src/providers/anthropic.ts: Request wiring (build): - AnthropicRequestBody gains optional tools[] and tool_choice fields. Tools is a single-element array; tool_choice pins the model to call exactly that tool. - NEW exported constant ANTHROPIC_STRUCTURED_TOOL_NAME = "structured_output". Same name on both sides of the wire so the parser can match deterministically. - buildAnthropicRequest accepts opts.jsonSchema. When set, the body grows a tools entry with the schema as input_schema and a tool_choice forcing that tool. Anthropic accepts JSON Schema directly, so unlike Gemini there is NO translation layer — the schema flows through verbatim. Response wiring (parse): - parseAnthropicResponse now distinguishes block types in the content array. tool_use blocks whose name matches ANTHROPIC_STRUCTURED_TOOL_NAME have their `input` field captured separately from text blocks. - When a matching tool_use block is present, its JSON-stringified input replaces the concatenated text payload. This keeps the downstream extractJson + validateSchema path entirely unaware of which transport carried the structured value — same parse → same validation → same envelope shape. - Non-matching tool_use blocks (e.g. worker_tools' fetch/verify in a different ReAct context) flow through untouched; only the structured_output tool name triggers the surface change. - Plain text responses keep working unchanged — baseline regression guard tested. src/providers/anthropic.ts (sendAnthropic): Threads args.jsonSchema into the builder. Tests (+9) test/providers/anthropic-native-json.test.ts: build (4): tools + tool_choice attached on jsonSchema; omitted otherwise; no schema translation (rich JSON Schema flows verbatim into input_schema, including $schema and additionalProperties:false); JSON-serializable round trip. parse (5): tool_use input → JSON-stringified text; tool_use trumps prose text block (forced-tool semantics); only the structured_output tool name is captured (other tool names fall through); plain text still works; array values JSON-stringify correctly. Scoreboard TS: 1,325 / 1,325 (+9 over #103's 1,316) typecheck + build: clean F4b coverage complete OpenAI response_format: {type: "json_schema", strict: true} PR #102 Gemini generationConfig.{responseSchema, responseMimeType} PR #103 Anthropic tools[] + tool_choice: forced tool_use THIS PR Universal schema-in-prompt + tolerant parser (F4c) main Every supported provider now has a native channel for structured output. SendArgs.jsonSchema is the single switch the caller flips; each adapter normalises to its own native shape and the universal parse path catches any miss. Schema-in-prompt + F4c's tolerant parser remain the safety net for providers we don't yet have native mode wired for (xai/mistral/groq/deepseek under openai-compat). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
47fa549 to
74a5c70
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Third and final slice of the F4b family (#102 OpenAI, #103 Gemini, this PR Anthropic). When `SendArgs.jsonSchema` is set, the Anthropic adapter forces the model to emit its result through a `tool_use` block — closing the universal defense-in-depth coverage.
Base branch: `feature/f4b-gemini-native-structured` (#103) which itself stacks on `feature/f4b-native-structured-output` (#102). Merge in order #102 → #103 → this.
What lands
`src/providers/anthropic.ts` — build side
`src/providers/anthropic.ts` — parse side
Tests (+9)
Scoreboard
F4b coverage now complete
Every supported provider has a native channel. `SendArgs.jsonSchema` is the single switch the caller flips; each adapter normalises to its own native shape; the universal parse path catches any miss.
Test plan
🤖 Generated with Claude Code