F4b-Anthropic — native structured output via forced tool_use by fxspeiser · Pull Request #104 · fxspeiser/crosscheck-agent

fxspeiser · 2026-06-07T21:29:53Z

Summary

Third and final slice of the F4b family (#102 OpenAI, #103 Gemini, this PR Anthropic). When `SendArgs.jsonSchema` is set, the Anthropic adapter forces the model to emit its result through a `tool_use` block — closing the universal defense-in-depth coverage.

Base branch: `feature/f4b-gemini-native-structured` (#103) which itself stacks on `feature/f4b-native-structured-output` (#102). Merge in order #102 → #103 → this.

What lands

`src/providers/anthropic.ts` — build side

`AnthropicRequestBody` gains optional `tools[]` and `tool_choice`
New exported `ANTHROPIC_STRUCTURED_TOOL_NAME = "structured_output"` (same name on both wire sides for deterministic match)
`buildAnthropicRequest` accepts `opts.jsonSchema`. When set, body grows a single-tool spec with the schema as `input_schema` and `tool_choice` pinning that tool
No schema translation — Anthropic accepts JSON Schema directly (unlike Gemini's OpenAPI-subset)

`src/providers/anthropic.ts` — parse side

`parseAnthropicResponse` now distinguishes block types in the `content` array
`tool_use` blocks whose `name === ANTHROPIC_STRUCTURED_TOOL_NAME` have their `input` field captured
When a matching tool_use block is present, its JSON-stringified input replaces the concatenated text payload — downstream `extractJson + validateSchema` works unchanged
Non-matching tool_use blocks (e.g. worker_tools' fetch/verify ReAct) flow through untouched
Plain text responses keep working — baseline regression guard tested

Tests (+9)

build (4): tools + tool_choice attached when schema present; omitted in baseline; no schema translation (rich JSON Schema verbatim including `$schema` and `additionalProperties: false`); JSON-serializable round trip
parse (5): tool_use input → JSON-stringified text; tool_use trumps prose text block (forced-tool semantics); only the structured_output tool name is captured (other tool names fall through to text path); plain text baseline; array values stringify correctly

Scoreboard

TS: 1,325 / 1,325 (+9 over F4b-Gemini — native structured output via responseSchema + JSON Schema translator #103's 1,316)
typecheck + build: clean

F4b coverage now complete

Provider	Mechanism	PR
OpenAI	`response_format: {type: "json_schema", strict: true}`	#102
Gemini	`generationConfig.{responseSchema, responseMimeType}`	#103
Anthropic	`tools[] + tool_choice: forced tool_use`	this PR
Universal	schema-in-prompt + tolerant parser (F4c)	main

Every supported provider has a native channel. `SendArgs.jsonSchema` is the single switch the caller flips; each adapter normalises to its own native shape; the universal parse path catches any miss.

Test plan

`npm test` — 75 files, 1,325 tests passing
`npm run build` — clean

🤖 Generated with Claude Code

…_schema). Defense-in-depth complement to F4c (tolerant JSON parser) + F4a (raised pick token budget) + F4d (scoring_errors[] in pick envelopes). When the provider/model combo supports native structured output, requestStructured now constrains the response shape at the API level instead of relying on the model obeying "return JSON only" — which is what failed for gpt-5 in the 2026-06-07 incident (prose-prefixed JSON). What lands src/providers/types.ts: SendArgs gains `jsonSchema?: Record<string, unknown>`. Documented as optional — adapters that don't support native structured output silently ignore the field, and the schema-in-prompt + tolerant parser path (F4c) remains the universal fallback. Mirror of OpenAI's json_schema spec; downstream adapters normalise to their own provider's native shape. src/providers/openai-compatible.ts: - OpenAIRequestBody extended with optional `response_format` of shape {type: "json_schema", json_schema: {name, schema, strict}}. - buildOpenAICompatibleRequest accepts `jsonSchema` and sets response_format when supportsNativeJsonSchema is true. - NEW supportsNativeJsonSchema(provider, model): conservative whitelist. Returns true for `openai` + non-o1 models. False for o1-* (which historically rejected response_format on chat/completions). False for xai/mistral/groq/deepseek (we haven't validated those endpoints; follow-up PRs can add each as the live API is verified). - NEW exported OPENAI_COMPAT_NATIVE_STRUCTURED set (currently {"openai"}). - sendOpenAICompatible threads args.jsonSchema into the builder. src/core/structured.ts: - askOne opts gains `jsonSchema?` and passes through to provider.send. - RequestStructuredOptions gains `useNativeStructured?: boolean` (default true). Operators can flip this off for A/B comparison. - requestStructured passes the schema down on every attempt (including the validation-failure retry — defense-in-depth doesn't degrade across retries). What does NOT change - Anthropic + Gemini adapters: untouched. They have their own native structured-output mechanisms (tool_use input_schema for Anthropic; responseSchema / responseMimeType for Gemini) but each is a separate surgical change with its own translation layer. Tagged as F4b follow-ups; this PR is intentionally OpenAI-only so the incident's primary symptom (gpt-5 prose-prefix JSON) is fixed without bundling the larger Gemini/Anthropic translations. - Schema-in-prompt path: still rides on every requestStructured call. Providers that don't support native mode rely on it; providers that do get belt-and-suspenders. - Tolerant parser (F4c): still the safety net for any miss. Native unit tests (+17) test/providers/openai-native-json.test.ts (12): builder (7): openai + gpt-4o-mini → response_format set with strict=true; openai + gpt-5 → set (reasoning model, not o1 family); openai + o1-mini → NOT set; openai + o1 → NOT set; deepseek → NOT set; xai → NOT set; openai without jsonSchema → response_format undefined (baseline regression guard). supportsNativeJsonSchema (5): openai chat models → true; openai o1 family → false; non-whitelist providers → false; provider name lookup case-insensitive; whitelist set has exactly one entry currently. test/core/structured-native.test.ts (5): - default (useNativeStructured unset) → schema passes through to Provider.send - useNativeStructured: true → passes through - useNativeStructured: false → NOT sent - schema-in-prompt still appended on every call (defense-in-depth keeps the prose path) - validation-failure retry: jsonSchema rides on every attempt Scoreboard TS: 1,305 / 1,305 typecheck + build: clean Why this is the last F4 PR F4d (scoring_errors[]) + F4c (tolerant parser) + F4a (token budget) shipped via #97 and address the regression's user-visible symptoms. F4b is defense-in-depth — when the model itself constrains its output shape via the API rather than via prompt-following, we don't depend on the model's discipline. Pulling Anthropic + Gemini into the same surface is a separate, larger PR each (different native mechanisms; non-trivial schema translation). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…Type. Second slice of the F4b family (after #102 shipped OpenAI). Same shape: when SendArgs.jsonSchema is set, the Gemini adapter wires generationConfig.responseMimeType="application/json" plus a responseSchema translated from the caller's JSON Schema into Gemini's OpenAPI-subset Schema proto. What lands src/providers/gemini.ts: - GeminiRequestBody.generationConfig gains optional responseMimeType ("application/json") and responseSchema (Record<string,unknown>). - buildGeminiRequest accepts opts.jsonSchema. When supplied, body gets responseMimeType + responseSchema = jsonSchemaToGeminiSchema(...). - sendGemini threads args.jsonSchema into the builder. NEW jsonSchemaToGeminiSchema translator: - Upper-cases `type` (STRING/NUMBER/INTEGER/BOOLEAN/ARRAY/OBJECT) — Gemini's Schema proto rejects lowercase JSON Schema type names. - Recurses into properties + items. - Passes through Gemini-supported keywords: description, enum, format, nullable, required, minItems, maxItems, minimum, maximum. - Handles type-arrays (["string","null"]) → STRING + nullable:true. - Drops $schema and additionalProperties (Gemini-unsupported). - Never throws on degenerate input; degraded translation paths emit a relaxed superset and rely on F4c's tolerant parser + downstream validateSchema as the safety net. test/providers/gemini-native-json.test.ts (11 tests): builder (3): responseSchema attached when jsonSchema supplied; omitted in baseline; body remains JSON-serializable. translator (8): primitive type uppercasing; recursion into properties + items; passthrough of required/enum/format/description; type-array collapse to nullable; $schema and additionalProperties dropped; deeply nested schemas (3 levels); degenerate input doesn't throw. Scoreboard TS: 1,316 / 1,316 (+11 over PR #102's 1,305) typecheck + build: clean Defense-in-depth chain now covers OpenAI via response_format: json_schema (PR #102) Gemini via responseSchema + responseMimeType (this PR) Anthropic via tool_use input_schema (follow-up) Universal schema-in-prompt + tolerant parser (F4c) (already in main) The translation layer is intentionally permissive: when a passed schema uses constructs Gemini's Schema proto doesn't support, we strip the offending keyword and emit a looser shape. The model may then return a slightly richer-than-intended object, which validateSchema catches at the parse/validation boundary. Bias toward "native output mostly works" + parse-side defense rather than "fail the call because the schema can't be perfectly translated." Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Third and final slice of the F4b family (#102 OpenAI, #103 Gemini, this PR Anthropic). When SendArgs.jsonSchema is set, the Anthropic adapter forces the model to emit its result through a tool_use block — closing the universal defense-in-depth coverage. What lands src/providers/anthropic.ts: Request wiring (build): - AnthropicRequestBody gains optional tools[] and tool_choice fields. Tools is a single-element array; tool_choice pins the model to call exactly that tool. - NEW exported constant ANTHROPIC_STRUCTURED_TOOL_NAME = "structured_output". Same name on both sides of the wire so the parser can match deterministically. - buildAnthropicRequest accepts opts.jsonSchema. When set, the body grows a tools entry with the schema as input_schema and a tool_choice forcing that tool. Anthropic accepts JSON Schema directly, so unlike Gemini there is NO translation layer — the schema flows through verbatim. Response wiring (parse): - parseAnthropicResponse now distinguishes block types in the content array. tool_use blocks whose name matches ANTHROPIC_STRUCTURED_TOOL_NAME have their `input` field captured separately from text blocks. - When a matching tool_use block is present, its JSON-stringified input replaces the concatenated text payload. This keeps the downstream extractJson + validateSchema path entirely unaware of which transport carried the structured value — same parse → same validation → same envelope shape. - Non-matching tool_use blocks (e.g. worker_tools' fetch/verify in a different ReAct context) flow through untouched; only the structured_output tool name triggers the surface change. - Plain text responses keep working unchanged — baseline regression guard tested. src/providers/anthropic.ts (sendAnthropic): Threads args.jsonSchema into the builder. Tests (+9) test/providers/anthropic-native-json.test.ts: build (4): tools + tool_choice attached on jsonSchema; omitted otherwise; no schema translation (rich JSON Schema flows verbatim into input_schema, including $schema and additionalProperties:false); JSON-serializable round trip. parse (5): tool_use input → JSON-stringified text; tool_use trumps prose text block (forced-tool semantics); only the structured_output tool name is captured (other tool names fall through); plain text still works; array values JSON-stringify correctly. Scoreboard TS: 1,325 / 1,325 (+9 over #103's 1,316) typecheck + build: clean F4b coverage complete OpenAI response_format: {type: "json_schema", strict: true} PR #102 Gemini generationConfig.{responseSchema, responseMimeType} PR #103 Anthropic tools[] + tool_choice: forced tool_use THIS PR Universal schema-in-prompt + tolerant parser (F4c) main Every supported provider now has a native channel for structured output. SendArgs.jsonSchema is the single switch the caller flips; each adapter normalises to its own native shape and the universal parse path catches any miss. Schema-in-prompt + F4c's tolerant parser remain the safety net for providers we don't yet have native mode wired for (xai/mistral/groq/deepseek under openai-compat). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

fxspeiser and others added 3 commits June 7, 2026 17:11

fxspeiser force-pushed the feature/f4b-gemini-native-structured branch from 47fa549 to 74a5c70 Compare June 7, 2026 22:59

fxspeiser deleted the branch feature/f4b-gemini-native-structured June 7, 2026 23:00

fxspeiser closed this Jun 7, 2026

fxspeiser mentioned this pull request Jun 7, 2026

F4b-Anthropic — native structured output via forced tool_use #107

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

F4b-Anthropic — native structured output via forced tool_use#104

F4b-Anthropic — native structured output via forced tool_use#104
fxspeiser wants to merge 3 commits into
feature/f4b-gemini-native-structuredfrom
feature/f4b-anthropic-native-structured

fxspeiser commented Jun 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

fxspeiser commented Jun 7, 2026

Summary

What lands

`src/providers/anthropic.ts` — build side

`src/providers/anthropic.ts` — parse side

Tests (+9)

Scoreboard

F4b coverage now complete

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant