Skip to content

F4b-Anthropic — native structured output via forced tool_use#104

Closed
fxspeiser wants to merge 3 commits into
feature/f4b-gemini-native-structuredfrom
feature/f4b-anthropic-native-structured
Closed

F4b-Anthropic — native structured output via forced tool_use#104
fxspeiser wants to merge 3 commits into
feature/f4b-gemini-native-structuredfrom
feature/f4b-anthropic-native-structured

Conversation

@fxspeiser

Copy link
Copy Markdown
Owner

Summary

Third and final slice of the F4b family (#102 OpenAI, #103 Gemini, this PR Anthropic). When `SendArgs.jsonSchema` is set, the Anthropic adapter forces the model to emit its result through a `tool_use` block — closing the universal defense-in-depth coverage.

Base branch: `feature/f4b-gemini-native-structured` (#103) which itself stacks on `feature/f4b-native-structured-output` (#102). Merge in order #102#103 → this.

What lands

`src/providers/anthropic.ts` — build side

  • `AnthropicRequestBody` gains optional `tools[]` and `tool_choice`
  • New exported `ANTHROPIC_STRUCTURED_TOOL_NAME = "structured_output"` (same name on both wire sides for deterministic match)
  • `buildAnthropicRequest` accepts `opts.jsonSchema`. When set, body grows a single-tool spec with the schema as `input_schema` and `tool_choice` pinning that tool
  • No schema translation — Anthropic accepts JSON Schema directly (unlike Gemini's OpenAPI-subset)

`src/providers/anthropic.ts` — parse side

  • `parseAnthropicResponse` now distinguishes block types in the `content` array
  • `tool_use` blocks whose `name === ANTHROPIC_STRUCTURED_TOOL_NAME` have their `input` field captured
  • When a matching tool_use block is present, its JSON-stringified input replaces the concatenated text payload — downstream `extractJson + validateSchema` works unchanged
  • Non-matching tool_use blocks (e.g. worker_tools' fetch/verify ReAct) flow through untouched
  • Plain text responses keep working — baseline regression guard tested

Tests (+9)

  • build (4): tools + tool_choice attached when schema present; omitted in baseline; no schema translation (rich JSON Schema verbatim including `$schema` and `additionalProperties: false`); JSON-serializable round trip
  • parse (5): tool_use input → JSON-stringified text; tool_use trumps prose text block (forced-tool semantics); only the structured_output tool name is captured (other tool names fall through to text path); plain text baseline; array values stringify correctly

Scoreboard

F4b coverage now complete

Provider Mechanism PR
OpenAI `response_format: {type: "json_schema", strict: true}` #102
Gemini `generationConfig.{responseSchema, responseMimeType}` #103
Anthropic `tools[] + tool_choice: forced tool_use` this PR
Universal schema-in-prompt + tolerant parser (F4c) main

Every supported provider has a native channel. `SendArgs.jsonSchema` is the single switch the caller flips; each adapter normalises to its own native shape; the universal parse path catches any miss.

Test plan

  • `npm test` — 75 files, 1,325 tests passing
  • `npm run build` — clean

🤖 Generated with Claude Code

fxspeiser and others added 3 commits June 7, 2026 17:11
…_schema).

Defense-in-depth complement to F4c (tolerant JSON parser) + F4a (raised
pick token budget) + F4d (scoring_errors[] in pick envelopes). When the
provider/model combo supports native structured output, requestStructured
now constrains the response shape at the API level instead of relying on
the model obeying "return JSON only" — which is what failed for gpt-5 in
the 2026-06-07 incident (prose-prefixed JSON).

What lands

  src/providers/types.ts:
    SendArgs gains `jsonSchema?: Record<string, unknown>`. Documented as
    optional — adapters that don't support native structured output
    silently ignore the field, and the schema-in-prompt + tolerant
    parser path (F4c) remains the universal fallback. Mirror of OpenAI's
    json_schema spec; downstream adapters normalise to their own
    provider's native shape.

  src/providers/openai-compatible.ts:
    - OpenAIRequestBody extended with optional `response_format` of
      shape {type: "json_schema", json_schema: {name, schema, strict}}.
    - buildOpenAICompatibleRequest accepts `jsonSchema` and sets
      response_format when supportsNativeJsonSchema is true.
    - NEW supportsNativeJsonSchema(provider, model): conservative
      whitelist. Returns true for `openai` + non-o1 models. False for
      o1-* (which historically rejected response_format on
      chat/completions). False for xai/mistral/groq/deepseek (we haven't
      validated those endpoints; follow-up PRs can add each as the live
      API is verified).
    - NEW exported OPENAI_COMPAT_NATIVE_STRUCTURED set (currently {"openai"}).
    - sendOpenAICompatible threads args.jsonSchema into the builder.

  src/core/structured.ts:
    - askOne opts gains `jsonSchema?` and passes through to provider.send.
    - RequestStructuredOptions gains `useNativeStructured?: boolean`
      (default true). Operators can flip this off for A/B comparison.
    - requestStructured passes the schema down on every attempt
      (including the validation-failure retry — defense-in-depth doesn't
      degrade across retries).

What does NOT change

  - Anthropic + Gemini adapters: untouched. They have their own native
    structured-output mechanisms (tool_use input_schema for Anthropic;
    responseSchema / responseMimeType for Gemini) but each is a separate
    surgical change with its own translation layer. Tagged as F4b
    follow-ups; this PR is intentionally OpenAI-only so the incident's
    primary symptom (gpt-5 prose-prefix JSON) is fixed without bundling
    the larger Gemini/Anthropic translations.

  - Schema-in-prompt path: still rides on every requestStructured call.
    Providers that don't support native mode rely on it; providers that
    do get belt-and-suspenders.

  - Tolerant parser (F4c): still the safety net for any miss.

Native unit tests (+17)

  test/providers/openai-native-json.test.ts (12):
    builder (7): openai + gpt-4o-mini → response_format set with
      strict=true; openai + gpt-5 → set (reasoning model, not o1
      family); openai + o1-mini → NOT set; openai + o1 → NOT set;
      deepseek → NOT set; xai → NOT set; openai without jsonSchema →
      response_format undefined (baseline regression guard).
    supportsNativeJsonSchema (5): openai chat models → true; openai
      o1 family → false; non-whitelist providers → false; provider
      name lookup case-insensitive; whitelist set has exactly one
      entry currently.

  test/core/structured-native.test.ts (5):
    - default (useNativeStructured unset) → schema passes through to
      Provider.send
    - useNativeStructured: true → passes through
    - useNativeStructured: false → NOT sent
    - schema-in-prompt still appended on every call (defense-in-depth
      keeps the prose path)
    - validation-failure retry: jsonSchema rides on every attempt

Scoreboard
  TS: 1,305 / 1,305
  typecheck + build: clean

Why this is the last F4 PR
  F4d (scoring_errors[]) + F4c (tolerant parser) + F4a (token budget)
  shipped via #97 and address the regression's user-visible symptoms.
  F4b is defense-in-depth — when the model itself constrains its
  output shape via the API rather than via prompt-following, we don't
  depend on the model's discipline. Pulling Anthropic + Gemini into
  the same surface is a separate, larger PR each (different native
  mechanisms; non-trivial schema translation).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…Type.

Second slice of the F4b family (after #102 shipped OpenAI). Same
shape: when SendArgs.jsonSchema is set, the Gemini adapter wires
generationConfig.responseMimeType="application/json" plus a
responseSchema translated from the caller's JSON Schema into Gemini's
OpenAPI-subset Schema proto.

What lands

  src/providers/gemini.ts:
    - GeminiRequestBody.generationConfig gains optional responseMimeType
      ("application/json") and responseSchema (Record<string,unknown>).
    - buildGeminiRequest accepts opts.jsonSchema. When supplied, body
      gets responseMimeType + responseSchema = jsonSchemaToGeminiSchema(...).
    - sendGemini threads args.jsonSchema into the builder.

    NEW jsonSchemaToGeminiSchema translator:
      - Upper-cases `type` (STRING/NUMBER/INTEGER/BOOLEAN/ARRAY/OBJECT)
        — Gemini's Schema proto rejects lowercase JSON Schema type names.
      - Recurses into properties + items.
      - Passes through Gemini-supported keywords: description, enum,
        format, nullable, required, minItems, maxItems, minimum, maximum.
      - Handles type-arrays (["string","null"]) → STRING + nullable:true.
      - Drops $schema and additionalProperties (Gemini-unsupported).
      - Never throws on degenerate input; degraded translation paths
        emit a relaxed superset and rely on F4c's tolerant parser +
        downstream validateSchema as the safety net.

  test/providers/gemini-native-json.test.ts (11 tests):
    builder (3): responseSchema attached when jsonSchema supplied;
      omitted in baseline; body remains JSON-serializable.
    translator (8): primitive type uppercasing; recursion into
      properties + items; passthrough of required/enum/format/description;
      type-array collapse to nullable; $schema and additionalProperties
      dropped; deeply nested schemas (3 levels); degenerate input
      doesn't throw.

Scoreboard
  TS: 1,316 / 1,316 (+11 over PR #102's 1,305)
  typecheck + build: clean

Defense-in-depth chain now covers
  OpenAI    via response_format: json_schema           (PR #102)
  Gemini    via responseSchema + responseMimeType      (this PR)
  Anthropic via tool_use input_schema                  (follow-up)
  Universal schema-in-prompt + tolerant parser (F4c)   (already in main)

The translation layer is intentionally permissive: when a passed
schema uses constructs Gemini's Schema proto doesn't support, we
strip the offending keyword and emit a looser shape. The model may
then return a slightly richer-than-intended object, which
validateSchema catches at the parse/validation boundary. Bias toward
"native output mostly works" + parse-side defense rather than "fail
the call because the schema can't be perfectly translated."

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Third and final slice of the F4b family (#102 OpenAI, #103 Gemini,
this PR Anthropic). When SendArgs.jsonSchema is set, the Anthropic
adapter forces the model to emit its result through a tool_use
block — closing the universal defense-in-depth coverage.

What lands

  src/providers/anthropic.ts:
    Request wiring (build):
      - AnthropicRequestBody gains optional tools[] and tool_choice
        fields. Tools is a single-element array; tool_choice pins
        the model to call exactly that tool.
      - NEW exported constant ANTHROPIC_STRUCTURED_TOOL_NAME =
        "structured_output". Same name on both sides of the wire so
        the parser can match deterministically.
      - buildAnthropicRequest accepts opts.jsonSchema. When set, the
        body grows a tools entry with the schema as input_schema and
        a tool_choice forcing that tool. Anthropic accepts JSON
        Schema directly, so unlike Gemini there is NO translation
        layer — the schema flows through verbatim.

    Response wiring (parse):
      - parseAnthropicResponse now distinguishes block types in the
        content array. tool_use blocks whose name matches
        ANTHROPIC_STRUCTURED_TOOL_NAME have their `input` field
        captured separately from text blocks.
      - When a matching tool_use block is present, its JSON-stringified
        input replaces the concatenated text payload. This keeps the
        downstream extractJson + validateSchema path entirely
        unaware of which transport carried the structured value —
        same parse → same validation → same envelope shape.
      - Non-matching tool_use blocks (e.g. worker_tools' fetch/verify
        in a different ReAct context) flow through untouched; only
        the structured_output tool name triggers the surface change.
      - Plain text responses keep working unchanged — baseline
        regression guard tested.

  src/providers/anthropic.ts (sendAnthropic):
    Threads args.jsonSchema into the builder.

Tests (+9)

  test/providers/anthropic-native-json.test.ts:
    build (4): tools + tool_choice attached on jsonSchema; omitted
      otherwise; no schema translation (rich JSON Schema flows
      verbatim into input_schema, including $schema and
      additionalProperties:false); JSON-serializable round trip.
    parse (5): tool_use input → JSON-stringified text; tool_use
      trumps prose text block (forced-tool semantics); only the
      structured_output tool name is captured (other tool names
      fall through); plain text still works; array values
      JSON-stringify correctly.

Scoreboard
  TS: 1,325 / 1,325 (+9 over #103's 1,316)
  typecheck + build: clean

F4b coverage complete

  OpenAI    response_format: {type: "json_schema", strict: true}     PR #102
  Gemini    generationConfig.{responseSchema, responseMimeType}      PR #103
  Anthropic tools[] + tool_choice: forced tool_use                   THIS PR
  Universal schema-in-prompt + tolerant parser (F4c)                 main

Every supported provider now has a native channel for structured
output. SendArgs.jsonSchema is the single switch the caller flips;
each adapter normalises to its own native shape and the universal
parse path catches any miss. Schema-in-prompt + F4c's tolerant
parser remain the safety net for providers we don't yet have native
mode wired for (xai/mistral/groq/deepseek under openai-compat).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@fxspeiser fxspeiser force-pushed the feature/f4b-gemini-native-structured branch from 47fa549 to 74a5c70 Compare June 7, 2026 22:59
@fxspeiser fxspeiser deleted the branch feature/f4b-gemini-native-structured June 7, 2026 23:00
@fxspeiser fxspeiser closed this Jun 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant