Skip to content

Constrained structured generation forces a fixed item count for unbounded array schemas → fabricated items + tokenBudgetExceeded #160

@Adi2K

Description

@Adi2K

Summary

When generating structured output against a schema containing an unbounded array (an array with no minItems/maxItems), the MLX constrained decoder emits a fixed, pre-computed number of items instead of letting the model decide the array length. For an unbounded array the forced count is min(16, maximumResponseTokens / 32), and the model is made to generate exactly that many items.

Two consequences:

  1. Fabrication / padding — the array always contains the forced count of items regardless of the input, so the model invents or repeats entries to fill the slots (and it can never produce an empty array).
  2. Token-budget exhaustion — a schema with several unbounded arrays forces 16 items in each, which runs the total token budget to zero before the JSON can close, throwing ConstrainedGenerationError.tokenBudgetExceeded. Raising maximumResponseTokens does not help (see below).

Where (source)

Sources/AnyLanguageModel/Shared/StructuredGeneration.swiftgenerateArray(...) (≈ lines 445–480):

// arrayDefaultCountDivisor = 32, arrayDefaultCountMax = 16
let budgetBasedCount = backend.totalTokenBudget / Self.arrayDefaultCountDivisor
let defaultCount = max(1, min(Self.arrayDefaultCountMax, budgetBasedCount))
// ...
// when the schema has no minItems/maxItems:
count = defaultCount
// ...
for index in 0 ..< count {                 // exactly `count` items, always
    output += try await generateNode(node.items)
    if index < count - 1 { output += try await emit(",") }
}

There is no path for the model to terminate the array early (emit ]) or to produce 0 items. The count is fixed up front. minItems/maxItems only change which fixed number is forced — they don't enable content-driven, variable length.

Related: each free-string field is capped at totalTokenBudget / 16 (freeStringTokenBudgetDivisor = 16), so a larger budget also lets each forced-but-contentless string ramble further before being cut off.

Minimal reproduction

A schema that is just an object with one unbounded string array, and a prompt whose input only supports ~1 item:

// Schema: { "type": "object",
//           "properties": { "keywords": { "type": "array", "items": { "type": "string" } } },
//           "required": ["keywords"] }
// (built as a DynamicGenerationSchema with a single array property, no minItems/maxItems)

let response = try await session.respond(
    to: "Extract the keywords mentioned in: 'The quick brown fox.'",
    generating: /* Generable bound to the schema above */ .self,
    options: GenerationOptions(maximumResponseTokens: 4096)
)
print(response.content)   // jsonString

Observed (reproduced with an MLX 4B-class instruct model): the keywords array contains exactly 16 items for a one-keyword input — a relevant value or two, then repeated/padded copies, with the first free string often rambling up to its per-field cap because the model has nothing left to emit. Shape:

{ "keywords": ["fox", "fox", "fox", "fox", ... ] }   // 16 forced items for a ~1-item input

Impact / why bumping the budget doesn't help

  • The per-array count caps at 16 once maximumResponseTokens ≥ 512 (512/32 = 16), so raising the budget does not shorten the arrays.
  • It only raises the per-field free-string cap (budget/16), letting each contentless field ramble longer — so larger budgets consume more tokens, not fewer.
  • Empirically, a schema with ~19 unbounded arrays throws tokenBudgetExceeded at both maximumResponseTokens = 8192 (~10 min) and 32768 (~57 min) — the larger budget just churns much longer before the same failure.

Expected behavior

The model should control array length: emit a content-driven number of items (including 0), terminating the array (]) when appropriate, rather than being forced to a fixed min(16, budget/32). For unbounded arrays, the decoder should allow a model-emitted stop and an empty array, instead of padding to a fixed count.

Environment

  • AnyLanguageModel main (revision 701d7e61…), MLX backend, Apple Silicon (macOS / iOS), 4B-class 4-bit instruct model.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions