Constrained structured generation forces a fixed item count for unbounded `array` schemas → fabricated items + `tokenBudgetExceeded`

## Summary

When generating structured output against a schema containing an **unbounded array** (an `array` with no `minItems`/`maxItems`), the MLX constrained decoder emits a **fixed, pre-computed number of items** instead of letting the model decide the array length. For an unbounded array the forced count is `min(16, maximumResponseTokens / 32)`, and the model is made to generate *exactly* that many items.

Two consequences:
1. **Fabrication / padding** — the array always contains the forced count of items regardless of the input, so the model invents or repeats entries to fill the slots (and it can never produce an empty array).
2. **Token-budget exhaustion** — a schema with several unbounded arrays forces 16 items in *each*, which runs the total token budget to zero before the JSON can close, throwing `ConstrainedGenerationError.tokenBudgetExceeded`. Raising `maximumResponseTokens` does not help (see below).

## Where (source)

`Sources/AnyLanguageModel/Shared/StructuredGeneration.swift` — `generateArray(...)` (≈ lines 445–480):

```swift
// arrayDefaultCountDivisor = 32, arrayDefaultCountMax = 16
let budgetBasedCount = backend.totalTokenBudget / Self.arrayDefaultCountDivisor
let defaultCount = max(1, min(Self.arrayDefaultCountMax, budgetBasedCount))
// ...
// when the schema has no minItems/maxItems:
count = defaultCount
// ...
for index in 0 ..< count {                 // exactly `count` items, always
    output += try await generateNode(node.items)
    if index < count - 1 { output += try await emit(",") }
}
```

There is no path for the model to terminate the array early (emit `]`) or to produce **0** items. The count is fixed up front. `minItems`/`maxItems` only change *which* fixed number is forced — they don't enable content-driven, variable length.

Related: each free-string field is capped at `totalTokenBudget / 16` (`freeStringTokenBudgetDivisor = 16`), so a larger budget also lets each forced-but-contentless string ramble further before being cut off.

## Minimal reproduction

A schema that is just an object with one unbounded string array, and a prompt whose input only supports ~1 item:

```swift
// Schema: { "type": "object",
//           "properties": { "keywords": { "type": "array", "items": { "type": "string" } } },
//           "required": ["keywords"] }
// (built as a DynamicGenerationSchema with a single array property, no minItems/maxItems)

let response = try await session.respond(
    to: "Extract the keywords mentioned in: 'The quick brown fox.'",
    generating: /* Generable bound to the schema above */ .self,
    options: GenerationOptions(maximumResponseTokens: 4096)
)
print(response.content)   // jsonString
```

**Observed** (reproduced with an MLX 4B-class instruct model): the `keywords` array contains **exactly 16 items** for a one-keyword input — a relevant value or two, then repeated/padded copies, with the first free string often rambling up to its per-field cap because the model has nothing left to emit. Shape:

```json
{ "keywords": ["fox", "fox", "fox", "fox", ... ] }   // 16 forced items for a ~1-item input
```

## Impact / why bumping the budget doesn't help

- The per-array count caps at **16** once `maximumResponseTokens ≥ 512` (`512/32 = 16`), so raising the budget does **not** shorten the arrays.
- It only raises the per-field free-string cap (`budget/16`), letting each contentless field ramble longer — so larger budgets consume *more* tokens, not fewer.
- Empirically, a schema with ~19 unbounded arrays throws `tokenBudgetExceeded` at **both** `maximumResponseTokens = 8192` (~10 min) **and** `32768` (~57 min) — the larger budget just churns much longer before the same failure.

## Expected behavior

The model should control array length: emit a **content-driven** number of items (including **0**), terminating the array (`]`) when appropriate, rather than being forced to a fixed `min(16, budget/32)`. For unbounded arrays, the decoder should allow a model-emitted stop and an empty array, instead of padding to a fixed count.

## Environment

- AnyLanguageModel `main` (revision `701d7e61…`), MLX backend, Apple Silicon (macOS / iOS), 4B-class 4-bit instruct model.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Constrained structured generation forces a fixed item count for unbounded `array` schemas → fabricated items + `tokenBudgetExceeded` #160

Summary

Where (source)

Minimal reproduction

Impact / why bumping the budget doesn't help

Expected behavior

Environment

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Constrained structured generation forces a fixed item count for unbounded array schemas → fabricated items + tokenBudgetExceeded #160

Description

Summary

Where (source)

Minimal reproduction

Impact / why bumping the budget doesn't help

Expected behavior

Environment

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Constrained structured generation forces a fixed item count for unbounded `array` schemas → fabricated items + `tokenBudgetExceeded` #160