Skip to content

Code review fixes#10

Merged
soukicz merged 9 commits into
masterfrom
code-review-fixes
Jun 12, 2026
Merged

Code review fixes#10
soukicz merged 9 commits into
masterfrom
code-review-fixes

Conversation

@soukicz

@soukicz soukicz commented Jun 12, 2026

Copy link
Copy Markdown
Owner

No description provided.

Bob Soukup and others added 9 commits June 12, 2026 13:50
When a model requested a tool that was not registered, no tool result was
produced and an empty user message was sent back, so the model kept asking
for the tool forever. The agent now responds with an explicit error tool
result, which lets the model recover or explain the failure.

The schema-validation failure path used `continue` on the inner tool loop,
so after recording the error it kept scanning remaining tools and could
execute a duplicate-named tool anyway. Tool lookup is now separated from
execution so each tool-use block is handled exactly once.

Adds regression tests for the unknown-tool error result, the
validation-failure path (handler must not run), and the feedback callback
return-type check.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Batch retrieval was unusable: the accumulator variable was overwritten by
each choice and then appended to itself ($content .= $content), so every
retrieved batch result came back doubled or garbled. Content parts are now
accumulated into a separate variable, handling both string and typed-part
message content.

Cost reporting ignored the cached prompt token discount, overstating input
cost for cached requests. The decoder now bills cached_tokens at the
model's cached input rate.

Also removes a dead always-true null check flagged by PHPStan.

Adds the first test coverage for the batch client (create/retrieve
lifecycle including the failure and expiry paths), decoder pricing, and
embedding batching (order preservation across chunks - the subject of two
recent production fixes).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Cache reads are input tokens, but their cost was added to the output price
bucket. Totals were unaffected; the input/output split reported by
getInputPriceUsd()/getOutputPriceUsd() was wrong whenever prompt caching
was active, which corrupts any per-direction cost accounting downstream.

Also restructures output_config assembly so reasoning effort and
structured output compose cleanly (fixes a PHPStan nullCoalesce.offset
error on a fixed-shape array).

Adds the first test coverage for the Anthropic batch client and for
decoder pricing including cache write/read rates.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The project goal is a universal API where models are interchangeable, but
several Gemini paths broke that promise - the same request that worked on
Anthropic/OpenAI failed or misbehaved on Gemini:

- Multi-turn tool use sent a synthesized function_response name
  ("function_<id>") that never matched the original function_call name,
  which is how Gemini correlates them. The encoder now maps tool-use IDs
  back to function names while encoding, keeping the universal ID-based
  API unchanged for callers.
- Multiple tools were sent as separate tools entries, which Gemini
  rejects; all function declarations now go into a single entry.
- Tool result payloads leaked the library's internal {class, data}
  serialization format onto the wire; they are now converted to plain
  JSON. Function responses use role "user" per the current API contract.
- ReasoningEffort was unusable on Gemini 2.x: the encoder always sent
  thinkingLevel, which only 3.x accepts ("Thinking level is not supported
  for this model"). Models now declare supportsThinkingLevel() and 2.x
  efforts are translated to a thinkingBudget, so the same reasoningConfig
  works across model generations.
- Streamed text was reconstructed as one part per chunk, so getLastText()
  returned only the final chunk. Consecutive parts are now merged to match
  the non-streaming response shape, and blockIndex identifies the logical
  block as it does for other providers.
- PDF and array-data content threw on Gemini while working elsewhere;
  both are now encoded (inline application/pdf, JSON text).
- Missing candidatesTokenCount no longer fatals on non-safety stops.

The tool-use, streaming, reasoning, and PDF behavior is verified against
the live API by the integration suite.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…ects

Persisted conversations silently lost the isError flag on tool results:
jsonSerialize() never emitted it and fromJson() never restored it, so
replayed error results were treated as successes by the encoders. The
serialized format now wraps items with the flag; fromJson() stays
backward compatible with the legacy plain-list format. Note: data written
by this version cannot be read by older library versions.

Other hardening, found during review:
- LLMConversation::getLastMessage() on an empty conversation violated its
  return type with a PHP warning; it now throws UnderflowException.
- MarkdownFormatter rendered system messages under a "## User:" heading
  and fataled on models without configured pricing (null prices).
- LLMRequest::withCost() compared against the original instead of the
  clone - harmless today but a trap for future edits.
- LLMMessageText/LLMMessageReasoning $text is now readonly like every
  sibling property.

Adds round-trip tests for the isError flag (including legacy format) and
the first MarkdownFormatter coverage.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The retry middleware only acted on HTTP status codes: when a request
failed at the network level (connection reset, DNS, timeout) the response
was null and the decider gave up immediately. ConnectException is now
retried with the same backoff, since LLM providers drop connections
routinely.

FileCache could corrupt or crash under concurrent use: writes had no lock
(interleaved writers produce torn JSON) and a file deleted or corrupted
between check and read crashed the caller. Writes now use LOCK_EX and any
unreadable or unparseable cache file degrades to a cache miss.

Adds the first coverage for the factory middleware stack (retry policy,
Retry-After in both numeric and HTTP-date form, cache store/replay,
non-2xx exclusion) and for FileCache.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…r tool

When old_str spanned multiple lines, the per-line search could never match
it, so the duplicate-occurrence error told the LLM "in lines []" - an
empty list that gives the model nothing to disambiguate with. Occurrences
are now located in the full content and mapped to their starting lines.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The integration suite had quietly stopped working: it pinned retired
models (claude-3-5-haiku-20241022, gemini-2.0-flash, an expired OpenRouter
alpha model), so every run failed before testing anything. Models are
updated to currently available cheap ones, and the stop-sequence test no
longer invites the model to emit the stop word in its preamble (the old
prompt made the test inherently flaky).

New suites close the biggest live-API coverage gaps:
- Streaming: accumulated TEXT_DELTAs must equal the final text on all
  five providers - provider SSE formats drift and unit tests only cover
  recorded formats. This caught the Gemini chunk-merging bug.
- PDF input: a generated fixture must be readable by all three native
  providers - end-to-end proof of the new Gemini PDF support.
- Reasoning: ReasoningEffort on all three providers plus Anthropic-only
  ReasoningBudget (asserting thinking blocks come back), exercising the
  temperature=1 and maxTokens>budget constraints. This caught the Gemini
  2.x thinkingLevel rejection.

.env.example now lists OPENROUTER_API_KEY, which the suite already reads.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
A full audit against the source found that most pages contained at least
one example that would not run: nonexistent methods (getTokenUsage,
getLastMessage on responses, hasToolCalls, toArray, requestToMarkdown),
wrong parameter names (reasoningEffort vs reasoningConfig, handler vs
customHttpMiddleware), wrong namespaces (TextEditor tools), a caching
guide written against a fictional get/set interface, and capability
tables that steered users away from working features (Anthropic batches
and extended thinking, Gemini reasoning) or into runtime exceptions
(ReasoningBudget with OpenAI). All examples now match the real API
surface and were verified against src/.

Beyond corrections:
- New structured-output guide; the feature was entirely undocumented
  despite being the most recent flagship addition. Readme gets a section
  and feature bullet.
- mkdocs nav gains the streaming guide, structured-output guide, and
  examples index, which existed on disk but were unreachable on the site.
- Provider matrices updated for Gemini PDF support, the Gemini 2.x/3.x
  thinking config split, and current model classes (Claude 4.6, GPT-5.4,
  Gemini 3.x); deprecated OpenRouter slugs and a wrong base URL fixed.
- Stop-reason sample outputs normalized to the library enum values.
- Caching guide documents that cache keys exclude headers/API keys
  (dev/test tool, not multi-tenant isolation).

Site builds clean with mkdocs --strict using the CI preparation steps.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@soukicz soukicz merged commit f699a31 into master Jun 12, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant