Code review fixes#10
Merged
Merged
Conversation
When a model requested a tool that was not registered, no tool result was produced and an empty user message was sent back, so the model kept asking for the tool forever. The agent now responds with an explicit error tool result, which lets the model recover or explain the failure. The schema-validation failure path used `continue` on the inner tool loop, so after recording the error it kept scanning remaining tools and could execute a duplicate-named tool anyway. Tool lookup is now separated from execution so each tool-use block is handled exactly once. Adds regression tests for the unknown-tool error result, the validation-failure path (handler must not run), and the feedback callback return-type check. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Batch retrieval was unusable: the accumulator variable was overwritten by each choice and then appended to itself ($content .= $content), so every retrieved batch result came back doubled or garbled. Content parts are now accumulated into a separate variable, handling both string and typed-part message content. Cost reporting ignored the cached prompt token discount, overstating input cost for cached requests. The decoder now bills cached_tokens at the model's cached input rate. Also removes a dead always-true null check flagged by PHPStan. Adds the first test coverage for the batch client (create/retrieve lifecycle including the failure and expiry paths), decoder pricing, and embedding batching (order preservation across chunks - the subject of two recent production fixes). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Cache reads are input tokens, but their cost was added to the output price bucket. Totals were unaffected; the input/output split reported by getInputPriceUsd()/getOutputPriceUsd() was wrong whenever prompt caching was active, which corrupts any per-direction cost accounting downstream. Also restructures output_config assembly so reasoning effort and structured output compose cleanly (fixes a PHPStan nullCoalesce.offset error on a fixed-shape array). Adds the first test coverage for the Anthropic batch client and for decoder pricing including cache write/read rates. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The project goal is a universal API where models are interchangeable, but
several Gemini paths broke that promise - the same request that worked on
Anthropic/OpenAI failed or misbehaved on Gemini:
- Multi-turn tool use sent a synthesized function_response name
("function_<id>") that never matched the original function_call name,
which is how Gemini correlates them. The encoder now maps tool-use IDs
back to function names while encoding, keeping the universal ID-based
API unchanged for callers.
- Multiple tools were sent as separate tools entries, which Gemini
rejects; all function declarations now go into a single entry.
- Tool result payloads leaked the library's internal {class, data}
serialization format onto the wire; they are now converted to plain
JSON. Function responses use role "user" per the current API contract.
- ReasoningEffort was unusable on Gemini 2.x: the encoder always sent
thinkingLevel, which only 3.x accepts ("Thinking level is not supported
for this model"). Models now declare supportsThinkingLevel() and 2.x
efforts are translated to a thinkingBudget, so the same reasoningConfig
works across model generations.
- Streamed text was reconstructed as one part per chunk, so getLastText()
returned only the final chunk. Consecutive parts are now merged to match
the non-streaming response shape, and blockIndex identifies the logical
block as it does for other providers.
- PDF and array-data content threw on Gemini while working elsewhere;
both are now encoded (inline application/pdf, JSON text).
- Missing candidatesTokenCount no longer fatals on non-safety stops.
The tool-use, streaming, reasoning, and PDF behavior is verified against
the live API by the integration suite.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…ects Persisted conversations silently lost the isError flag on tool results: jsonSerialize() never emitted it and fromJson() never restored it, so replayed error results were treated as successes by the encoders. The serialized format now wraps items with the flag; fromJson() stays backward compatible with the legacy plain-list format. Note: data written by this version cannot be read by older library versions. Other hardening, found during review: - LLMConversation::getLastMessage() on an empty conversation violated its return type with a PHP warning; it now throws UnderflowException. - MarkdownFormatter rendered system messages under a "## User:" heading and fataled on models without configured pricing (null prices). - LLMRequest::withCost() compared against the original instead of the clone - harmless today but a trap for future edits. - LLMMessageText/LLMMessageReasoning $text is now readonly like every sibling property. Adds round-trip tests for the isError flag (including legacy format) and the first MarkdownFormatter coverage. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The retry middleware only acted on HTTP status codes: when a request failed at the network level (connection reset, DNS, timeout) the response was null and the decider gave up immediately. ConnectException is now retried with the same backoff, since LLM providers drop connections routinely. FileCache could corrupt or crash under concurrent use: writes had no lock (interleaved writers produce torn JSON) and a file deleted or corrupted between check and read crashed the caller. Writes now use LOCK_EX and any unreadable or unparseable cache file degrades to a cache miss. Adds the first coverage for the factory middleware stack (retry policy, Retry-After in both numeric and HTTP-date form, cache store/replay, non-2xx exclusion) and for FileCache. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…r tool When old_str spanned multiple lines, the per-line search could never match it, so the duplicate-occurrence error told the LLM "in lines []" - an empty list that gives the model nothing to disambiguate with. Occurrences are now located in the full content and mapped to their starting lines. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The integration suite had quietly stopped working: it pinned retired models (claude-3-5-haiku-20241022, gemini-2.0-flash, an expired OpenRouter alpha model), so every run failed before testing anything. Models are updated to currently available cheap ones, and the stop-sequence test no longer invites the model to emit the stop word in its preamble (the old prompt made the test inherently flaky). New suites close the biggest live-API coverage gaps: - Streaming: accumulated TEXT_DELTAs must equal the final text on all five providers - provider SSE formats drift and unit tests only cover recorded formats. This caught the Gemini chunk-merging bug. - PDF input: a generated fixture must be readable by all three native providers - end-to-end proof of the new Gemini PDF support. - Reasoning: ReasoningEffort on all three providers plus Anthropic-only ReasoningBudget (asserting thinking blocks come back), exercising the temperature=1 and maxTokens>budget constraints. This caught the Gemini 2.x thinkingLevel rejection. .env.example now lists OPENROUTER_API_KEY, which the suite already reads. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
A full audit against the source found that most pages contained at least one example that would not run: nonexistent methods (getTokenUsage, getLastMessage on responses, hasToolCalls, toArray, requestToMarkdown), wrong parameter names (reasoningEffort vs reasoningConfig, handler vs customHttpMiddleware), wrong namespaces (TextEditor tools), a caching guide written against a fictional get/set interface, and capability tables that steered users away from working features (Anthropic batches and extended thinking, Gemini reasoning) or into runtime exceptions (ReasoningBudget with OpenAI). All examples now match the real API surface and were verified against src/. Beyond corrections: - New structured-output guide; the feature was entirely undocumented despite being the most recent flagship addition. Readme gets a section and feature bullet. - mkdocs nav gains the streaming guide, structured-output guide, and examples index, which existed on disk but were unreachable on the site. - Provider matrices updated for Gemini PDF support, the Gemini 2.x/3.x thinking config split, and current model classes (Claude 4.6, GPT-5.4, Gemini 3.x); deprecated OpenRouter slugs and a wrong base URL fixed. - Stop-reason sample outputs normalized to the library enum values. - Caching guide documents that cache keys exclude headers/API keys (dev/test tool, not multi-tenant isolation). Site builds clean with mkdocs --strict using the CI preparation steps. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.