Conversation
@workglow/cli
@workglow/ai
@workglow/browser-control
@workglow/indexeddb
@workglow/javascript
@workglow/job-queue
@workglow/knowledge-base
@workglow/mcp
@workglow/storage
@workglow/task-graph
@workglow/tasks
@workglow/util
workglow
@workglow/anthropic
@workglow/bun-webview
@workglow/cactus
@workglow/chrome-ai
@workglow/electron
@workglow/google-gemini
@workglow/huggingface-inference
@workglow/huggingface-transformers
@workglow/node-llama-cpp
@workglow/ollama
@workglow/openai
@workglow/playwright
@workglow/postgres
@workglow/sqlite
@workglow/supabase
@workglow/tf-mediapipe
commit: |
Coverage Report
File CoverageNo changed files found. |
… (#520) * feat(chrome-ai): probe-gate tool-use and json-mode capabilities (C1) Chrome's `LanguageModel.create` did not universally accept `tools` or `responseConstraint` options, yet `inferWebBrowserCapabilities` always advertised `tool-use` + `json-mode` for `chrome-prompt`/`gemini-nano`. This caused the dispatcher to route json-mode and tool-use tasks to the WebBrowser provider on Chrome builds that would reject them at runtime. Adds a one-shot capability probe (`probeWebBrowserCapabilities`) that smoke-tests `factory.create({ responseConstraint })` and `factory.create({ tools })`, with module-level coalescing so concurrent callers share one probe round-trip. `WebBrowserProvider` kicks the probe off in its constructor; until it resolves, `inferCapabilities` returns the conservative subset (no `json-mode`, no `tool-use`). Tests cover all four probe outcome combinations, coalescing, and pre/post-ready inference. https://claude.ai/code/session_013PqntVCfKgKmJ5396w7BPC * feat(chrome-ai): StructuredGeneration accepts sessionId with schema fingerprint (H1) The structured-generation run-fn dropped `sessionId` from its signature, so successive calls with the same id always rebuilt the underlying Chrome `LanguageModel` even though the surface supports session reuse. This matched the pre-session-cache behaviour rather than the post-cache shape adopted by `WebBrowser_Chat`. Accept `sessionId` as the 6th positional parameter, mirroring chat. Cache reuse is gated on a canonical schema fingerprint stored on the cache entry — a schema change forces a rebuild because Chrome's `responseConstraint` state is bound at first-prompt and re-feeding a different schema is undefined behaviour. On stream failure the entry is dropped + destroyed via the same `cacheWritten` / `dropChromeSessionEntry` dance as chat. `ChromeChatSessionState` grows an optional `schemaFingerprint` field. https://claude.ai/code/session_013PqntVCfKgKmJ5396w7BPC * feat(chrome-ai): ToolCalling accepts sessionId with tools fingerprint (H2) `WebBrowser_ToolCalling` ignored both `outputSchema` and `sessionId` — the 5th and 6th positional parameters of the run-fn contract — so multi-turn tool-calling rebuilt the `LanguageModel` each turn. Accept both parameters. Cache reuse keys on a sorted-tool-name fingerprint (Chrome binds `tools` at `create()` time and can't hot-swap them per turn). We only cache when the orchestrator drives via `input.messages` because Chrome's tool-calling loop appends tool-result turns to the session's internal state opaquely — reusing a cached session across a turn the orchestrator hasn't fully replayed would double-feed those results. Bare-prompt callers always rebuild. On any error we drop + destroy the cache entry: Chrome's internal state may be mid-tool-call-cycle. `ChromeChatSessionState` grows an optional `toolsFingerprint` field. https://claude.ai/code/session_013PqntVCfKgKmJ5396w7BPC * fix(chrome-ai): validate tool-call arguments against tool inputSchema (H3) Chrome's `LanguageModel` invokes our stub `execute` callback with whatever arguments the model emits. `filterValidToolCalls` only checked the tool name, so a hallucinated arg shape was forwarded to the orchestrator verbatim — leaving the downstream tool runner to either fail or silently produce garbage. Compile each tool's `inputSchema` once via `compileSchema` (cached by name) before the stream starts. After streaming we validate every captured call's `input` against its tool's validator; failures are dropped + warn-logged in the same shape as `filterValidToolCalls`'s existing name-only warning. Tools whose `inputSchema` fails to compile emit a single warning and fall through to the name-only check rather than failing the whole run. https://claude.ai/code/session_013PqntVCfKgKmJ5396w7BPC * fix(chrome-ai): validate StructuredGeneration final JSON against schema (H4) Chrome's `responseConstraint` is best-effort, not a hard guarantee — the model can still produce a partial or shape-mismatched payload. The existing fallback (`parsePartialJson(...) ?? {}`) handed downstream code an empty object cast to the output type, indistinguishable from a legitimate empty payload. Worse, that path emitted a `finish` event, so `StructuredGenerationTask`'s retry loop had no signal to retry on. Compile the validator once via `compileSchema`. After streaming: - If neither `JSON.parse` nor `parsePartialJson` produces a value: throw `PermanentJobError("Chrome AI returned unparseable JSON")`. - If validation fails: throw with the first validator error message. - Only on success do we emit `finish` and write the cache entry. `StructuredGenerationTask.executeStream` catches per-attempt errors and retries, so throwing here is the correct signal — no `finish` so the loop knows this attempt failed. Schema compile failures are also surfaced as `PermanentJobError` (so retries don't burn through quota on a malformed schema). https://claude.ai/code/session_013PqntVCfKgKmJ5396w7BPC * Potential fix for pull request finding Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> --------- Co-authored-by: Claude <noreply@anthropic.com> Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
HIGH-priority review findings — fix plansFollowing an end-of-day automated review of PRs updated in the last 24h. 5 HIGH findings, 0 CRITICAL. Recommend addressing H1–H4 before merge; H5 is a dead-code cleanup that can be deferred. Full detail per finding below. H1 —
|
|
@copilot Some of this was recently added to main branch, but not all. It needs to be rebased and checked carefully. main has some work on sessions and making sure that streaming does not use snapshot. |
Rechecked this against main and updated the Chrome AI paths in 789c0e4. The branch now uses globalThis-safe Chrome API lookup, text-delta-only streaming fallback, chat history fingerprinting for session reuse, and fresh sessions for tool-calling / structured generation so the main-branch session and streaming concerns are covered. Warning Firewall rules blocked me from connecting to one or more addresses (expand for details)I tried to connect to the following addresses, but was blocked by firewall rules:
If you need me to access, download, or install something from one of these locations, you can either:
|
There was a problem hiding this comment.
Pull request overview
This PR expands the Chrome Built-in AI provider to support capability probing, multi-turn chat/session handling, structured JSON generation (json-mode), and tool-calling (tool-use). It also introduces persistent, machine-readable error codes for FetchUrlTask failures (propagated through the job queue/client), plus accompanying test refactors/utilities.
Changes:
- Chrome AI provider: add capability probing + new run-fns for chat, structured generation, and tool calling; refactor streaming helpers and session management.
- FetchUrl: introduce
FetchUrlErrorCode+ helpers, persisterror_codereliably, and update SafeFetch/FetchUrlTask logic + tests. - Tests/tooling: add cross-runner fake-timer helper and expand provider + fetch error-code coverage.
Reviewed changes
Copilot reviewed 36 out of 38 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| scripts/lib/preload-credentials.ts | Add warning log when credential unlock/hydration fails. |
| providers/chrome-ai/tsconfig.json | Include dom-chromium-ai ambient types for Chrome AI globals. |
| providers/chrome-ai/src/ai/WebBrowserProvider.ts | Add capability probe lifecycle (ready()), gated inference, and session disposal hook. |
| providers/chrome-ai/src/ai/index.ts | Expand _testOnly exports to cover new run-fns, probing, sessions, and chat helpers. |
| providers/chrome-ai/src/ai/common/WebBrowser_ToolCalling.ts | New tool-calling run-fn bridging Chrome’s internal tool loop to Workglow toolCalls protocol. |
| providers/chrome-ai/src/ai/common/WebBrowser_TextTranslation.ts | Refactor Translator access via getChromeGlobal + add download-progress monitoring. |
| providers/chrome-ai/src/ai/common/WebBrowser_TextSummary.ts | Add download-progress monitoring, signal plumbing, and normalize tl;dr → tldr. |
| providers/chrome-ai/src/ai/common/WebBrowser_TextRewriter.ts | Add download-progress monitoring and signal plumbing. |
| providers/chrome-ai/src/ai/common/WebBrowser_TextLanguageDetection.ts | Add download-progress monitoring, signal plumbing, and defensive mapping of detection results. |
| providers/chrome-ai/src/ai/common/WebBrowser_TextGeneration.ts | Add download-progress monitoring, signal plumbing, and unify delta streaming helper usage. |
| providers/chrome-ai/src/ai/common/WebBrowser_StructuredGeneration.ts | New json-mode structured generation run-fn using responseConstraint with partial JSON streaming. |
| providers/chrome-ai/src/ai/common/WebBrowser_Sessions.ts | New LanguageModel session cache utilities keyed by AiChatTask sessionId. |
| providers/chrome-ai/src/ai/common/WebBrowser_JobRunFns.ts | Register new capability sets and add unified text.generation dispatcher (chat vs one-shot). |
| providers/chrome-ai/src/ai/common/WebBrowser_ChromeHelpers.ts | Add global lookup helper, download monitor, canonical stringify, and adjust streaming snapshot→delta conversion. |
| providers/chrome-ai/src/ai/common/WebBrowser_ChromeAI.d.ts | Remove deprecated in-repo ambient Chrome AI type declarations (replaced by @types/dom-chromium-ai). |
| providers/chrome-ai/src/ai/common/WebBrowser_ChatHistory.ts | New helpers to map Workglow chat history into Chrome initialPrompts + fingerprinting. |
| providers/chrome-ai/src/ai/common/WebBrowser_Chat.ts | New multi-turn chat run-fn with session caching and failure-path cache hygiene. |
| providers/chrome-ai/src/ai/common/WebBrowser_CapabilitySets.ts | Add json-mode and tool-use capability sets. |
| providers/chrome-ai/src/ai/common/WebBrowser_CapabilityProbe.ts | New module-level cached probe for json/tool support via LanguageModel.create() smoke tests. |
| providers/chrome-ai/src/ai/common/WebBrowser_Capabilities.ts | Gate json-mode/tool-use capability inference behind probe results (+ async helper). |
| providers/chrome-ai/package.json | Add @types/dom-chromium-ai dev dependency. |
| packages/test/src/test/util/WorkerManager.idle.test.ts | Refactor timer advancement to shared helper for Vitest/Bun compatibility. |
| packages/test/src/test/task/FetchTask.test.ts | Assert FetchUrlErrorCode propagation and persisted errorCode behavior through queue. |
| packages/test/src/test/resource/DisposeStrategy.test.ts | Refactor timer advancement to shared helper. |
| packages/test/src/test/helpers/advanceFakeTimers.ts | New helper to advance timers portably across Vitest/Bun (with optional microtask flush). |
| packages/test/src/test/browser-control/SequentialTasks.test.ts | Refactor timer advancement to shared helper. |
| packages/test/src/test/ai-provider/WebBrowserProvider.test.ts | Add extensive tests for probing, unified dispatcher, structured generation, tool calling, sessions, and helpers. |
| packages/test/src/test/ai-provider/DownloadModelAbort.integration.test.ts | Refactor model setup + improve abort-error classification helper. |
| packages/tasks/src/util/SafeFetch.ts | Replace generic errors with structured FetchUrl error codes for SSRF/redirect failures. |
| packages/tasks/src/util/SafeFetch.server.ts | Replace generic errors with structured FetchUrl error codes for DNS/SSRF/redirect failures. |
| packages/tasks/src/task/FetchUrlTask.ts | Emit structured FetchUrl errors, classify parse failures, and route HTTP errors through code-bearing helpers. |
| packages/tasks/src/task/FetchUrlJobError.ts | New centralized FetchUrl error-code definitions + constructors/wrappers. |
| packages/tasks/src/common.ts | Export FetchUrlJobError and reorder/re-export image + media filter utilities. |
| packages/task-graph/src/task/TaskError.ts | Propagate underlying JobError.code onto JobTaskFailedError.code when available. |
| packages/job-queue/src/job/JobQueueWorker.ts | Persist error_code via jobErrorPersistedCode() and emit consistent event payloads. |
| packages/job-queue/src/job/JobQueueClient.ts | Rehydrate FETCH_* persisted codes into retryable/permanent errors and preserve code. |
| packages/job-queue/src/job/JobError.ts | Add JobError.code and helper to derive persisted error codes. |
| bun.lock | Lockfile updates for new dependency and trustedDependencies ordering. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| } else { | ||
| previousSnapshot = value; | ||
| yield { type: "snapshot", data: buildFallbackOutput(value) }; | ||
| accumulatedText += value; |
| // Cache hygiene: only reuse the cached session if its watermark exactly | ||
| // matches the history we'd otherwise re-feed. Out-of-sync caches (task | ||
| // reset mid-conversation, retroactive edits to `messages`) are torn down | ||
| // and rebuilt. | ||
| let cached = sessionId ? getChromeSession(sessionId) : undefined; | ||
| if (sessionId !== undefined && cached && cached.historyFingerprint !== historyFingerprint) { |
| if (err instanceof Error && err.name === "AbortSignalJobError") { | ||
| throw err; |
|
@claude Some of this was recently added to main branch, but not all. It needs to be rebased and checked carefully. main has some work on chrome sessions too and making sure that chrome ai streaming does not send snapshot when they aren't actually snapshots. Double check docs at google. Text generation and translation may be different. |
…totype pollution, snapshot reset (#528) * fix(chrome-ai): repair WebBrowser_Chat session-cache reuse (HIGH) The previous fingerprint-based cache key recomputed the fingerprint from the *prior* history on every turn, so turn 2's cache lookup always missed and rebuilt the session from scratch. Switch to a messageCount high-water mark: cache hits when cached.messageCount === lastUserIdx (i.e., the session has already heard everything before the trailing user message). After a successful turn the session has heard messages.length + 1 messages (history + new assistant reply), which we record for the next call. * fix(chrome-ai): sanitize tool-call arguments to prevent prototype pollution (HIGH) Many tool input schemas don't set `additionalProperties: false`, so a hallucinated `{__proto__: {polluted: true}, ok: true}` payload would pass validation and propagate through to consumers. Add a `sanitizeToolArgs` helper that recursively rebuilds the value with a plain Object.prototype, dropping `__proto__`, `constructor`, and `prototype` keys at every depth. Sanitize BEFORE validation so the validator sees the cleaned object. * fix(chrome-ai): reset accumulator on non-prefix snapshots (HIGH) `snapshotStreamToTextDeltas` was concatenating instead of resetting when a snapshot was not a prefix-extension of the previously accumulated text. For self-correction snapshots (Chrome replacing, not extending, prior text) this corrupted consumer state with duplicated content like `"hello worldhello sailor"`. Reset the accumulator to the new snapshot and emit it as the delta so consumers treat the non-prefix boundary as a replace, matching the documented streaming-convention exception. Also add `snapshotStreamToTextDeltas` to `_testOnly` so the helper is testable from the test package, and add coverage for: - HIGH-1: chat cache reuse and rebuild-on-divergence - HIGH-2: __proto__/constructor/prototype scrubbing (top-level + recursive) - HIGH-3: prefix-extend, non-prefix-reset, identical-snapshot semantics Also fix a stale comment in the existing tool-calling lifecycle test that claimed cache reuse — tool-calling intentionally rebuilds per turn. * docs(chrome-ai): align test comment with actual shrink-rebuild behavior
…apability probe Integrates the chrome-ai branch (7 commits — PR #514/#520/#528) with main's parallel chrome-ai work (model.download, model.dispose, ApiBinding): - Chat-session cache keyed by AiChatTask sessionId, with messageCount high-water mark for reuse (replaces fingerprint-based invalidation) - StructuredGeneration + ToolCalling run-fns gated by an async capability probe; pre-probe state advertises a conservative subset (no json-mode, no tool-use) so the provider never claims a capability it can't fulfil - ChatHistory helpers + WebBrowser_TextGeneration_Unified dispatcher (text.generation shared by AiChatTask + TextGenerationTask) - ChromeHelpers ships both assertAvailability and ensureAvailable; both session APIs (chrome-chat cache + idle-evict store) coexist - Drops main's WebBrowser_Chat.test.ts (chrome-ai's WebBrowserProvider.test already covers chat behavior under the new cache semantics)
…rn streams Tool calling utilities (packages/ai/src/task/ToolCallingUtils.ts): - sanitizeToolArgs: recursive __proto__/constructor/prototype scrubbing for model-supplied tool args (prototype-pollution defence) - compileToolValidators + validateToolCallArgs: per-tool inputSchema validation with graceful fallback for tools whose schema fails to compile Stream helpers converted from generators to emit-callback so run-fns no longer need a for-await/yield pump: - snapshotStreamToTextDeltas / snapshotStreamToSnapshots (chrome-ai) - accumulateOpenAIStream (@workglow/ai provider-utils, used by OpenAI + HFI) Run-fns updated to call helpers with emit directly and emit their own final 'finish' event. chrome-ai's WebBrowser_ToolCalling drops its private sanitization + validation copy and reuses the shared utils.
…viders Addresses review of #514/#520/#528 rebase: CRITICAL fix — `model.dispose` now reaches chat-cached sessions. The post-rebase chrome-ai branch had two parallel session maps (`chromeSessions` for chat reuse, `sessions` for idle-evict + ModelDispose lookup) but only the chat map was populated by runtime code, making `model.dispose` a functional no-op in production. Unified into a single Map<sessionId, WebBrowserSessionEntry> with both chat-cache fields (messageCount, fingerprints) and lifecycle fields (modelKey, lastUsedAt, idleTimer). `ChromeChatSessionState` now requires `modelKey`. `disposeWebBrowserSessionsForModel(modelKey)` iterates the unified store, so model.dispose destroys chat-cached sessions. Chat sessions become subject to idle eviction (free bonus). IMPORTANT — sanitizeToolArgs applied across the codebase per intent of the prior refactor: - OpenAIShapedChat (parseOpenAIToolCallMessage + accumulateOpenAIStream) → covers OpenAI + HFI - ToolCallParsers (adaptParserResult + parseToolCallsFromText) → covers llama.cpp Hermes/Liquid/Qwen35/Llama paths + HFT - Anthropic_ToolCalling (input_json_delta + content_block_stop) - Gemini_ToolCalling (functionCall.args) - Ollama_ToolCalling (parsed function.arguments) - LlamaCpp_ToolCalling (extractNativeFunctionCalls) - Cactus_ToolCalling[.browser] (JSON-parse parseToolCalls paths) Every model-supplied tool-arg payload now passes through sanitizeToolArgs before reaching downstream consumers, closing the prototype-pollution vector across the provider matrix. Also: - Added packages/test/src/test/ai/ToolCallingUtils.test.ts (14 unit tests for sanitizeToolArgs, compileToolValidators, validateToolCallArgs, plus a sanitize→validate→name-check integration test). - Added WebBrowser_Sessions.test regression for the unified-store behavior (disposeWebBrowserSessionsForModel sees chat-cached entries). - Documented WebBrowser_Chat's rebuild-on-next-turn recovery model (vs the in-fn retry that main's now-deleted test exercised).
…n is destroyed
Chrome can destroy a `LanguageModel` session out from under us (tab
backgrounding, GPU process restart, memory pressure). When a cached
session's `promptStreaming` throws DOMException("...destroyed...",
"InvalidStateError") we now rebuild the session from full history via
`initialPrompts` and retry the prompt once.
Retry is gated on three conditions, all required:
- We were using a CACHED session (a fresh-session failure means the
model is broken; retrying won't help).
- No text-delta has reached the consumer yet (we can't unsend deltas).
- The error name is `InvalidStateError` (matches Chrome's
InvalidStateError DOMException; tolerant of message-text changes).
Tests:
- "retries once with a fresh session when a cached session is destroyed"
seeds the cache on turn 1, has the cached session's promptStreaming
throw on turn 2's reuse, asserts rebuild + retry + cache replacement.
- "does not retry when a fresh (non-cached) session fails" guards the
first gate.
@types/dom-chromium-aito improve type definitions for Chrome AI APIs.WebBrowser_Chatfunction to handle multi-turn chat sessions, allowing for better session management and context retention.WebBrowser_StructuredGeneration, enabling JSON output from the AI model.WebBrowser_ToolCalling, allowing the model to invoke tools and handle their results seamlessly.json-modeandtool-use, expanding the range of tasks the AI can perform.WebBrowser_Sessionsto cache and manage AI sessions effectively, improving performance and resource utilization.WebBrowser_ChromeAI.d.tsto streamline the codebase.This commit significantly enhances the functionality and usability of the Chrome AI provider, paving the way for more complex interactions and improved user experience.