Skip to content

Chrome-ai#514

Merged
sroussey merged 4 commits into
mainfrom
chrome-ai
May 22, 2026
Merged

Chrome-ai#514
sroussey merged 4 commits into
mainfrom
chrome-ai

Conversation

@sroussey
Copy link
Copy Markdown
Collaborator

  • Added support for @types/dom-chromium-ai to improve type definitions for Chrome AI APIs.
  • Introduced a new WebBrowser_Chat function to handle multi-turn chat sessions, allowing for better session management and context retention.
  • Implemented structured generation capabilities with WebBrowser_StructuredGeneration, enabling JSON output from the AI model.
  • Enhanced tool-calling functionality with WebBrowser_ToolCalling, allowing the model to invoke tools and handle their results seamlessly.
  • Updated existing capabilities to include new features such as json-mode and tool-use, expanding the range of tasks the AI can perform.
  • Refactored session management with WebBrowser_Sessions to cache and manage AI sessions effectively, improving performance and resource utilization.
  • Removed deprecated type definitions from WebBrowser_ChromeAI.d.ts to streamline the codebase.

This commit significantly enhances the functionality and usability of the Chrome AI provider, paving the way for more complex interactions and improved user experience.

@pkg-pr-new
Copy link
Copy Markdown

pkg-pr-new Bot commented May 20, 2026

Open in StackBlitz

@workglow/cli

npm i https://pkg.pr.new/@workglow/cli@514

@workglow/ai

npm i https://pkg.pr.new/@workglow/ai@514

@workglow/browser-control

npm i https://pkg.pr.new/@workglow/browser-control@514

@workglow/indexeddb

npm i https://pkg.pr.new/@workglow/indexeddb@514

@workglow/javascript

npm i https://pkg.pr.new/@workglow/javascript@514

@workglow/job-queue

npm i https://pkg.pr.new/@workglow/job-queue@514

@workglow/knowledge-base

npm i https://pkg.pr.new/@workglow/knowledge-base@514

@workglow/mcp

npm i https://pkg.pr.new/@workglow/mcp@514

@workglow/storage

npm i https://pkg.pr.new/@workglow/storage@514

@workglow/task-graph

npm i https://pkg.pr.new/@workglow/task-graph@514

@workglow/tasks

npm i https://pkg.pr.new/@workglow/tasks@514

@workglow/util

npm i https://pkg.pr.new/@workglow/util@514

workglow

npm i https://pkg.pr.new/workglow@514

@workglow/anthropic

npm i https://pkg.pr.new/@workglow/anthropic@514

@workglow/bun-webview

npm i https://pkg.pr.new/@workglow/bun-webview@514

@workglow/cactus

npm i https://pkg.pr.new/@workglow/cactus@514

@workglow/chrome-ai

npm i https://pkg.pr.new/@workglow/chrome-ai@514

@workglow/electron

npm i https://pkg.pr.new/@workglow/electron@514

@workglow/google-gemini

npm i https://pkg.pr.new/@workglow/google-gemini@514

@workglow/huggingface-inference

npm i https://pkg.pr.new/@workglow/huggingface-inference@514

@workglow/huggingface-transformers

npm i https://pkg.pr.new/@workglow/huggingface-transformers@514

@workglow/node-llama-cpp

npm i https://pkg.pr.new/@workglow/node-llama-cpp@514

@workglow/ollama

npm i https://pkg.pr.new/@workglow/ollama@514

@workglow/openai

npm i https://pkg.pr.new/@workglow/openai@514

@workglow/playwright

npm i https://pkg.pr.new/@workglow/playwright@514

@workglow/postgres

npm i https://pkg.pr.new/@workglow/postgres@514

@workglow/sqlite

npm i https://pkg.pr.new/@workglow/sqlite@514

@workglow/supabase

npm i https://pkg.pr.new/@workglow/supabase@514

@workglow/tf-mediapipe

npm i https://pkg.pr.new/@workglow/tf-mediapipe@514

commit: b382050

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 20, 2026

Coverage Report

Status Category Percentage Covered / Total
🔵 Lines 62.52% 23684 / 37877
🔵 Statements 62.4% 24511 / 39278
🔵 Functions 63.56% 4509 / 7093
🔵 Branches 50.99% 11506 / 22561
File CoverageNo changed files found.
Generated in workflow #2390 for commit b382050 by the Vitest Coverage Report Action

sroussey added a commit that referenced this pull request May 20, 2026
… (#520)

* feat(chrome-ai): probe-gate tool-use and json-mode capabilities (C1)

Chrome's `LanguageModel.create` did not universally accept `tools` or
`responseConstraint` options, yet `inferWebBrowserCapabilities` always
advertised `tool-use` + `json-mode` for `chrome-prompt`/`gemini-nano`.
This caused the dispatcher to route json-mode and tool-use tasks to the
WebBrowser provider on Chrome builds that would reject them at runtime.

Adds a one-shot capability probe (`probeWebBrowserCapabilities`) that
smoke-tests `factory.create({ responseConstraint })` and
`factory.create({ tools })`, with module-level coalescing so concurrent
callers share one probe round-trip. `WebBrowserProvider` kicks the probe
off in its constructor; until it resolves, `inferCapabilities` returns
the conservative subset (no `json-mode`, no `tool-use`). Tests cover
all four probe outcome combinations, coalescing, and pre/post-ready
inference.

https://claude.ai/code/session_013PqntVCfKgKmJ5396w7BPC

* feat(chrome-ai): StructuredGeneration accepts sessionId with schema fingerprint (H1)

The structured-generation run-fn dropped `sessionId` from its signature,
so successive calls with the same id always rebuilt the underlying Chrome
`LanguageModel` even though the surface supports session reuse. This
matched the pre-session-cache behaviour rather than the post-cache shape
adopted by `WebBrowser_Chat`.

Accept `sessionId` as the 6th positional parameter, mirroring chat. Cache
reuse is gated on a canonical schema fingerprint stored on the cache
entry — a schema change forces a rebuild because Chrome's
`responseConstraint` state is bound at first-prompt and re-feeding a
different schema is undefined behaviour. On stream failure the entry is
dropped + destroyed via the same `cacheWritten` / `dropChromeSessionEntry`
dance as chat. `ChromeChatSessionState` grows an optional
`schemaFingerprint` field.

https://claude.ai/code/session_013PqntVCfKgKmJ5396w7BPC

* feat(chrome-ai): ToolCalling accepts sessionId with tools fingerprint (H2)

`WebBrowser_ToolCalling` ignored both `outputSchema` and `sessionId` —
the 5th and 6th positional parameters of the run-fn contract — so
multi-turn tool-calling rebuilt the `LanguageModel` each turn.

Accept both parameters. Cache reuse keys on a sorted-tool-name
fingerprint (Chrome binds `tools` at `create()` time and can't hot-swap
them per turn). We only cache when the orchestrator drives via
`input.messages` because Chrome's tool-calling loop appends tool-result
turns to the session's internal state opaquely — reusing a cached
session across a turn the orchestrator hasn't fully replayed would
double-feed those results. Bare-prompt callers always rebuild.

On any error we drop + destroy the cache entry: Chrome's internal state
may be mid-tool-call-cycle. `ChromeChatSessionState` grows an optional
`toolsFingerprint` field.

https://claude.ai/code/session_013PqntVCfKgKmJ5396w7BPC

* fix(chrome-ai): validate tool-call arguments against tool inputSchema (H3)

Chrome's `LanguageModel` invokes our stub `execute` callback with whatever
arguments the model emits. `filterValidToolCalls` only checked the tool
name, so a hallucinated arg shape was forwarded to the orchestrator
verbatim — leaving the downstream tool runner to either fail or silently
produce garbage.

Compile each tool's `inputSchema` once via `compileSchema` (cached by
name) before the stream starts. After streaming we validate every
captured call's `input` against its tool's validator; failures are
dropped + warn-logged in the same shape as `filterValidToolCalls`'s
existing name-only warning. Tools whose `inputSchema` fails to compile
emit a single warning and fall through to the name-only check rather
than failing the whole run.

https://claude.ai/code/session_013PqntVCfKgKmJ5396w7BPC

* fix(chrome-ai): validate StructuredGeneration final JSON against schema (H4)

Chrome's `responseConstraint` is best-effort, not a hard guarantee — the
model can still produce a partial or shape-mismatched payload. The
existing fallback (`parsePartialJson(...) ?? {}`) handed downstream code
an empty object cast to the output type, indistinguishable from a
legitimate empty payload. Worse, that path emitted a `finish` event, so
`StructuredGenerationTask`'s retry loop had no signal to retry on.

Compile the validator once via `compileSchema`. After streaming:
 - If neither `JSON.parse` nor `parsePartialJson` produces a value:
   throw `PermanentJobError("Chrome AI returned unparseable JSON")`.
 - If validation fails: throw with the first validator error message.
 - Only on success do we emit `finish` and write the cache entry.

`StructuredGenerationTask.executeStream` catches per-attempt errors and
retries, so throwing here is the correct signal — no `finish` so the
loop knows this attempt failed. Schema compile failures are also
surfaced as `PermanentJobError` (so retries don't burn through quota on
a malformed schema).

https://claude.ai/code/session_013PqntVCfKgKmJ5396w7BPC

* Potential fix for pull request finding

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

---------

Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Copy link
Copy Markdown
Collaborator Author

HIGH-priority review findings — fix plans

Following an end-of-day automated review of PRs updated in the last 24h. 5 HIGH findings, 0 CRITICAL. Recommend addressing H1–H4 before merge; H5 is a dead-code cleanup that can be deferred. Full detail per finding below.

H1 — WebBrowser_StructuredGeneration defeats StructuredGenerationTask.maxRetries

Where: providers/chrome-ai/src/ai/common/WebBrowser_StructuredGeneration.ts:175-195

Why: The run-fn throws PermanentJobError on JSON.parse or schema-validation failure. The inline comment claims StructuredGenerationTask runs inside a retry loop that catches per-attempt errors — but packages/ai/src/task/StructuredGenerationTask.ts:177 consumes the inner stream via for await (const event of super.executeStream(...)), which lets exceptions propagate straight out of the generator. The task's retry/repair pipeline only runs when the run-fn emits a finish event whose data.object parses but fails validation. Chrome's path therefore silently disables maxRetries.

Fix: Mirror Anthropic_StructuredGeneration_Stream / HFT_StructuredGeneration_Stream. On parse failure, fall back to parsePartialJson(accumulatedJson) ?? {}. On validation failure, do NOT throw — still emit { type: "finish", data: { object: finalObject } } and let the task's compiled validator decide retry. Compile-time invalid-schema (the PermanentJobError at lines 113-118) legitimately stays as throw.

Effort: S (~25 lines).

H2 — WebBrowser_ToolCalling cache reuse double-feeds tool results

Where: providers/chrome-ai/src/ai/common/WebBrowser_ToolCalling.ts:200-260 (cache check), :102-118 (trade-off comment)

Why: The run-fn caches LanguageModel and reuses across orchestrator turns when input.messages is present, gated on toolsFingerprint and priorMessageCount. Chrome's promptStreaming() internally appends tool-result turns to the session's history. On the orchestrator's next turn, input.messages re-supplies the tool result as a text frame, while the cached session also has its own appended copy → model sees the same tool result twice. The trade-off comment acknowledges the failure mode; the chosen priorMessageCount watermark does not actually defend against it.

Fix: Delete the cache-reuse path for tool-calling entirely. Match WebBrowser_TextGeneration.ts's pattern (always factory.create + session.destroy in finally). Remove cached / usedCachedSession / cacheable / cacheWritten locals and the getChromeSession / setChromeSession / dropChromeSessionEntry imports. Chrome session-create is cheap on a warm model; the correctness win outweighs the amortization. If amortization matters later, gate on capturedCalls.length === 0 from the previous turn (only safe when the prior turn was text-only).

Effort: S (~40 lines deletion).

H3 — WebBrowser_Chat history-replay watermark drifts from Chrome's effective initial prompt

Where: providers/chrome-ai/src/ai/common/WebBrowser_Chat.ts:67-78, providers/chrome-ai/src/ai/common/WebBrowser_ChatHistory.ts:38-67

Why: Cache watermark is priorHistory.length (raw count of ChatMessages). But buildInitialPromptsFromHistory silently filters out empty-text messages, tool-role messages, and mid-history system messages. Two histories that produce identical initialPrompts but differ in dropped-frame count force a needless rebuild (cache miss). Conversely, a mid-conversation mutation of the leading system text — also silently swallowed by the filter — keeps the cache hot with the old system prompt baked in.

Fix: Compute the watermark over the FILTERED history that actually got fed to Chrome. Change buildInitialPromptsFromHistory to return { initialPrompts, fingerprint } where fingerprint is canonicalStringify(initialPrompts) (or a hash). Gate cache reuse on the fingerprint match, not raw length. Add a separate leading-system fingerprint check so mid-conversation system-prompt mutations force a rebuild. Update ChromeChatSessionState (WebBrowser_Sessions.ts:17-34) to carry historyFingerprint?: string alongside the existing messageCount (preserved for tool-calling/structured-gen which don't change in this fix).

Effort: M (~80 lines + interface touch).

H4 — Bare LanguageModel (and friends) references can throw at module load on non-Chrome

Where: Eight files — WebBrowser_Chat.ts:40-44, WebBrowser_StructuredGeneration.ts:100-104, WebBrowser_ToolCalling.ts:159-163, WebBrowser_TextGeneration.ts:42-46, WebBrowser_TextLanguageDetection.ts, WebBrowser_TextRewriter.ts, WebBrowser_TextSummary.ts, WebBrowser_TextTranslation.ts

Why: Each file uses typeof LanguageModel !== "undefined" ? LanguageModel : undefined. The typeof X form is correct in classic scripts, but in strict-mode ES modules under some bundlers the second reference to LanguageModel (after the ?) is treated as a binding lookup and can throw ReferenceError at module evaluation on Firefox/Safari/Node. WebBrowser_CapabilityProbe.ts:80-90 already routes through globalThis and got this right — the probe fix just didn't propagate to the run-fns.

Fix: Add a tiny helper in WebBrowser_ChromeHelpers.ts:

export function getChromeGlobal<T = unknown>(name: string): T | undefined {
  return (globalThis as unknown as Record<string, T | undefined>)[name];
}

Replace each call site:

// before
getApi("LanguageModel", typeof LanguageModel !== "undefined" ? LanguageModel : undefined)
// after
getApi("LanguageModel", getChromeGlobal<typeof LanguageModel>("LanguageModel"))

Apply to LanguageModel, Summarizer, Rewriter, Translator, LanguageDetector across all 8 files.

Effort: S (~10 lines per site × 8 + helper).

H5 — WebBrowser_StructuredGeneration cache code is dead in production

Where: providers/chrome-ai/src/ai/common/WebBrowser_StructuredGeneration.ts:125-160 (cache branch); packages/ai/src/task/StructuredGenerationTask.ts (no sessionId plumbing)

Why: The fingerprint and cache infrastructure adds ~80 lines plus three live failure modes (cache poisoning, schema-mismatch invalidation, watermark drift). But StructuredGenerationTask doesn't allocate a sessionId and the JobInput plumbing that produces one is owned by AiChatTask. No production caller passes sessionId. Dead in production.

Fix: Delete the cache code in this file (always create + destroy like WebBrowser_TextGeneration). Leave a TODO comment pointing at the follow-up: wiring StructuredGenerationTask to allocate a sessionId for json-mode providers is a cross-provider contract change and belongs in a separate PR. If canonicalStringify is needed by the H3 fix, promote it to a shared helper; otherwise inline-delete.

Effort: S (~90 lines deletion).

Sequencing

  • H1, H2, H4 are independent — any order.
  • H1 + H5 both touch WebBrowser_StructuredGeneration.ts — land H5 first (deletes more), then H1 on the simplified file.
  • H3 touches a shared ChromeChatSessionState interface that H2 reads (messageCount); H3's plan keeps messageCount and adds historyFingerprint as an optional sibling, so the two don't conflict.
  • If H4 lands first, H1/H5 pick up getChromeGlobal for free.

Generated by Claude Code

@sroussey
Copy link
Copy Markdown
Collaborator Author

@copilot Some of this was recently added to main branch, but not all. It needs to be rebased and checked carefully. main has some work on sessions and making sure that streaming does not use snapshot.

Copy link
Copy Markdown
Contributor

Copilot AI commented May 22, 2026

@copilot Some of this was recently added to main branch, but not all. It needs to be rebased and checked carefully. main has some work on sessions and making sure that streaming does not use s...

Rechecked this against main and updated the Chrome AI paths in 789c0e4. The branch now uses globalThis-safe Chrome API lookup, text-delta-only streaming fallback, chat history fingerprinting for session reuse, and fresh sessions for tool-calling / structured generation so the main-branch session and streaming concerns are covered.

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

  • huggingface.co
    • Triggering command: /opt/hostedtoolcache/node/24.15.0/x64/bin/node /opt/hostedtoolcache/node/24.15.0/x64/bin/node --experimental-import-meta-resolve --require /home/REDACTED/work/libs/libs/node_modules/.bun/vitest@4.1.6&#43;01bed7d671562612/node_modules/vitest/suppress-warnings.cjs --conditions node --conditions development /home/REDACTED/work/libs/libs/node_modules/.bun/vitest@4.1.6&#43;01bed7d671562612/node_modules/vitest/dist/workers/forks.js owser /native-preview-linux-x64/lib/tsbun run build-code n ncur�� hema /native-preview-linux-x64/lib/tsgo go ackages/test/src/test/ai-provider/AiProviderRegistry.test.ts packages/test/src/test/ai-provider mpress-browser mpress-node fgrep (dns block)
    • Triggering command: /opt/hostedtoolcache/node/24.15.0/x64/bin/node /opt/hostedtoolcache/node/24.15.0/x64/bin/node --experimental-import-meta-resolve --require /home/REDACTED/work/libs/libs/node_modules/.bun/vitest@4.1.6&#43;01bed7d671562612/node_modules/vitest/suppress-warnings.cjs --conditions node --conditions development /home/REDACTED/work/libs/libs/node_modules/.bun/vitest@4.1.6&#43;01bed7d671562612/node_modules/vitest/dist/workers/forks.js bun run build-me/home/REDACTED/work/libs/libs/node_modules/.bin/turbo bun run build-merun ules/.bin/bun /hom�� -c auto odules/npm/node_ACCEPT ackages/test/src/test/ai-provider/AiProviderRegistry.test.ts packages/test/src/test/ai-provider bun run build-browser linux-x64/lib/tsgo node (dns block)
    • Triggering command: /opt/hostedtoolcache/node/24.15.0/x64/bin/node /opt/hostedtoolcache/node/24.15.0/x64/bin/node --experimental-import-meta-resolve --require /home/REDACTED/work/libs/libs/node_modules/.bun/vitest@4.1.6&#43;01bed7d671562612/node_modules/vitest/suppress-warnings.cjs --conditions node --conditions development /home/REDACTED/work/libs/libs/node_modules/.bun/vitest@4.1.6&#43;01bed7d671562612/node_modules/vitest/dist/workers/forks.js bun run build-br--norc bun run build-no--noprofile node test�� (dns block)

If you need me to access, download, or install something from one of these locations, you can either:

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR expands the Chrome Built-in AI provider to support capability probing, multi-turn chat/session handling, structured JSON generation (json-mode), and tool-calling (tool-use). It also introduces persistent, machine-readable error codes for FetchUrlTask failures (propagated through the job queue/client), plus accompanying test refactors/utilities.

Changes:

  • Chrome AI provider: add capability probing + new run-fns for chat, structured generation, and tool calling; refactor streaming helpers and session management.
  • FetchUrl: introduce FetchUrlErrorCode + helpers, persist error_code reliably, and update SafeFetch/FetchUrlTask logic + tests.
  • Tests/tooling: add cross-runner fake-timer helper and expand provider + fetch error-code coverage.

Reviewed changes

Copilot reviewed 36 out of 38 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
scripts/lib/preload-credentials.ts Add warning log when credential unlock/hydration fails.
providers/chrome-ai/tsconfig.json Include dom-chromium-ai ambient types for Chrome AI globals.
providers/chrome-ai/src/ai/WebBrowserProvider.ts Add capability probe lifecycle (ready()), gated inference, and session disposal hook.
providers/chrome-ai/src/ai/index.ts Expand _testOnly exports to cover new run-fns, probing, sessions, and chat helpers.
providers/chrome-ai/src/ai/common/WebBrowser_ToolCalling.ts New tool-calling run-fn bridging Chrome’s internal tool loop to Workglow toolCalls protocol.
providers/chrome-ai/src/ai/common/WebBrowser_TextTranslation.ts Refactor Translator access via getChromeGlobal + add download-progress monitoring.
providers/chrome-ai/src/ai/common/WebBrowser_TextSummary.ts Add download-progress monitoring, signal plumbing, and normalize tl;drtldr.
providers/chrome-ai/src/ai/common/WebBrowser_TextRewriter.ts Add download-progress monitoring and signal plumbing.
providers/chrome-ai/src/ai/common/WebBrowser_TextLanguageDetection.ts Add download-progress monitoring, signal plumbing, and defensive mapping of detection results.
providers/chrome-ai/src/ai/common/WebBrowser_TextGeneration.ts Add download-progress monitoring, signal plumbing, and unify delta streaming helper usage.
providers/chrome-ai/src/ai/common/WebBrowser_StructuredGeneration.ts New json-mode structured generation run-fn using responseConstraint with partial JSON streaming.
providers/chrome-ai/src/ai/common/WebBrowser_Sessions.ts New LanguageModel session cache utilities keyed by AiChatTask sessionId.
providers/chrome-ai/src/ai/common/WebBrowser_JobRunFns.ts Register new capability sets and add unified text.generation dispatcher (chat vs one-shot).
providers/chrome-ai/src/ai/common/WebBrowser_ChromeHelpers.ts Add global lookup helper, download monitor, canonical stringify, and adjust streaming snapshot→delta conversion.
providers/chrome-ai/src/ai/common/WebBrowser_ChromeAI.d.ts Remove deprecated in-repo ambient Chrome AI type declarations (replaced by @types/dom-chromium-ai).
providers/chrome-ai/src/ai/common/WebBrowser_ChatHistory.ts New helpers to map Workglow chat history into Chrome initialPrompts + fingerprinting.
providers/chrome-ai/src/ai/common/WebBrowser_Chat.ts New multi-turn chat run-fn with session caching and failure-path cache hygiene.
providers/chrome-ai/src/ai/common/WebBrowser_CapabilitySets.ts Add json-mode and tool-use capability sets.
providers/chrome-ai/src/ai/common/WebBrowser_CapabilityProbe.ts New module-level cached probe for json/tool support via LanguageModel.create() smoke tests.
providers/chrome-ai/src/ai/common/WebBrowser_Capabilities.ts Gate json-mode/tool-use capability inference behind probe results (+ async helper).
providers/chrome-ai/package.json Add @types/dom-chromium-ai dev dependency.
packages/test/src/test/util/WorkerManager.idle.test.ts Refactor timer advancement to shared helper for Vitest/Bun compatibility.
packages/test/src/test/task/FetchTask.test.ts Assert FetchUrlErrorCode propagation and persisted errorCode behavior through queue.
packages/test/src/test/resource/DisposeStrategy.test.ts Refactor timer advancement to shared helper.
packages/test/src/test/helpers/advanceFakeTimers.ts New helper to advance timers portably across Vitest/Bun (with optional microtask flush).
packages/test/src/test/browser-control/SequentialTasks.test.ts Refactor timer advancement to shared helper.
packages/test/src/test/ai-provider/WebBrowserProvider.test.ts Add extensive tests for probing, unified dispatcher, structured generation, tool calling, sessions, and helpers.
packages/test/src/test/ai-provider/DownloadModelAbort.integration.test.ts Refactor model setup + improve abort-error classification helper.
packages/tasks/src/util/SafeFetch.ts Replace generic errors with structured FetchUrl error codes for SSRF/redirect failures.
packages/tasks/src/util/SafeFetch.server.ts Replace generic errors with structured FetchUrl error codes for DNS/SSRF/redirect failures.
packages/tasks/src/task/FetchUrlTask.ts Emit structured FetchUrl errors, classify parse failures, and route HTTP errors through code-bearing helpers.
packages/tasks/src/task/FetchUrlJobError.ts New centralized FetchUrl error-code definitions + constructors/wrappers.
packages/tasks/src/common.ts Export FetchUrlJobError and reorder/re-export image + media filter utilities.
packages/task-graph/src/task/TaskError.ts Propagate underlying JobError.code onto JobTaskFailedError.code when available.
packages/job-queue/src/job/JobQueueWorker.ts Persist error_code via jobErrorPersistedCode() and emit consistent event payloads.
packages/job-queue/src/job/JobQueueClient.ts Rehydrate FETCH_* persisted codes into retryable/permanent errors and preserve code.
packages/job-queue/src/job/JobError.ts Add JobError.code and helper to derive persisted error codes.
bun.lock Lockfile updates for new dependency and trustedDependencies ordering.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

} else {
previousSnapshot = value;
yield { type: "snapshot", data: buildFallbackOutput(value) };
accumulatedText += value;
Comment on lines +64 to +69
// Cache hygiene: only reuse the cached session if its watermark exactly
// matches the history we'd otherwise re-feed. Out-of-sync caches (task
// reset mid-conversation, retroactive edits to `messages`) are torn down
// and rebuilt.
let cached = sessionId ? getChromeSession(sessionId) : undefined;
if (sessionId !== undefined && cached && cached.historyFingerprint !== historyFingerprint) {
Comment thread packages/tasks/src/task/FetchUrlTask.ts Outdated
Comment on lines +147 to +148
if (err instanceof Error && err.name === "AbortSignalJobError") {
throw err;
@sroussey
Copy link
Copy Markdown
Collaborator Author

@claude Some of this was recently added to main branch, but not all. It needs to be rebased and checked carefully. main has some work on chrome sessions too and making sure that chrome ai streaming does not send snapshot when they aren't actually snapshots. Double check docs at google. Text generation and translation may be different.

sroussey added a commit that referenced this pull request May 22, 2026
…totype pollution, snapshot reset (#528)

* fix(chrome-ai): repair WebBrowser_Chat session-cache reuse (HIGH)

The previous fingerprint-based cache key recomputed the fingerprint
from the *prior* history on every turn, so turn 2's cache lookup always
missed and rebuilt the session from scratch. Switch to a messageCount
high-water mark: cache hits when cached.messageCount === lastUserIdx
(i.e., the session has already heard everything before the trailing
user message). After a successful turn the session has heard
messages.length + 1 messages (history + new assistant reply), which we
record for the next call.

* fix(chrome-ai): sanitize tool-call arguments to prevent prototype pollution (HIGH)

Many tool input schemas don't set `additionalProperties: false`, so a
hallucinated `{__proto__: {polluted: true}, ok: true}` payload would
pass validation and propagate through to consumers. Add a
`sanitizeToolArgs` helper that recursively rebuilds the value with a
plain Object.prototype, dropping `__proto__`, `constructor`, and
`prototype` keys at every depth. Sanitize BEFORE validation so the
validator sees the cleaned object.

* fix(chrome-ai): reset accumulator on non-prefix snapshots (HIGH)

`snapshotStreamToTextDeltas` was concatenating instead of resetting
when a snapshot was not a prefix-extension of the previously
accumulated text. For self-correction snapshots (Chrome replacing,
not extending, prior text) this corrupted consumer state with
duplicated content like `"hello worldhello sailor"`. Reset the
accumulator to the new snapshot and emit it as the delta so consumers
treat the non-prefix boundary as a replace, matching the documented
streaming-convention exception.

Also add `snapshotStreamToTextDeltas` to `_testOnly` so the helper
is testable from the test package, and add coverage for:
  - HIGH-1: chat cache reuse and rebuild-on-divergence
  - HIGH-2: __proto__/constructor/prototype scrubbing (top-level + recursive)
  - HIGH-3: prefix-extend, non-prefix-reset, identical-snapshot semantics

Also fix a stale comment in the existing tool-calling lifecycle test
that claimed cache reuse — tool-calling intentionally rebuilds per
turn.

* docs(chrome-ai): align test comment with actual shrink-rebuild behavior
sroussey added 2 commits May 22, 2026 16:03
…apability probe

Integrates the chrome-ai branch (7 commits — PR #514/#520/#528) with main's
parallel chrome-ai work (model.download, model.dispose, ApiBinding):

- Chat-session cache keyed by AiChatTask sessionId, with messageCount
  high-water mark for reuse (replaces fingerprint-based invalidation)
- StructuredGeneration + ToolCalling run-fns gated by an async capability
  probe; pre-probe state advertises a conservative subset (no json-mode,
  no tool-use) so the provider never claims a capability it can't fulfil
- ChatHistory helpers + WebBrowser_TextGeneration_Unified dispatcher
  (text.generation shared by AiChatTask + TextGenerationTask)
- ChromeHelpers ships both assertAvailability and ensureAvailable; both
  session APIs (chrome-chat cache + idle-evict store) coexist
- Drops main's WebBrowser_Chat.test.ts (chrome-ai's WebBrowserProvider.test
  already covers chat behavior under the new cache semantics)
…rn streams

Tool calling utilities (packages/ai/src/task/ToolCallingUtils.ts):
- sanitizeToolArgs: recursive __proto__/constructor/prototype scrubbing
  for model-supplied tool args (prototype-pollution defence)
- compileToolValidators + validateToolCallArgs: per-tool inputSchema
  validation with graceful fallback for tools whose schema fails to compile

Stream helpers converted from generators to emit-callback so run-fns no
longer need a for-await/yield pump:
- snapshotStreamToTextDeltas / snapshotStreamToSnapshots (chrome-ai)
- accumulateOpenAIStream (@workglow/ai provider-utils, used by OpenAI + HFI)

Run-fns updated to call helpers with emit directly and emit their own
final 'finish' event. chrome-ai's WebBrowser_ToolCalling drops its
private sanitization + validation copy and reuses the shared utils.
sroussey added 2 commits May 22, 2026 18:33
…viders

Addresses review of #514/#520/#528 rebase:

CRITICAL fix — `model.dispose` now reaches chat-cached sessions. The
post-rebase chrome-ai branch had two parallel session maps
(`chromeSessions` for chat reuse, `sessions` for idle-evict +
ModelDispose lookup) but only the chat map was populated by runtime
code, making `model.dispose` a functional no-op in production.

Unified into a single Map<sessionId, WebBrowserSessionEntry> with both
chat-cache fields (messageCount, fingerprints) and lifecycle fields
(modelKey, lastUsedAt, idleTimer). `ChromeChatSessionState` now requires
`modelKey`. `disposeWebBrowserSessionsForModel(modelKey)` iterates the
unified store, so model.dispose destroys chat-cached sessions. Chat
sessions become subject to idle eviction (free bonus).

IMPORTANT — sanitizeToolArgs applied across the codebase per intent of
the prior refactor:

  - OpenAIShapedChat (parseOpenAIToolCallMessage + accumulateOpenAIStream)
    → covers OpenAI + HFI
  - ToolCallParsers (adaptParserResult + parseToolCallsFromText)
    → covers llama.cpp Hermes/Liquid/Qwen35/Llama paths + HFT
  - Anthropic_ToolCalling (input_json_delta + content_block_stop)
  - Gemini_ToolCalling (functionCall.args)
  - Ollama_ToolCalling (parsed function.arguments)
  - LlamaCpp_ToolCalling (extractNativeFunctionCalls)
  - Cactus_ToolCalling[.browser] (JSON-parse parseToolCalls paths)

Every model-supplied tool-arg payload now passes through
sanitizeToolArgs before reaching downstream consumers, closing the
prototype-pollution vector across the provider matrix.

Also:
  - Added packages/test/src/test/ai/ToolCallingUtils.test.ts (14 unit
    tests for sanitizeToolArgs, compileToolValidators,
    validateToolCallArgs, plus a sanitize→validate→name-check
    integration test).
  - Added WebBrowser_Sessions.test regression for the unified-store
    behavior (disposeWebBrowserSessionsForModel sees chat-cached
    entries).
  - Documented WebBrowser_Chat's rebuild-on-next-turn recovery model
    (vs the in-fn retry that main's now-deleted test exercised).
…n is destroyed

Chrome can destroy a `LanguageModel` session out from under us (tab
backgrounding, GPU process restart, memory pressure). When a cached
session's `promptStreaming` throws DOMException("...destroyed...",
"InvalidStateError") we now rebuild the session from full history via
`initialPrompts` and retry the prompt once.

Retry is gated on three conditions, all required:
  - We were using a CACHED session (a fresh-session failure means the
    model is broken; retrying won't help).
  - No text-delta has reached the consumer yet (we can't unsend deltas).
  - The error name is `InvalidStateError` (matches Chrome's
    InvalidStateError DOMException; tolerant of message-text changes).

Tests:
  - "retries once with a fresh session when a cached session is destroyed"
    seeds the cache on turn 1, has the cached session's promptStreaming
    throw on turn 2's reuse, asserts rebuild + retry + cache replacement.
  - "does not retry when a fresh (non-cached) session fails" guards the
    first gate.
@sroussey sroussey merged commit ab2ceec into main May 22, 2026
16 checks passed
@sroussey sroussey deleted the chrome-ai branch May 22, 2026 20:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants