0.6.10 - deterministic waiting, safe self-healing, token controls, domain understanding#2
Conversation
Collapse Iris integration from a 6-step manual process into `npx @syrin/iris init` plus a dev-server restart, and add a Vite plugin that makes the whole React integration a single config line. @syrin/iris-vite-plugin (exposed as @syrin/iris/vite): - iris() with apply:'serve' so Vite drops it from `vite build` entirely — production-leak protection is structural, not a user-managed env gate - reuses the babel plugin to stamp data-iris-source, and injects a dev-only iris.connect() via transformIndexHtml (no entry-file edit needed) - options: port/session/token/sourceMapping/inject; non-default port is baked into the injected connect() url - integration test runs Vite's real resolveConfig to prove the plugin is filtered out of the build pipeline and present in serve iris init: - detects framework (Next/Vite/HTML), React major, and package manager - merges an iris entry into .mcp.json without clobbering, runs the dependency install, and for Vite patches the config with iris() - bail-to-manual safety: only auto-patches the obvious shape, otherwise prints the exact paste-in snippet rather than half-editing a build config; Next config wrap + layout mount are intentionally manual - pure modules (detect/mcp-config/vite-config/plan) + a thin injectable IO shell, fully unit-tested; --dry-run, --no-mcp, --no-install, --yes, --port Docs: getting-started.md now leads with the init fast path and recommends the Vite plugin. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…script An inline <script type="module"> injected via transformIndexHtml is not run through Vite's import resolution, so its bare `import '@syrin/iris'` failed in the browser with "Failed to resolve module specifier". Found by driving the real demo app with a headless browser. Serve the connect code as a real module (resolveId + load) referenced by a <script src>, so Vite resolves the import normally. Verified end-to-end against the demo: session connects, the react adapter loads, iris_inspect returns the component source file:line, and a real `vite build` emits zero Iris artifacts. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…entry under pnpm
Three issues found by a deep end-to-end stress test (adversarial config shapes +
a faithful npm-tarball install driven by a real browser):
1. A .mcp.json with comments/trailing commas (jsonc — what the docs show and the
MCP convention) crashed init with an uncaught JSON.parse SyntaxError. Now it
bails to a manual instruction and never rewrites the file (preserving comments).
2. `iris init --port N` wrote the port into .mcp.json but injected a bare iris(),
so the browser SDK connected to 4400 while the bridge listened on N. The port
now flows into iris({ port: N }) in the Vite patch and into the Next/HTML
connect snippets, so bridge and SDK always agree.
3. The CLI bin silently no-opped under pnpm installs: the bin shim runs the
symlinked node_modules/@syrin/iris path as argv[1] while ESM import.meta.url
is the realpath, so the entry guard was false and main() never ran — breaking
the documented `npx @syrin/iris init` / `mcp`. The guard now also compares the
realpath of argv[1].
Verified end-to-end: packed the real tarball, installed it into a fresh Vite +
React 19 app, ran `iris init` via the .bin shim, drove a headless browser — the
plugin connected the session and iris_inspect returned src/App.tsx:8 — and a
real `vite build` emitted zero Iris artifacts.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…oject The bridge + MCP server is a single process that serves every project, so it should be registered once — not re-added to a .mcp.json in each repo. `iris init` now runs `claude mcp add iris -s user` (idempotent: probes `claude mcp get iris` first, skips if present) instead of writing a project .mcp.json. Per project, only the SDK (the iris() Vite plugin / connect call) is added. - new init/mcp.ts: builds the `claude mcp add -s user` command, an existence probe, an availability probe, and a manual fallback (printed when the claude CLI is absent, with a Cursor-style global-config snippet too) - run.ts gains a quiet `probe` IO capability; gathers claudeCli + mcpExists - removed init/mcp-config.ts (the project .mcp.json JSON merge) — no longer used - --port flows into the global registration too, so bridge and SDK agree - docs (getting-started, integrate-with-claude-code): register once, globally Verified end-to-end in an isolated CLAUDE_CONFIG_DIR: init writes the iris server to the user config (no project .mcp.json), and a re-run reports ALREADY instead of erroring. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
`iris init` now registers Iris once, globally, for every supported agent it detects — not just Claude: - Claude Code: `claude mcp add iris -s user` (as before) - Cursor: merges an `iris` entry into ~/.cursor/mcp.json (Cursor has no CLI, but its global MCP config is a small dedicated file — safe to merge, unlike ~/.claude.json). Idempotent; never clobbers an existing entry; bails to manual on unparseable jsonc. Detection is per-agent (claude CLI present / ~/.cursor exists), so each gets its own step and only installed agents are touched. If neither is found, a single manual instruction is printed. --port flows into both registrations. The init IO gained homeDir() and absolute-path support for writing the global Cursor file. Verified end-to-end in an isolated HOME: with Cursor present, init writes ~/.cursor/mcp.json (and registers Claude too); a re-run reports ALREADY for both. Confirmed the real ~/.claude.json and ~/.cursor configs were untouched. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…nly flows
Iris now tells an agent, the moment it saves a flow, whether that flow actually
asserts an observable consequence — or whether it will pass even when the
feature is broken. iris_flow_save returns assertions.grade:
- asserted — a step or the success end-condition asserts a signal/network
consequence (a wrong element can't fake it)
- presence-only — only element-presence checks; a locator healed to the WRONG
element can still pass (the failure self-healing vendors admit)
- assertion-free — acts but asserts nothing observable
For non-"asserted" grades it returns a warning telling the agent to add a
consequence assertion (iris_annotate assert-signal / assert-net / success-state).
Grounded in: Fowler "Assertion-Free Testing" (100% coverage, 0 assertions),
Kent C. Dodds "make your test fail", mabl/qate.ai (healed-wrong-locator ships
bugs green), and AI agents agreeing with human pass/fail only ~68% of the time —
so the flow must carry a real oracle, not rely on the agent eyeballing success.
First slice of V1-ROADMAP M1 (intent-anchored assertions). Pure classifier +
7 unit tests; wired into iris_flow_save.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ed oracle) flow.success was checked only by the @syrin/iris-test CI spec runner — the live iris_flow_replay MCP tool an agent calls never evaluated it, so a replay reported ok even when the end-condition (the actual intent) was never met. Move the success oracle (successToPredicate + assertSuccess) down into iris-server as flow-success.ts so ONE implementation is shared by both the MCP tool and the spec runner — no divergent oracle. @syrin/iris-test/success-assert now re-exports it (its 12 tests still pass against the moved code). iris_flow_replay now, after every step runs clean, asserts the success end-condition as a real consequence: a signal/net success that never fires makes the replay status:error "intent not satisfied" even though all locators resolved — catching the regression a healed-but-wrong locator ships green (mabl/qate.ai). Dynamic (LLM-output) success fields are skipped, symmetric with the step layer. A `success` result row is appended so the agent sees the verdict. V1-ROADMAP M1 slice 2 (corrected: shared oracle, not a duplicate in replayFlow). Discovered + fixed via the full suite catching a turbo-cached masking failure. 8 new server tests; iris-server 488, iris-test 97; all 16 tasks green uncached. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…before huge reads
High-volume read results now carry cost:{ bytes, tokens } (tokens estimated at
~chars/4) so an agent can re-scope BEFORE spending context on a large body —
re-scope a snapshot (mode:interactive/status or a tighter `scope`) or narrow a
query (`name`/`scope`). Pure helpers in output-budget.ts (estimateTokens,
sizeCost, withSizeCost) reuse the existing JSON-byte idiom; the wrapper is pure
and unit-tested, the handlers just .then(withSizeCost).
Grounded in: by ~step 15 a screenshot/Playwright-MCP agent accrues 60–80K tokens
of stale accessibility-tree data and starts hallucinating selectors that don't
exist; a single screenshot is up to ~4,784 tokens. Giving the agent a cheap
size signal up front is the first lever of V1-ROADMAP M2 (token efficiency).
11 output-budget tests (+8); 493 server tests; all 16 tasks green uncached.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…what changed After the first snapshot, an agent can pass diff:true to get back only the delta since its last look of the same scope/mode — mode:delta with added/removed lines, or mode:unchanged — instead of the full accessibility tree again. A route change automatically resets to a full snapshot (a cross-page delta would be meaningless), so the agent never sees a misleading diff. Why: screenshot/Playwright-MCP agents accrue 60–80K tokens of stale a11y data over a session and start hallucinating selectors that no longer exist. Returning only the change set attacks both the token cost AND that stale-context failure in one move. The agent-facing cost is the MCP result, so the delta is computed server-side, reusing the same normalize+diff the baseline layer uses. Pure decision (snapshotDelta) + a route-invalidated, bounded SnapshotCache + applySnapshotDelta shaper — all unit-tested without a browser. V1-ROADMAP M2. 10 new tests; 503 server tests; all 16 tasks green uncached. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
A reproducible, regression-guarding benchmark (runs the shipped sizeCost / estimateTokens / applySnapshotDelta, not mocks) over a representative 150-row operational dashboard — the shape that makes full snapshots expensive. Measured: full re-snapshot = 4,246 tok (16,983 bytes) diff (1 row changed) = 60 tok → 99% saved unchanged re-snapshot = 17 tok Asserts the diff is <10% of a full re-read and ≥90% saved, so the M2 win can't silently regress. This is the data behind the "token efficient" claim — the absolute figure uses the ~chars/4 heuristic, the relative saving is robust. V1-ROADMAP M0 (first numbers; real-app capture is the follow-up). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…efore testing New tool + pure builder that synthesizes every saved flow and the registered capabilities into a compact domain model an agent reads BEFORE driving the app: each flow with its assertion grade and the signals/testids it exercises, coverage counts, and — the differentiator — GAPS: declared signals/testids that NO flow asserts (untested intent), plus flows that assert no consequence. A one-line summary headlines it. Why: automation "checks the DOM, not the intent" — tests pass while integrated bugs ship (Bolton; Fowler). Pointing the agent straight at "you declared signal order:placed but no flow asserts it" is domain understanding + rigor in one read, instead of crawling the whole app or reading all the source. Reuses the flow assertion classifier so the two reinforce each other. Pure buildDomainModel (no IO) + a thin iris_domain tool reading .iris/flows/ + .iris/contract.json (no browser). V1-ROADMAP M4 first slice. 5 domain-model tests; 511 server tests; all 16 tasks green uncached. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ver the intent) When two present testids tie at the minimum edit distance, the prior nearest-pick was an arbitrary lexical tiebreak — exactly how a self-healing rebind lands on the WRONG element and ships a bug green (the failure mabl/qate.ai admit). Drift now carries `ambiguous:true` in that case, and the heal proposal layer refuses to auto-rebind an ambiguous drift: it still surfaces the drift + a nearest candidate for a human/agent to choose, but never applies a coin-flip. - protocol Drift gains optional `ambiguous` - flow-replay: pure `nearestIsAmbiguous` (≥2 candidates at min distance) sets it - heal.proposeRebindWith refuses when drift.ambiguous (regardless of confidence) Pairs with the M1 replay success oracle (which catches a wrong rebind at run time): this stops the wrong rebind from ever being proposed. V1-ROADMAP M5. 6 new tests; 517 server tests; all 16 tasks green uncached. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…grades, heal) Eight iterations shipped capabilities the human/agent docs didn't mention — features are invisible without docs. Surfaced across the three read surfaces: - token-efficiency.md: diffed snapshots section with the measured numbers (full 4,246 tok vs diff 60 tok, ~99% saved) + the cost-preview note. - agent-cheatsheet.md: "read iris_domain first" in Start-here, iris_domain in the core tool set, and a token note covering diff:true, cost, assertion grades, and ambiguous-heal refusal. - usage.md: iris_snapshot diff:true + cost documented; new iris_domain reference entry (flows + coverage + untested-intent gaps). Docs only; no code change. V1-ROADMAP M8 (discoverability slice). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… quality)
iris_domain now folds .iris/project.json run history into the model and returns
`riskRanked` (flow names worst-first) plus a per-flow `risk:{ level, reason,
lastStatus }`. Risk is the WORSE of two signals:
- run history: last run errored/drifted = high; passed-but-with-console/network
errors = medium; passed clean = low; never run = unknown.
- assertion quality: an assertion-free flow is medium EVEN when it passed —
a green run of a flow that asserts nothing is false confidence.
Test-priority order is high > medium > unknown (unvalidated) > low, so the agent
tests the genuinely riskiest journeys first instead of re-running known-good ones.
Synthesizes "domain-aware" + "rigorous": a flow that looks green but can't catch a
regression surfaces as risk. Pure flow-risk.ts; iris_domain loads the history.
V1-ROADMAP M4 (deepening).
13 new tests (flow-risk + domain); 528 server tests; all 16 tasks green uncached.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Completes the risk-ranking work: the one-line `summary` now names the flow to
test first ("test first: <name> (<reason>)") when run history flags a high/medium
risk flow — so the single most actionable fact (where the real regression risk
is) is in the headline, not buried in the arrays. Omitted when the top flow is
only low risk (no false alarm).
Pure; 2 new tests; 529 server tests; all 16 tasks green uncached.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…utputSchema
The handler returned bare flow-name strings while its declared outputSchema
promises {name, path, createdAt?} objects — so a schema-validating MCP client
rejected the result ("Expected object, received string"). Caught for real while
driving the live demo via MCP. The unit test had codified the wrong (string[])
shape, so it passed while the contract was violated — corrected too.
Now maps each name → {name, path: flowPath(irisRoot, name)}. 529 server tests;
all gates green.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…endation) Observed live driving the demo: a single login flooded iris_observe with 319 events / ~37KB because the dashboard's count-up animations emit a dom.text per frame — the agent only avoided the token hit because it knew to pass filters. Now costHint adds a `recommendation` when a timeline is large (>=80 events or >=8KB) telling the agent to pass filters:[...] or max_events next time. Surfaced in iris_observe's cost output schema. Backed by real session data; V1-ROADMAP M2. 3 new tests; 532 server tests; all 16 tasks green uncached. (No daemon restart — the live demo session is untouched.) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…rdown The network observer stored window.fetch.bind(window) and assigned that bound copy back on teardown, so window.fetch was never restored to its original identity — violating the "fully reversible" contract in types.ts. Keep the true original for restore; use a separate window-bound copy only for invocation (fetch throws "Illegal invocation" on the wrong `this`). Adds the first test coverage for installNetwork, pinning GET->500 emission (the case a live MCP session appeared to miss; the observer itself is sound, so that miss was a long-session/window artifact, not a code bug), POST method capture, reject-path status:0, and teardown identity. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…eardown Same bind-on-restore bug audited from the network observer fix: route.ts and console.ts stored bound copies (history.pushState.bind / console[m].bind) and assigned those back on teardown, so the globals were never restored to their original identity — breaking the "reversible" contract. Keep the true originals for restore; use bound copies only for invocation. Adds first-ever observer-level tests for both (emit + teardown identity). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…-based flakiness) The #1 flake cause in agent-driven UI testing is async-wait + concurrency handled with fixed sleeps. Adds a { kind: 'settled', quietMs } predicate: the page is settled when no network/DOM/animation activity has occurred for quietMs (default 500). "No activity in the last N ms" is relative to now, so the buffer clock is injected via PredicateSession.elapsed() (CLAUDE.md rule 7) and the wait loop's poll interval flips it to pass once activity stops. Usable today via iris_wait_for and iris_act_and_wait's `until` (PredicateSchema already gates both), composes inside allOf with the consequence you expect, and both tool descriptions now steer agents to it over raw sleeps. FlowReplaySession gains elapsed(); test fakes across server + the public @syrin/iris-test package updated to match. All 16 packages green, 537 server tests (+5 settled cases). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… (M3)
Makes "act then wait for the page to go quiet" a single zero-config call: omit
`until` on iris_act_and_wait and it now waits for the { kind: 'settled' }
predicate (network/DOM/animation idle) instead of requiring a predicate — the
documented alternative to a fixed sleep. Explicit predicates are unchanged;
to assert a consequence AND settle, allOf them. Tool + field descriptions
updated to steer agents here. Adds onEvent to the live-control test fake so the
wait loop is exercised. 538 server tests green (+1).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…5 — heal the locator, never the intent) iris_flow_heal apply:true rewrote a drifted testid to its nearest match and wrote it to disk WITHOUT checking that the healed flow still does what it was testing. A rebind can resolve to a real but wrong element (a look-alike control) that no longer triggers the flow's success signal/net — persisting that ships a green test that asserts nothing. The maintenance-tax research is explicit: locator-only healing covers <1/3 of failures and can ship bugs. Now, before writing, apply re-replays the healed flow in memory and re-asserts its success consequence. If the consequence no longer fires, the write is REFUSED (status:consequence_broken, file untouched) and the proposal is still surfaced for a human. Flows with no declared success heal but say so loudly (the rebind couldn't be verified — add a success-state assertion). Extracts the pure applyHealChanges (shared by FlowStore.heal's on-disk writer and the in-memory verification so both rewrite identically). 541 server tests green (+3: verified-heal, refused-broken-consequence, unverified-no-success). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
A broad query (by=role value=button) on a busy page returns every descriptor,
spending agent context on data it may not need. Adds two opt-in trims to the
result the AGENT sees:
- count_only: return just { count }, drop the element array (when you only need
"how many match?").
- limit: keep the first N descriptors; if more matched, carry total +
truncated:true so the trim is never silent — the agent narrows with name/scope
rather than assuming it saw everything.
Pure, result-shape-tolerant paginateQueryResult (non-{elements} results pass
through). 548 server tests green (+7).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Both tools returned every matching call/log uncapped and carried no size hint —
unlike iris_observe/iris_query/iris_snapshot. A buggy page spams the console and
a wide `since` window returns many calls, both spending agent context. Adds a
`limit` (most-recent-N via the already-tested applyEventBudget, reporting total +
droppedOldest so a trim is never silent) and a `cost:{bytes,tokens}` hint so the
agent can self-budget — consistent with the other read tools. 551 server tests
green (+3).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…w (M4) iris_domain told an agent that a flow asserts SOMETHING (boolean) but not WHAT its success depends on. The M4 bar is that the agent can answer "what are the critical flows and what must hold for each?" from Iris alone — the "what must hold" was missing. Each DomainFlowSummary now carries mustHold: the human label of the flow's success consequence (signal name / net URL), or undefined when the flow declares no success (then asserts:false — it tests nothing observable). 552 server tests green (+1). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…control, safe-heal, and domain features Records this session's user-facing changes under [Unreleased]: the settled predicate + act_and_wait auto-settle (M3), iris_query/network/console token controls + cost hints (M2), iris_domain mustHold (M4), consequence-verified flow_heal (M5), and the observer teardown-identity fix. (Also folds in a pre-existing cosmetic prettier reflow of the 0.5.0 section.) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…nsequence (rigor)
The success-oracle thesis ("green means the feature worked") was enforced for
saved flows (classifyFlowAssertions) but not for ad-hoc iris_assert. An agent
that asserts only { element }/{ text } presence gets a green verdict even though
a healed-but-wrong element or a stale render can satisfy it (Fowler, Assertion-
Free Testing; Dodds, Make Your Test Fail). iris_assert now attaches `advice` to a
PASSING presence-only assertion, steering toward a { signal }/{ net } consequence
(or an allOf with one). Never on a failing verdict or when a consequence is
already asserted. Pure isPresenceOnlyAssertion walks allOf/anyOf/not. 563 server
tests green (+11).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…rts, and read caps
Surfaces tonight's habits in the canonical quick-reference so they're discoverable
to agents and humans, not just buried in tool descriptions:
- Never sleep — iris_act_and_wait with no `until` settles; { kind:"settled" } waits
for network/DOM/animation idle; allOf a consequence with settled.
- Assert a consequence ({signal}/{net}) over presence ({element}/{text}); a passing
presence-only assert returns `advice`.
- Cap broad reads: iris_query limit/count_only, iris_network/iris_console limit + cost.
- flow_heal refuses to write a rebind that breaks the success consequence.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Self-review caught a real bug in tonight's settled predicate: it counted dom.text and animation events as activity, so any page with an ambient effect — a count-up counter, spinner, or pulsing dot — emits an event every frame and NEVER goes quiet, making act->settle time out at 4s on a healthy page. Observed live in iter 18: one login flooded 319 dom.text events from the dashboard's count-up animations. Same trap that deprecated Playwright's networkidle. SETTLE_ACTIVITY is now network + structural DOM (added/removed/attr) only — the real "app is doing work" signals. Ambient text/anim churn is ignored; for an animation-gated outcome, assert the specific consequence instead. Tool descriptions, cheatsheet, and CHANGELOG corrected to "network + DOM idle". 564 server tests green (+1 ambient-animation regression test). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ignal false pass) Self-review of the M5 heal verification found a real honesty gap: assertSuccess evaluated the success predicate against the WHOLE session buffer (no since floor). In flow_heal this is acute — one heal call does two replays (the pre-heal drift replay's prefix, then the verify replay), so a success signal emitted by the first could satisfy the verify even when the rebound step never fired it: a false-positive heal, the exact thing the feature exists to prevent. flow_replay had the same staleness across runs in one session. assertSuccess now takes a `since` floor (WaitForSignal gains the param, already honored by waitForPredicate); flow_replay and the heal verify each capture the cursor before their replay so only that replay's events count — the same since-floor honesty principle used by iris_assert/wait_for. 565 server tests green (+1 floor regression test); flow-replay test fake gains elapsed(). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Positively exercises tonight's additions end-to-end in a real browser (the existing battery only proves no-regression): iris_query count_only/limit, the settled predicate + act_and_wait auto-settle, and the iris_assert presence-only advice. The settled check is run against the demo's count-up animations on purpose — the exact ambient dom.text churn that the iter-29 fix must tolerate; it now settles (quietForMs ~307) instead of timing out. 7/7 live. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ontrols, domain mustHold Bumps all 10 publishable @syrin/iris-* packages 0.5.0 -> 0.6.10 and promotes the CHANGELOG [Unreleased] section to [0.6.10]. This is the live-verified feature release (NOT 1.0.0 — see plan/V1-ROADMAP.md for the 1.0 gate): full e2e battery 14/14 + a new-features spec 7/7 green on a real browser across the Vite/React demo, Next.js 15, and the api. Cleanup: removed a throwaway headless-launcher script (untracked) and its leaked browser process. No dead code to remove otherwise — noUnusedLocals + no-unused-vars are enforced and green; knip's "unused files" are all intentional dynamically-spawned e2e specs and standalone asset scripts. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Warning Review limit reached
More reviews will be available in 45 minutes and 26 seconds. Learn how PR review limits work. Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file). ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits. 🚦 How do rate limits work?CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan refill rate. For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, the refill rate gradually slows as usage increases. The highest same-day bursts are limited more strictly. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: ⛔ Files ignored due to path filters (1)
📒 Files selected for processing (2)
📝 WalkthroughWalkthroughThis PR adds new Iris tool behaviors, flow and domain analysis, ChangesCore release changes
Sequence Diagram(s)sequenceDiagram
participant CLI as iris init
participant Runner as runInit
participant Planner as buildPlan
participant Claude as claude mcp add
participant Cursor as ~/.cursor/mcp.json
participant Project as vite.config.ts
CLI->>Runner: init options
Runner->>Planner: gather inputs and build plan
Planner-->>Runner: MCP, install, framework steps
Runner->>Claude: register iris MCP when planned
Runner->>Cursor: merge iris server config when planned
Runner->>Project: patch Vite config when planned
Estimated code review effort🎯 5 (Critical) | ⏱️ ~105 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
…6j-xcff + 2 moderates) and add build+audit to pre-commit audit: vite-plugin's devDependency vite "^5" resolved to 5.4.21, flagged by 1 high (server.fs.deny bypass) + 2 moderate (optimized-deps path traversal, launch-editor NTLM disclosure) advisories (all vite <=6.4.2). Bumped to "^7" so it dedupes to the already-in-tree patched vite 7.3.5; peer stays >=4. `pnpm audit --audit-level high` now reports no known vulnerabilities; vite-plugin builds + 12 tests pass under vite 7. pre-commit: the hook was missing two CI steps. It now mirrors .github/workflows/ci.yml one-for-one — added `pnpm build` (step 2) and `pnpm audit --audit-level high` (step 7), so a green commit is a green CI run and a high+ vuln is caught before push, not in CI. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 15
Note
Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
docs/getting-started.md (1)
443-443:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winAlign the troubleshooting package name with Step 3.
This line points to
@syrin/iris-babel-plugin, while Step 3 instructs@syrin/iris/babel. Keeping both names in one guide is confusing and can misdirect setup.Suggested doc fix
-- Wire up `@syrin/iris-babel-plugin` (Step 3). Without it, only component identity is available. +- Wire up `@syrin/iris/babel` (Step 3). Without it, only component identity is available.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/getting-started.md` at line 443, The troubleshooting section mentions `@syrin/iris-babel-plugin` but Step 3 of the guide references `@syrin/iris/babel`. Update the package name reference at line 443 to use `@syrin/iris/babel` instead to match the package name specified in Step 3, ensuring consistency throughout the documentation and preventing setup confusion.
🟡 Minor comments (8)
packages/browser/src/observers/console.test.ts-27-36 (1)
27-36:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winAssert forwarding explicitly in the first test.
The test name says forwarding is verified, but only event emission is asserted. Add an assertion that the original
console.errorpath is invoked so this regression is actually covered.Suggested patch
it('emits CONSOLE_ERROR and still forwards to the original console', () => { + const errorSpy = vi.spyOn(console, 'error').mockImplementation(() => {}); const { emit, events } = collect(); teardown = installConsole(emit); console.error('boom', 42); expect(events).toHaveLength(1); expect(events[0]?.type).toBe(EventType.CONSOLE_ERROR); expect(events[0]?.data.message).toBe('boom 42'); + expect(errorSpy).toHaveBeenCalledWith('boom', 42); + errorSpy.mockRestore(); });🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/browser/src/observers/console.test.ts` around lines 27 - 36, The test verifies event emission but does not assert that the original console.error method is actually being forwarded. Add a spy or mock of the original console.error method before calling installConsole, then after the console.error call, add an assertion to verify that the original console.error was invoked with the correct arguments ('boom' and 42) to ensure the forwarding behavior is actually covered by the test.CHANGELOG.md-7-7 (1)
7-7:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winAdd the missing link definition for
0.6.10.The heading uses
[0.6.10]reference-link syntax, but there is no matching link definition at the bottom, so this version entry won’t hyperlink.📝 Proposed fix
+[0.6.10]: https://github.com/syrin-labs/iris/releases/tag/v0.6.10 [0.5.0]: https://github.com/syrin-labs/iris/releases/tag/v0.5.0 [0.4.0]: https://github.com/syrin-labs/iris/releases/tag/v0.4.0🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@CHANGELOG.md` at line 7, The heading for version 0.6.10 uses markdown reference-link syntax with [0.6.10], but there is no corresponding link definition at the bottom of the CHANGELOG.md file to complete the reference. Add a link definition at the end of the file following the pattern of other version links in the document to match the [0.6.10] reference in the heading and enable proper hyperlinking for this version entry.docs/usage.md-267-267 (1)
267-267:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winAdd
mustHold?to the documentediris_domainflow shape.Line 267 omits
mustHold?, butbuildDomainModelsets it when a flow hassuccess(seepackages/server/src/domain/domain-model.tsLine 117). The docs currently under-specify the returned payload.Doc fix
-- `iris_domain({})` → `{ flowCount, flows: [{ name, steps, grade, asserts, signals, testids, warning?, risk? }], declared: { testids, signals, stores }, coverage: { asserted, presenceOnly, assertionFree }, gaps: { unassertedFlows, declaredUntestedSignals, declaredUntestedTestids }, riskRanked, summary }` +- `iris_domain({})` → `{ flowCount, flows: [{ name, steps, grade, asserts, signals, testids, mustHold?, warning?, risk? }], declared: { testids, signals, stores }, coverage: { asserted, presenceOnly, assertionFree }, gaps: { unassertedFlows, declaredUntestedSignals, declaredUntestedTestids }, riskRanked, summary }`🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/usage.md` at line 267, The `iris_domain({})` return type documentation at line 267 of docs/usage.md is missing the `mustHold?` optional property from the flow object shape. Add `mustHold?` to the documented flow properties in the return type definition to match what `buildDomainModel` actually sets in the packages/server/src/domain/domain-model.ts file when a flow has a success condition.packages/server/src/flows/flows.heal.test.ts-85-93 (1)
85-93:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winMake the heal test session cursor-aware so stale-signal protection is actually exercised.
heal(..., apply: true)verifies success using asincefloor fromsession.elapsed(). Here,eventsSince()ignores the cursor andelapsed()is always0, so buffered signals can satisfy verification even when they should be considered stale.Suggested test-harness fix
- eventsSince(): IrisEvent[] { - return this.events; + eventsSince(cursor: number): IrisEvent[] { + return this.events.filter((event) => event.t >= cursor); } @@ elapsed(): number { - return 0; + return 1; } @@ function signalEvent(name: string): IrisEvent { - return { t: 0, type: EventType.SIGNAL, sessionId: 's', data: { name, data: {} } }; + return { t: 1, type: EventType.SIGNAL, sessionId: 's', data: { name, data: {} } }; }Also applies to: 96-99
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/server/src/flows/flows.heal.test.ts` around lines 85 - 93, The test mock session needs to be made cursor-aware to properly exercise stale-signal protection. Modify the mock session to track a cursor position and elapsed time: make eventsSince() accept and use a cursor parameter to filter events rather than returning all events, and make elapsed() return an actual tracked value (either incrementing or controllable) instead of always returning 0. This will ensure that heal(..., apply: true) verification with the since floor from session.elapsed() properly validates that buffered signals outside the time window are considered stale.packages/server/src/tools/snapshot-cost.test.ts-6-11 (1)
6-11:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winRemove design-doc identifiers from comments, test title, and log labels.
M0/M2markers are internal design-doc style identifiers and should not appear in comments/test strings.✏️ Suggested edit
- * M0 — a reproducible measurement of the M2 token wins, run over the REAL shipped functions + * Reproducible measurement of token savings, run over the real shipped functions @@ -describe('M0 — snapshot token cost (M2 wins, measured on real functions)', () => { +describe('Snapshot token cost regression (measured on real functions)', () => { @@ - `[M0] full re-snapshot=${String(fullTokens)} tok diff=${String(deltaTokens)} tok saved=${String(savedPct)}%`, + `[snapshot-cost] full re-snapshot=${String(fullTokens)} tok diff=${String(deltaTokens)} tok saved=${String(savedPct)}%`, @@ - console.log(`[M0] unchanged re-snapshot=${String(unchangedTokens)} tok`); + console.log(`[snapshot-cost] unchanged re-snapshot=${String(unchangedTokens)} tok`); @@ - `[M0] cost preview: ${String(preview.tokens)} tok / ${String(preview.bytes)} bytes`, + `[snapshot-cost] cost preview: ${String(preview.tokens)} tok / ${String(preview.bytes)} bytes`,As per coding guidelines, "Never include internal tracking tags in comments, file names, directory names, or test descriptions. Do not use design-doc reference codes (letter + digit patterns like N5, G4, M8, P2, F1, R1) or internal version strings."
Also applies to: 39-40, 59-60, 78-78, 87-88
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/server/src/tools/snapshot-cost.test.ts` around lines 6 - 11, Remove the internal design-doc identifiers M0 and M2 from the comment block in the snapshot-cost.test.ts file. Specifically, remove "M0 — a reproducible measurement of the" and replace it with a direct description, and remove "M2" references so the comment describes the measurement approach without internal tracking codes. Apply the same fix to all other locations in the file where these design-doc reference patterns appear (at lines 39-40, 59-60, 78, and 87-88), ensuring all letter-digit patterns like N5, G4, M8, etc. are removed from comments, test titles, and log labels throughout the file.Source: Coding guidelines
apps/e2e/specs/new-features-test.mjs-1-1 (1)
1-1:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winRemove the internal changelog tag from the file comment.
[Unreleased]in an inline comment violates the internal-tag prohibition.✏️ Suggested edit
-// Live verification of the features added in the [Unreleased] CHANGELOG section, against the real +// Live verification of the features added in the CHANGELOG section, against the realAs per coding guidelines, "Never include internal tracking tags in comments, file names, directory names, or test descriptions."
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@apps/e2e/specs/new-features-test.mjs` at line 1, The file comment at the start of new-features-test.mjs contains the internal changelog tag [Unreleased] which violates the coding guideline prohibiting internal tracking tags in comments. Remove the reference to [Unreleased] and the CHANGELOG section from the comment, keeping only the core description of what the file does for live feature verification.Source: Coding guidelines
packages/server/src/tools/assert-grade.ts-40-46 (1)
40-46:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winMixed predicates are incorrectly flagged as presence-only.
At Line 44-46 and Line 60,
allOf/anyOfwith{ kind: 'element' }plus a non-presence predicate like{ kind: 'settled' }still returns presence-only advice. That contradicts the contract in this file and causes false advice emission.Proposed fix
interface PredicateKinds { /** A signal/net leaf is present — proves the app did something a wrong element cannot fake. */ consequence: boolean; /** An element/text leaf is present — a weak presence check. */ presence: boolean; + /** Any non-presence, non-consequence leaf (route/console/animation/settled). */ + other: boolean; } function walk(predicate: Predicate): PredicateKinds { switch (predicate.kind) { case 'signal': case 'net': - return { consequence: true, presence: false }; + return { consequence: true, presence: false, other: false }; case 'element': case 'text': - return { consequence: false, presence: true }; + return { consequence: false, presence: true, other: false }; case 'route': case 'console': case 'animation': case 'settled': - // Observable but not the weak presence pattern we nudge — neither flags the advice. - return { consequence: false, presence: false }; + return { consequence: false, presence: false, other: true }; case 'allOf': case 'anyOf': { const subs = predicate.predicates.map(walk); return { consequence: subs.some((s) => s.consequence), presence: subs.some((s) => s.presence), + other: subs.some((s) => s.other), }; } case 'not': return walk(predicate.predicate); } } export function isPresenceOnlyAssertion(predicate: Predicate): boolean { const kinds = walk(predicate); - return kinds.presence && !kinds.consequence; + return kinds.presence && !kinds.consequence && !kinds.other; }Also applies to: 58-60
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/server/src/tools/assert-grade.ts` around lines 40 - 46, The walk function's handling of allOf and anyOf predicates incorrectly marks mixed predicates (containing both presence-only and non-presence predicates) as presence-only. In the return statement for these cases, the presence property is computed using subs.some() which returns true if ANY sub-predicate is presence-only; this should instead use subs.every() to return true only when ALL sub-predicates are presence-only. This ensures that a combined predicate is only flagged as presence-only when all of its constituent predicates are presence-only, fixing the false advice emission issue that contradicts the file's contract. Apply the same fix to both the allOf/anyOf case block starting around line 42 and the similar logic at line 58-60.packages/server/src/init/mcp.ts-42-45 (1)
42-45:⚠️ Potential issue | 🟡 MinorExtract CLI command strings as named constants.
The function uses inline strings (
'get','add','user','-s','--','--version') for CLI arguments. Per the coding guidelines, all domain/wire/UI strings must be named constants. Define constants for these command tokens (e.g.,const MCP_GET_SUBCOMMAND = 'get',const MCP_ADD_SUBCOMMAND = 'add',const SCOPE_USER = 'user', etc.) and reference them instead.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/server/src/init/mcp.ts` around lines 42 - 45, The claudeExistsProbe() function and other functions in this file use inline string literals for CLI command arguments (such as 'get', 'add', 'user', '-s', '--', '--version') instead of named constants. Define named constants at the top of the file for each of these CLI command tokens (e.g., MCP_GET_SUBCOMMAND, MCP_ADD_SUBCOMMAND, SCOPE_USER, etc.) following the same naming pattern as the existing CLAUDE_CLI, MCP_SUBCOMMAND, and MCP_SERVER_NAME constants, then replace all inline string occurrences throughout the file with references to these new constants.
🧹 Nitpick comments (10)
packages/server/src/flows/flow-success.ts (1)
30-75: ⚡ Quick winExtract domain/wire string literals into named constants.
This segment introduces multiple free domain/wire literals (
'signal','net','element','allOf','testid','role','name','net','success'). Please promote them to named constants to match repository rules.Proposed refactor
+const PredicateKind = { + signal: 'signal', + net: 'net', + element: 'element', + allOf: 'allOf', +} as const; + +const ElementQueryKey = { + testid: 'testid', + role: 'role', + name: 'name', +} as const; + +const SuccessLabelFallback = { + net: 'net', + success: 'success', +} as const; + export function successLabel(success: FlowExpect): string { if (success.signal !== undefined) return success.signal; - if (success.net !== undefined) return success.net.urlContains ?? success.net.method ?? 'net'; - return success.element?.testid ?? success.element?.name ?? success.element?.role ?? 'success'; + if (success.net !== undefined) + return success.net.urlContains ?? success.net.method ?? SuccessLabelFallback.net; + return ( + success.element?.testid ?? + success.element?.name ?? + success.element?.role ?? + SuccessLabelFallback.success + ); } ... - const query: Record<string, string> = {}; - if (testid !== undefined) query['testid'] = testid; - if (element.role !== undefined) query['role'] = element.role; - if (element.name !== undefined) query['name'] = element.name; - if (Object.keys(query).length > 0) parts.push({ kind: 'element', query }); + const query: Record<string, string> = {}; + if (testid !== undefined) query[ElementQueryKey.testid] = testid; + if (element.role !== undefined) query[ElementQueryKey.role] = element.role; + if (element.name !== undefined) query[ElementQueryKey.name] = element.name; + if (Object.keys(query).length > 0) parts.push({ kind: PredicateKind.element, query });As per coding guidelines, “All domain/wire/UI strings must be named constants, never free strings in code.”
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/server/src/flows/flow-success.ts` around lines 30 - 75, Extract all hardcoded string literals from the successLabel and successToPredicate functions into named constants at the module level. Replace all occurrences of 'signal', 'net', 'element', 'allOf', 'testid', 'role', 'name', and 'success' string literals with these named constants throughout both functions to comply with the domain/wire string literal guidelines.Source: Coding guidelines
packages/server/src/domain/domain-model.test.ts (1)
59-67: ⚡ Quick winAdd a regression test for “anchor signal only” coverage.
Please add a case where a flow references a signal anchor but does not assert
expect.signal/success.signal, and verifydeclaredUntestedSignalsstill flags it. That protects the stated “asserted signal” contract.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/server/src/domain/domain-model.test.ts` around lines 59 - 67, Add a new test case in the test file after the existing test that verifies when a flow defines a signal anchor (via the signal property in flow()) but does not assert it with expect.signal or success.signal, the signal is still properly flagged in declaredUntestedSignals. Use buildDomainModel, flow, and contract functions similar to the existing test pattern to construct a scenario where a flow references a signal without asserting it, then verify that signal appears in the gaps.declaredUntestedSignals array to protect the stated asserted signal contract.packages/server/src/domain/flow-risk.test.ts (1)
29-52: ⚡ Quick winAdd a
PRESENCE_ONLYrisk test case.Current coverage skips the
FlowAssertionGrade.PRESENCE_ONLYbranch, so risk-priority regressions there won’t be caught.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/server/src/domain/flow-risk.test.ts` around lines 29 - 52, The test suite for flowRisk is missing coverage for the FlowAssertionGrade.PRESENCE_ONLY case, which could allow regressions to go undetected. Add a new test case in the describe block that exercises the PRESENCE_ONLY grade by calling flowRisk with FlowAssertionGrade.PRESENCE_ONLY as the first argument and a run result as the second argument (similar to the existing test cases like the assertion-free flow test), then assert the expected RiskLevel that should be returned for this grade to ensure the branch is properly tested and documented.packages/server/src/events/predicate.ts (1)
363-373: ⚡ Quick winExtract newly introduced wire strings to named constants.
Line 363 and Line 372 introduce new inline domain/wire strings (
noteandfailureReason). Please hoist these to named constants to match the repository rule and keep message contracts centralized.As per coding guidelines, "All domain/wire/UI strings must be named constants, never free strings in code."
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/server/src/events/predicate.ts` around lines 363 - 373, The inline strings `'no activity to settle'` (used as the note value) and the failureReason string template (used in the failureReason property) need to be extracted to named constants following the repository's coding guidelines. Define these strings as named constants at the module level or in a centralized constants location, then replace the inline string literals with references to those constants. Ensure the failureReason constant properly handles the dynamic values by using a template string with the placeholders for lastType, quietForMs, and quietMs.Source: Coding guidelines
packages/server/src/events/predicate.test.ts (1)
228-311: ⚡ Quick winConsolidate repeated predicate/event literals into named test constants.
The new settled suite adds many inline domain/wire strings (e.g., predicate kinds, signal names, URLs, messages). Please move these to file-level constants for compliance and easier fixture reuse.
As per coding guidelines, "All domain/wire/UI strings must be named constants, never free strings in code."
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/server/src/events/predicate.test.ts` around lines 228 - 311, Extract all inline domain/wire/UI strings from the settled predicate test suite (the describe block starting at line 228) into file-level named constants. This includes predicate kinds like 'settled', 'signal', and 'allOf', signal names like 'deploy:shipped' and 'x', URLs like '/api/x' and '/api/deploy', event property names and values, and failure reason strings like 'not settled'. Replace all occurrences of these inline literals throughout the test cases with references to the corresponding named constants to ensure compliance with the coding guideline that all domain/wire/UI strings must be named constants.Source: Coding guidelines
packages/server/src/tools/tools.near-miss.test.ts (1)
79-127: ⚡ Quick winExtract newly added event/tool literals to shared constants.
The new token-budget tests add inline domain/wire strings (URLs, messages, tool-facing literals). Please promote them to named constants for compliance and consistency with other fixtures.
As per coding guidelines, "All domain/wire/UI strings must be named constants, never free strings in code."
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/server/src/tools/tools.near-miss.test.ts` around lines 79 - 127, The token budget tests in the describe block contain hardcoded string literals for URLs, console messages, and log levels that should be extracted to named constants. Identify all inline strings in the test cases including the URLs '/1', '/2', '/3' used in iris_network tests, the messages 'a', 'b', 'c' used in iris_console tests, and the level string 'error' used in the iris_console test. Create named constants for each of these values and replace the inline strings throughout the test methods (in the ev() calls, handler calls, and expect assertions) with references to these constants to ensure compliance with the coding guideline that all domain/wire strings must be named constants.Source: Coding guidelines
packages/server/src/tools/assert-grade.test.ts (1)
5-60: ⚡ Quick winApply the string-constant rule in this new test suite.
This file introduces many inline domain/wire strings in predicates and assertions. Please define named constants and reuse them across cases.
As per coding guidelines, "All domain/wire/UI strings must be named constants, never free strings in code."
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/server/src/tools/assert-grade.test.ts` around lines 5 - 60, The test suite in isPresenceOnlyAssertion contains many inline string literals for predicate kinds (element, text, signal, net, allOf, not, route, settled, console) and test data values (button, Saved, order:placed, /api/order, Done, Welcome, /success, error, etc.). Define named constants at the top of the test file for all these domain/wire/UI strings and replace each inline string literal throughout all test cases in the describe block with references to the corresponding constant. This ensures consistency and makes the strings reusable across the test suite.Source: Coding guidelines
packages/server/src/session/output-budget.ts (1)
46-46: ⚡ Quick winExtract the recommendation message into a named constant.
This introduces a free domain/wire string inline in a
.tsfile; move it to a named constant and format it from parameters.♻️ Suggested change
+const LARGE_TIMELINE_RECOMMENDATION_TEMPLATE = + 'large timeline ({events} events, ~{tokens} tokens) — pass filters:[...] (e.g. ["signal","net"]) or max_events to scope your next call and cut tokens'; + export function costHint(payload: unknown, events: number, droppedOldest = 0): CostHint { const json = JSON.stringify(payload) ?? ''; const bytes = json.length; const base: CostHint = droppedOldest > 0 ? { events, bytes, droppedOldest } : { events, bytes }; if (events >= LARGE_TIMELINE_EVENTS || bytes >= LARGE_TIMELINE_BYTES) { - base.recommendation = `large timeline (${String(events)} events, ~${String(estimateTokens(json))} tokens) — pass filters:[...] (e.g. ["signal","net"]) or max_events to scope your next call and cut tokens`; + base.recommendation = LARGE_TIMELINE_RECOMMENDATION_TEMPLATE + .replace('{events}', String(events)) + .replace('{tokens}', String(estimateTokens(json))); } return base; }As per coding guidelines, "All domain/wire/UI strings must be named constants, never free strings in code."
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/server/src/session/output-budget.ts` at line 46, Extract the hardcoded recommendation message string currently assigned to base.recommendation into a named constant at the module level. The constant should contain the template with placeholders for the dynamic values (events count and estimated tokens), and then reference this constant in the assignment while still providing the formatted parameters. This ensures the domain/wire string is not a free string in the code but follows the guideline of being a named constant.Source: Coding guidelines
packages/server/src/init/run.test.ts (1)
61-67: ⚡ Quick winAdd a regression test for malformed
package.json.The suite checks the missing-file case but not invalid JSON content. A targeted test here will prevent reintroducing the crash path.
🧪 Suggested test case
describe('runInit', () => { it('errors cleanly without a package.json', () => { const io = memoryIo({}); const r = runInit(OPTS, io); expect(r.ok).toBe(false); expect(io.lines.join('\n')).toContain('No package.json'); }); + it('errors cleanly with an invalid package.json', () => { + const io = memoryIo({ 'package.json': '{ "name": "bad-json",' }); + const r = runInit(OPTS, io); + expect(r.ok).toBe(false); + expect(io.lines.join('\n')).toContain('Invalid package.json'); + }); + it('registers iris globally via the claude CLI (not a project .mcp.json) and patches vite', () => {🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/server/src/init/run.test.ts` around lines 61 - 67, Add a new regression test case in the same test suite after the existing "errors cleanly without a package.json" test. Create a test that verifies the runInit function handles malformed JSON content gracefully. Use memoryIo to provide a package.json file with invalid JSON syntax (such as incomplete or syntactically broken JSON), call runInit with OPTS and the io object, and verify that the result has ok set to false and that io.lines contains an appropriate error message indicating the JSON parsing failure. This test will prevent crashes when package.json exists but contains invalid JSON.packages/server/src/init/vite-config.test.ts (1)
30-34: ⚡ Quick winAdd a regression case for “import exists, plugin missing”.
Current idempotency coverage only verifies fully patched input. Add a test where the config already imports
@syrin/iris/vitebutpluginslacksiris(); expected result should still beAPPLY.Suggested test
+ it('does not treat import-only configs as already patched', () => { + const src = `import { defineConfig } from 'vite'; +import { iris } from '`@syrin/iris/vite`'; +import react from '`@vitejs/plugin-react`'; + +export default defineConfig({ + plugins: [react()], +}); +`; + const r = patchViteConfig(src); + expect(r.kind).toBe(VitePatchKind.APPLY); + });Also applies to: 56-61
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/server/src/init/vite-config.test.ts` around lines 30 - 34, Add a new test case after the existing idempotency test in the test file to cover the regression scenario where a Vite config already has the import statement for `@syrin/iris/vite` but is missing the iris() plugin in the plugins array. Create a test config string that includes the import line but lacks the iris() function call in the plugins configuration, then call patchViteConfig on this config and assert that the returned kind is VitePatchKind.APPLY, confirming that the function correctly identifies and applies the patch for this incomplete state.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@apps/e2e/specs/new-features-test.mjs`:
- Around line 54-91: The test code lacks proper error handling and resource
cleanup. If any await statement throws an error before reaching b.close() and
server.close(), resources will leak and hang the CI. Wrap all the test logic
(from chromium.launch through the final chk call) in a try block, and move the
b.close() and server.close() calls into a finally block that runs regardless of
success or failure. Keep the process.exit call after the finally block to ensure
it only executes after cleanup is complete.
In `@packages/server/src/domain/domain-model.ts`:
- Around line 174-190: Extract all hardcoded strings in the coverage status
message building code into named constants at the module or class level. This
includes the "No saved flows yet" message, the template strings used for
constructing the parts array such as "flow"/"flows", "test first: ", "declared
signal(s) no flow asserts (", and "flow(s) assert no consequence". Replace each
free string in the code with a reference to its corresponding named constant to
ensure consistency and maintainability of domain strings throughout the
codebase.
- Around line 77-85: The flowSignals function incorrectly includes anchor
signals in addition to asserted signals, which causes flows that only reference
a signal anchor to be treated as tested even when they don't actually assert or
expect that signal. Remove the line that adds anchor signals to the set (the
condition checking if step.anchor.kind === AnchorKind.SIGNAL) so that only
signals that are explicitly asserted through expect.signal or success.signal are
counted as covered by the flow.
In `@packages/server/src/domain/domain-tools.ts`:
- Around line 23-37: The outputSchema.flows object definition is missing the
optional risk field that is actually returned by the handler's buildDomainModel
function call, causing a schema contract mismatch for consumers. Add an optional
risk field to the Zod schema object within the flows definition (around the
z.object block), using the same pattern as the other optional fields like
mustHold and warning, declaring it as z.string().optional() or with an
appropriate type based on what buildDomainModel returns.
- Around line 17-19: Extract all free string literals from the tool definition
(including the long description string in the description field, and any strings
at the locations mentioned in lines 31-33 and 57-58) into named constants at the
top of the file, then replace each free string with a reference to its
corresponding constant. This ensures all domain/wire/UI strings follow the
repository policy of being named constants rather than inline literals.
In `@packages/server/src/domain/flow-risk.ts`:
- Around line 64-75: The gradeRisk function currently maps
FlowAssertionGrade.PRESENCE_ONLY to RiskLevel.LOW, which incorrectly ranks it as
equivalent to consequence-asserting flows. Instead, change the return value for
the PRESENCE_ONLY condition to return a higher risk level (such as
RiskLevel.MEDIUM) to properly reflect that presence-only assertions represent a
weaker oracle and should be prioritized higher in triage according to the
module's risk prioritization strategy.
In `@packages/server/src/flows/flow-tools.ts`:
- Around line 109-117: The assertions schema object in the outputSchema (which
contains fields like grade, hasConsequenceAssertion, totalSteps,
consequenceSteps, weakSteps, and warning) is missing the successIsConsequence
field that is actually returned by the classifyFlowAssertions function on line
143. Add the successIsConsequence field to the z.object() schema definition with
an appropriate zod type (likely z.boolean()) to match the payload being returned
and prevent strict schema consumers from rejecting the response.
- Around line 246-247: The condition for `stepsClean` includes `steps.length >
0`, which prevents success verification when a flow has zero replayed steps.
This allows flows with declared `success` values but no steps to bypass the
`assertSuccess` check. Remove the `steps.length > 0` requirement from the
`stepsClean` variable definition so that success verification occurs regardless
of whether there are any steps, ensuring that any declared success consequence
is properly validated. Apply this same fix to the similar pattern at lines
483-484.
In `@packages/server/src/init/cursor.ts`:
- Around line 39-44: The parseConfig function currently returns a success result
with an empty config object when encountering non-object JSON roots or null
values, which incorrectly allows automatic rewrites that could overwrite
existing configuration. Modify the parseConfig function to return a result
indicating MANUAL handling mode instead of ok: true when the parsed JSON root is
not an object or is null. Additionally, apply the same MANUAL handling approach
for invalid mcpServers shapes in the configuration validation logic (in the
range around lines 54-67) to prevent unintended rewrites of existing
configuration content.
In `@packages/server/src/init/mcp.ts`:
- Around line 38-50: Extract the inline CLI command tokens ('add', 'user',
'get', '--version') into named constants at the module level, following the
no-free-strings rule. Define constants for each token (e.g., similar to how
CLAUDE_CLI, MCP_SUBCOMMAND, and MCP_SERVER_NAME are defined), then replace all
occurrences of these hardcoded strings in the command builder function and the
claudeExistsProbe and claudeAvailableProbe functions with references to the
newly created constants. This ensures all domain/wire strings are captured as
named constants throughout the module.
In `@packages/server/src/init/plan.ts`:
- Around line 68-255: Extract the inline string literals for step titles and
details throughout the file into named constants at the top, following the
pattern of CLAUDE_MCP_TITLE and CURSOR_MCP_TITLE. This applies to strings in the
installStep, viteSteps, nextSteps, and buildPlan functions including titles like
"Install dependency", "Vite plugin", "IrisDev component", "Next config
(withIris)", "Mount IrisDev", "Connect snippet", and their corresponding detail
messages. Replace each inline string literal with the appropriate constant
reference to ensure all domain/UI strings are centralized and follow the
repository's coding guidelines.
In `@packages/server/src/init/run.ts`:
- Around line 70-71: The JSON.parse call in the gatherPlanInput function at line
71 can throw an exception if the pkgRaw string contains malformed JSON, causing
the CLI to crash. Wrap the JSON.parse(pkgRaw) call in a try-catch block to
handle the parse error gracefully, and either throw a custom error with a
user-friendly message or return an error state that the caller can handle.
Similarly, check the related code around lines 157-166 where similar JSON
parsing may occur and apply the same error handling pattern to ensure all JSON
parsing in the runInit path handles malformed input without crashing.
In `@packages/server/src/init/vite-config.ts`:
- Around line 49-57: The ALREADY detection in patchViteConfig function is too
broad because checking only if the source includes IRIS_MARKER can match
occurrences in comments or imports without verifying the iris plugin is actually
in the plugins array. Replace the simple source.includes(IRIS_MARKER) check with
a more robust validation that confirms the iris() plugin call is present in the
plugins array, similar to how PLUGINS_ARRAY.test(source) validates the plugins
array structure. This ensures ALREADY is only returned when the plugin injection
is truly complete.
In `@packages/server/src/tools/tools.ts`:
- Line 1299: The tools.ts file exceeds the repository's 500-line file limit at
1299 lines. Refactor by extracting related tool definition groups (such as
snapshot tools, query tools, assert tools, and network-console tools) into
separate dedicated modules. Then import these extracted tool groups back into
tools.ts and compose them together in the DOMAIN_TOOLS spread operator to
maintain the same public API while keeping each file within the 500-line limit.
- Around line 339-356: The snapshot cache key generation in the handler function
uses a fallback value of 'default' when sessionId is not provided, causing
different sessions without an explicit sessionId to share the same cache entry
in the process-wide SNAPSHOT_CACHE. This creates privacy and correctness issues
as unrelated sessions can get deltas computed from another session's prior
snapshot. Remove the fallback to 'default' in the sessionId field passed to
applySnapshotDelta and ensure each session has a uniquely identifiable sessionId
rather than silently defaulting multiple sessions to the same cache key value.
---
Outside diff comments:
In `@docs/getting-started.md`:
- Line 443: The troubleshooting section mentions `@syrin/iris-babel-plugin` but
Step 3 of the guide references `@syrin/iris/babel`. Update the package name
reference at line 443 to use `@syrin/iris/babel` instead to match the package
name specified in Step 3, ensuring consistency throughout the documentation and
preventing setup confusion.
---
Minor comments:
In `@apps/e2e/specs/new-features-test.mjs`:
- Line 1: The file comment at the start of new-features-test.mjs contains the
internal changelog tag [Unreleased] which violates the coding guideline
prohibiting internal tracking tags in comments. Remove the reference to
[Unreleased] and the CHANGELOG section from the comment, keeping only the core
description of what the file does for live feature verification.
In `@CHANGELOG.md`:
- Line 7: The heading for version 0.6.10 uses markdown reference-link syntax
with [0.6.10], but there is no corresponding link definition at the bottom of
the CHANGELOG.md file to complete the reference. Add a link definition at the
end of the file following the pattern of other version links in the document to
match the [0.6.10] reference in the heading and enable proper hyperlinking for
this version entry.
In `@docs/usage.md`:
- Line 267: The `iris_domain({})` return type documentation at line 267 of
docs/usage.md is missing the `mustHold?` optional property from the flow object
shape. Add `mustHold?` to the documented flow properties in the return type
definition to match what `buildDomainModel` actually sets in the
packages/server/src/domain/domain-model.ts file when a flow has a success
condition.
In `@packages/browser/src/observers/console.test.ts`:
- Around line 27-36: The test verifies event emission but does not assert that
the original console.error method is actually being forwarded. Add a spy or mock
of the original console.error method before calling installConsole, then after
the console.error call, add an assertion to verify that the original
console.error was invoked with the correct arguments ('boom' and 42) to ensure
the forwarding behavior is actually covered by the test.
In `@packages/server/src/flows/flows.heal.test.ts`:
- Around line 85-93: The test mock session needs to be made cursor-aware to
properly exercise stale-signal protection. Modify the mock session to track a
cursor position and elapsed time: make eventsSince() accept and use a cursor
parameter to filter events rather than returning all events, and make elapsed()
return an actual tracked value (either incrementing or controllable) instead of
always returning 0. This will ensure that heal(..., apply: true) verification
with the since floor from session.elapsed() properly validates that buffered
signals outside the time window are considered stale.
In `@packages/server/src/init/mcp.ts`:
- Around line 42-45: The claudeExistsProbe() function and other functions in
this file use inline string literals for CLI command arguments (such as 'get',
'add', 'user', '-s', '--', '--version') instead of named constants. Define named
constants at the top of the file for each of these CLI command tokens (e.g.,
MCP_GET_SUBCOMMAND, MCP_ADD_SUBCOMMAND, SCOPE_USER, etc.) following the same
naming pattern as the existing CLAUDE_CLI, MCP_SUBCOMMAND, and MCP_SERVER_NAME
constants, then replace all inline string occurrences throughout the file with
references to these new constants.
In `@packages/server/src/tools/assert-grade.ts`:
- Around line 40-46: The walk function's handling of allOf and anyOf predicates
incorrectly marks mixed predicates (containing both presence-only and
non-presence predicates) as presence-only. In the return statement for these
cases, the presence property is computed using subs.some() which returns true if
ANY sub-predicate is presence-only; this should instead use subs.every() to
return true only when ALL sub-predicates are presence-only. This ensures that a
combined predicate is only flagged as presence-only when all of its constituent
predicates are presence-only, fixing the false advice emission issue that
contradicts the file's contract. Apply the same fix to both the allOf/anyOf case
block starting around line 42 and the similar logic at line 58-60.
In `@packages/server/src/tools/snapshot-cost.test.ts`:
- Around line 6-11: Remove the internal design-doc identifiers M0 and M2 from
the comment block in the snapshot-cost.test.ts file. Specifically, remove "M0 —
a reproducible measurement of the" and replace it with a direct description, and
remove "M2" references so the comment describes the measurement approach without
internal tracking codes. Apply the same fix to all other locations in the file
where these design-doc reference patterns appear (at lines 39-40, 59-60, 78, and
87-88), ensuring all letter-digit patterns like N5, G4, M8, etc. are removed
from comments, test titles, and log labels throughout the file.
---
Nitpick comments:
In `@packages/server/src/domain/domain-model.test.ts`:
- Around line 59-67: Add a new test case in the test file after the existing
test that verifies when a flow defines a signal anchor (via the signal property
in flow()) but does not assert it with expect.signal or success.signal, the
signal is still properly flagged in declaredUntestedSignals. Use
buildDomainModel, flow, and contract functions similar to the existing test
pattern to construct a scenario where a flow references a signal without
asserting it, then verify that signal appears in the
gaps.declaredUntestedSignals array to protect the stated asserted signal
contract.
In `@packages/server/src/domain/flow-risk.test.ts`:
- Around line 29-52: The test suite for flowRisk is missing coverage for the
FlowAssertionGrade.PRESENCE_ONLY case, which could allow regressions to go
undetected. Add a new test case in the describe block that exercises the
PRESENCE_ONLY grade by calling flowRisk with FlowAssertionGrade.PRESENCE_ONLY as
the first argument and a run result as the second argument (similar to the
existing test cases like the assertion-free flow test), then assert the expected
RiskLevel that should be returned for this grade to ensure the branch is
properly tested and documented.
In `@packages/server/src/events/predicate.test.ts`:
- Around line 228-311: Extract all inline domain/wire/UI strings from the
settled predicate test suite (the describe block starting at line 228) into
file-level named constants. This includes predicate kinds like 'settled',
'signal', and 'allOf', signal names like 'deploy:shipped' and 'x', URLs like
'/api/x' and '/api/deploy', event property names and values, and failure reason
strings like 'not settled'. Replace all occurrences of these inline literals
throughout the test cases with references to the corresponding named constants
to ensure compliance with the coding guideline that all domain/wire/UI strings
must be named constants.
In `@packages/server/src/events/predicate.ts`:
- Around line 363-373: The inline strings `'no activity to settle'` (used as the
note value) and the failureReason string template (used in the failureReason
property) need to be extracted to named constants following the repository's
coding guidelines. Define these strings as named constants at the module level
or in a centralized constants location, then replace the inline string literals
with references to those constants. Ensure the failureReason constant properly
handles the dynamic values by using a template string with the placeholders for
lastType, quietForMs, and quietMs.
In `@packages/server/src/flows/flow-success.ts`:
- Around line 30-75: Extract all hardcoded string literals from the successLabel
and successToPredicate functions into named constants at the module level.
Replace all occurrences of 'signal', 'net', 'element', 'allOf', 'testid',
'role', 'name', and 'success' string literals with these named constants
throughout both functions to comply with the domain/wire string literal
guidelines.
In `@packages/server/src/init/run.test.ts`:
- Around line 61-67: Add a new regression test case in the same test suite after
the existing "errors cleanly without a package.json" test. Create a test that
verifies the runInit function handles malformed JSON content gracefully. Use
memoryIo to provide a package.json file with invalid JSON syntax (such as
incomplete or syntactically broken JSON), call runInit with OPTS and the io
object, and verify that the result has ok set to false and that io.lines
contains an appropriate error message indicating the JSON parsing failure. This
test will prevent crashes when package.json exists but contains invalid JSON.
In `@packages/server/src/init/vite-config.test.ts`:
- Around line 30-34: Add a new test case after the existing idempotency test in
the test file to cover the regression scenario where a Vite config already has
the import statement for `@syrin/iris/vite` but is missing the iris() plugin in
the plugins array. Create a test config string that includes the import line but
lacks the iris() function call in the plugins configuration, then call
patchViteConfig on this config and assert that the returned kind is
VitePatchKind.APPLY, confirming that the function correctly identifies and
applies the patch for this incomplete state.
In `@packages/server/src/session/output-budget.ts`:
- Line 46: Extract the hardcoded recommendation message string currently
assigned to base.recommendation into a named constant at the module level. The
constant should contain the template with placeholders for the dynamic values
(events count and estimated tokens), and then reference this constant in the
assignment while still providing the formatted parameters. This ensures the
domain/wire string is not a free string in the code but follows the guideline of
being a named constant.
In `@packages/server/src/tools/assert-grade.test.ts`:
- Around line 5-60: The test suite in isPresenceOnlyAssertion contains many
inline string literals for predicate kinds (element, text, signal, net, allOf,
not, route, settled, console) and test data values (button, Saved, order:placed,
/api/order, Done, Welcome, /success, error, etc.). Define named constants at the
top of the test file for all these domain/wire/UI strings and replace each
inline string literal throughout all test cases in the describe block with
references to the corresponding constant. This ensures consistency and makes the
strings reusable across the test suite.
In `@packages/server/src/tools/tools.near-miss.test.ts`:
- Around line 79-127: The token budget tests in the describe block contain
hardcoded string literals for URLs, console messages, and log levels that should
be extracted to named constants. Identify all inline strings in the test cases
including the URLs '/1', '/2', '/3' used in iris_network tests, the messages
'a', 'b', 'c' used in iris_console tests, and the level string 'error' used in
the iris_console test. Create named constants for each of these values and
replace the inline strings throughout the test methods (in the ev() calls,
handler calls, and expect assertions) with references to these constants to
ensure compliance with the coding guideline that all domain/wire strings must be
named constants.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 47fd35ff-a94f-48c2-880e-ff8bf8781c13
⛔ Files ignored due to path filters (1)
pnpm-lock.yamlis excluded by!**/pnpm-lock.yaml
📒 Files selected for processing (89)
CHANGELOG.mdapps/e2e/specs/new-features-test.mjsdocs/agent-cheatsheet.mddocs/getting-started.mddocs/integrate-with-claude-code.mddocs/token-efficiency.mddocs/usage.mdpackages/babel-plugin/package.jsonpackages/browser/package.jsonpackages/browser/src/observers/console.test.tspackages/browser/src/observers/console.tspackages/browser/src/observers/network.test.tspackages/browser/src/observers/network.tspackages/browser/src/observers/route.test.tspackages/browser/src/observers/route.tspackages/eslint-plugin/package.jsonpackages/iris/package.jsonpackages/iris/src/vite.tspackages/iris/tsup.config.tspackages/next/package.jsonpackages/protocol/package.jsonpackages/protocol/src/constants.tspackages/protocol/src/types.tspackages/react/package.jsonpackages/server/package.jsonpackages/server/src/cli.test.tspackages/server/src/cli.tspackages/server/src/domain/domain-model.test.tspackages/server/src/domain/domain-model.tspackages/server/src/domain/domain-tools.tspackages/server/src/domain/flow-risk.test.tspackages/server/src/domain/flow-risk.tspackages/server/src/events/predicate.test.tspackages/server/src/events/predicate.tspackages/server/src/flows/flow-classify.test.tspackages/server/src/flows/flow-classify.tspackages/server/src/flows/flow-replay.dynamic.test.tspackages/server/src/flows/flow-replay.test.tspackages/server/src/flows/flow-replay.tspackages/server/src/flows/flow-success.test.tspackages/server/src/flows/flow-success.tspackages/server/src/flows/flow-tools.tspackages/server/src/flows/flows.heal.test.tspackages/server/src/flows/flows.tspackages/server/src/flows/heal-proposal.test.tspackages/server/src/flows/heal.tspackages/server/src/flows/tools.flow-replay.test.tspackages/server/src/flows/tools.flows.test.tspackages/server/src/index.tspackages/server/src/init/cursor.test.tspackages/server/src/init/cursor.tspackages/server/src/init/detect.test.tspackages/server/src/init/detect.tspackages/server/src/init/mcp.test.tspackages/server/src/init/mcp.tspackages/server/src/init/node-io.tspackages/server/src/init/plan.test.tspackages/server/src/init/plan.tspackages/server/src/init/run.test.tspackages/server/src/init/run.tspackages/server/src/init/snippets.tspackages/server/src/init/vite-config.test.tspackages/server/src/init/vite-config.tspackages/server/src/session/output-budget.test.tspackages/server/src/session/output-budget.tspackages/server/src/session/tools.live-control.test.tspackages/server/src/tools/assert-grade.test.tspackages/server/src/tools/assert-grade.tspackages/server/src/tools/query-paginate.test.tspackages/server/src/tools/query-paginate.tspackages/server/src/tools/snapshot-cost.test.tspackages/server/src/tools/snapshot-delta.test.tspackages/server/src/tools/snapshot-delta.tspackages/server/src/tools/tool-names.tspackages/server/src/tools/tools.assert-advice.test.tspackages/server/src/tools/tools.near-miss.test.tspackages/server/src/tools/tools.tspackages/test/package.jsonpackages/test/src/flow-spec.test.tspackages/test/src/register.test.tspackages/test/src/success-assert.test.tspackages/test/src/success-assert.tspackages/vite-plugin/README.mdpackages/vite-plugin/package.jsonpackages/vite-plugin/src/build.integration.test.tspackages/vite-plugin/src/index.test.tspackages/vite-plugin/src/index.tspackages/vite-plugin/tsconfig.jsontsconfig.json
| const b = await chromium.launch({ headless: true }); | ||
| const p = await b.newPage(); | ||
| await p.goto('http://localhost:4310/?session=demo', { waitUntil: 'networkidle' }); | ||
| for (let i = 0; i < 200 && server.bridge.sessions.count() === 0; i++) await sleep(50); | ||
|
|
||
| console.log('\n=== Iris × new features (:4310) ==='); | ||
| chk('dashboard SDK connected', server.bridge.sessions.count() > 0); | ||
|
|
||
| // count_only — just the match count, no descriptors. | ||
| const co = await T('iris_query', { by: 'role', value: 'button', count_only: true }); | ||
| chk('iris_query count_only returns a count, drops elements', typeof co.count === 'number' && co.count >= 1 && co.elements === undefined, `count=${co.count}`); | ||
|
|
||
| // limit — cap descriptors; when more matched, total + truncated flag it. | ||
| const lim = await T('iris_query', { by: 'role', value: 'button', limit: 1 }); | ||
| const moreThanOne = (co.count ?? 0) > 1; | ||
| chk('iris_query limit caps descriptors (truncated when more)', (lim.elements?.length ?? 0) <= 1 && (!moreThanOne || (lim.truncated === true && lim.total === co.count)), `returned=${lim.elements?.length}, total=${lim.total ?? 'n/a'}`); | ||
|
|
||
| // Auth (pre-filled) → dashboard with its count-up animations. | ||
| await T('iris_act_and_wait', { ref: await refOf('testid', 'login-submit'), action: 'click', until: { kind: 'signal', name: 'auth:granted' }, timeout_ms: 5000 }); | ||
| chk('login → dashboard', (await refOf('testid', 'nav-deployments')) !== null); | ||
|
|
||
| // settled wait — the dashboard's count-up counters emit dom.text every frame; settle must STILL | ||
| // resolve (the ambient-animation fix). Pre-fix this would time out at 4s with pass:false. | ||
| const settled = await T('iris_wait_for', { predicate: { kind: 'settled', quietMs: 300 }, timeout_ms: 4000 }); | ||
| chk('settled resolves despite count-up animation churn', settled.pass === true, JSON.stringify(settled.evidence ?? {})); | ||
|
|
||
| // act_and_wait with NO `until` → auto-settle after a nav click; verdict carries settled evidence. | ||
| const aw = await T('iris_act_and_wait', { ref: await refOf('testid', 'nav-deployments'), action: 'click' }); | ||
| chk('act_and_wait (no until) auto-settles', aw.verdict?.pass === true && aw.verdict?.evidence?.settled === true, JSON.stringify(aw.verdict?.evidence ?? {})); | ||
|
|
||
| // presence-only advice — a PASSING element assertion is nudged toward a consequence. | ||
| const adv = await T('iris_assert', { predicate: { kind: 'element', query: { testid: 'deploy-list' } } }); | ||
| chk('iris_assert presence-only attaches advice', adv.pass === true && typeof adv.advice === 'string' && adv.advice.includes('consequence'), adv.advice ? 'advice present' : 'no advice'); | ||
|
|
||
| console.log(`\n${fail === 0 ? '✅ NEW FEATURES VERIFIED' : '❌ FAILED'} (${pass} passed, ${fail} failed)`); | ||
| await b.close(); | ||
| await server.close(); | ||
| process.exit(fail === 0 ? 0 : 1); |
There was a problem hiding this comment.
Guarantee teardown with try/finally around the e2e run.
Any thrown await in the flow skips b.close()/server.close(), which can leak resources and hang CI jobs.
🛠️ Suggested hardening
-const b = await chromium.launch({ headless: true });
-const p = await b.newPage();
-await p.goto('http://localhost:4310/?session=demo', { waitUntil: 'networkidle' });
-for (let i = 0; i < 200 && server.bridge.sessions.count() === 0; i++) await sleep(50);
+let b;
+let p;
+try {
+ b = await chromium.launch({ headless: true });
+ p = await b.newPage();
+ await p.goto('http://localhost:4310/?session=demo', { waitUntil: 'networkidle' });
+ for (let i = 0; i < 200 && server.bridge.sessions.count() === 0; i++) await sleep(50);
@@
-console.log(`\n${fail === 0 ? '✅ NEW FEATURES VERIFIED' : '❌ FAILED'} (${pass} passed, ${fail} failed)`);
-await b.close();
-await server.close();
+ console.log(`\n${fail === 0 ? '✅ NEW FEATURES VERIFIED' : '❌ FAILED'} (${pass} passed, ${fail} failed)`);
+} finally {
+ await p?.close();
+ await b?.close();
+ await server.close();
+}
process.exit(fail === 0 ? 0 : 1);🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@apps/e2e/specs/new-features-test.mjs` around lines 54 - 91, The test code
lacks proper error handling and resource cleanup. If any await statement throws
an error before reaching b.close() and server.close(), resources will leak and
hang the CI. Wrap all the test logic (from chromium.launch through the final chk
call) in a try block, and move the b.close() and server.close() calls into a
finally block that runs regardless of success or failure. Keep the process.exit
call after the finally block to ensure it only executes after cleanup is
complete.
| function flowSignals(flow: FlowFile): string[] { | ||
| const set = new Set<string>(); | ||
| for (const step of flatten(flow.steps)) { | ||
| if (step.anchor.kind === AnchorKind.SIGNAL) set.add(step.anchor.name); | ||
| if (step.expect?.signal !== undefined) set.add(step.expect.signal); | ||
| } | ||
| if (flow.success?.signal !== undefined) set.add(flow.success.signal); | ||
| return [...set]; | ||
| } |
There was a problem hiding this comment.
Signal-gap computation currently over-credits non-asserting flows.
On Line 80, flowSignals includes anchor signals, but declaredUntestedSignals is defined as signals that no flow asserts. A flow that only references a signal anchor (without expect.signal/success.signal) will be treated as covered, which can suppress real intent gaps.
Suggested fix
function flowSignals(flow: FlowFile): string[] {
const set = new Set<string>();
for (const step of flatten(flow.steps)) {
- if (step.anchor.kind === AnchorKind.SIGNAL) set.add(step.anchor.name);
if (step.expect?.signal !== undefined) set.add(step.expect.signal);
}
if (flow.success?.signal !== undefined) set.add(flow.success.signal);
return [...set];
}Also applies to: 133-136
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@packages/server/src/domain/domain-model.ts` around lines 77 - 85, The
flowSignals function incorrectly includes anchor signals in addition to asserted
signals, which causes flows that only reference a signal anchor to be treated as
tested even when they don't actually assert or expect that signal. Remove the
line that adds anchor signals to the set (the condition checking if
step.anchor.kind === AnchorKind.SIGNAL) so that only signals that are explicitly
asserted through expect.signal or success.signal are counted as covered by the
flow.
| return 'No saved flows yet — record the critical journeys (iris_record_start) so the agent learns the app.'; | ||
| } | ||
| const parts = [ | ||
| `${String(flowCount)} flow${flowCount === 1 ? '' : 's'}: ${String(coverage.asserted)} asserted, ${String(coverage.presenceOnly)} presence-only, ${String(coverage.assertionFree)} assertion-free`, | ||
| ]; | ||
| if (topRisk !== undefined) { | ||
| parts.push(`test first: ${topRisk.name} (${topRisk.reason})`); | ||
| } | ||
| if (gaps.declaredUntestedSignals.length > 0) { | ||
| parts.push( | ||
| `${String(gaps.declaredUntestedSignals.length)} declared signal(s) no flow asserts (${gaps.declaredUntestedSignals.join(', ')})`, | ||
| ); | ||
| } | ||
| if (gaps.unassertedFlows.length > 0) { | ||
| parts.push(`${String(gaps.unassertedFlows.length)} flow(s) assert no consequence`); | ||
| } | ||
| return parts.join('. ') + '.'; |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major | 🏗️ Heavy lift
Extract runtime/domain strings into named constants.
Lines 174–190 introduce multiple free strings in tool-facing output. This violates the repo rule and makes wording changes harder to maintain consistently.
As per coding guidelines, “All domain/wire/UI strings must be named constants, never free strings in code.”
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@packages/server/src/domain/domain-model.ts` around lines 174 - 190, Extract
all hardcoded strings in the coverage status message building code into named
constants at the module or class level. This includes the "No saved flows yet"
message, the template strings used for constructing the parts array such as
"flow"/"flows", "test first: ", "declared signal(s) no flow asserts (", and
"flow(s) assert no consequence". Replace each free string in the code with a
reference to its corresponding named constant to ensure consistency and
maintainability of domain strings throughout the codebase.
Source: Coding guidelines
| description: | ||
| 'Read the app domain model BEFORE testing: every saved flow with its assertion grade, the consequence that MUST hold for it (mustHold = what it actually tests), the anchors/signals it exercises, plus GAPS — declared signals/testids that NO flow asserts (untested intent), and flows that assert no observable consequence. Use this to decide what to test and where the real risk is, instead of crawling the whole app. Reads .iris/flows/ + .iris/contract.json (no browser needed).', | ||
| inputSchema: {}, |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major | 🏗️ Heavy lift
Move tool-facing strings to named constants.
This file introduces multiple free strings in tool description and schema metadata. Please extract them to constants to satisfy the repo string policy.
As per coding guidelines, “All domain/wire/UI strings must be named constants, never free strings in code.”
Also applies to: 31-33, 57-58
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@packages/server/src/domain/domain-tools.ts` around lines 17 - 19, Extract all
free string literals from the tool definition (including the long description
string in the description field, and any strings at the locations mentioned in
lines 31-33 and 57-58) into named constants at the top of the file, then replace
each free string with a reference to its corresponding constant. This ensures
all domain/wire/UI strings follow the repository policy of being named constants
rather than inline literals.
Source: Coding guidelines
| z.object({ | ||
| name: z.string(), | ||
| steps: z.number(), | ||
| grade: z.string(), | ||
| asserts: z.boolean(), | ||
| mustHold: z | ||
| .string() | ||
| .optional() | ||
| .describe( | ||
| 'The success consequence that must hold for this flow (what it actually tests).', | ||
| ), | ||
| warning: z.string().optional(), | ||
| signals: z.array(z.string()), | ||
| testids: z.array(z.string()), | ||
| }), |
There was a problem hiding this comment.
outputSchema.flows is missing the optional risk field returned by the handler.
The handler returns buildDomainModel(...) (Line 71), and DomainFlowSummary includes risk when run history exists. But outputSchema.flows doesn’t declare it, creating a wire-contract mismatch for schema-aware consumers.
Suggested schema addition
flows: z.array(
z.object({
name: z.string(),
steps: z.number(),
grade: z.string(),
asserts: z.boolean(),
@@
warning: z.string().optional(),
signals: z.array(z.string()),
testids: z.array(z.string()),
+ risk: z
+ .object({
+ level: z.string(),
+ reason: z.string(),
+ lastStatus: z.string().optional(),
+ })
+ .optional(),
}),
),Also applies to: 61-72
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@packages/server/src/domain/domain-tools.ts` around lines 23 - 37, The
outputSchema.flows object definition is missing the optional risk field that is
actually returned by the handler's buildDomainModel function call, causing a
schema contract mismatch for consumers. Add an optional risk field to the Zod
schema object within the flows definition (around the z.object block), using the
same pattern as the other optional fields like mustHold and warning, declaring
it as z.string().optional() or with an appropriate type based on what
buildDomainModel returns.
| const CLAUDE_MCP_TITLE = 'MCP server (Claude, global)'; | ||
| const CURSOR_MCP_TITLE = 'MCP server (Cursor, global)'; | ||
|
|
||
| function claudeMcpStep(input: PlanInput): Step | null { | ||
| if (!input.claudeCli) return null; | ||
| if (input.mcpExists) { | ||
| return { | ||
| title: CLAUDE_MCP_TITLE, | ||
| target: MCP_TARGET, | ||
| status: StepStatus.ALREADY, | ||
| detail: 'iris already registered (install once, used by every project)', | ||
| }; | ||
| } | ||
| const cmd = claudeAddCommand(input.options.port); | ||
| return { | ||
| title: CLAUDE_MCP_TITLE, | ||
| target: MCP_TARGET, | ||
| status: StepStatus.APPLY, | ||
| detail: 'register iris globally for all projects', | ||
| exec: { command: cmd.command, args: cmd.args, fallback: cmd.display }, | ||
| }; | ||
| } | ||
|
|
||
| function cursorMcpStep(input: PlanInput): Step | null { | ||
| if (!input.cursorPresent) return null; | ||
| const r = mergeCursorConfig(input.cursorConfig, input.options.port); | ||
| if (r.status === CursorMergeStatus.ALREADY) { | ||
| return { | ||
| title: CURSOR_MCP_TITLE, | ||
| target: input.cursorConfigPath, | ||
| status: StepStatus.ALREADY, | ||
| detail: 'iris already in Cursor global config', | ||
| }; | ||
| } | ||
| if (r.status === CursorMergeStatus.MANUAL) { | ||
| return { | ||
| title: CURSOR_MCP_TITLE, | ||
| target: input.cursorConfigPath, | ||
| status: StepStatus.MANUAL, | ||
| detail: `couldn't parse ${input.cursorConfigPath} — add this server by hand:\n "iris": ${JSON.stringify(cursorServerEntry(input.options.port))}`, | ||
| }; | ||
| } | ||
| return { | ||
| title: CURSOR_MCP_TITLE, | ||
| target: input.cursorConfigPath, | ||
| status: StepStatus.APPLY, | ||
| detail: 'register iris in Cursor global config', | ||
| write: { path: input.cursorConfigPath, content: r.content }, | ||
| }; | ||
| } | ||
|
|
||
| /** One global registration per detected agent (Claude + Cursor). Falls back to a manual note. */ | ||
| function mcpSteps(input: PlanInput): Step[] { | ||
| if (!input.options.mcp) { | ||
| return [ | ||
| { | ||
| title: 'MCP server (global)', | ||
| target: MCP_TARGET, | ||
| status: StepStatus.SKIP, | ||
| detail: '--no-mcp', | ||
| }, | ||
| ]; | ||
| } | ||
| const steps = [claudeMcpStep(input), cursorMcpStep(input)].filter((s): s is Step => s !== null); | ||
| if (steps.length > 0) return steps; | ||
| // No supported agent detected — print the one-time global instructions. | ||
| return [ | ||
| { | ||
| title: 'MCP server (global)', | ||
| target: MCP_TARGET, | ||
| status: StepStatus.MANUAL, | ||
| detail: mcpManual(input.options.port), | ||
| }, | ||
| ]; | ||
| } | ||
|
|
||
| function installStep(input: PlanInput): Step { | ||
| const pm = input.detection.packageManager; | ||
| const command = installCommand(pm, IRIS_PACKAGE); | ||
| if (!input.options.install) { | ||
| return { | ||
| title: 'Install dependency', | ||
| target: 'package.json', | ||
| status: StepStatus.MANUAL, | ||
| detail: command, | ||
| }; | ||
| } | ||
| const parts = installCommandParts(pm, IRIS_PACKAGE); | ||
| return { | ||
| title: 'Install dependency', | ||
| target: 'package.json', | ||
| status: StepStatus.APPLY, | ||
| detail: command, | ||
| exec: { command: parts.command, args: parts.args, fallback: command }, | ||
| }; | ||
| } | ||
|
|
||
| function viteSteps(input: PlanInput): Step[] { | ||
| const cfg = input.viteConfig; | ||
| const port = input.options.port; | ||
| if (cfg === null) { | ||
| return [ | ||
| { | ||
| title: 'Vite plugin', | ||
| target: 'vite.config', | ||
| status: StepStatus.MANUAL, | ||
| detail: viteManual(port), | ||
| }, | ||
| ]; | ||
| } | ||
| const patch = patchViteConfig(cfg.source, port); | ||
| if (patch.kind === VitePatchKind.ALREADY) { | ||
| return [ | ||
| { | ||
| title: 'Vite plugin', | ||
| target: cfg.path, | ||
| status: StepStatus.ALREADY, | ||
| detail: 'iris() already in plugins', | ||
| }, | ||
| ]; | ||
| } | ||
| if (patch.kind === VitePatchKind.MANUAL) { | ||
| return [ | ||
| { | ||
| title: 'Vite plugin', | ||
| target: cfg.path, | ||
| status: StepStatus.MANUAL, | ||
| detail: `${patch.reason}\n\n${viteManual(port)}`, | ||
| }, | ||
| ]; | ||
| } | ||
| return [ | ||
| { | ||
| title: 'Vite plugin', | ||
| target: cfg.path, | ||
| status: StepStatus.APPLY, | ||
| detail: 'add iris() to plugins (also injects connect())', | ||
| write: { path: cfg.path, content: patch.code }, | ||
| }, | ||
| ]; | ||
| } | ||
|
|
||
| function nextSteps(input: PlanInput): Step[] { | ||
| const configFile = input.nextConfigFile ?? 'next.config.mjs'; | ||
| const devFile: Step = input.nextIrisDevExists | ||
| ? { | ||
| title: 'IrisDev component', | ||
| target: NEXT_IRIS_DEV_PATH, | ||
| status: StepStatus.ALREADY, | ||
| detail: 'file exists', | ||
| } | ||
| : { | ||
| title: 'IrisDev component', | ||
| target: NEXT_IRIS_DEV_PATH, | ||
| status: StepStatus.APPLY, | ||
| detail: 'create dev-only connect component', | ||
| write: { path: NEXT_IRIS_DEV_PATH, content: nextIrisDevFile(input.options.port) }, | ||
| }; | ||
| return [ | ||
| devFile, | ||
| { | ||
| title: 'Next config (withIris)', | ||
| target: configFile, | ||
| status: StepStatus.MANUAL, | ||
| detail: nextConfigManual(configFile), | ||
| }, | ||
| { | ||
| title: 'Mount IrisDev', | ||
| target: 'app/layout.tsx', | ||
| status: StepStatus.MANUAL, | ||
| detail: NEXT_LAYOUT_MANUAL, | ||
| }, | ||
| ]; | ||
| } | ||
|
|
||
| export function buildPlan(input: PlanInput): Plan { | ||
| const steps: Step[] = [...mcpSteps(input), installStep(input)]; | ||
| if (input.detection.framework === Framework.VITE) { | ||
| steps.push(...viteSteps(input)); | ||
| } else if (input.detection.framework === Framework.NEXT) { | ||
| steps.push(...nextSteps(input)); | ||
| } else { | ||
| steps.push({ | ||
| title: 'Connect snippet', | ||
| target: 'index.html', | ||
| status: StepStatus.MANUAL, | ||
| detail: htmlManual(input.options.port), | ||
| }); |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major | 🏗️ Heavy lift
Extract remaining step title/detail literals into named constants.
This file still has many inline domain/UI strings in plan steps; centralizing them as constants will align with the repo rule and reduce drift in CLI messaging.
As per coding guidelines, All domain/wire/UI strings must be named constants, never free strings in code.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@packages/server/src/init/plan.ts` around lines 68 - 255, Extract the inline
string literals for step titles and details throughout the file into named
constants at the top, following the pattern of CLAUDE_MCP_TITLE and
CURSOR_MCP_TITLE. This applies to strings in the installStep, viteSteps,
nextSteps, and buildPlan functions including titles like "Install dependency",
"Vite plugin", "IrisDev component", "Next config (withIris)", "Mount IrisDev",
"Connect snippet", and their corresponding detail messages. Replace each inline
string literal with the appropriate constant reference to ensure all domain/UI
strings are centralized and follow the repository's coding guidelines.
Source: Coding guidelines
| function gatherPlanInput(options: InitOptions, io: InitIo, pkgRaw: string): PlanInput { | ||
| const pkg: unknown = JSON.parse(pkgRaw); |
There was a problem hiding this comment.
Handle malformed package.json without crashing iris init.
Line 71 can throw during parse, and that exception is currently uncaught in the runInit path. This turns a recoverable input error into a hard CLI failure.
💡 Proposed fix
export function runInit(options: InitOptions, io: InitIo): InitResult {
const pkgRaw = io.readFile(PACKAGE_JSON);
if (pkgRaw === null) {
io.print('No package.json found. Run `iris init` from your project root.');
return { ok: false, applied: 0, manual: 0 };
}
- const plan = buildPlan(gatherPlanInput(options, io, pkgRaw));
+ let plan: Plan;
+ try {
+ plan = buildPlan(gatherPlanInput(options, io, pkgRaw));
+ } catch {
+ io.print('Invalid package.json. Fix JSON syntax and re-run `iris init`.');
+ return { ok: false, applied: 0, manual: 0 };
+ }
const failed = options.dryRun ? new Set<string>() : applyEffects(plan, io);
return report(plan, options.dryRun, failed, io);
}Also applies to: 157-166
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@packages/server/src/init/run.ts` around lines 70 - 71, The JSON.parse call in
the gatherPlanInput function at line 71 can throw an exception if the pkgRaw
string contains malformed JSON, causing the CLI to crash. Wrap the
JSON.parse(pkgRaw) call in a try-catch block to handle the parse error
gracefully, and either throw a custom error with a user-friendly message or
return an error state that the caller can handle. Similarly, check the related
code around lines 157-166 where similar JSON parsing may occur and apply the
same error handling pattern to ensure all JSON parsing in the runInit path
handles malformed input without crashing.
| export function patchViteConfig(source: string, port?: number): VitePatch { | ||
| if (source.includes(IRIS_MARKER)) { | ||
| return { kind: VitePatchKind.ALREADY }; | ||
| } | ||
| if (!PLUGINS_ARRAY.test(source)) { | ||
| return { kind: VitePatchKind.MANUAL, reason: NO_PLUGINS_REASON }; | ||
| } | ||
| return { kind: VitePatchKind.APPLY, code: insertImport(insertPlugin(source, port)) }; | ||
| } |
There was a problem hiding this comment.
ALREADY detection is too broad and can skip required plugin injection.
Checking only @syrin/iris/vite presence can return ALREADY when the config has the import (or even just a comment) but no iris(...) in plugins, leaving setup incomplete.
Suggested fix
const IRIS_MARKER = '`@syrin/iris/vite`';
+const IRIS_PLUGIN_IN_PLUGINS = /plugins\s*:\s*\[[\s\S]*?\biris\s*\(/;
function insertImport(source: string): string {
+ if (source.includes(IRIS_MARKER)) return source;
const matches = [...source.matchAll(IMPORT_LINE)];
const last = matches[matches.length - 1];
if (last?.index === undefined) {
return `${VITE_IMPORT}\n${source}`;
}
const end = last.index + last[0].length;
return `${source.slice(0, end)}\n${VITE_IMPORT}${source.slice(end)}`;
}
export function patchViteConfig(source: string, port?: number): VitePatch {
- if (source.includes(IRIS_MARKER)) {
+ if (source.includes(IRIS_MARKER) && IRIS_PLUGIN_IN_PLUGINS.test(source)) {
return { kind: VitePatchKind.ALREADY };
}🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@packages/server/src/init/vite-config.ts` around lines 49 - 57, The ALREADY
detection in patchViteConfig function is too broad because checking only if the
source includes IRIS_MARKER can match occurrences in comments or imports without
verifying the iris plugin is actually in the plugins array. Replace the simple
source.includes(IRIS_MARKER) check with a more robust validation that confirms
the iris() plugin call is present in the plugins array, similar to how
PLUGINS_ARRAY.test(source) validates the plugins array structure. This ensures
ALREADY is only returned when the plugin injection is truly complete.
| handler: (deps, args) => { | ||
| const sessionId = asString(args['sessionId']); | ||
| const mode = asString(args['mode']) ?? SnapshotMode.FULL; | ||
| return commandOrThrow(deps, sessionId, IrisCommand.SNAPSHOT, { | ||
| scope: args['scope'], | ||
| mode: args['mode'] ?? SnapshotMode.FULL, | ||
| }), | ||
| mode, | ||
| }).then((raw) => | ||
| withSizeCost( | ||
| applySnapshotDelta( | ||
| raw, | ||
| { | ||
| sessionId: sessionId ?? 'default', | ||
| scope: asString(args['scope']) ?? '', | ||
| mode, | ||
| diff: args['diff'] === true, | ||
| }, | ||
| SNAPSHOT_CACHE, | ||
| ), |
There was a problem hiding this comment.
Snapshot diff cache key can collide across sessions when sessionId is omitted.
At Line 350, sessionId ?? 'default' is used with a process-wide SNAPSHOT_CACHE. Different sessions that omit sessionId can share one cache key and get deltas computed from another session’s prior snapshot, which is both incorrect and a privacy boundary risk.
Proposed fix
handler: (deps, args) => {
- const sessionId = asString(args['sessionId']);
+ const requestedSessionId = asString(args['sessionId']);
+ const resolvedSessionId = deps.sessions.resolve(requestedSessionId).id;
const mode = asString(args['mode']) ?? SnapshotMode.FULL;
- return commandOrThrow(deps, sessionId, IrisCommand.SNAPSHOT, {
+ return commandOrThrow(deps, resolvedSessionId, IrisCommand.SNAPSHOT, {
scope: args['scope'],
mode,
}).then((raw) =>
withSizeCost(
applySnapshotDelta(
raw,
{
- sessionId: sessionId ?? 'default',
+ sessionId: resolvedSessionId,
scope: asString(args['scope']) ?? '',
mode,
diff: args['diff'] === true,
},
SNAPSHOT_CACHE,
),
),
);
},🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@packages/server/src/tools/tools.ts` around lines 339 - 356, The snapshot
cache key generation in the handler function uses a fallback value of 'default'
when sessionId is not provided, causing different sessions without an explicit
sessionId to share the same cache entry in the process-wide SNAPSHOT_CACHE. This
creates privacy and correctness issues as unrelated sessions can get deltas
computed from another session's prior snapshot. Remove the fallback to 'default'
in the sessionId field passed to applySnapshotDelta and ensure each session has
a uniquely identifiable sessionId rather than silently defaulting multiple
sessions to the same cache key value.
| }, | ||
| // iris_capabilities (live | fromDisk) + iris_contract_save. See contract-tools.ts. | ||
| ...CONTRACT_TOOLS, | ||
| ...DOMAIN_TOOLS, |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major | 🏗️ Heavy lift
tools.ts exceeds the repository’s 500-line file limit.
This file now reaches at least Line 1299. Please split tool definitions into smaller modules (for example: snapshot/query/assert/network-console groups) and compose them in TOOLS to stay within the limit.
As per coding guidelines, **/*.{ts,tsx,js,jsx} requires that no file exceeds 500 lines.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@packages/server/src/tools/tools.ts` at line 1299, The tools.ts file exceeds
the repository's 500-line file limit at 1299 lines. Refactor by extracting
related tool definition groups (such as snapshot tools, query tools, assert
tools, and network-console tools) into separate dedicated modules. Then import
these extracted tool groups back into tools.ts and compose them together in the
DOMAIN_TOOLS spread operator to maintain the same public API while keeping each
file within the 500-line limit.
Source: Coding guidelines
What this is
The 0.6.10 feature release — live-verified, not 1.0.0 (see
plan/V1-ROADMAP.mdfor the 1.0 gate). Bumps all 10publishable
@syrin/iris-*packages0.5.0 → 0.6.10.Highlights (full list in CHANGELOG
[0.6.10])Deterministic waiting (M3) — kills the #1 flake cause (sleeps)
{ kind: "settled", quietMs }predicate: passes once network + structural-DOM activity is quiet forquietMs. Ambientdom.text/animation churn (count-ups, spinners) is ignored so an animated page still settles.iris_act_and_waitwith nountilnow auto-settles — "act, then wait for quiet" in one call, the documented replacementfor a fixed sleep.
Self-healing (M5) — heal the locator, never the intent
iris_flow_heal apply:truere-replays the healed flow and re-asserts its success consequence; if a rebound locator resolvesbut the consequence no longer fires, the write is refused (
status:consequence_broken, file untouched).Token efficiency (M2) — every read tool self-budgets
iris_querygainslimit+count_only;iris_network/iris_consolegainlimit+ acost:{bytes,tokens}hint(most-recent-N, non-silent truncation).
Domain understanding (M4)
iris_domainnow reportsmustHoldper flow — the success consequence that must hold — so an agent can answer "what are thecritical flows and what must hold for each?" from the model alone.
Rigor
success-oracle thesis, now enforced at the live-assert layer too).
Reliability
Verification
new-features-test.mjsspec 7/7 positively exercises the new surfaces in a real browser — includingsettledresolvingagainst the demo's count-up animations (the exact ambient-churn condition the fix targets).
Summary by CodeRabbit
New Features
iris_act_and_waitauto-settles whenuntilis omittediris_querynow supportslimitandcount_onlypagination controlslimitparameteriris_domaintool for inspecting app's testable surfaceiris initcommand for guided project setup with Vite/Next/Claude/Cursor integrationBug Fixes
iris_flow_healre-verifies success consequences; refuses to persist if brokenDocumentation