syrin-labs · divshekhar · Jun 18, 2026 · Jun 17, 2026 · Jun 17, 2026 · Jun 17, 2026
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -4,6 +4,41 @@ All notable changes to **`@syrin/iris`** are documented here. The format follows
 [Keep a Changelog](https://keepachangelog.com/en/1.1.0/), and the project follows
 [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
 
+## [0.6.10] — 2026-06-18
+
+### Added
+
+- **Deterministic waiting — the `settled` predicate** (`packages/server`). A new predicate
+  `{ kind: "settled", quietMs }` passes once network + structural-DOM activity has been quiet for
+  `quietMs` (default 500ms); ambient `dom.text`/animation churn (count-ups, spinners) is ignored so
+  an animated page can still settle. Usable in `iris_wait_for` and `iris_assert`, and composable inside
+  `allOf` with the consequence you expect. Replaces fixed sleeps — the #1 cause of flaky agent tests.
+- **`iris_act_and_wait` auto-settle** (`packages/server`). Omit `until` and the tool waits for the page
+  to settle instead of requiring a predicate — "act, then wait for quiet" is now a single zero-config
+  call, the documented alternative to a sleep.
+- **`iris_query` token controls** (`packages/server`) — `limit` (cap returned descriptors; reports
+  `total` + `truncated` so a trim is never silent) and `count_only` (return just the match count).
+- **`iris_network` / `iris_console` token controls** (`packages/server`) — `limit` (keep the most
+  recent N matches, reporting `total` + `droppedOldest`) and a `cost:{bytes,tokens}` hint, matching the
+  other read tools so the agent can self-budget everywhere.
+- **`iris_domain` `mustHold` per flow** (`packages/server`) — each flow now reports the success
+  consequence that must hold for it (signal name / net URL), so an agent can answer "what are the
+  critical flows and what must hold for each?" from the domain model alone.
+
+### Changed
+
+- **Self-healing now verifies the consequence before persisting** (`packages/server`). `iris_flow_heal`
+  with `apply:true` re-replays the healed flow and re-asserts its success consequence; if a rebound
+  locator resolves but the flow no longer satisfies its intent, the write is **refused**
+  (`status:consequence_broken`, file untouched). It heals the locator, never the intent.
+
+### Fixed
+
+- **Browser observers fully restore patched globals on teardown** (`packages/browser`). The network,
+  route, and console observers stored a bound copy and assigned it back on teardown, so `window.fetch`
+  / `history.pushState` / `console.*` were never restored to their original identity. They now keep the
+  true original for restore and a bound copy only for invocation.
+
 ## [0.5.0] — 2026-06-15
 
 ### Added
@@ -25,24 +60,18 @@ All notable changes to **`@syrin/iris`** are documented here. The format follows
   dev-only HUD overlay that the agent can control: `iris_narrate` shows a caption, `iris_highlight`
   draws a ring around any element. The HUD is excluded from snapshots and tree-shaken in production.
 - **Unified `SKILL.md` at repo root** — a single skill file auto-detects mode: setup wizard on first
-  run (no `.iris.json`), live-app testing on every run after. Covers Claude Code, OpenCode, Codex CLI,
-  Cursor, Windsurf, VS Code, and Zed MCP config formats.
+  run (no `.iris.json`), live-app testing on every run after. Covers Claude Code, OpenCode, Codex CLI, Cursor, Windsurf, VS Code, and Zed MCP config formats.
 - **`.iris.json` project config** — written after first-run setup; persists `port`, `headed`,
   `framework`, and `harnesses` so subsequent runs need zero questions.
-- **`dev:iris` script** in `apps/demo` — second Vite dev server on port 4310, isolated from the user's
-  normal dev port.
+- **`dev:iris` script** in `apps/demo` — second Vite dev server on port 4310, isolated from the user's normal dev port.
 
 ### Fixed
 
 - **All-throttled session auto-selection** (`packages/server`). When every connected tab is hidden
-  (e.g. user is in VS Code with Chrome on another desktop), `SessionManager.resolve()` now picks the
-  session with the freshest heartbeat instead of throwing `"multiple sessions connected"`.
-- **Presenter HUD shows on bridge connect** — the overlay now mounts as soon as the SDK connects to the
-  bridge, not only after the first `iris_narrate` call.
-- **`iris_narrate` MCP schema validation** — relaxed the output schema so the tool no longer rejects
-  responses from narration calls.
-- **`iris_inspect` / `iris_clock` output schemas** — relaxed to pass through extra fields instead of
-  stripping them, fixing spurious validation errors.
+  (e.g. user is in VS Code with Chrome on another desktop), `SessionManager.resolve()` now picks the session with the freshest heartbeat instead of throwing `"multiple sessions connected"`.
+- **Presenter HUD shows on bridge connect** — the overlay now mounts as soon as the SDK connects to the bridge, not only after the first `iris_narrate` call.
+- **`iris_narrate` MCP schema validation** — relaxed the output schema so the tool no longer rejects responses from narration calls.
+- **`iris_inspect` / `iris_clock` output schemas** — relaxed to pass through extra fields instead of stripping them, fixing spurious validation errors.
 
 ---
 

diff --git a/apps/e2e/specs/new-features-test.mjs b/apps/e2e/specs/new-features-test.mjs
@@ -0,0 +1,91 @@
+// Live verification of the features added in the [Unreleased] CHANGELOG section, against the real
+// showcase dashboard (apps/demo :4310 + apps/api :8787). The existing battery proves no regression;
+// this spec positively exercises the NEW surfaces end-to-end in a real browser:
+//   - settled predicate + iris_act_and_wait auto-settle (incl. the ambient-animation fix: the demo's
+//     count-up counters emit dom.text every frame, which must NOT prevent settling)
+//   - iris_query limit / count_only token controls
+//   - iris_assert presence-only `advice` nudge
+import { chromium } from 'playwright';
+import {
+  start,
+  TOOLS,
+  BaselineStore,
+  RecordingStore,
+  FlowStore,
+  ProjectStore,
+  AnnotationStore,
+  createNodeFileSystem,
+} from '@syrin/iris-server';
+import os from 'node:os';
+import path from 'node:path';
+const sleep = (ms) => new Promise((r) => setTimeout(r, ms));
+let pass = 0,
+  fail = 0;
+const chk = (l, o, d = '') => {
+  console.log(`   ${o ? '✅' : '❌'} ${l}${d ? '  — ' + d : ''}`);
+  o ? pass++ : fail++;
+};
+
+const irisRoot = path.join(os.tmpdir(), `iris-nf-${process.pid}`, '.iris');
+const fsp = createNodeFileSystem();
+const now = () => Date.now();
+const server = await start({ port: 4400, mcp: false });
+const deps = {
+  sessions: server.bridge.sessions,
+  baselines: new BaselineStore(),
+  recordings: new RecordingStore(),
+  flows: new FlowStore(fsp, irisRoot, { now }),
+  project: new ProjectStore(fsp, irisRoot, { now }),
+  annotations: new AnnotationStore(),
+  fs: fsp,
+  irisRoot,
+  now,
+};
+const T = (n, a = {}) => TOOLS.find((t) => t.name === n).handler(deps, { sessionId: 'demo', ...a });
+const refOf = async (by, value) => {
+  for (let i = 0; i < 40; i++) {
+    const r = (await T('iris_query', { by, value })).elements?.[0]?.ref;
+    if (r) return r;
+    await sleep(100);
+  }
+  return null;
+};
+
+const b = await chromium.launch({ headless: true });
+const p = await b.newPage();
+await p.goto('http://localhost:4310/?session=demo', { waitUntil: 'networkidle' });
+for (let i = 0; i < 200 && server.bridge.sessions.count() === 0; i++) await sleep(50);
+
+console.log('\n=== Iris × new features (:4310) ===');
+chk('dashboard SDK connected', server.bridge.sessions.count() > 0);
+
+// count_only — just the match count, no descriptors.
+const co = await T('iris_query', { by: 'role', value: 'button', count_only: true });
+chk('iris_query count_only returns a count, drops elements', typeof co.count === 'number' && co.count >= 1 && co.elements === undefined, `count=${co.count}`);
+
+// limit — cap descriptors; when more matched, total + truncated flag it.
+const lim = await T('iris_query', { by: 'role', value: 'button', limit: 1 });
+const moreThanOne = (co.count ?? 0) > 1;
+chk('iris_query limit caps descriptors (truncated when more)', (lim.elements?.length ?? 0) <= 1 && (!moreThanOne || (lim.truncated === true && lim.total === co.count)), `returned=${lim.elements?.length}, total=${lim.total ?? 'n/a'}`);
+
+// Auth (pre-filled) → dashboard with its count-up animations.
+await T('iris_act_and_wait', { ref: await refOf('testid', 'login-submit'), action: 'click', until: { kind: 'signal', name: 'auth:granted' }, timeout_ms: 5000 });
+chk('login → dashboard', (await refOf('testid', 'nav-deployments')) !== null);
+
+// settled wait — the dashboard's count-up counters emit dom.text every frame; settle must STILL
+// resolve (the ambient-animation fix). Pre-fix this would time out at 4s with pass:false.
+const settled = await T('iris_wait_for', { predicate: { kind: 'settled', quietMs: 300 }, timeout_ms: 4000 });
+chk('settled resolves despite count-up animation churn', settled.pass === true, JSON.stringify(settled.evidence ?? {}));
+
+// act_and_wait with NO `until` → auto-settle after a nav click; verdict carries settled evidence.
+const aw = await T('iris_act_and_wait', { ref: await refOf('testid', 'nav-deployments'), action: 'click' });
+chk('act_and_wait (no until) auto-settles', aw.verdict?.pass === true && aw.verdict?.evidence?.settled === true, JSON.stringify(aw.verdict?.evidence ?? {}));
+
+// presence-only advice — a PASSING element assertion is nudged toward a consequence.
+const adv = await T('iris_assert', { predicate: { kind: 'element', query: { testid: 'deploy-list' } } });
+chk('iris_assert presence-only attaches advice', adv.pass === true && typeof adv.advice === 'string' && adv.advice.includes('consequence'), adv.advice ? 'advice present' : 'no advice');
+
+console.log(`\n${fail === 0 ? '✅ NEW FEATURES VERIFIED' : '❌ FAILED'} (${pass} passed, ${fail} failed)`);
+await b.close();
+await server.close();
+process.exit(fail === 0 ? 0 : 1);
diff --git a/docs/agent-cheatsheet.md b/docs/agent-cheatsheet.md
@@ -23,6 +23,20 @@ pointer sequence on the element (no coordinate gesture for the HUD to intercept)
 `occluded:true` when something covers the target, and stays synthetic even with CDP configured
 (use `args:{ native:true }` for a trusted native click).
 
+**Never sleep — wait deterministically.** Fixed sleeps are the #1 cause of flaky agent tests. Instead:
+
+- `iris_act_and_wait({ ref, action })` with **no `until`** waits for the page to _settle_ (network +
+  structural DOM idle; ambient count-up/spinner churn is ignored so an animated page still settles)
+  before returning — the one-call replacement for "click then sleep 500ms".
+- Need to wait without acting? `iris_wait_for({ predicate: { kind: "settled", quietMs } })`.
+- Waiting for a specific outcome? Pass that consequence as the predicate (`{ signal }` / `{ net }`),
+  or `allOf` it with `{ kind: "settled" }` to wait for both the event _and_ the page going quiet.
+
+**Assert a consequence, not just presence.** `{ signal }` / `{ net }` prove the feature actually did
+something; `{ element }` / `{ text }` only prove something is on screen — which a stale render or a
+locator healed to the wrong element can fake. A _passing_ presence-only `iris_assert` returns
+`advice` nudging you to a consequence; heed it on anything that matters.
+
 ## The 4-layer cross-check — never trust a green the state contradicts
 
 A claim is real only when the layers agree. Check more than the UI:
@@ -54,9 +68,9 @@ tree; a wrong `path` returns `{ found:false, availableKeys }` so it's self-corre
 
 Sessions/perception/verify — what you'll use 90% of the time:
 
-`iris_sessions` · `iris_snapshot` · `iris_query` · `iris_act` · `iris_act_and_wait` ·
-`iris_observe` · `iris_wait_for` · `iris_assert` · `iris_state` · `iris_diff` ·
-`iris_capabilities` · `iris_narrate` (show intent on-page) · `iris_project` (run-history, see below).
+`iris_sessions` · `iris_domain` (learn the app + gaps, read first) · `iris_snapshot` · `iris_query` ·
+`iris_act` · `iris_act_and_wait` · `iris_observe` · `iris_wait_for` · `iris_assert` · `iris_state` ·
+`iris_diff` · `iris_capabilities` · `iris_narrate` (show intent on-page) · `iris_project` (run-history).
 
 **Reach past core when…** you need to record/replay a journey (`iris_record_start/stop`,
 `iris_replay`), persist a self-healing golden flow (`iris_flow_save*` / `iris_flow_replay` /
@@ -89,15 +103,31 @@ Both need a **driven browser** (`iris drive <url>` / `IRIS_CDP_URL`); without on
 ## Start here
 
 1. `iris_sessions` — find the connected tab (omit `sessionId` if there's only one).
-2. `iris_capabilities` — learn the app's testable surface (`testids`, `signals`, `stores`, `flows`)
-   so you assert on facts without reading source. (`iris_sessions` flags `hasCapabilities`.)
+2. `iris_domain` — learn the app BEFORE testing: the saved flows, what each asserts, and the **gaps**
+   (declared signals/testids that no flow verifies — untested intent). Tells you what to test and
+   where the real risk is without crawling the whole app. Falls back to `iris_capabilities` for the
+   raw testable surface (`testids`, `signals`, `stores`, `flows`).
 3. Run the loop: **look → act → observe → assert**, cross-checking the 4 layers on anything that matters.
 
 ## Token note
 
 - **Keep the eyes cheap.** Prefer `iris_query` / scoped or `interactive` `iris_snapshot` /
   `iris_assert` over dumping the full tree. A full verify loop is ~100 tokens; see
   [token-efficiency.md](token-efficiency.md) (~73× leaner than full-tree snapshots).
+- **Re-look with `iris_snapshot({ diff:true })`** after an action — it returns only what changed
+  (`mode:delta`/`unchanged`), ~99% fewer tokens than a full re-snapshot and no stale tree to
+  mis-read. Every snapshot/query result carries `cost:{ bytes, tokens }` — re-scope before reading
+  if it's large.
+- **Cap broad reads.** `iris_query` takes `limit` (caps descriptors; reports `total`/`truncated`) and
+  `count_only` (just the match count). `iris_network` / `iris_console` take `limit` (most-recent-N,
+  reports `droppedOldest`) and carry the same `cost` hint — so a busy page or wide window never floods
+  your context unnoticed.
+- **A saved flow tells you if it's a real test.** `iris_flow_save` returns `assertions.grade`
+  (`asserted` / `presence-only` / `assertion-free`); if it's not `asserted`, add a consequence
+  (`iris_annotate` assert-signal/assert-net or a success-state) so it can't pass while broken. On
+  replay, an ambiguous heal (two testids tie) is surfaced, never auto-applied — and an `apply` heal
+  re-replays the rebound flow and **refuses to write** if the success consequence no longer fires
+  (`status:consequence_broken`): it heals the locator, never the intent.
 - **Predicate schema is not bloated.** The recursive predicate DSL used by `iris_assert` /
   `iris_wait_for` / `iris_act_and_wait` is **factored, not inlined**: when converted to the
   JSON Schema MCP sends, the predicate body is emitted **once** (~2.7k chars ≈ **~685 tokens**