Skip to content

feat(demo): Project auto-escalation wiring, live-progress fixes, vision-click hardening#210

Merged
AVADSA25 merged 1 commit into
mainfrom
feat/demo-hardening
Jul 2, 2026
Merged

feat(demo): Project auto-escalation wiring, live-progress fixes, vision-click hardening#210
AVADSA25 merged 1 commit into
mainfrom
feat/demo-hardening

Conversation

@AVADSA25

@AVADSA25 AVADSA25 commented Jul 2, 2026

Copy link
Copy Markdown
Owner

What

Three demo-hardening fixes from the 2026-07-02 dry-run:

1. Project auto-escalation (Step 10 Q11) — implemented but never called; now wired.
Post-reply (never adds latency to the answer), gated by a cheap regex prefilter so the Qwen classifier only runs on task-shaped messages. UI shows a suggestion chip; "Start as Project" flips to Project mode with the text pre-filled (user still presses send), "No thanks" hits the new POST /api/chat/escalate_silence (Q11 session silence). Audit: agent_auto_escalated_from_chat.

2. Agent live progress actually shows.
_startAgentPoller (approve-path, correct) now also auto-resumes after page reload while the agent is active. And schedulePollAgentTimeline filtered for message types the runner never posts (agent_reply/agent_status) — added the real vocabulary (agent_update/agent_blocked/agent_question/agent_done/agent_aborted).

3. Vision-click (demo step 7) screenshot hardening.
Stress test showed UI-TARS coordinates stable within 2px across runs — the flakiness is screencapture blowing the 5s cap under load (2/6 runs). Now 10s + one retry. Manifest regenerated for the skill edit.

Tests

4 new escalation tests; full suite 2,440 passed.

🤖 Generated with Claude Code

…ling, harden vision-click screenshots

Demo-hardening from the 2026-07-02 dry-run:

- Step 10 Q11 finally wired: after a task-shaped chat reply (regex
  prefilter: >=60 chars + action verb, so casual messages never pay the
  classifier cost), _should_escalate_to_project runs post-reply and the
  stream emits an escalate_project frame; the UI renders a suggestion
  chip ("Start as Project" pre-fills Project mode — user still hits
  send; "No thanks" → new POST /api/chat/escalate_silence, Q11
  session silence). Emits agent_auto_escalated_from_chat audit event.

- Agent live progress: _startAgentPoller now auto-resumes after a page
  reload while the bound agent is active (piggybacked on the existing
  lifecycle tick, idempotent); schedulePollAgentTimeline's type filter
  included only legacy types (agent_reply/agent_status) — the runner
  posts agent_update/agent_blocked/agent_question/agent_done/_aborted,
  so live updates were filtered out on the reply path.

- mouse_control: screenshot capture 5s→10s timeout + one retry. Stress
  test (6 runs): UI-TARS coords stable within 2px, but 2/6 runs died on
  screencapture timeout — the screenshot, not the model, is the flaky
  part. Manifest regenerated.

Tests: 4 new (escalation prefilter/verdict/never-raises). Full suite
2440 passed (4 pre-existing pilot-e2e failures, known-issues).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@AVADSA25 AVADSA25 merged commit 8713d38 into main Jul 2, 2026
1 check passed
@AVADSA25 AVADSA25 deleted the feat/demo-hardening branch July 2, 2026 14:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants