feat(demo): Project auto-escalation wiring, live-progress fixes, vision-click hardening by AVADSA25 · Pull Request #210 · AVADSA25/codec

AVADSA25 · 2026-07-02T14:21:15Z

What

Three demo-hardening fixes from the 2026-07-02 dry-run:

1. Project auto-escalation (Step 10 Q11) — implemented but never called; now wired.
Post-reply (never adds latency to the answer), gated by a cheap regex prefilter so the Qwen classifier only runs on task-shaped messages. UI shows a suggestion chip; "Start as Project" flips to Project mode with the text pre-filled (user still presses send), "No thanks" hits the new POST /api/chat/escalate_silence (Q11 session silence). Audit: agent_auto_escalated_from_chat.

2. Agent live progress actually shows.
_startAgentPoller (approve-path, correct) now also auto-resumes after page reload while the agent is active. And schedulePollAgentTimeline filtered for message types the runner never posts (agent_reply/agent_status) — added the real vocabulary (agent_update/agent_blocked/agent_question/agent_done/agent_aborted).

3. Vision-click (demo step 7) screenshot hardening.
Stress test showed UI-TARS coordinates stable within 2px across runs — the flakiness is screencapture blowing the 5s cap under load (2/6 runs). Now 10s + one retry. Manifest regenerated for the skill edit.

Tests

4 new escalation tests; full suite 2,440 passed.

🤖 Generated with Claude Code

…ling, harden vision-click screenshots Demo-hardening from the 2026-07-02 dry-run: - Step 10 Q11 finally wired: after a task-shaped chat reply (regex prefilter: >=60 chars + action verb, so casual messages never pay the classifier cost), _should_escalate_to_project runs post-reply and the stream emits an escalate_project frame; the UI renders a suggestion chip ("Start as Project" pre-fills Project mode — user still hits send; "No thanks" → new POST /api/chat/escalate_silence, Q11 session silence). Emits agent_auto_escalated_from_chat audit event. - Agent live progress: _startAgentPoller now auto-resumes after a page reload while the bound agent is active (piggybacked on the existing lifecycle tick, idempotent); schedulePollAgentTimeline's type filter included only legacy types (agent_reply/agent_status) — the runner posts agent_update/agent_blocked/agent_question/agent_done/_aborted, so live updates were filtered out on the reply path. - mouse_control: screenshot capture 5s→10s timeout + one retry. Stress test (6 runs): UI-TARS coords stable within 2px, but 2/6 runs died on screencapture timeout — the screenshot, not the model, is the flaky part. Manifest regenerated. Tests: 4 new (escalation prefilter/verdict/never-raises). Full suite 2440 passed (4 pre-existing pilot-e2e failures, known-issues). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

AVADSA25 merged commit 8713d38 into main Jul 2, 2026
1 check passed

AVADSA25 deleted the feat/demo-hardening branch July 2, 2026 14:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(demo): Project auto-escalation wiring, live-progress fixes, vision-click hardening#210

feat(demo): Project auto-escalation wiring, live-progress fixes, vision-click hardening#210
AVADSA25 merged 1 commit into
mainfrom
feat/demo-hardening

AVADSA25 commented Jul 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

AVADSA25 commented Jul 2, 2026

What

Tests

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants