Skip to content

feat: swappable guard backend (vLLM today, realtime hook)#7

Open
melmallow wants to merge 9 commits into
mainfrom
feat/swappable-guard-backend
Open

feat: swappable guard backend (vLLM today, realtime hook)#7
melmallow wants to merge 9 commits into
mainfrom
feat/swappable-guard-backend

Conversation

@melmallow

Copy link
Copy Markdown
Contributor

Summary

Introduces a session-shaped GuardBackend interface so ConversationManager's calls to the guard model can swap between backends via a GUARD_BACKEND env var without touching conversation code.

  • Today's path: GUARD_BACKEND=vllm (default) → VllmBackend — thin wrapper over minicpmo_client.chat() with a ring buffer. Behavior identical to the previous commit on the default path.
  • Future path: GUARD_BACKEND=realtime → raises NotImplementedError with a message pointing at the follow-up plan. Implementing the realtime backend needs OpenBMB's realtime WebSocket server running on the MiniCPM-o box — server-side change tracked separately.

What's inside

  • New: guard_backend.py (Protocol), vllm_backend.py (VllmBackend + ring buffer), backend_factory.py (env-driven picker)
  • Refactor: ConversationManager takes a backend: GuardBackend keyword-only arg; opens an encounter at conversation start, closes on end (with graceful failure handling if open_encounter() raises).
  • Wire: main.py lifespan calls get_backend() inside the existing try: block so a bad GUARD_BACKEND value degrades gracefully (log + disable conv manager, service still starts).
  • Docs: DEPLOY_AWS.md gains a Backend Selection section.
  • Test scaffold: pytest + pytest-asyncio (repo had none). 19 tests total.
  • Plan: docs/superpowers/plans/2026-07-02-swappable-guard-backend.md.

Test plan

  • pytest tests/ -v — 19/19 passing locally on Python 3.14 + pytest 8.4.2. Warnings are all pre-existing pytest-asyncio + Python 3.14 deprecation noise (not this branch's).
  • GUARD_BACKEND unset → VllmBackend() selected; default VLLM_FRAME_BUFFER_SIZE=1 sends only the newest frame (matches previous single-frame minicpmo_chat call shape).
  • Regression test for encounter-lifecycle guard: a backend whose open_encounter() raises returns the manager to IDLE, resets _active, and starts the cooldown timer — no wedge.
  • Runtime smoke on the real service: start uvicorn main:app with GUARD_BACKEND unset and confirm conversation flow is unchanged. (Left to the reviewer since it needs the hardware/CVR loop.)
  • Runtime smoke on GUARD_BACKEND=realtime — expected: startup logs NotImplementedError, _conv_manager = None, service still serves other routes. (Left to the reviewer.)

Follow-ups (out of scope)

  • RealtimeBackend implementation. WebSocket client for OpenBMB's realtime MiniCPM-o server. Needs the server standing up first.
  • Frame burst experiment. Bump VLLM_FRAME_BUFFER_SIZE and extend minicpmo_client.chat() to accept multiple frames per request.
  • Pre-existing bug (not caused by this branch, worth a separate issue): minicpmo_client.chat() does NOT accept a user_text kwarg, but conversation_manager._do_turn (and now VllmBackend) pass it. Every conversation turn today would raise TypeError at runtime. Introduced in c1a2234 (feat: stable two-way guard conversation). Our tests didn't catch it because _FakeChat uses **kwargs. Recommend either dropping user_text from the call, adding it to minicpmo_client.chat, or folding it into the system prompt.

🤖 Generated with Claude Code

Melissa Hargis and others added 9 commits July 2, 2026 12:37
…g polish

Address final-review Important findings:
- Move open_encounter() inside try/finally so _end_conversation cleanup
  always runs (previously a failing open would wedge _active/cooldown/DB).
- Document history as read-only in GuardEncounter.turn.
- Correct guard_backend.py docstring reference to realtime_backend as planned.

Adds a regression test for the lifecycle guard using a FailingBackend
that raises from open_encounter.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Reference doc for this branch — records the interface design, task
decomposition, and follow-up scope (RealtimeBackend, frame burst, CVR
continuous-video wiring). PR description links here.
c1a2234 added user_text=_TURN_ROLE_REMINDER to _do_turn's minicpmo_chat
call but the matching parameter never landed on chat(). Every guard reply
turn (turn 2+, after the person responds) has been raising TypeError since,
silently caught by ConversationManager's try/except and killing the
conversation with '[ConvMgr] MiniCPM-o call failed'. Turn 1 (webhook path)
went through initial_text so masked the failure.

- Add user_text: str | None = None param.
- Prepend the text as {"type": "text", ...} before the image on the user
  turn so the model has role context when it sees a bare frame.
- Add regression test tests/test_backend_chat_compat.py that pins the
  signature against VllmBackend's forwarded kwarg list — the class of bug
  _FakeChat (**kwargs) failed to catch.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant