feat: swappable guard backend (vLLM today, realtime hook) by melmallow · Pull Request #7 · cloudastructure/rioc

melmallow · 2026-07-02T18:03:37Z

Summary

Introduces a session-shaped GuardBackend interface so ConversationManager's calls to the guard model can swap between backends via a GUARD_BACKEND env var without touching conversation code.

Today's path: GUARD_BACKEND=vllm (default) → VllmBackend — thin wrapper over minicpmo_client.chat() with a ring buffer. Behavior identical to the previous commit on the default path.
Future path: GUARD_BACKEND=realtime → raises NotImplementedError with a message pointing at the follow-up plan. Implementing the realtime backend needs OpenBMB's realtime WebSocket server running on the MiniCPM-o box — server-side change tracked separately.

What's inside

New: guard_backend.py (Protocol), vllm_backend.py (VllmBackend + ring buffer), backend_factory.py (env-driven picker)
Refactor: ConversationManager takes a backend: GuardBackend keyword-only arg; opens an encounter at conversation start, closes on end (with graceful failure handling if open_encounter() raises).
Wire: main.py lifespan calls get_backend() inside the existing try: block so a bad GUARD_BACKEND value degrades gracefully (log + disable conv manager, service still starts).
Docs: DEPLOY_AWS.md gains a Backend Selection section.
Test scaffold: pytest + pytest-asyncio (repo had none). 19 tests total.
Plan: docs/superpowers/plans/2026-07-02-swappable-guard-backend.md.

Test plan

pytest tests/ -v — 19/19 passing locally on Python 3.14 + pytest 8.4.2. Warnings are all pre-existing pytest-asyncio + Python 3.14 deprecation noise (not this branch's).
GUARD_BACKEND unset → VllmBackend() selected; default VLLM_FRAME_BUFFER_SIZE=1 sends only the newest frame (matches previous single-frame minicpmo_chat call shape).
Regression test for encounter-lifecycle guard: a backend whose open_encounter() raises returns the manager to IDLE, resets _active, and starts the cooldown timer — no wedge.
Runtime smoke on the real service: start uvicorn main:app with GUARD_BACKEND unset and confirm conversation flow is unchanged. (Left to the reviewer since it needs the hardware/CVR loop.)
Runtime smoke on GUARD_BACKEND=realtime — expected: startup logs NotImplementedError, _conv_manager = None, service still serves other routes. (Left to the reviewer.)

Follow-ups (out of scope)

RealtimeBackend implementation. WebSocket client for OpenBMB's realtime MiniCPM-o server. Needs the server standing up first.
Frame burst experiment. Bump VLLM_FRAME_BUFFER_SIZE and extend minicpmo_client.chat() to accept multiple frames per request.
Pre-existing bug (not caused by this branch, worth a separate issue): minicpmo_client.chat() does NOT accept a user_text kwarg, but conversation_manager._do_turn (and now VllmBackend) pass it. Every conversation turn today would raise TypeError at runtime. Introduced in c1a2234 (feat: stable two-way guard conversation). Our tests didn't catch it because _FakeChat uses **kwargs. Recommend either dropping user_text from the call, adding it to minicpmo_client.chat, or folding it into the system prompt.

🤖 Generated with Claude Code

…g polish Address final-review Important findings: - Move open_encounter() inside try/finally so _end_conversation cleanup always runs (previously a failing open would wedge _active/cooldown/DB). - Document history as read-only in GuardEncounter.turn. - Correct guard_backend.py docstring reference to realtime_backend as planned. Adds a regression test for the lifecycle guard using a FailingBackend that raises from open_encounter. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Reference doc for this branch — records the interface design, task decomposition, and follow-up scope (RealtimeBackend, frame burst, CVR continuous-video wiring). PR description links here.

c1a2234 added user_text=_TURN_ROLE_REMINDER to _do_turn's minicpmo_chat call but the matching parameter never landed on chat(). Every guard reply turn (turn 2+, after the person responds) has been raising TypeError since, silently caught by ConversationManager's try/except and killing the conversation with '[ConvMgr] MiniCPM-o call failed'. Turn 1 (webhook path) went through initial_text so masked the failure. - Add user_text: str | None = None param. - Prepend the text as {"type": "text", ...} before the image on the user turn so the model has role context when it sees a bare frame. - Add regression test tests/test_backend_chat_compat.py that pins the signature against VllmBackend's forwarded kwarg list — the class of bug _FakeChat (**kwargs) failed to catch. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Melissa Hargis and others added 9 commits July 2, 2026 12:37

test: add pytest scaffold with asyncio auto mode

15d7004

feat: define GuardBackend / GuardEncounter Protocol

686910b

feat: add VllmBackend wrapping minicpmo_client.chat

ddad54a

refactor: route ConversationManager through GuardBackend interface

84b7c08

feat: env-driven guard backend selection (GUARD_BACKEND)

e36eedf

docs: describe GUARD_BACKEND selection and VLLM_FRAME_BUFFER_SIZE

c39084f

docs: add implementation plan for swappable guard backend

d95a566

Reference doc for this branch — records the interface design, task decomposition, and follow-up scope (RealtimeBackend, frame burst, CVR continuous-video wiring). PR description links here.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: swappable guard backend (vLLM today, realtime hook)#7

feat: swappable guard backend (vLLM today, realtime hook)#7
melmallow wants to merge 9 commits into
mainfrom
feat/swappable-guard-backend

melmallow commented Jul 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant