feat: swappable guard backend (vLLM today, realtime hook)#7
Open
melmallow wants to merge 9 commits into
Open
Conversation
…g polish Address final-review Important findings: - Move open_encounter() inside try/finally so _end_conversation cleanup always runs (previously a failing open would wedge _active/cooldown/DB). - Document history as read-only in GuardEncounter.turn. - Correct guard_backend.py docstring reference to realtime_backend as planned. Adds a regression test for the lifecycle guard using a FailingBackend that raises from open_encounter. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Reference doc for this branch — records the interface design, task decomposition, and follow-up scope (RealtimeBackend, frame burst, CVR continuous-video wiring). PR description links here.
c1a2234 added user_text=_TURN_ROLE_REMINDER to _do_turn's minicpmo_chat call but the matching parameter never landed on chat(). Every guard reply turn (turn 2+, after the person responds) has been raising TypeError since, silently caught by ConversationManager's try/except and killing the conversation with '[ConvMgr] MiniCPM-o call failed'. Turn 1 (webhook path) went through initial_text so masked the failure. - Add user_text: str | None = None param. - Prepend the text as {"type": "text", ...} before the image on the user turn so the model has role context when it sees a bare frame. - Add regression test tests/test_backend_chat_compat.py that pins the signature against VllmBackend's forwarded kwarg list — the class of bug _FakeChat (**kwargs) failed to catch. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Introduces a session-shaped
GuardBackendinterface soConversationManager's calls to the guard model can swap between backends via aGUARD_BACKENDenv var without touching conversation code.GUARD_BACKEND=vllm(default) →VllmBackend— thin wrapper overminicpmo_client.chat()with a ring buffer. Behavior identical to the previous commit on the default path.GUARD_BACKEND=realtime→ raisesNotImplementedErrorwith a message pointing at the follow-up plan. Implementing the realtime backend needs OpenBMB's realtime WebSocket server running on the MiniCPM-o box — server-side change tracked separately.What's inside
guard_backend.py(Protocol),vllm_backend.py(VllmBackend + ring buffer),backend_factory.py(env-driven picker)ConversationManagertakes abackend: GuardBackendkeyword-only arg; opens an encounter at conversation start, closes on end (with graceful failure handling ifopen_encounter()raises).main.pylifespan callsget_backend()inside the existingtry:block so a badGUARD_BACKENDvalue degrades gracefully (log + disable conv manager, service still starts).DEPLOY_AWS.mdgains a Backend Selection section.docs/superpowers/plans/2026-07-02-swappable-guard-backend.md.Test plan
pytest tests/ -v— 19/19 passing locally on Python 3.14 + pytest 8.4.2. Warnings are all pre-existing pytest-asyncio + Python 3.14 deprecation noise (not this branch's).GUARD_BACKENDunset →VllmBackend()selected; defaultVLLM_FRAME_BUFFER_SIZE=1sends only the newest frame (matches previous single-frameminicpmo_chatcall shape).open_encounter()raises returns the manager to IDLE, resets_active, and starts the cooldown timer — no wedge.uvicorn main:appwithGUARD_BACKENDunset and confirm conversation flow is unchanged. (Left to the reviewer since it needs the hardware/CVR loop.)GUARD_BACKEND=realtime— expected: startup logsNotImplementedError,_conv_manager = None, service still serves other routes. (Left to the reviewer.)Follow-ups (out of scope)
RealtimeBackendimplementation. WebSocket client for OpenBMB's realtime MiniCPM-o server. Needs the server standing up first.VLLM_FRAME_BUFFER_SIZEand extendminicpmo_client.chat()to accept multiple frames per request.minicpmo_client.chat()does NOT accept auser_textkwarg, butconversation_manager._do_turn(and nowVllmBackend) pass it. Every conversation turn today would raiseTypeErrorat runtime. Introduced in c1a2234 (feat: stable two-way guard conversation). Our tests didn't catch it because_FakeChatuses**kwargs. Recommend either droppinguser_textfrom the call, adding it tominicpmo_client.chat, or folding it into the system prompt.🤖 Generated with Claude Code