OpenViking memory engine behind the gate — adapter + query-aware hook + runbook (#147) by hanwencheng · Pull Request #177 · litentry/agentKeys

hanwencheng · 2026-06-02T14:26:09Z

Summary

OpenViking as the AgentKeys memory engine behind the gate (model B, plan §6a). Follows up the merged #150 (namespace + engine seam + storage test). OpenViking ranks; AgentKeys keeps storing (K3-encrypted S3), gating (cap/scope/namespace/audit), and delivering (the pre_llm_call hook). OpenViking can reorder what's injected but can never widen it.

What landed

Spike (recorded in plan §6a): OpenViking's real server API, taken from the Hermes plugin's actual client — POST /api/v1/search/find {query,top_k}, X-OpenViking-Agent/Account/User + X-API-Key/Bearer headers, GET /health, ov.conf VLM+embedding. Corrected the plan's wrong "deterministic/zero-egress" claim (OpenViking requires a VLM; Holographic is the deterministic one).
agentkeys-core::openviking — faithful async client + rank_gate_bounded() with the gate-as-bound safety property (only ever returns gate-authorized lines; errors/empty → deterministic fallback). 4 mock-HTTP tests incl. dropping an unauthorized hit.
Query-aware memory-inject hook — in openviking mode reads the current turn from the host payload (is_terminal()-guarded so it can't hang), ranks via OpenViking, falls back to a deterministic engine. extract_query() defensive over Hermes' payload shape.
agentkeys wire — --openviking-endpoint / --openviking-api-key bake OPENVIKING_ENDPOINT/_API_KEY into the hook only when --memory-engine openviking (else byte-identical).
harness/phase1-wire-demo.sh --openviking — optional phase; asserts the wire baking + that the query-aware hook runs; skips gracefully if openviking-server is down.
docs/operator-runbook-openviking.md — the complete operator guide (install → configure VLM/embedding → start → mirror gated lines → wire → test gated→ranked→injected → verify safety/privacy). arch.md §15.2 links it.

Verification

cargo test -p agentkeys-core -p agentkeys-cli green; cargo clippy -- -D warnings clean; cargo fmt --check clean; bash -n on the harness clean.
Not run here: the live OpenViking-server + VLM + sandbox end-to-end is operator-executed per the runbook (this worktree can't run it, same as --real).

🤖 Generated with Claude Code

…ing (#147) Spike result (from the Hermes openviking plugin client + volcengine/OpenViking): - Real server API recorded: POST /api/v1/search/find {query,top_k} -> {result:{results:[{score,content|text,uri}]}}, /content/{write,read,abstract, overview}, /fs/{ls,stat,tree}; base :1933; OPENVIKING_API_KEY + account/user/ agent headers; VLM+embedding in ~/.openviking/ov.conf; pip install openviking. - Correction: OpenViking is NOT deterministic/zero-egress (the earlier draft was wrong). It REQUIRES a VLM + embedding model and is query-driven. The deterministic, no-LLM engine is Holographic. Re-tiered: 1a Holographic (deterministic), 1b OpenViking (self-hosted, LLM-backed). - Consequence: OpenViking fits the QUERY path, not the no-query pre_llm_call passive injection. 'Fit OpenViking' = make injection query-aware (turn -> search/find -> gate-bounded top-K, recency fallback); the gate still bounds injectability so OpenViking ranks but never widens visibility.

…bounded (#147) Model-B 'engine behind the gate' (plan §6a): OpenViking RANKS; AgentKeys still stores (K3 S3) + gates + delivers. agentkeys-core::openviking: - OpenVikingClient: faithful to the spiked API (from the Hermes plugin client, NOT guessed) — X-OpenViking-Agent/Account/User + X-API-Key/Bearer headers; GET /health; POST /api/v1/search/find {query,top_k} -> {result:{results:[{score,content|text}]}}; POST /api/v1/content/write. from_env() returns None when OPENVIKING_ENDPOINT unset (clean fallback). - rank_gate_bounded(): SAFETY — only ever returns lines from the gate-authorized input set (OpenViking reorders, never widens visibility); None on error/empty so the caller falls back to a deterministic engine. Tests (axum stub, no live server): score-ordered parse, gate-bound drop of an unauthorized hit, empty->None fallback, budget cap. cargo test + fmt + clippy green on agentkeys-core.

…gate (#147) Piece 2 of the OpenViking model-B integration (plan §6a). memory-inject now: - When AGENTKEYS_MEMORY_ENGINE=openviking + OPENVIKING_ENDPOINT set, reads the current turn from the host payload (stdin) as the query, calls openviking::rank_gate_bounded over the gate-authorized namespace lines, and injects the gate-bounded top-K. OpenViking ranks; it can never widen what is injectable. - stdin read is guarded by is_terminal() so a direct interactive call cannot hang (the historical no-stdin rule for the default engines is preserved — only openviking mode reads stdin, and only when piped). - Falls back to a deterministic engine (LexicalEngine for openviking mode, else engine_from_env) when OpenViking is unconfigured / has no query / errors, so OpenViking is never load-bearing for availability. - extract_query(): defensive pull of the user turn (query/prompt/input/ messages[-1].content) — Hermes' pre_llm_call payload field isn't pinned. Tests: hook suite (9, incl. extract_query) + core/cli green; fmt + clippy clean.

…viking (#147) The operator-followable guide for OpenViking-as-engine-behind-the-gate, plus the last code to make it real. docs/operator-runbook-openviking.md (NEW): complete step-by-step — install openviking, configure via its own init wizard (VLM+embedding; Ollama for zero egress), start + health, mirror gate-authorized lines into its index, wire AgentKeys with --memory-engine openviking, test gated->ranked->injected, and verify the safety/privacy properties (gate bounds visibility; OpenViking not load-bearing; LLM gets no viking_* tools). jq for all JSON (no heredocs). Honest about the VLM requirement + the egress tradeoff. agentkeys wire (wire.rs + main.rs): --openviking-endpoint / --openviking-api-key flags bake OPENVIKING_ENDPOINT / OPENVIKING_API_KEY into the pre_llm_call hook ONLY when --memory-engine openviking (else not emitted — byte-identical). New test: endpoint baked iff engine==openviking. harness/phase1-wire-demo.sh: --openviking phase (after the acts). Skips gracefully if openviking-server is down; else asserts the wire baked the engine+ endpoint and the query-aware hook runs. Does NOT install/config OpenViking (that is operator+provider-specific — the runbook covers it). bash -n clean. arch.md §15.2: link the new runbook. Verified: cargo test core+cli + clippy + fmt green (the live OpenViking+VLM run is operator-executed per the runbook; can't run from this worktree).

… hermes setup, real corpus (#147) Three fixes from following the runbook live: 1. sbx confusion — sbx is a laptop-only helper for the one-shot /v1/shell/exec API; it can't run the INTERACTIVE `openviking-server init` wizard. Rewrote the runbook to 'docker exec -it … bash' and run all commands directly inside the sandbox (no sbx). Added a troubleshooting row for 'command not found: sbx'. 2. hermes memory setup — added a prominent ⛔ callout: do NOT run it. It sets memory.provider: openviking (the ungated Model-A path that gives the LLM the viking_* tools + bypasses our gate). `agentkeys wire --memory-engine openviking` (Step 6) is its replacement. 3. tiny DB isn't meaningful for semantic search — added harness/fixtures/ sample-memory.md (35 diverse facts) + a Step 4 'direct semantic eval' that loads the corpus and queries search/find directly so you SEE semantic recall (query words absent from the matches). Clarified direct-eval (Step 4) vs the gated path (Steps 5-7, gate bounds to authorized lines). Also: Step 2 now says VLM → Skip (embedding-only) since our flow only uses search/find (embeddings); the VLM is OpenViking's extraction engine we don't need. Verified: loader extracts 35 facts (skips headers/comments); jq payload matches the content/write contract; runbook links resolve.

Operator ran hermes memory setup (the ungated Model-A provider path) and needed to reverse it. The runbook warned NOT to run it but didn't say how to undo — folding that back: an 'Already ran it?' block in the callout (inspect memory.provider, disable via hermes or by removing the config key + OPENVIKING_* env) + a troubleshooting row. Stresses it does NOT touch the AgentKeys pre_llm_call hook (separate managed block) and to keep the agentkeys wire path.

…r (operator QA) (#147) Operator hit HTTP 400 on every content/write + no dedup on re-run. Root cause: my example URI 'viking://user/memories/<ns>/<n>' was malformed — the real format (verbatim from the Hermes plugin's _build_memory_uri) is 'viking://user/<user>/memories/<subdir>/<name>.md' — the <user> segment AND the .md extension are required or the server 400s. Also: -f hid the error body and the loop counted iterations, not successes (so 'loaded 35/70' was fiction). Fixes in the runbook: - Step 4 loader + Step 5 mirror: correct URI (user segment + .md), drop -f, parse the response, count ACTUAL successes, deterministic filenames + treat 'exists' as already-loaded => idempotent (no duplicates on re-run). - Added a one-write sanity check that SHOWS the response. - Troubleshooting: 400-malformed-URI row + exists-on-rerun row. (The Rust adapter is unaffected — write_content takes the URI as a param; only the runbook's example URIs were wrong.) Verified loader payload + branch logic locally.

… QA) (#147) search/find returns results under result.{memories,resources,skills}[] with {score,uri,abstract} — NOT result.results[].{content} as I'd assumed (the spike read the write call but not the response parsing). Operator's jq '.result.results[]' hit a null -> 'Cannot iterate over null'. Runbook: corrected the query to .result.memories[]?|{score,uri,abstract} (+ raw shape first), a content/read fallback for the Skip-VLM empty-abstract case, and two troubleshooting rows. KNOWN FOLLOWUP (not in this commit): crates/agentkeys-core/src/openviking.rs search_find parses the same wrong shape (result.results[].content|text), so the GATED OpenViking path currently gets no hits and falls back to the deterministic lexical engine. Fix pending live confirmation of the response (esp. whether abstract is populated under Skip-VLM, which decides text-match vs content/read).

hanwencheng added 8 commits June 2, 2026 22:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OpenViking memory engine behind the gate — adapter + query-aware hook + runbook (#147)#177

OpenViking memory engine behind the gate — adapter + query-aware hook + runbook (#147)#177
hanwencheng wants to merge 8 commits into
mainfrom
claude/memory-openviking

hanwencheng commented Jun 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

hanwencheng commented Jun 2, 2026

Summary

What landed

Verification

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant