OpenViking memory engine behind the gate — adapter + query-aware hook + runbook (#147)#177
Open
hanwencheng wants to merge 8 commits into
Open
OpenViking memory engine behind the gate — adapter + query-aware hook + runbook (#147)#177hanwencheng wants to merge 8 commits into
hanwencheng wants to merge 8 commits into
Conversation
…ing (#147) Spike result (from the Hermes openviking plugin client + volcengine/OpenViking): - Real server API recorded: POST /api/v1/search/find {query,top_k} -> {result:{results:[{score,content|text,uri}]}}, /content/{write,read,abstract, overview}, /fs/{ls,stat,tree}; base :1933; OPENVIKING_API_KEY + account/user/ agent headers; VLM+embedding in ~/.openviking/ov.conf; pip install openviking. - Correction: OpenViking is NOT deterministic/zero-egress (the earlier draft was wrong). It REQUIRES a VLM + embedding model and is query-driven. The deterministic, no-LLM engine is Holographic. Re-tiered: 1a Holographic (deterministic), 1b OpenViking (self-hosted, LLM-backed). - Consequence: OpenViking fits the QUERY path, not the no-query pre_llm_call passive injection. 'Fit OpenViking' = make injection query-aware (turn -> search/find -> gate-bounded top-K, recency fallback); the gate still bounds injectability so OpenViking ranks but never widens visibility.
…bounded (#147) Model-B 'engine behind the gate' (plan §6a): OpenViking RANKS; AgentKeys still stores (K3 S3) + gates + delivers. agentkeys-core::openviking: - OpenVikingClient: faithful to the spiked API (from the Hermes plugin client, NOT guessed) — X-OpenViking-Agent/Account/User + X-API-Key/Bearer headers; GET /health; POST /api/v1/search/find {query,top_k} -> {result:{results:[{score,content|text}]}}; POST /api/v1/content/write. from_env() returns None when OPENVIKING_ENDPOINT unset (clean fallback). - rank_gate_bounded(): SAFETY — only ever returns lines from the gate-authorized input set (OpenViking reorders, never widens visibility); None on error/empty so the caller falls back to a deterministic engine. Tests (axum stub, no live server): score-ordered parse, gate-bound drop of an unauthorized hit, empty->None fallback, budget cap. cargo test + fmt + clippy green on agentkeys-core.
…gate (#147) Piece 2 of the OpenViking model-B integration (plan §6a). memory-inject now: - When AGENTKEYS_MEMORY_ENGINE=openviking + OPENVIKING_ENDPOINT set, reads the current turn from the host payload (stdin) as the query, calls openviking::rank_gate_bounded over the gate-authorized namespace lines, and injects the gate-bounded top-K. OpenViking ranks; it can never widen what is injectable. - stdin read is guarded by is_terminal() so a direct interactive call cannot hang (the historical no-stdin rule for the default engines is preserved — only openviking mode reads stdin, and only when piped). - Falls back to a deterministic engine (LexicalEngine for openviking mode, else engine_from_env) when OpenViking is unconfigured / has no query / errors, so OpenViking is never load-bearing for availability. - extract_query(): defensive pull of the user turn (query/prompt/input/ messages[-1].content) — Hermes' pre_llm_call payload field isn't pinned. Tests: hook suite (9, incl. extract_query) + core/cli green; fmt + clippy clean.
…viking (#147) The operator-followable guide for OpenViking-as-engine-behind-the-gate, plus the last code to make it real. docs/operator-runbook-openviking.md (NEW): complete step-by-step — install openviking, configure via its own init wizard (VLM+embedding; Ollama for zero egress), start + health, mirror gate-authorized lines into its index, wire AgentKeys with --memory-engine openviking, test gated->ranked->injected, and verify the safety/privacy properties (gate bounds visibility; OpenViking not load-bearing; LLM gets no viking_* tools). jq for all JSON (no heredocs). Honest about the VLM requirement + the egress tradeoff. agentkeys wire (wire.rs + main.rs): --openviking-endpoint / --openviking-api-key flags bake OPENVIKING_ENDPOINT / OPENVIKING_API_KEY into the pre_llm_call hook ONLY when --memory-engine openviking (else not emitted — byte-identical). New test: endpoint baked iff engine==openviking. harness/phase1-wire-demo.sh: --openviking phase (after the acts). Skips gracefully if openviking-server is down; else asserts the wire baked the engine+ endpoint and the query-aware hook runs. Does NOT install/config OpenViking (that is operator+provider-specific — the runbook covers it). bash -n clean. arch.md §15.2: link the new runbook. Verified: cargo test core+cli + clippy + fmt green (the live OpenViking+VLM run is operator-executed per the runbook; can't run from this worktree).
… hermes setup, real corpus (#147) Three fixes from following the runbook live: 1. sbx confusion — sbx is a laptop-only helper for the one-shot /v1/shell/exec API; it can't run the INTERACTIVE `openviking-server init` wizard. Rewrote the runbook to 'docker exec -it … bash' and run all commands directly inside the sandbox (no sbx). Added a troubleshooting row for 'command not found: sbx'. 2. hermes memory setup — added a prominent ⛔ callout: do NOT run it. It sets memory.provider: openviking (the ungated Model-A path that gives the LLM the viking_* tools + bypasses our gate). `agentkeys wire --memory-engine openviking` (Step 6) is its replacement. 3. tiny DB isn't meaningful for semantic search — added harness/fixtures/ sample-memory.md (35 diverse facts) + a Step 4 'direct semantic eval' that loads the corpus and queries search/find directly so you SEE semantic recall (query words absent from the matches). Clarified direct-eval (Step 4) vs the gated path (Steps 5-7, gate bounds to authorized lines). Also: Step 2 now says VLM → Skip (embedding-only) since our flow only uses search/find (embeddings); the VLM is OpenViking's extraction engine we don't need. Verified: loader extracts 35 facts (skips headers/comments); jq payload matches the content/write contract; runbook links resolve.
Operator ran hermes memory setup (the ungated Model-A provider path) and needed to reverse it. The runbook warned NOT to run it but didn't say how to undo — folding that back: an 'Already ran it?' block in the callout (inspect memory.provider, disable via hermes or by removing the config key + OPENVIKING_* env) + a troubleshooting row. Stresses it does NOT touch the AgentKeys pre_llm_call hook (separate managed block) and to keep the agentkeys wire path.
…r (operator QA) (#147) Operator hit HTTP 400 on every content/write + no dedup on re-run. Root cause: my example URI 'viking://user/memories/<ns>/<n>' was malformed — the real format (verbatim from the Hermes plugin's _build_memory_uri) is 'viking://user/<user>/memories/<subdir>/<name>.md' — the <user> segment AND the .md extension are required or the server 400s. Also: -f hid the error body and the loop counted iterations, not successes (so 'loaded 35/70' was fiction). Fixes in the runbook: - Step 4 loader + Step 5 mirror: correct URI (user segment + .md), drop -f, parse the response, count ACTUAL successes, deterministic filenames + treat 'exists' as already-loaded => idempotent (no duplicates on re-run). - Added a one-write sanity check that SHOWS the response. - Troubleshooting: 400-malformed-URI row + exists-on-rerun row. (The Rust adapter is unaffected — write_content takes the URI as a param; only the runbook's example URIs were wrong.) Verified loader payload + branch logic locally.
… QA) (#147) search/find returns results under result.{memories,resources,skills}[] with {score,uri,abstract} — NOT result.results[].{content} as I'd assumed (the spike read the write call but not the response parsing). Operator's jq '.result.results[]' hit a null -> 'Cannot iterate over null'. Runbook: corrected the query to .result.memories[]?|{score,uri,abstract} (+ raw shape first), a content/read fallback for the Skip-VLM empty-abstract case, and two troubleshooting rows. KNOWN FOLLOWUP (not in this commit): crates/agentkeys-core/src/openviking.rs search_find parses the same wrong shape (result.results[].content|text), so the GATED OpenViking path currently gets no hits and falls back to the deterministic lexical engine. Fix pending live confirmation of the response (esp. whether abstract is populated under Skip-VLM, which decides text-match vs content/read).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
OpenViking as the AgentKeys memory engine behind the gate (model B, plan §6a). Follows up the merged #150 (namespace + engine seam + storage test). OpenViking ranks; AgentKeys keeps storing (K3-encrypted S3), gating (cap/scope/namespace/audit), and delivering (the
pre_llm_callhook). OpenViking can reorder what's injected but can never widen it.What landed
POST /api/v1/search/find {query,top_k},X-OpenViking-Agent/Account/User+X-API-Key/Bearer headers,GET /health,ov.confVLM+embedding. Corrected the plan's wrong "deterministic/zero-egress" claim (OpenViking requires a VLM; Holographic is the deterministic one).agentkeys-core::openviking— faithful async client +rank_gate_bounded()with the gate-as-bound safety property (only ever returns gate-authorized lines; errors/empty → deterministic fallback). 4 mock-HTTP tests incl. dropping an unauthorized hit.memory-injecthook — in openviking mode reads the current turn from the host payload (is_terminal()-guarded so it can't hang), ranks via OpenViking, falls back to a deterministic engine.extract_query()defensive over Hermes' payload shape.agentkeys wire—--openviking-endpoint/--openviking-api-keybakeOPENVIKING_ENDPOINT/_API_KEYinto the hook only when--memory-engine openviking(else byte-identical).harness/phase1-wire-demo.sh --openviking— optional phase; asserts the wire baking + that the query-aware hook runs; skips gracefully ifopenviking-serveris down.docs/operator-runbook-openviking.md— the complete operator guide (install → configure VLM/embedding → start → mirror gated lines → wire → test gated→ranked→injected → verify safety/privacy). arch.md §15.2 links it.Verification
cargo test -p agentkeys-core -p agentkeys-cligreen;cargo clippy -- -D warningsclean;cargo fmt --checkclean;bash -non the harness clean.--real).🤖 Generated with Claude Code