Skip to content

OpenViking memory engine behind the gate — adapter + query-aware hook + runbook (#147)#177

Open
hanwencheng wants to merge 8 commits into
mainfrom
claude/memory-openviking
Open

OpenViking memory engine behind the gate — adapter + query-aware hook + runbook (#147)#177
hanwencheng wants to merge 8 commits into
mainfrom
claude/memory-openviking

Conversation

@hanwencheng
Copy link
Copy Markdown
Member

Summary

OpenViking as the AgentKeys memory engine behind the gate (model B, plan §6a). Follows up the merged #150 (namespace + engine seam + storage test). OpenViking ranks; AgentKeys keeps storing (K3-encrypted S3), gating (cap/scope/namespace/audit), and delivering (the pre_llm_call hook). OpenViking can reorder what's injected but can never widen it.

What landed

  • Spike (recorded in plan §6a): OpenViking's real server API, taken from the Hermes plugin's actual client — POST /api/v1/search/find {query,top_k}, X-OpenViking-Agent/Account/User + X-API-Key/Bearer headers, GET /health, ov.conf VLM+embedding. Corrected the plan's wrong "deterministic/zero-egress" claim (OpenViking requires a VLM; Holographic is the deterministic one).
  • agentkeys-core::openviking — faithful async client + rank_gate_bounded() with the gate-as-bound safety property (only ever returns gate-authorized lines; errors/empty → deterministic fallback). 4 mock-HTTP tests incl. dropping an unauthorized hit.
  • Query-aware memory-inject hook — in openviking mode reads the current turn from the host payload (is_terminal()-guarded so it can't hang), ranks via OpenViking, falls back to a deterministic engine. extract_query() defensive over Hermes' payload shape.
  • agentkeys wire--openviking-endpoint / --openviking-api-key bake OPENVIKING_ENDPOINT/_API_KEY into the hook only when --memory-engine openviking (else byte-identical).
  • harness/phase1-wire-demo.sh --openviking — optional phase; asserts the wire baking + that the query-aware hook runs; skips gracefully if openviking-server is down.
  • docs/operator-runbook-openviking.md — the complete operator guide (install → configure VLM/embedding → start → mirror gated lines → wire → test gated→ranked→injected → verify safety/privacy). arch.md §15.2 links it.

Verification

  • cargo test -p agentkeys-core -p agentkeys-cli green; cargo clippy -- -D warnings clean; cargo fmt --check clean; bash -n on the harness clean.
  • Not run here: the live OpenViking-server + VLM + sandbox end-to-end is operator-executed per the runbook (this worktree can't run it, same as --real).

🤖 Generated with Claude Code

…ing (#147)

Spike result (from the Hermes openviking plugin client + volcengine/OpenViking):
- Real server API recorded: POST /api/v1/search/find {query,top_k} ->
  {result:{results:[{score,content|text,uri}]}}, /content/{write,read,abstract,
  overview}, /fs/{ls,stat,tree}; base :1933; OPENVIKING_API_KEY + account/user/
  agent headers; VLM+embedding in ~/.openviking/ov.conf; pip install openviking.
- Correction: OpenViking is NOT deterministic/zero-egress (the earlier draft was
  wrong). It REQUIRES a VLM + embedding model and is query-driven. The
  deterministic, no-LLM engine is Holographic. Re-tiered: 1a Holographic
  (deterministic), 1b OpenViking (self-hosted, LLM-backed).
- Consequence: OpenViking fits the QUERY path, not the no-query pre_llm_call
  passive injection. 'Fit OpenViking' = make injection query-aware (turn ->
  search/find -> gate-bounded top-K, recency fallback); the gate still bounds
  injectability so OpenViking ranks but never widens visibility.
…bounded (#147)

Model-B 'engine behind the gate' (plan §6a): OpenViking RANKS; AgentKeys still
stores (K3 S3) + gates + delivers. agentkeys-core::openviking:

- OpenVikingClient: faithful to the spiked API (from the Hermes plugin client,
  NOT guessed) — X-OpenViking-Agent/Account/User + X-API-Key/Bearer headers;
  GET /health; POST /api/v1/search/find {query,top_k} ->
  {result:{results:[{score,content|text}]}}; POST /api/v1/content/write.
  from_env() returns None when OPENVIKING_ENDPOINT unset (clean fallback).
- rank_gate_bounded(): SAFETY — only ever returns lines from the gate-authorized
  input set (OpenViking reorders, never widens visibility); None on error/empty
  so the caller falls back to a deterministic engine.

Tests (axum stub, no live server): score-ordered parse, gate-bound drop of an
unauthorized hit, empty->None fallback, budget cap. cargo test + fmt + clippy
green on agentkeys-core.
…gate (#147)

Piece 2 of the OpenViking model-B integration (plan §6a). memory-inject now:
- When AGENTKEYS_MEMORY_ENGINE=openviking + OPENVIKING_ENDPOINT set, reads the
  current turn from the host payload (stdin) as the query, calls
  openviking::rank_gate_bounded over the gate-authorized namespace lines, and
  injects the gate-bounded top-K. OpenViking ranks; it can never widen what is
  injectable.
- stdin read is guarded by is_terminal() so a direct interactive call cannot
  hang (the historical no-stdin rule for the default engines is preserved —
  only openviking mode reads stdin, and only when piped).
- Falls back to a deterministic engine (LexicalEngine for openviking mode, else
  engine_from_env) when OpenViking is unconfigured / has no query / errors, so
  OpenViking is never load-bearing for availability.
- extract_query(): defensive pull of the user turn (query/prompt/input/
  messages[-1].content) — Hermes' pre_llm_call payload field isn't pinned.

Tests: hook suite (9, incl. extract_query) + core/cli green; fmt + clippy clean.
…viking (#147)

The operator-followable guide for OpenViking-as-engine-behind-the-gate, plus the
last code to make it real.

docs/operator-runbook-openviking.md (NEW): complete step-by-step — install
openviking, configure via its own init wizard (VLM+embedding; Ollama for zero
egress), start + health, mirror gate-authorized lines into its index, wire
AgentKeys with --memory-engine openviking, test gated->ranked->injected, and
verify the safety/privacy properties (gate bounds visibility; OpenViking not
load-bearing; LLM gets no viking_* tools). jq for all JSON (no heredocs). Honest
about the VLM requirement + the egress tradeoff.

agentkeys wire (wire.rs + main.rs): --openviking-endpoint / --openviking-api-key
flags bake OPENVIKING_ENDPOINT / OPENVIKING_API_KEY into the pre_llm_call hook
ONLY when --memory-engine openviking (else not emitted — byte-identical). New
test: endpoint baked iff engine==openviking.

harness/phase1-wire-demo.sh: --openviking phase (after the acts). Skips
gracefully if openviking-server is down; else asserts the wire baked the engine+
endpoint and the query-aware hook runs. Does NOT install/config OpenViking (that
is operator+provider-specific — the runbook covers it). bash -n clean.

arch.md §15.2: link the new runbook.

Verified: cargo test core+cli + clippy + fmt green (the live OpenViking+VLM run
is operator-executed per the runbook; can't run from this worktree).
… hermes setup, real corpus (#147)

Three fixes from following the runbook live:
1. sbx confusion — sbx is a laptop-only helper for the one-shot /v1/shell/exec
   API; it can't run the INTERACTIVE `openviking-server init` wizard. Rewrote
   the runbook to 'docker exec -it … bash' and run all commands directly inside
   the sandbox (no sbx). Added a troubleshooting row for 'command not found: sbx'.
2. hermes memory setup — added a prominent ⛔ callout: do NOT run it. It sets
   memory.provider: openviking (the ungated Model-A path that gives the LLM the
   viking_* tools + bypasses our gate). `agentkeys wire --memory-engine openviking`
   (Step 6) is its replacement.
3. tiny DB isn't meaningful for semantic search — added harness/fixtures/
   sample-memory.md (35 diverse facts) + a Step 4 'direct semantic eval' that
   loads the corpus and queries search/find directly so you SEE semantic recall
   (query words absent from the matches). Clarified direct-eval (Step 4) vs the
   gated path (Steps 5-7, gate bounds to authorized lines).

Also: Step 2 now says VLM → Skip (embedding-only) since our flow only uses
search/find (embeddings); the VLM is OpenViking's extraction engine we don't need.

Verified: loader extracts 35 facts (skips headers/comments); jq payload matches
the content/write contract; runbook links resolve.
Operator ran hermes memory setup (the ungated Model-A provider path) and needed
to reverse it. The runbook warned NOT to run it but didn't say how to undo —
folding that back: an 'Already ran it?' block in the callout (inspect
memory.provider, disable via hermes or by removing the config key + OPENVIKING_*
env) + a troubleshooting row. Stresses it does NOT touch the AgentKeys
pre_llm_call hook (separate managed block) and to keep the agentkeys wire path.
…r (operator QA) (#147)

Operator hit HTTP 400 on every content/write + no dedup on re-run. Root cause:
my example URI 'viking://user/memories/<ns>/<n>' was malformed — the real format
(verbatim from the Hermes plugin's _build_memory_uri) is
'viking://user/<user>/memories/<subdir>/<name>.md' — the <user> segment AND the
.md extension are required or the server 400s. Also: -f hid the error body and
the loop counted iterations, not successes (so 'loaded 35/70' was fiction).

Fixes in the runbook:
- Step 4 loader + Step 5 mirror: correct URI (user segment + .md), drop -f, parse
  the response, count ACTUAL successes, deterministic filenames + treat 'exists'
  as already-loaded => idempotent (no duplicates on re-run).
- Added a one-write sanity check that SHOWS the response.
- Troubleshooting: 400-malformed-URI row + exists-on-rerun row.

(The Rust adapter is unaffected — write_content takes the URI as a param; only the
runbook's example URIs were wrong.) Verified loader payload + branch logic locally.
… QA) (#147)

search/find returns results under result.{memories,resources,skills}[] with
{score,uri,abstract} — NOT result.results[].{content} as I'd assumed (the spike
read the write call but not the response parsing). Operator's
jq '.result.results[]' hit a null -> 'Cannot iterate over null'.

Runbook: corrected the query to .result.memories[]?|{score,uri,abstract} (+ raw
shape first), a content/read fallback for the Skip-VLM empty-abstract case, and
two troubleshooting rows.

KNOWN FOLLOWUP (not in this commit): crates/agentkeys-core/src/openviking.rs
search_find parses the same wrong shape (result.results[].content|text), so the
GATED OpenViking path currently gets no hits and falls back to the deterministic
lexical engine. Fix pending live confirmation of the response (esp. whether
abstract is populated under Skip-VLM, which decides text-match vs content/read).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant