Hermes-inspired memory hardening: hybrid retrieval, trust feedback, auto-consolidation, tiered context, embedding blobs#35
Merged
Conversation
Inspired by Hermes Agent's Holographic memory provider, this adds three interlocking improvements to the cortex-engine cognitive loop: 1. Hybrid lexical+vector retrieval (FTS5 on SQLite, token-overlap fallback on JSON/Firestore). The `query` tool now merges BM25 full-text hits into the semantic candidate set, re-scoring them by cosine before ranking. Exact IDs, proper nouns, and rare terms that embeddings miss are now surfaced. Controlled by `lexical: false` to opt out. 2. Asymmetric trust feedback tool. New `feedback` tool lets agents close the retrieval loop: helpful memories gain +0.05 confidence, unhelpful ones lose -0.10. The asymmetry mirrors Hermes' holographic trust scoring — bad retrievals decay out of rankings faster than good ones earn their way in. Every event logged to `feedback_log` for `retrieval_audit`. 3. Schema hardening on SQLite. `last_retrieval_score`, `last_hop_count`, and `memory_origin` are now first-class persisted columns (with migration shims for existing DBs). FTS5 external-content index kept in sync by triggers. Six missing indexes added (edges, obs, memories, ops, beliefs). `recursive_triggers = ON` so INSERT OR REPLACE correctly fires the FTS delete trigger on upserts. https://claude.ai/code/session_01DAZ3GzRri9hqxkTyqmSpc4
Two of the three Hermes Agent patterns that were not yet in cortex-engine: ## Thing 2 — Automatic session-end memory extraction Hermes syncs conversation turns to memory after each response and extracts on session end. SessionConsolidator (engines/auto-consolidate.ts) replicates this loop: - observe / wonder / speculate call consolidator.notifyObservation() after every successful write. - When pending count crosses AUTO_THRESHOLD (10) per namespace, dreamPhaseA fires in the background without blocking the calling tool. Phase A only (NREM: cluster → refine → create) — lightweight enough to run per session; REM stays in the scheduled dream cron. - SIGTERM / SIGINT / beforeExit handlers flush all namespaces with unprocessed observations before the process dies. - Background errors are swallowed (best-effort); CORTEX_DEBUG=1 surfaces them to stderr. ## Thing 3 — Tiered context loading (L0 / L1 / L2) New `context` tool mirrors Hermes OpenViking's progressive context tiers: - L0 (~100 tokens): top-3 by salience × FSRS retrievability. One vector search, no LLM call. Designed for system-prompt injection on every turn. - L1 (~2k tokens): semantic top-15, full definitions, tags, immediate graph edges (one hop). Working-memory refresh mid-conversation. - L2 (full): multi-anchor retrieval (4 query reformulations, Borda count), spreading activation (2 hops), full metadata including provenance, FSRS state, activation path. Maximum recall for deep research tasks. All tiers support HyDE expansion (default on, disable with hyde: false). L0 always skips HyDE — it is the latency-zero path. https://claude.ai/code/session_01DAZ3GzRri9hqxkTyqmSpc4
27 new tests: - search-text.test.ts — FTS5 keyword search (name/definition/tags), faded exclusion, MATCH-syntax injection safety, trigger sync across updateMemory and upsertMemory (the INSERT OR REPLACE path that needs recursive_triggers), FTS rebuild on reopened DBs, JSON lexical fallback ranking, and round-trip persistence of last_retrieval_score / last_hop_count / memory_origin. - feedback.test.ts — asymmetric deltas (+0.05/-0.10), floor/ceiling clamping, access reinforcement only on helpful, feedback_log contents, unknown-id and missing-arg errors. Runs against real in-memory SQLite so the withTransaction path is exercised. - auto-consolidate.test.ts — threshold triggering (exactly at 10, not below), counter reset, per-namespace isolation, flush() draining, and error swallowing. docs/hermes-audit.md records the audit findings (severity + fix), the three Hermes patterns borrowed, and the gaps deliberately left open (embedding blob storage, ANN scaling, generic-collection scans). https://claude.ai/code/session_01DAZ3GzRri9hqxkTyqmSpc4
Embeddings were stored as JSON text (~4x larger, parsed on every read) even though the read path already understood Float32Array blobs. All write paths (putMemory, updateMemory, upsertMemory, putObservation, upsertObservation) now encode blobs, and legacy JSON-text rows are converted in place at store-open time — idempotent, only text-typed rows are touched. Because float32 truncation changes embedding values vs the float64 kept by the JSON backend, verifyMigration now compares embeddings at float32 precision (Math.fround) so json->sqlite migrations verify clean. https://claude.ai/code/session_01DAZ3GzRri9hqxkTyqmSpc4
There was a problem hiding this comment.
Pull request overview
This PR hardens cortex-engine’s memory subsystem (storage + retrieval) with Hermes-inspired patterns: hybrid lexical+vector retrieval with trust feedback, automatic session consolidation, tiered context loading, and SQLite embedding/storage upgrades (FTS5, indexes, float32-blob embeddings, and schema migrations).
Changes:
- Adds hybrid retrieval via
CortexStore.searchText()(FTS5/BM25 on SQLite; token-overlap fallback for JSON/Firestore) and merges lexical hits intoquery. - Introduces new memory tools:
feedback(asymmetric confidence deltas + audit log) andcontext(L0/L1/L2 tiered loading). - Upgrades SQLite: persists retrieval-feedback fields, adds secondary indexes + trigger-synced FTS5 table, and migrates embeddings from JSON text to Float32Array BLOBs (idempotent).
Reviewed changes
Copilot reviewed 21 out of 22 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| src/tools/wonder.ts | Notifies the session auto-consolidator after writes. |
| src/tools/speculate.ts | Notifies the session auto-consolidator after writes. |
| src/tools/query.ts | Hybrid lexical+vector recall and additional response metadata. |
| src/tools/observe.ts | Notifies the session auto-consolidator after writes. |
| src/tools/feedback.ts | New tool: asymmetric trust scoring + feedback log writes. |
| src/tools/feedback.test.ts | Tests for feedback deltas, clamping, logging, and errors. |
| src/tools/context.ts | New tool: tiered context retrieval (L0/L1/L2). |
| src/stores/sqlite.ts | Schema fixes, indexes, FTS5+triggers, embedding blobs + migration, searchText(). |
| src/stores/sqlite.test.ts | Tests for blob embeddings and legacy JSON→BLOB migration/idempotency. |
| src/stores/search-text.test.ts | Tests for SQLite FTS5 search, JSON fallback lexical search, and field persistence. |
| src/stores/json.ts | Implements searchText() via shared lexical fallback. |
| src/stores/firestore.ts | Persists retrieval-feedback fields + lexical fallback searchText() implementation. |
| src/stores/_lexical.ts | New shared token-overlap lexical search implementation. |
| src/namespace/scoped-store.ts | Pass-through implementation for searchText(). |
| src/mcp/tools.ts | Registers context and feedback tools; adds consolidator to tool context. |
| src/mcp/server.ts | Instantiates SessionConsolidator and hooks shutdown flush handlers. |
| src/engines/memory.ts | Exports cosineSimilarity for reuse. |
| src/engines/auto-consolidate.ts | New SessionConsolidator (threshold-triggered background Phase A + flush). |
| src/engines/auto-consolidate.test.ts | Tests for consolidator thresholds, namespace isolation, flush, and error tolerance. |
| src/core/store.ts | Adds searchText() to the CortexStore interface. |
| src/bin/migrate-cmd.ts | Normalizes embedding comparisons at float32 precision for migration verification. |
| docs/hermes-audit.md | Audit write-up describing findings, fixes, and known gaps. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
210
to
216
| export function createTools(): ToolDefinition[] { | ||
| return [ | ||
| // Core cognitive tools | ||
| contextTool, | ||
| queryTool, | ||
| feedbackTool, | ||
| observeTool, |
Comment on lines
+145
to
+149
| // 7c. Flush pending observations to memory on shutdown | ||
| const consolidatorFlush = () => { consolidator.flush().catch(() => {}); }; | ||
| process.once('SIGTERM', consolidatorFlush); | ||
| process.once('SIGINT', consolidatorFlush); | ||
| process.once('beforeExit', consolidatorFlush); |
Comment on lines
+67
to
+83
| const rawEmbedding = await ctx.embed.embed(text); | ||
| const candidates = await store.findNearest(rawEmbedding, 20); | ||
| const now = new Date(); | ||
|
|
||
| const scored = candidates.map((r) => { | ||
| const daysSince = r.memory.fsrs.last_review | ||
| ? elapsedDaysSince(r.memory.fsrs.last_review) | ||
| : 0; | ||
| const ret = retrievability(r.memory.fsrs.stability, daysSince); | ||
| return { r, score: r.memory.salience * ret }; | ||
| }); | ||
|
|
||
| const top = scored | ||
| .sort((a, b) => b.score - a.score) | ||
| .slice(0, 3); | ||
|
|
||
| void now; |
Comment on lines
+104
to
+127
| const nearest = await store.findNearest(embedding, 15); | ||
|
|
||
| const now = new Date(); | ||
| const results = await Promise.all( | ||
| nearest.map(async (r) => { | ||
| const daysSince = r.memory.fsrs.last_review | ||
| ? elapsedDaysSince(r.memory.fsrs.last_review) | ||
| : 0; | ||
| const ret = retrievability(r.memory.fsrs.stability, daysSince); | ||
| const salienceFactor = 0.5 + r.memory.salience * 0.5; | ||
| const compositeScore = r.score * ret * salienceFactor; | ||
|
|
||
| const edges = await store.getEdgesFrom(r.memory.id); | ||
| const links = edges.slice(0, 5).map((e) => ({ | ||
| target_id: e.target_id, | ||
| relation: e.relation, | ||
| weight: e.weight, | ||
| })); | ||
|
|
||
| return { r, compositeScore, ret, links }; | ||
| }), | ||
| ); | ||
|
|
||
| void now; |
…down, remove dead vars - context and feedback tools were gated behind namespace cognitive_tools config and not in CORE_TOOLS, so they never appeared in ListTools. Added both to CORE_TOOLS so they are always active like query/observe. - SIGTERM/SIGINT consolidator flush handler returned immediately, leaving the flush promise racing against process exit. Handlers now call process.exit(0) in the .finally() callback so the process stays alive until flush completes. beforeExit keeps the existing pattern (flush promise holds the event loop). - Removed two dead `now` variable declarations in context.ts L0/L1 handlers (elapsedDaysSince() computes its own reference time internally). https://claude.ai/code/session_01DAZ3GzRri9hqxkTyqmSpc4
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Audit of cortex-engine's storage and retrieval systems, cross-referenced against Hermes Agent (Nous Research, MIT). Full findings in
docs/hermes-audit.md.Audit fixes
last_retrieval_score,last_hop_count,memory_originwere silently dropped by the SQLite and Firestore backends — the dream pipeline's FSRS rating feedback loop never fired. Now persisted as real columns withALTER TABLEmigration shims.Patterns borrowed from Hermes
searchText()onCortexStore(FTS5/BM25 on SQLite with trigger-synced external-content index; weighted token-overlap fallback on JSON/Firestore). Lexical hits merge into the vector candidate set inquery, re-scored by cosine. Newfeedbacktool applies asymmetric confidence deltas (+0.05 helpful / −0.10 unhelpful) so polluted memories decay out of top ranks quickly.SessionConsolidatortriggersdreamPhaseA(NREM: cluster → refine → create) in the background after 10 pending observations per namespace;SIGTERM/SIGINT/beforeExitflush so sessions that end early don't strand knowledge.contexttool with L0 (~100 tokens, salience × FSRS retrievability), L1 (~2k tokens, semantic top-15 + one-hop edges), L2 (multi-anchor retrieval + 2-hop spreading activation + full metadata).Embedding storage migration
SQLite embeddings are now raw
Float32Arrayblobs (~4× smaller, parse-free reads) instead of JSON text. Legacy rows are converted in place at store-open time (idempotent).verifyMigrationcompares embeddings at float32 precision so json→sqlite migrations verify clean.Test plan
INSERT OR REPLACE/recursive_triggersedge case), lexical fallback, feedback deltas + clamping + audit log, consolidator thresholds/flush/error-tolerance, blob round-trips, legacy text→blob conversion, idempotency.tscclean.https://claude.ai/code/session_01DAZ3GzRri9hqxkTyqmSpc4
Generated by Claude Code