Fozikio · idapixl · Jun 10, 2026 · Jun 10, 2026 · Jun 10, 2026 · Jun 10, 2026
diff --git a/docs/hermes-audit.md b/docs/hermes-audit.md
@@ -0,0 +1,82 @@
+# Systems Audit & Hermes Agent Research — June 2026
+
+Audit of cortex-engine's storage and retrieval systems, cross-referenced
+against [Hermes Agent](https://github.com/nousresearch/hermes-agent)
+(Nous Research, MIT) — a comparable self-improving agent whose memory
+subsystem solved several problems we had open.
+
+## Audit findings (fixed)
+
+| # | Finding | Severity | Fix |
+|---|---------|----------|-----|
+| 1 | `last_retrieval_score`, `last_hop_count`, `memory_origin` were silently dropped by the SQLite and Firestore backends. The dream pipeline's FSRS rating (`engines/cognition.ts`, score phase) reads these fields to boost/penalize review ratings — that feedback loop **never fired** on either production backend. | High | Persisted as real columns/fields in both backends, with `ALTER TABLE` migration shims for existing SQLite DBs. |
+| 2 | Zero secondary indexes in the SQLite schema. Edge traversal (`getEdgesFrom`, `getEdgesForMemories`), unprocessed-observation fetches, recency queries, and belief history were all full table scans. | Medium | Six indexes added: `edges(source_id)`, `edges(target_id)`, `observations(processed, created_at)`, `memories(updated_at)`, `ops(created_at)`, `beliefs(concept_id)`. |
+| 3 | Retrieval was embedding-only. Exact identifiers, proper nouns, and rare terms that embed poorly were unfindable even when stored verbatim. | Medium | FTS5 + hybrid recall (see below). |
+| 4 | No mechanism for an agent to report that a retrieved memory was wrong. Bad memories stayed highly ranked until a dream-cycle hindsight review happened to catch them. | Medium | `feedback` tool (see below). |
+| 5 | Observations recorded via `observe`/`wonder`/`speculate` sat unprocessed until someone ran `dream` manually or via cron. Sessions that ended before a dream cycle left knowledge stranded. | Medium | Auto-consolidation (see below). |
+
+## Patterns borrowed from Hermes Agent
+
+### 1. Holographic memory — FTS5 + asymmetric trust scoring
+
+Hermes' Holographic provider pairs SQLite FTS5 full-text search with trust
+scoring (+0.05 helpful / −0.10 unhelpful).
+
+- **`searchText()`** on `CortexStore`: FTS5/BM25 on SQLite (external-content
+  table, trigger-synced, `recursive_triggers=ON` so upserts stay in sync);
+  weighted token-overlap fallback on JSON/Firestore.
+- **Hybrid recall in `query`**: lexical hits are merged into the vector
+  candidate set and re-scored by cosine, so ranking semantics stay uniform.
+  Disable with `lexical: false`.
+- **`feedback` tool**: asymmetric confidence adjustment. The asymmetry is
+  the point — one bad retrieval costs twice what one good retrieval earns,
+  so polluted memories decay out of top ranks quickly. Events log to
+  `feedback_log` for correlation with `retrieval_audit` traces.
+
+### 2. Automatic memory extraction (session sync)
+
+Hermes syncs turns to memory after each response and extracts on session
+end. `SessionConsolidator` (`engines/auto-consolidate.ts`):
+
+- `observe`/`wonder`/`speculate` notify it after every write.
+- At 10 pending observations per namespace, `dreamPhaseA` (NREM only:
+  cluster → refine → create) runs in the background — non-blocking,
+  best-effort, re-triggers if more arrive mid-run.
+- `SIGTERM`/`SIGINT`/`beforeExit` flush all pending namespaces.
+- REM phases (edges, abstraction, FSRS scoring, hindsight) intentionally
+  stay in the scheduled `dream` cycle — they are LLM-heavy.
+
+### 3. Tiered context loading (L0 → L1 → L2)
+
+Hermes' OpenViking provider loads context progressively (~100 tokens →
+~2k → full). The `context` tool mirrors this:
+
+- **L0** (~100 tokens): top-3 by salience × FSRS retrievability, names +
+  80-char snippets. One vector search, no LLM call. For per-turn
+  system-prompt injection.
+- **L1** (~2k tokens): semantic top-15 with definitions, tags, one-hop
+  edges. Mid-conversation working-memory refresh.
+- **L2** (full): multi-anchor retrieval (Borda count over 4 query
+  reformulations) + 2-hop spreading activation + full metadata
+  (provenance, FSRS state, activation paths). Deep research.
+
+## Follow-up work (done)
+
+- **Embedding storage format** (June 2026): SQLite now stores embeddings as
+  raw `Float32Array` blobs (~4× smaller, parse-free reads). Legacy JSON-text
+  rows are converted in place when the store is opened — idempotent, only
+  text-typed rows are touched. Embeddings are float32-truncated on write, so
+  cross-backend comparisons (e.g. `verifyMigration` for json→sqlite) compare
+  at float32 precision via `Math.fround`.
+
+## Known gaps (deliberately not addressed)
+
+- **Brute-force ANN**: `findNearest` on SQLite scans every row. Documented
+  as fine below 10k memories; beyond that, consider `sqlite-vec` or an HNSW
+  sidecar.
+- **Generic-collection queries**: `query()` on SQLite loads the entire
+  collection and filters in JS. Acceptable for current collection sizes
+  (threads, journal, vitals); revisit if any collection grows unbounded —
+  `feedback_log` is the most likely candidate.
+- **Firestore `searchText`** falls back to a full-collection scan. Swap in
+  an external search index if cloud deployments grow.
diff --git a/src/bin/migrate-cmd.ts b/src/bin/migrate-cmd.ts
@@ -737,8 +737,12 @@ function deepEqualJson(a: unknown, b: unknown): boolean {
 }
 
 function jsonNormalize(value: unknown): string {
-  return JSON.stringify(value, (_key, v) => {
+  return JSON.stringify(value, (key, v) => {
     if (v instanceof Date) return v.toISOString();
+    // SQLite stores embeddings as float32 blobs while JSON keeps full
+    // float64, so compare embeddings at float32 precision — otherwise a
+    // json→sqlite migration reports value diffs on every sampled memory.
+    if (key === 'embedding' && Array.isArray(v)) return v.map(Math.fround);
     return v;
   }, 0);
 }

diff --git a/src/core/store.ts b/src/core/store.ts
@@ -54,6 +54,16 @@ export interface CortexStore {
   /** Find k nearest memories by embedding vector. Returns sorted by similarity desc. */
   findNearest(embedding: number[], limit: number): Promise<SearchResult[]>;
 
+  /**
+   * Lexical full-text search over memory name/definition/tags. Complements
+   * findNearest: catches exact-keyword matches that embeddings miss (IDs,
+   * proper nouns, rare terms). SQLite uses FTS5/BM25; JSON and Firestore
+   * fall back to token-overlap scoring. Scores are normalized to 0-1 but are
+   * NOT comparable to cosine similarity — rank order is the contract.
+   * Faded memories are excluded. An empty or stopword-only query returns [].
+   */
+  searchText(text: string, limit: number): Promise<SearchResult[]>;
+
   /** Increment access_count, update last_accessed and FSRS fields. */
   touchMemory(id: string, fsrsUpdates: Partial<FSRSData>): Promise<void>;
 

diff --git a/src/engines/auto-consolidate.test.ts b/src/engines/auto-consolidate.test.ts
@@ -0,0 +1,127 @@
+/**
+ * Tests for SessionConsolidator — threshold-triggered background Phase A.
+ */
+
+import { describe, it, expect, vi } from 'vitest';
+import { SessionConsolidator, AUTO_THRESHOLD } from './auto-consolidate.js';
+import type { CortexStore } from '../core/store.js';
+import type { NamespaceManager } from '../namespace/manager.js';
+import type { EmbedProvider } from '../core/embed.js';
+import type { LLMProvider } from '../core/llm.js';
+
+function makeMockStore(): CortexStore {
+  return {
+    // Phase A entry point — empty result short-circuits cluster/refine/create
+    // so no embed/llm calls happen. The call itself is the trigger signal.
+    getUnprocessedObservations: vi.fn(() => Promise.resolve([])),
+    getEdgesForMemories: vi.fn(() => Promise.resolve([])),
+    findNearest: vi.fn(() => Promise.resolve([])),
+    getAllMemories: vi.fn(() => Promise.resolve([])),
+  } as unknown as CortexStore;
+}
+
+function makeManager(stores: Record<string, CortexStore>): NamespaceManager {
+  return {
+    getStore: vi.fn((ns?: string) => stores[ns ?? 'default']),
+    getConfig: vi.fn(() => ({
+      description: 'test',
+      cognitive_tools: [],
+      collections_prefix: '',
+      similarity_merge: 0.85,
+      similarity_link: 0.5,
+    })),
+    getNamespaceNames: vi.fn(() => Object.keys(stores)),
+    getDefaultNamespace: vi.fn(() => 'default'),
+  } as unknown as NamespaceManager;
+}
+
+const embed = { embed: vi.fn(() => Promise.resolve([1, 0, 0])) } as EmbedProvider;
+const llm = {
+  generate: vi.fn(() => Promise.resolve('')),
+  generateJSON: vi.fn(() => Promise.resolve({})),
+} as unknown as LLMProvider;
+
+async function settle(): Promise<void> {
+  await new Promise((resolve) => setTimeout(resolve, 0));
+}
+
+describe('SessionConsolidator', () => {
+  it('does not trigger below the threshold', async () => {
+    const store = makeMockStore();
+    const consolidator = new SessionConsolidator(makeManager({ default: store }), embed, llm);
+
+    for (let i = 0; i < AUTO_THRESHOLD - 1; i++) {
+      consolidator.notifyObservation('default');
+    }
+    await settle();
+
+    expect(store.getUnprocessedObservations).not.toHaveBeenCalled();
+  });
+
+  it('triggers Phase A exactly at the threshold', async () => {
+    const store = makeMockStore();
+    const consolidator = new SessionConsolidator(makeManager({ default: store }), embed, llm);
+
+    for (let i = 0; i < AUTO_THRESHOLD; i++) {
+      consolidator.notifyObservation('default');
+    }
+    await settle();
+
+    expect(store.getUnprocessedObservations).toHaveBeenCalledTimes(1);
+  });
+
+  it('resets the counter after triggering — next trigger needs a full batch', async () => {
+    const store = makeMockStore();
+    const consolidator = new SessionConsolidator(makeManager({ default: store }), embed, llm);
+
+    for (let i = 0; i < AUTO_THRESHOLD; i++) consolidator.notifyObservation('default');
+    await settle();
+    // A few more, below threshold — must not re-trigger
+    for (let i = 0; i < 3; i++) consolidator.notifyObservation('default');
+    await settle();
+
+    expect(store.getUnprocessedObservations).toHaveBeenCalledTimes(1);
+  });
+
+  it('tracks namespaces independently', async () => {
+    const storeA = makeMockStore();
+    const storeB = makeMockStore();
+    const consolidator = new SessionConsolidator(
+      makeManager({ a: storeA, b: storeB }), embed, llm,
+    );
+
+    for (let i = 0; i < AUTO_THRESHOLD; i++) consolidator.notifyObservation('a');
+    consolidator.notifyObservation('b');
+    await settle();
+
+    expect(storeA.getUnprocessedObservations).toHaveBeenCalledTimes(1);
+    expect(storeB.getUnprocessedObservations).not.toHaveBeenCalled();
+  });
+
+  it('flush() drains namespaces with pending observations', async () => {
+    const storeA = makeMockStore();
+    const storeB = makeMockStore();
+    const consolidator = new SessionConsolidator(
+      makeManager({ a: storeA, b: storeB }), embed, llm,
+    );
+
+    consolidator.notifyObservation('a'); // below threshold — pending
+    await consolidator.flush();
+
+    expect(storeA.getUnprocessedObservations).toHaveBeenCalledTimes(1);
+    expect(storeB.getUnprocessedObservations).not.toHaveBeenCalled();
+  });
+
+  it('survives store errors without throwing', async () => {
+    const store = {
+      getUnprocessedObservations: vi.fn(() => Promise.reject(new Error('boom'))),
+      getEdgesForMemories: vi.fn(() => Promise.resolve([])),
+      findNearest: vi.fn(() => Promise.resolve([])),
+      getAllMemories: vi.fn(() => Promise.resolve([])),
+    } as unknown as CortexStore;
+    const consolidator = new SessionConsolidator(makeManager({ default: store }), embed, llm);
+
+    for (let i = 0; i < AUTO_THRESHOLD; i++) consolidator.notifyObservation('default');
+    await expect(consolidator.flush()).resolves.toBeUndefined();
+  });
+});
diff --git a/src/engines/auto-consolidate.ts b/src/engines/auto-consolidate.ts
@@ -0,0 +1,96 @@
+/**
+ * SessionConsolidator — Hermes-inspired automatic memory extraction.
+ *
+ * Hermes Agent syncs conversation turns to memory after each response and
+ * extracts memories on session end. This module replicates that loop for
+ * cortex-engine:
+ *
+ *   - observe / wonder / speculate call notifyObservation() after every write.
+ *   - When pending count hits AUTO_THRESHOLD per namespace, dreamPhaseA
+ *     (NREM: cluster → refine → create) fires in the background without
+ *     blocking the tool call that triggered it.
+ *   - On process exit (SIGTERM / SIGINT), flush() runs dreamPhaseA across
+ *     all namespaces with unprocessed observations.
+ *
+ * dreamPhaseA is intentionally lightweight — no REM (edges, abstraction,
+ * FSRS scoring). Those still belong in the scheduled full `dream` cycle.
+ * The point is that raw observations do not sit unprocessed across session
+ * boundaries; they become searchable memories within the same session.
+ */
+
+import type { CortexStore } from '../core/store.js';
+import type { EmbedProvider } from '../core/embed.js';
+import type { LLMProvider } from '../core/llm.js';
+import type { NamespaceManager } from '../namespace/manager.js';
+import { dreamPhaseA } from './cognition.js';
+
+/** Number of new observations per namespace that trigger an auto-consolidation. */
+export const AUTO_THRESHOLD = 10;
+
+export class SessionConsolidator {
+  /** pending[namespace] = count of new observations since last auto-run */
+  private pending = new Map<string, number>();
+  /** running[namespace] = true while a background Phase A is in flight */
+  private running = new Set<string>();
+  private shuttingDown = false;
+
+  constructor(
+    private readonly namespaces: NamespaceManager,
+    private readonly embed: EmbedProvider,
+    private readonly llm: LLMProvider,
+  ) {}
+
+  /**
+   * Call this after every successful observation write. When the pending
+   * count crosses AUTO_THRESHOLD, schedules a background Phase A run.
+   */
+  notifyObservation(namespace: string): void {
+    const count = (this.pending.get(namespace) ?? 0) + 1;
+    this.pending.set(namespace, count);
+    if (count >= AUTO_THRESHOLD && !this.running.has(namespace)) {
+      this.runPhaseA(namespace);
+    }
+  }
+
+  /**
+   * Flush all namespaces — called on process exit. Awaitable so the
+   * exit handler can give it a chance to complete before the process dies.
+   */
+  async flush(): Promise<void> {
+    this.shuttingDown = true;
+    const namespaces = this.namespaces.getNamespaceNames();
+    await Promise.allSettled(
+      namespaces
+        .filter((ns) => (this.pending.get(ns) ?? 0) > 0)
+        .map((ns) => this.runPhaseA(ns, true)),
+    );
+  }
+
+  private runPhaseA(namespace: string, wait = false): Promise<void> {
+    this.running.add(namespace);
+    this.pending.set(namespace, 0);
+
+    const store: CortexStore = this.namespaces.getStore(namespace);
+    const nsConfig = this.namespaces.getConfig(namespace);
+
+    const work: Promise<void> = dreamPhaseA(store, this.embed, this.llm, {
+      observation_limit: 50,
+      similarity_merge: nsConfig.similarity_merge,
+      similarity_link: nsConfig.similarity_link,
+    }).then(() => {}).catch((err: unknown) => {
+      // Auto-consolidation is best-effort — never crash the serving process.
+      if (process.env['CORTEX_DEBUG']) {
+        process.stderr.write(`[auto-consolidate:${namespace}] ${String(err)}\n`);
+      }
+    }).finally(() => {
+      this.running.delete(namespace);
+      // If more observations arrived while we were running, re-trigger.
+      if (!this.shuttingDown && (this.pending.get(namespace) ?? 0) >= AUTO_THRESHOLD) {
+        void this.runPhaseA(namespace);
+      }
+    });
+
+    if (!wait) { void work; }
+    return wait ? work : Promise.resolve();
+  }
+}
diff --git a/src/engines/memory.ts b/src/engines/memory.ts
@@ -127,7 +127,7 @@ export async function hydeExpand(
  * Compute cosine similarity between two equal-length vectors.
  * Returns 0 if either vector is zero-length.
  */
-function cosineSimilarity(a: number[], b: number[]): number {
+export function cosineSimilarity(a: number[], b: number[]): number {
   let dot = 0, normA = 0, normB = 0;
   for (let i = 0; i < a.length; i++) {
     dot += a[i] * b[i];

diff --git a/src/mcp/server.ts b/src/mcp/server.ts
@@ -37,6 +37,7 @@ import type { EmbedProvider } from '../core/embed.js';
 import type { LLMProvider } from '../core/llm.js';
 import { createTools, CORE_TOOLS, composeMcpDescription } from './tools.js';
 import type { ToolContext, ToolDefinition } from './tools.js';
+import { SessionConsolidator } from '../engines/auto-consolidate.js';
 import { loadPlugins } from '../plugins/loader.js';
 
 // ─── Context Factory ──────────────────────────────────────────────────────────
@@ -103,7 +104,8 @@ export async function createContext(config: CortexConfig): Promise<EngineContext
   const allTools = [...coreTools, ...pluginTools];
 
   // 7. Build tool context (includes allTools for trigger/bridge pipelines)
-  const ctx: ToolContext = { namespaces, embed, llm, session, triggers, bridges, allTools };
+  const consolidator = new SessionConsolidator(namespaces, embed, llm);
+  const ctx: ToolContext = { namespaces, embed, llm, session, triggers, bridges, allTools, consolidator };
 
   // 7b. Wire federation if configured
   if (config.federation) {
@@ -140,6 +142,20 @@ export async function createContext(config: CortexConfig): Promise<EngineContext
     }
   }
 
+  // 7c. Flush pending observations to memory on shutdown.
+  // SIGTERM/SIGINT: flush then explicitly exit so the process doesn't
+  // terminate before the async flush completes (signal handlers return
+  // immediately; the pending promise alone is not enough to keep the
+  // process alive once stdio closes).
+  const consolidatorFlushAndExit = () => {
+    consolidator.flush().catch(() => {}).finally(() => process.exit(0));
+  };
+  process.once('SIGTERM', consolidatorFlushAndExit);
+  process.once('SIGINT', consolidatorFlushAndExit);
+  // beforeExit fires when the event loop is empty — the flush promise
+  // keeps it alive until complete, so no explicit exit call is needed.
+  process.once('beforeExit', () => { consolidator.flush().catch(() => {}); });
+
   // 8. Filter active tools by namespace config + core set
   const activeToolNames = namespaces.getActiveTools();
   for (const t of CORE_TOOLS) {