Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
82 changes: 82 additions & 0 deletions docs/hermes-audit.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,82 @@
# Systems Audit & Hermes Agent Research — June 2026

Audit of cortex-engine's storage and retrieval systems, cross-referenced
against [Hermes Agent](https://github.com/nousresearch/hermes-agent)
(Nous Research, MIT) — a comparable self-improving agent whose memory
subsystem solved several problems we had open.

## Audit findings (fixed)

| # | Finding | Severity | Fix |
|---|---------|----------|-----|
| 1 | `last_retrieval_score`, `last_hop_count`, `memory_origin` were silently dropped by the SQLite and Firestore backends. The dream pipeline's FSRS rating (`engines/cognition.ts`, score phase) reads these fields to boost/penalize review ratings — that feedback loop **never fired** on either production backend. | High | Persisted as real columns/fields in both backends, with `ALTER TABLE` migration shims for existing SQLite DBs. |
| 2 | Zero secondary indexes in the SQLite schema. Edge traversal (`getEdgesFrom`, `getEdgesForMemories`), unprocessed-observation fetches, recency queries, and belief history were all full table scans. | Medium | Six indexes added: `edges(source_id)`, `edges(target_id)`, `observations(processed, created_at)`, `memories(updated_at)`, `ops(created_at)`, `beliefs(concept_id)`. |
| 3 | Retrieval was embedding-only. Exact identifiers, proper nouns, and rare terms that embed poorly were unfindable even when stored verbatim. | Medium | FTS5 + hybrid recall (see below). |
| 4 | No mechanism for an agent to report that a retrieved memory was wrong. Bad memories stayed highly ranked until a dream-cycle hindsight review happened to catch them. | Medium | `feedback` tool (see below). |
| 5 | Observations recorded via `observe`/`wonder`/`speculate` sat unprocessed until someone ran `dream` manually or via cron. Sessions that ended before a dream cycle left knowledge stranded. | Medium | Auto-consolidation (see below). |

## Patterns borrowed from Hermes Agent

### 1. Holographic memory — FTS5 + asymmetric trust scoring

Hermes' Holographic provider pairs SQLite FTS5 full-text search with trust
scoring (+0.05 helpful / −0.10 unhelpful).

- **`searchText()`** on `CortexStore`: FTS5/BM25 on SQLite (external-content
table, trigger-synced, `recursive_triggers=ON` so upserts stay in sync);
weighted token-overlap fallback on JSON/Firestore.
- **Hybrid recall in `query`**: lexical hits are merged into the vector
candidate set and re-scored by cosine, so ranking semantics stay uniform.
Disable with `lexical: false`.
- **`feedback` tool**: asymmetric confidence adjustment. The asymmetry is
the point — one bad retrieval costs twice what one good retrieval earns,
so polluted memories decay out of top ranks quickly. Events log to
`feedback_log` for correlation with `retrieval_audit` traces.

### 2. Automatic memory extraction (session sync)

Hermes syncs turns to memory after each response and extracts on session
end. `SessionConsolidator` (`engines/auto-consolidate.ts`):

- `observe`/`wonder`/`speculate` notify it after every write.
- At 10 pending observations per namespace, `dreamPhaseA` (NREM only:
cluster → refine → create) runs in the background — non-blocking,
best-effort, re-triggers if more arrive mid-run.
- `SIGTERM`/`SIGINT`/`beforeExit` flush all pending namespaces.
- REM phases (edges, abstraction, FSRS scoring, hindsight) intentionally
stay in the scheduled `dream` cycle — they are LLM-heavy.

### 3. Tiered context loading (L0 → L1 → L2)

Hermes' OpenViking provider loads context progressively (~100 tokens →
~2k → full). The `context` tool mirrors this:

- **L0** (~100 tokens): top-3 by salience × FSRS retrievability, names +
80-char snippets. One vector search, no LLM call. For per-turn
system-prompt injection.
- **L1** (~2k tokens): semantic top-15 with definitions, tags, one-hop
edges. Mid-conversation working-memory refresh.
- **L2** (full): multi-anchor retrieval (Borda count over 4 query
reformulations) + 2-hop spreading activation + full metadata
(provenance, FSRS state, activation paths). Deep research.

## Follow-up work (done)

- **Embedding storage format** (June 2026): SQLite now stores embeddings as
raw `Float32Array` blobs (~4× smaller, parse-free reads). Legacy JSON-text
rows are converted in place when the store is opened — idempotent, only
text-typed rows are touched. Embeddings are float32-truncated on write, so
cross-backend comparisons (e.g. `verifyMigration` for json→sqlite) compare
at float32 precision via `Math.fround`.

## Known gaps (deliberately not addressed)

- **Brute-force ANN**: `findNearest` on SQLite scans every row. Documented
as fine below 10k memories; beyond that, consider `sqlite-vec` or an HNSW
sidecar.
- **Generic-collection queries**: `query()` on SQLite loads the entire
collection and filters in JS. Acceptable for current collection sizes
(threads, journal, vitals); revisit if any collection grows unbounded —
`feedback_log` is the most likely candidate.
- **Firestore `searchText`** falls back to a full-collection scan. Swap in
an external search index if cloud deployments grow.
6 changes: 5 additions & 1 deletion src/bin/migrate-cmd.ts
Original file line number Diff line number Diff line change
Expand Up @@ -737,8 +737,12 @@ function deepEqualJson(a: unknown, b: unknown): boolean {
}

function jsonNormalize(value: unknown): string {
return JSON.stringify(value, (_key, v) => {
return JSON.stringify(value, (key, v) => {
if (v instanceof Date) return v.toISOString();
// SQLite stores embeddings as float32 blobs while JSON keeps full
// float64, so compare embeddings at float32 precision — otherwise a
// json→sqlite migration reports value diffs on every sampled memory.
if (key === 'embedding' && Array.isArray(v)) return v.map(Math.fround);
return v;
}, 0);
}
Expand Down
10 changes: 10 additions & 0 deletions src/core/store.ts
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,16 @@ export interface CortexStore {
/** Find k nearest memories by embedding vector. Returns sorted by similarity desc. */
findNearest(embedding: number[], limit: number): Promise<SearchResult[]>;

/**
* Lexical full-text search over memory name/definition/tags. Complements
* findNearest: catches exact-keyword matches that embeddings miss (IDs,
* proper nouns, rare terms). SQLite uses FTS5/BM25; JSON and Firestore
* fall back to token-overlap scoring. Scores are normalized to 0-1 but are
* NOT comparable to cosine similarity — rank order is the contract.
* Faded memories are excluded. An empty or stopword-only query returns [].
*/
searchText(text: string, limit: number): Promise<SearchResult[]>;

/** Increment access_count, update last_accessed and FSRS fields. */
touchMemory(id: string, fsrsUpdates: Partial<FSRSData>): Promise<void>;

Expand Down
127 changes: 127 additions & 0 deletions src/engines/auto-consolidate.test.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,127 @@
/**
* Tests for SessionConsolidator — threshold-triggered background Phase A.
*/

import { describe, it, expect, vi } from 'vitest';
import { SessionConsolidator, AUTO_THRESHOLD } from './auto-consolidate.js';
import type { CortexStore } from '../core/store.js';
import type { NamespaceManager } from '../namespace/manager.js';
import type { EmbedProvider } from '../core/embed.js';
import type { LLMProvider } from '../core/llm.js';

function makeMockStore(): CortexStore {
return {
// Phase A entry point — empty result short-circuits cluster/refine/create
// so no embed/llm calls happen. The call itself is the trigger signal.
getUnprocessedObservations: vi.fn(() => Promise.resolve([])),
getEdgesForMemories: vi.fn(() => Promise.resolve([])),
findNearest: vi.fn(() => Promise.resolve([])),
getAllMemories: vi.fn(() => Promise.resolve([])),
} as unknown as CortexStore;
}

function makeManager(stores: Record<string, CortexStore>): NamespaceManager {
return {
getStore: vi.fn((ns?: string) => stores[ns ?? 'default']),
getConfig: vi.fn(() => ({
description: 'test',
cognitive_tools: [],
collections_prefix: '',
similarity_merge: 0.85,
similarity_link: 0.5,
})),
getNamespaceNames: vi.fn(() => Object.keys(stores)),
getDefaultNamespace: vi.fn(() => 'default'),
} as unknown as NamespaceManager;
}

const embed = { embed: vi.fn(() => Promise.resolve([1, 0, 0])) } as EmbedProvider;
const llm = {
generate: vi.fn(() => Promise.resolve('')),
generateJSON: vi.fn(() => Promise.resolve({})),
} as unknown as LLMProvider;

async function settle(): Promise<void> {
await new Promise((resolve) => setTimeout(resolve, 0));
}

describe('SessionConsolidator', () => {
it('does not trigger below the threshold', async () => {
const store = makeMockStore();
const consolidator = new SessionConsolidator(makeManager({ default: store }), embed, llm);

for (let i = 0; i < AUTO_THRESHOLD - 1; i++) {
consolidator.notifyObservation('default');
}
await settle();

expect(store.getUnprocessedObservations).not.toHaveBeenCalled();
});

it('triggers Phase A exactly at the threshold', async () => {
const store = makeMockStore();
const consolidator = new SessionConsolidator(makeManager({ default: store }), embed, llm);

for (let i = 0; i < AUTO_THRESHOLD; i++) {
consolidator.notifyObservation('default');
}
await settle();

expect(store.getUnprocessedObservations).toHaveBeenCalledTimes(1);
});

it('resets the counter after triggering — next trigger needs a full batch', async () => {
const store = makeMockStore();
const consolidator = new SessionConsolidator(makeManager({ default: store }), embed, llm);

for (let i = 0; i < AUTO_THRESHOLD; i++) consolidator.notifyObservation('default');
await settle();
// A few more, below threshold — must not re-trigger
for (let i = 0; i < 3; i++) consolidator.notifyObservation('default');
await settle();

expect(store.getUnprocessedObservations).toHaveBeenCalledTimes(1);
});

it('tracks namespaces independently', async () => {
const storeA = makeMockStore();
const storeB = makeMockStore();
const consolidator = new SessionConsolidator(
makeManager({ a: storeA, b: storeB }), embed, llm,
);

for (let i = 0; i < AUTO_THRESHOLD; i++) consolidator.notifyObservation('a');
consolidator.notifyObservation('b');
await settle();

expect(storeA.getUnprocessedObservations).toHaveBeenCalledTimes(1);
expect(storeB.getUnprocessedObservations).not.toHaveBeenCalled();
});

it('flush() drains namespaces with pending observations', async () => {
const storeA = makeMockStore();
const storeB = makeMockStore();
const consolidator = new SessionConsolidator(
makeManager({ a: storeA, b: storeB }), embed, llm,
);

consolidator.notifyObservation('a'); // below threshold — pending
await consolidator.flush();

expect(storeA.getUnprocessedObservations).toHaveBeenCalledTimes(1);
expect(storeB.getUnprocessedObservations).not.toHaveBeenCalled();
});

it('survives store errors without throwing', async () => {
const store = {
getUnprocessedObservations: vi.fn(() => Promise.reject(new Error('boom'))),
getEdgesForMemories: vi.fn(() => Promise.resolve([])),
findNearest: vi.fn(() => Promise.resolve([])),
getAllMemories: vi.fn(() => Promise.resolve([])),
} as unknown as CortexStore;
const consolidator = new SessionConsolidator(makeManager({ default: store }), embed, llm);

for (let i = 0; i < AUTO_THRESHOLD; i++) consolidator.notifyObservation('default');
await expect(consolidator.flush()).resolves.toBeUndefined();
});
});
96 changes: 96 additions & 0 deletions src/engines/auto-consolidate.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,96 @@
/**
* SessionConsolidator — Hermes-inspired automatic memory extraction.
*
* Hermes Agent syncs conversation turns to memory after each response and
* extracts memories on session end. This module replicates that loop for
* cortex-engine:
*
* - observe / wonder / speculate call notifyObservation() after every write.
* - When pending count hits AUTO_THRESHOLD per namespace, dreamPhaseA
* (NREM: cluster → refine → create) fires in the background without
* blocking the tool call that triggered it.
* - On process exit (SIGTERM / SIGINT), flush() runs dreamPhaseA across
* all namespaces with unprocessed observations.
*
* dreamPhaseA is intentionally lightweight — no REM (edges, abstraction,
* FSRS scoring). Those still belong in the scheduled full `dream` cycle.
* The point is that raw observations do not sit unprocessed across session
* boundaries; they become searchable memories within the same session.
*/

import type { CortexStore } from '../core/store.js';
import type { EmbedProvider } from '../core/embed.js';
import type { LLMProvider } from '../core/llm.js';
import type { NamespaceManager } from '../namespace/manager.js';
import { dreamPhaseA } from './cognition.js';

/** Number of new observations per namespace that trigger an auto-consolidation. */
export const AUTO_THRESHOLD = 10;

export class SessionConsolidator {
/** pending[namespace] = count of new observations since last auto-run */
private pending = new Map<string, number>();
/** running[namespace] = true while a background Phase A is in flight */
private running = new Set<string>();
private shuttingDown = false;

constructor(
private readonly namespaces: NamespaceManager,
private readonly embed: EmbedProvider,
private readonly llm: LLMProvider,
) {}

/**
* Call this after every successful observation write. When the pending
* count crosses AUTO_THRESHOLD, schedules a background Phase A run.
*/
notifyObservation(namespace: string): void {
const count = (this.pending.get(namespace) ?? 0) + 1;
this.pending.set(namespace, count);
if (count >= AUTO_THRESHOLD && !this.running.has(namespace)) {
this.runPhaseA(namespace);
}
}

/**
* Flush all namespaces — called on process exit. Awaitable so the
* exit handler can give it a chance to complete before the process dies.
*/
async flush(): Promise<void> {
this.shuttingDown = true;
const namespaces = this.namespaces.getNamespaceNames();
await Promise.allSettled(
namespaces
.filter((ns) => (this.pending.get(ns) ?? 0) > 0)
.map((ns) => this.runPhaseA(ns, true)),
);
}

private runPhaseA(namespace: string, wait = false): Promise<void> {
this.running.add(namespace);
this.pending.set(namespace, 0);

const store: CortexStore = this.namespaces.getStore(namespace);
const nsConfig = this.namespaces.getConfig(namespace);

const work: Promise<void> = dreamPhaseA(store, this.embed, this.llm, {
observation_limit: 50,
similarity_merge: nsConfig.similarity_merge,
similarity_link: nsConfig.similarity_link,
}).then(() => {}).catch((err: unknown) => {
// Auto-consolidation is best-effort — never crash the serving process.
if (process.env['CORTEX_DEBUG']) {
process.stderr.write(`[auto-consolidate:${namespace}] ${String(err)}\n`);
}
}).finally(() => {
this.running.delete(namespace);
// If more observations arrived while we were running, re-trigger.
if (!this.shuttingDown && (this.pending.get(namespace) ?? 0) >= AUTO_THRESHOLD) {
void this.runPhaseA(namespace);
}
});

if (!wait) { void work; }
return wait ? work : Promise.resolve();
}
}
2 changes: 1 addition & 1 deletion src/engines/memory.ts
Original file line number Diff line number Diff line change
Expand Up @@ -127,7 +127,7 @@ export async function hydeExpand(
* Compute cosine similarity between two equal-length vectors.
* Returns 0 if either vector is zero-length.
*/
function cosineSimilarity(a: number[], b: number[]): number {
export function cosineSimilarity(a: number[], b: number[]): number {
let dot = 0, normA = 0, normB = 0;
for (let i = 0; i < a.length; i++) {
dot += a[i] * b[i];
Expand Down
18 changes: 17 additions & 1 deletion src/mcp/server.ts
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,7 @@ import type { EmbedProvider } from '../core/embed.js';
import type { LLMProvider } from '../core/llm.js';
import { createTools, CORE_TOOLS, composeMcpDescription } from './tools.js';
import type { ToolContext, ToolDefinition } from './tools.js';
import { SessionConsolidator } from '../engines/auto-consolidate.js';
import { loadPlugins } from '../plugins/loader.js';

// ─── Context Factory ──────────────────────────────────────────────────────────
Expand Down Expand Up @@ -103,7 +104,8 @@ export async function createContext(config: CortexConfig): Promise<EngineContext
const allTools = [...coreTools, ...pluginTools];

// 7. Build tool context (includes allTools for trigger/bridge pipelines)
const ctx: ToolContext = { namespaces, embed, llm, session, triggers, bridges, allTools };
const consolidator = new SessionConsolidator(namespaces, embed, llm);
const ctx: ToolContext = { namespaces, embed, llm, session, triggers, bridges, allTools, consolidator };

// 7b. Wire federation if configured
if (config.federation) {
Expand Down Expand Up @@ -140,6 +142,20 @@ export async function createContext(config: CortexConfig): Promise<EngineContext
}
}

// 7c. Flush pending observations to memory on shutdown.
// SIGTERM/SIGINT: flush then explicitly exit so the process doesn't
// terminate before the async flush completes (signal handlers return
// immediately; the pending promise alone is not enough to keep the
// process alive once stdio closes).
const consolidatorFlushAndExit = () => {
consolidator.flush().catch(() => {}).finally(() => process.exit(0));
};
process.once('SIGTERM', consolidatorFlushAndExit);
process.once('SIGINT', consolidatorFlushAndExit);
// beforeExit fires when the event loop is empty — the flush promise
// keeps it alive until complete, so no explicit exit call is needed.
process.once('beforeExit', () => { consolidator.flush().catch(() => {}); });

// 8. Filter active tools by namespace config + core set
const activeToolNames = namespaces.getActiveTools();
for (const t of CORE_TOOLS) {
Expand Down
Loading
Loading