GitHub - cdeust/Cortex: Persistent memory for Claude Code — 41 neuroscience papers, 26 biological mechanisms with paper-bearing per-mechanism ablation evidence (E1 v3). LongMemEval R@10 98.4% / MRR 0.9124 (n=500). LoCoMo R@10 94.2% / MRR 0.8278 (n=1986). BEAM-10M +33.4% over flat retrieval. PostgreSQL + pgvector. Verified via 31-row two-benchmark ablation campaign.

Getting Started · Write Papers · What It Feels Like · Benchmarks · Science · Views

Companion projects:
cortex-beam-abstain — community-trained retrieval abstention model for RAG systems
zetetic-team-subagents — specialist Claude Code agents Cortex orchestrates with
automatised-pipeline — automated 11-stage pipeline (findings → PRs) that Cortex drives via run_pipeline
prd-spec-generator — stateless reducer that turns a feature description into a 9-file PRD (consumes Cortex memory + the pipeline's graph intel)

Claude Code forgets you every time you close the tab. Every architecture decision you explained. Every debugging session where you traced a bug through four layers of abstraction. Every "remember, we decided to use event sourcing, not CRUD" correction. Gone. Next session, you're a stranger to your own tools.

Cortex is a persistent memory engine for Claude Code built on computational neuroscience. It remembers what you worked on, how you think, what you decided and why. Not as a dumb text dump shoved into context, but as a living memory system that consolidates, forgets intelligently, and reconstructs the right context at the right time.

26 biological mechanisms. 49 MCP tools. 9 automatic hooks. Autonomous per-project wiki — 15 canonical scopes × 13 file sections, authored by a headless claude -p worker on a 6-hour cycle without you. Runs entirely on your machine. PostgreSQL + pgvector.

v3.17.0 — autonomous wiki curation: the wiki is no longer a memory dump but a self-curating per-project knowledge base. SessionStart auto-spawns a background consolidate every 6 hours; a headless authoring worker reads the curation-gap queue, calls codebase-intelligence MCP tools (codebase_context, codebase_impact, codebase_query) to ground the explanation in the actual call graph, then writes each missing section via claude -p (your existing credentials, no API key). Missing anchor pages — architecture / services / api / data-flow / operations / decisions / PRD — are authored from the source tree on demand. Every file-doc page exposes its remaining gaps inline with descriptions of what should go in each. Per-project wiki/_dashboards/<project>.md shows slot-fill rate every cycle. [[wiki/path]] cross-links resolve; mermaid diagrams have a 🔍 lens with zoom + pan. Release notes →

v3.15.0 — verification campaign + arXiv-ready papers: 45 per-mechanism ablation rows across LongMemEval-S (17 rows, n=500) and LoCoMo (14 rows × 2 sweeps, n=1986). Headline numbers stay verified — LongMemEval R@10 = 98.4% / MRR = 0.9124, LoCoMo R@10 = 94.3% / MRR = 0.8279 — and every figure now traces to a JSON in benchmarks/results/ablation/ with code SHAs, dirty flags, and per-row category breakdowns. The thermodynamic memory paper (docs/arxiv-thermodynamic/main.pdf, 30 pages, all 45 citations resolved) and the structured context assembly paper (docs/arxiv-context-assembly/main.pdf, 37 pages) are arXiv-ready. Two production fixes surfaced during verification: consolidation cadence is now ingest-relative instead of wall-clock (recovers MRR 0.222 → 0.8264 on backdated corpora), and the plasticity ablation no-op preserves the result-shape contract (no more silent KeyError). HOPFIELD, HDC, SPREADING_ACTIVATION, DENDRITIC_CLUSTERS, EMOTIONAL_RETRIEVAL, MOOD_CONGRUENT_RERANK, and RECONSOLIDATION are now wired end-to-end on the production read path; 23 mechanisms have CORTEX_ABLATE_ hooks reading at the hot path. BEAM-10M LLM head-to-head harness scaffolded at benchmarks/llm_head_to_head/. Release notes →

v3.14.2 — call graph lit + queryable: the workflow graph now renders the actual call and import edges between symbols — not just the AST shells. Every edge carries a confidence (0.0–1.0) and a reason tag (direct-ast, import-scope-lookup, memory-entities-link, …) so you can tell a resolved call from a same-name guess at a glance. Knowledge-graph entities ship as a first-class layer: ~10k entities extracted from memory text land between the memory ring and the file shell, heat-weighted centroid-placed near the memories that mention them. And a new query_workflow_graph MCP tool returns typed subgraphs on demand — filter by node_kind, edge_kind, neighbour_of <id> + depth, or domain, so downstream agents can reason over graph slices without rebuilding from scratch.

v3.14.0 — neural graph & AST integration: the workflow graph reveals itself one layer at a time — first your projects, then their tools, then the files those tools touched, then the code itself (functions, methods, classes) parsed from 10 languages (Rust, Python, TypeScript, Java, Kotlin, Swift, Objective-C, C, C++, Go) via the automatised-pipeline Rust AST backend. A symbol that is imported by two projects literally sits in the space between those two projects on the map, so the picture of what connects to what is the picture of your codebase. Each project is indexed once and cached on disk; reopening the graph hydrates in milliseconds, and only projects whose source actually changed are re-read. Click any node — a file, a function, a command — and the side panel lists the named things it is connected to (callers, imports, the files that used it) instead of a bare count. Release notes →

Getting Started

The plugin marketplace is the only supported install path (ADR-0050):

claude plugin marketplace add cdeust/Cortex
claude plugin install cortex

PyPI / pip install neuro-cortex-memory is deprecated. It is kept best-effort for legacy users only and may lag the marketplace or be removed. Versions 3.14.6 and 3.14.7 on PyPI are affected by GHSA-gvpp-v77h-5w8g (local ACE, CVSS 7.8) — do not use them; install via the marketplace instead.

Restart your Claude Code session, then run:

/cortex-setup-project

This handles everything: PostgreSQL + pgvector installation, database creation, embedding model download, cognitive profile building from session history, codebase seeding, conversation import, and hook registration. Zero manual steps.

After install, verify everything is wired correctly:

python3 -m mcp_server.doctor

(or, from inside the marketplace clone: python3 ~/.claude/plugins/cache/cortex-plugins/cortex/*/mcp_server/doctor.py)

Seven checks in two seconds: Python, PG driver, DATABASE_URL, PG connection, extensions, writable methodology dir, I10 pool-capacity invariant. Exit 0 means ready.

Using Claude Cowork? Install Cortex-cowork instead — uses SQLite, no PostgreSQL required.

More options (Clone, Docker, Manual setup)

Clone + setup script:

git clone https://github.com/cdeust/Cortex.git && cd Cortex
bash scripts/setup.sh        # macOS / Linux
python3 scripts/setup.py     # Windows / cross-platform

Docker:

git clone https://github.com/cdeust/Cortex.git && cd Cortex
docker build -t cortex-runtime -f docker/Dockerfile .
docker run -it \
  -v $(pwd):/workspace \
  -v cortex-pgdata:/var/lib/postgresql/17/data \
  -v ~/.claude:/home/cortex/.claude-host:ro \
  cortex-runtime

Manual: See detailed manual setup instructions.

Write papers in Cortex

Cortex doesn't just remember — it authors. Every memory that passes the pipeline AND every page the autonomous worker drafts (architecture, services, API, decisions, ADRs as task-records …) becomes a structured wiki page, editable in place with a full scientific writing environment:

CodeMirror 6 inline editor with live preview — split pane, syntax-highlighted markdown source on the left, fully-rendered article on the right. Save round-trips atomically to the .md file on disk (git-diffable).
Structured frontmatter — every page declares kind / domain / scope / status / authored_by / provenance / created / updated / last_reviewed. The autonomous headless worker writes authored_by: headless-authoring-worker + provenance: auto-authored; human-edited pages flip these. The frontmatter is real metadata: the coverage audit, the dashboards, and the wiki view all read it.
[[wiki/path]] cross-references — rendered as clickable internal links; bare slugs route to filtered search. Backlinks footer shows every page that points at this one.
Mermaid diagrams with a lens — mermaid fenced blocks render inline with a 🔍 button that opens a viewport-sized viewer with mouse-wheel zoom + drag-pan + keyboard shortcuts (+ / − / 0 / Esc). Edge labels get a dark pill background; light-fill nodes get dark text — readable on both light and dark theme.
LaTeX math — $E=\nabla \cdot F$ and $$…$$ blocks rendered live via KaTeX.
BibTeX citations — drop .bib files under wiki/_bibliography/, use [@friston2010] inline, and Citation.js resolves them to (Friston 2010) with an auto-generated APA bibliography.
Figure / equation / table auto-numbering — {#fig:arch} labels, {@fig:arch} cross-refs, resolved to Figure 1 / Equation 3 / Section 2.1.
Curation-gap banner — file-doc pages display a yellow banner naming which of their 14 canonical sections (Purpose, Public API, Dependencies, Callers, How it works, Invariants, What can go wrong, Tests, Sequence diagram, Flow diagram, Parameters, Request example, Response example, See also) are still missing and what should go in each. The autonomous worker drains the queue cycle by cycle; the banner shrinks visibly.
Pandoc export — one click produces PDF (via LaTeX), TEX, DOCX, or HTML. Journal-submittable from the same markdown that feeds the memory pipeline.

The source stays markdown. Your .md files remain grep-able, diffable, and interoperable with any external tool. Cortex adds a rendering + editing + export layer on top without stealing your content into a proprietary format — and the autonomous worker means most pages already have a substantive draft waiting for you when you open them.

What this actually feels like

Monday. You spend an hour debugging a webhook handler. After tracing through four layers, you find the root cause: a race condition in the Redis session store where TTL expiry can fire between the auth check and the permission lookup. You discuss the fix with Claude, decide on an approach, and implement it. Session ends.

Thursday. Different project, but a user reports intermittent logouts. You open Claude Code. Before you even describe the bug, Cortex has already injected three memories: Monday's race condition analysis, a decision from two weeks ago to use Redis for all session state, and a lesson from an older session about TTL edge cases in distributed caches.

Claude doesn't just have your conversation history. It has context. It connects the current problem to past decisions, surfaces lessons you forgot you learned, and skips the part where you re-explain your entire architecture.

Three weeks later. Those individual debugging sessions have been consolidated into a general pattern: "authentication edge cases involving TTL-based caches." The specific Redis commands compressed to a summary. The debugging steps faded. The principle survived. Your next auth issue starts with institutional knowledge, not a blank page.

That's the difference. Not "here's what you said last time." Real recall — the kind where your tools understand the shape of what you've been building.

Retrieval that actually works

We tested Cortex against three published benchmarks. All scores are retrieval-only — no LLM reader in the evaluation loop. We measure whether the right memory shows up, not whether a model can generate a good answer from it.

LongMemEval — can you find a fact from 40 sessions ago?

LongMemEval (Wu et al., ICLR 2025): 500 human-curated questions embedded in ~40 sessions of conversation history (~115k tokens). The paper's best retrieval hit 78.4% Recall@10.

	Cortex	What it means
Recall@10	98.4%	The right memory shows up in the top 10 results for nearly every question
MRR	0.9124	The correct answer is usually the first or second result

Category	MRR	R@10	Why this score
Single-session (assistant)	1.000	100.0%	Verbatim assistant responses are easy to match
Multi-session reasoning	0.962	100.0%	Entity graph connects evidence across sessions
Knowledge updates	0.925	100.0%	Heat decay naturally surfaces the newest version of a fact
Temporal reasoning	0.926	98.5%	Time anchors embedded directly in memory content
Single-session (user)	0.814	94.3%	User phrasing varies more than assistant responses
Single-session (preference)	0.668	93.3%	Preferences are implicit — harder to retrieve by keyword

Knowledge updates scored highest because heat-based decay naturally pushes newer information above older versions of the same fact. This wasn't designed for the benchmark. It's just how the thermodynamic model works.

LoCoMo — can you handle trick questions and multi-hop reasoning?

LoCoMo (Maharana et al., ACL 2024): 1,986 questions across 10 conversations, including adversarial trick questions designed to confuse retrieval, multi-hop queries requiring evidence from multiple turns, and temporal reasoning about when things happened.

	Cortex	What it means
Recall@10	94.3%	Right memory in top 10 over 9 times out of 10 (n=1986, BASELINE_NO_CONSOLIDATION, post-plasticity-fix)
MRR	0.8279	Correct answer is typically the first result

Category	MRR	R@10	Why this score
Adversarial	0.881	96.0%	Trick questions can't fool five fused signals
Open-domain	0.875	96.9%	Broad questions benefit from multi-signal coverage
Multi-hop	0.779	90.3%	Entity graph connects evidence across turns
Single-hop	0.741	94.0%	Direct factual questions — strong but room to improve
Temporal	0.577	78.3%	"When did X happen?" is the hardest category — needs better time-series matching

No LLM at query time. No API calls. Just a 22MB embedding model, PostgreSQL with pgvector, and neuroscience algorithms doing the heavy lifting. Five retrieval signals fused server-side (vector similarity, full-text search, trigram matching, thermodynamic heat, recency), then reranked by a cross-encoder.

BEAM — 10 million tokens of conversation, one memory system

BEAM (Tavakoli et al., ICLR 2026) is the hardest long-term memory benchmark published. 10 conversations, each spanning 10 million tokens. 200 probing questions across 10 memory abilities, including three that no prior benchmark tests: contradiction resolution, event ordering, and instruction following.

Every system in the paper collapses at this scale. The best result reported (LIGHT on Llama-4-Maverick) scores 0.266. Context-window approaches can't fit it. Standard RAG drowns in noise.

Split	WRRF baseline	With Context Assembler	What happened
BEAM-100K	0.591	0.602	Flat search still works at small scale
BEAM-10M	0.353	0.471 (+33.4%)	Structured assembly dominates when flat search drowns

BEAM-10M per-ability breakdown (Temporal Context Assembler — no oracle labels, timestamps only):

Ability	MRR	R@10	Δ vs WRRF	What happened
knowledge_update	0.950	100.0%	+0.115	Day-level grouping keeps knowledge updates tighter than topic labels
contradiction_resolution	0.892	95.0%	+0.259	Temporal proximity catches contradictions better than topic boundaries
information_extraction	0.592	75.0%	+0.144	Same-day memories cluster the right facts
preference_following	0.508	60.0%	+0.096	Preferences cluster by time, not topic
abstention	0.600	60.0%	+0.500	Temporal scoping correctly empties irrelevant stages
temporal_reasoning	0.460	50.0%	+0.090	Time anchors naturally align with temporal stages
multi_session_reasoning	0.425	60.0%	+0.010	Cross-day bridging via entity graph — marginal gain
instruction_following	0.150	15.0%	+0.082	Instructions still look like normal questions
summarization	0.083	11.1%	−0.103	Temporal scoping too narrow for broad summary queries
event_ordering	0.050	5.0%	−0.017	Chronological sequencing needs more than retrieval

Eight of ten abilities improve. The key finding: temporal day-level partitioning outperforms BEAM's ground-truth topic labels (0.471 vs 0.429 with oracle plan_id). This was not predicted — it means temporal proximity is a stronger stage signal than topic boundaries for conversational memory, and the architecture generalizes without any oracle metadata.

At 10 million tokens per conversation, you have ~7,500 memories that all look similar to a vector search engine. The Structured Context Assembly architecture fixes this by breaking the conversation into stages (distinct topics), retrieving within the current stage first, following entity graph connections to related stages, and falling back to summaries for everything else. 8 of 10 memory abilities improve.

This architecture was originally designed in September 2025 for generating coherent 9-page PRDs on Apple Intelligence's 4096-token context window (ai-prd-builder, commit 462de01 — one month before the BEAM paper existed). It works because the problem is the same at both scales: you can't fit everything in context, so you need to be smart about what goes in.

Honest caveat: BEAM doesn't define a retrieval MRR metric — the paper uses LLM-as-judge nugget scoring. Our "MRR" is a retrieval proxy (rank of first substring-matching memory). The paper's "LIGHT" scores are end-to-end QA, shown for directional reference.

Running benchmarks yourself

pip install -e ".[postgresql,benchmarks,dev]"

python benchmarks/beam/run_benchmark.py --split 100K          # ~10 min
python benchmarks/beam/run_benchmark.py --split 10M           # ~50 min
CORTEX_USE_ASSEMBLER=1 python benchmarks/beam/run_benchmark.py --split 10M
python benchmarks/locomo/run_benchmark.py                     # ~40 min
python benchmarks/longmemeval/run_benchmark.py --variant s    # ~45 min

All scores on fresh database (DROP + CREATE per run), TRUNCATE between conversations, FlashRank preflight verified. See full methodology.

The science under the hood

Cortex doesn't store memories the way a database stores rows. It treats them more like a brain treats experiences.

Memories have temperature. Every memory starts hot. Access it and it stays hot. Ignore it and it cools. Below a threshold, it compresses: full text → summary → keywords → fades entirely. This isn't a bug — it's rate-distortion optimal forgetting, the same mathematical framework your brain uses to decide what's worth keeping. Important memories resist compression. Surprising ones get a heat boost. Boring, redundant ones quietly disappear. (Anderson & Lebiere 1998; Ebbinghaus 1885)

Storage has a gatekeeper. Not everything deserves to be remembered. Cortex maintains a predictive model of what it already knows, and only stores information that violates its expectations. Tell it the same thing twice and the write gate blocks the second attempt. This is predictive coding — the same mechanism your neocortex uses to filter sensory input. Only prediction errors get through. (Friston 2005; Bastos et al. 2012)

Retrieval changes the memory. When you recall a memory in a new context, Cortex doesn't just passively hand it back. It compares the retrieval context against the storage context, and if there's enough mismatch, it reconsolidates — updates the memory to reflect what's true now. This is real neuroscience. Nader et al. showed in 2000 that retrieved memories become labile and can be rewritten. Your codebase evolves, and so do Cortex's memories of it. (Dudai 2012; Nader et al. 2000)

Emotional memories are stronger. Frustration during debugging, excitement when a test passes, urgency in a production incident — Cortex detects emotional valence and encodes those memories with more force. They decay slower, compress later, and surface faster. Like how you remember your worst production outage in vivid detail but can't recall last Tuesday's standup. (Wang & Bhatt 2024; Yerkes-Dodson 1908)

Background consolidation runs like sleep. When you're away, a consolidation cycle processes recent memories: decays old ones, compresses verbose ones, promotes recurring patterns into general knowledge (episodic → semantic transfer), discovers entity relationships, and runs "dream replay" where related memories are compared and new connections emerge. (McClelland et al. 1995; Foster & Wilson 2006; Buzsáki 2015)

Similar memories stay distinct. Pattern separation — modeled on the dentate gyrus, which keeps "Tuesday's standup" separate from "Wednesday's standup" even though they're almost identical. Without this, retrieval returns the same generic match for every similar query. (Leutgeb et al. 2007; Yassa & Stark 2011)

45 papers total. Every algorithm, constant, and threshold traces to a published source. Full citations, equations, ablation data, and per-module implementation audit: docs/papers/science.md | Thermodynamic memory paper (PDF, 30 pages) | Structured context assembly paper (PDF, 37 pages) | Research post on structured context assembly

Hippocampal Replay: context that survives compaction

Claude Code has a 200k/1M token context window. During long sessions, when that window fills up, it compacts: summarizes older messages, strips tool outputs, paraphrases your instructions. Important nuance evaporates. Decisions you anchored early in the conversation dissolve into vague summaries.

Hippocampal Replay fixes this. Named after the neuroscience phenomenon where your brain replays important experiences during sleep to consolidate them, it treats context compaction as "sleep" and replays what matters when Claude "wakes up."

Before compaction hits, a hook fires. Cortex drains your active context — what you were working on, which files were open, what decisions you'd made, what errors were unresolved — and stores it as a checkpoint.

After compaction, a second hook fires. Cortex reconstructs your context intelligently. Not by dumping everything back in, but by assembling the right pieces: your latest checkpoint, any facts you'd anchored as critical, the hottest project memories, and predictions about what you'll need next.

You can be explicit about what matters:

cortex:anchor({ content: "We're using the event-sourcing pattern. All state changes go through the event bus.", reason: "Architecture constraint" })

Anchored memories get maximum protection. They always survive compaction, no matter what.

Autonomous project wiki

Cortex's wiki is a self-curating per-project knowledge base — not a memory dump. Every project the registry knows about is driven toward 15 canonical documentation slots (product overview, architecture, services, code walkthrough, public API, data flow, commands, MCP integration, tooling, CI/CD, AI usage, operations, PRDs, decisions, onboarding) and every source file gets 13 canonical sections (Purpose, Public API, Dependencies, Callers, How it works, Invariants, What can go wrong, Tests, Sequence diagram, Parameters, Request example, Response example, See also).

The wiki view organizes everything by project first (left panel). Click into any project — its 15 canonical slots become headings with the pages that fill them; everything below is the file-doc layer with its 13 sections per file. Breadcrumbs and [[wiki/path]] cross-links make traversal one click. Pages whose canonical sections are still missing carry a yellow banner naming exactly what's not yet written and what the worker will fill next.

What makes it autonomous:

SessionStart hook auto-spawns a background consolidate cycle every 6 hours (stamp at ~/.claude/methodology/.last_consolidate). You never run consolidate by hand.
Curation-gap detector + headless authoring worker. Every file-doc page declares the sections it's missing in frontmatter (curation_gaps:); a worker drains those gaps by invoking claude -p (your existing Claude Code credentials, no API key) which uses the codebase intelligence MCP tools — codebase_context, codebase_impact, codebase_ownership, codebase_query — to ground each section in the actual call graph before writing.
Missing-anchor authoring. When a project has no architecture / services / api / data-flow / operations / ADR / PRD page, the worker authors them from the source tree (top-level structure + README + manifest + CLAUDE.md) with the same MCP-tool grounding.
Drift detection. Pages whose cited source files moved, whose mtime is stale (default >60 days), or whose body is off-template are flagged and re-authored in place.
Visible curation gaps. Open any file-doc page in the wiki view and a yellow banner shows ⚠ Page N% curated — M sections still missing listing exactly what's not written and what should go there. Deletion is never the policy; visibility is.
Per-project dashboards under wiki/_dashboards/<project>.md show slot-fill rate, file-coverage %, open gaps, and the queue of pages waiting for the next consolidate cycle.
ADRs as task-records. Every completed task (≥1 commit at session end) auto-drafts an ADR with the five mandatory sections — Entry / Mandatory elements / How / Result / Serves — filled from commit subjects + the session's memories; the worker refines it on the next cycle.
Wiki view shows project as the top-level axis. Left panel: Project → Kind → Pages. Welcome screen: a grid of every project with coverage badges. [[wiki/path]] cross-links resolve; bare slugs route to filtered search.
Mermaid diagrams have a lens. Click 🔍 on any rendered diagram to open a viewport-sized viewer with zoom (wheel + buttons + keys) and pan (drag).

Explore it:

/cortex-visualize — opens the wiki view (Wiki tab) with the project tree and curation banners; same launcher for the Graph / Knowledge / Board / Pipeline views over the same data.
curate_wiki — returns mixed authoring jobs (coverage gaps + drift re-authors + memory clusters) in priority order; the headless worker consumes these.
get_causal_chain — trace how one decision led to another.
get_project_story — auto-generated narrative of your project's evolution.
detect_gaps — find areas where knowledge is thin or isolated.

This isn't documentation you write. It's documentation Cortex authors and verifies for you, every 6 hours, until every project reaches 15/15 slot coverage and every source file has all 13 canonical sections filled.

Neural Graph

Launch with /cortex-visualize. The default landing view is Graph — a live, radial-hierarchical map of everything Claude has ever done in your projects. Knowledge / Wiki / Board / Pipeline tabs sit over the same data for different reading angles.

Graph View — the Claude workflow map. Each project becomes a cloud of nodes around one gold domain hub. Inside every cloud, nodes are arranged in six concentric levels by the Claude surface (or the code itself) that produced them:

Level	What's there	How to click through
L1 · Claude setup	Skills · Commands · Hooks · Agents · MCPs	Click a skill for its file path; click an MCP to see which domains share it (thin indigo threads bridge clouds)
L2 · Tools	One hub per Claude tool per domain (Edit · Write · Read · Grep · Glob · Bash · Task)	Click a hub for files touched + total uses
L3 · Files	Every file Claude ever opened, read, edited, searched, or referenced in a Bash command — colored by primary tool (green edited / cyan read / fuchsia searched / orange bash-only)	Click for `first_seen`, `last_accessed`, `last_modified`, and a See diff against HEAD button that renders new/modified/deleted/historical content inline
L4 · Discussions	One node per Claude Code session	Click for `started_at`, duration, message count, and a View full conversation button that replays every turn (including tool calls)
L5 · Memories	Persistent memories, colored by consolidation stage (labile → early LTP → late LTP → consolidated → semantic)	Click for full content, tags, and every scientific measurement
L6 · AST symbols	The code itself — functions (cyan), methods (sky), classes/structs/enums/traits/protocols (violet), modules/packages/namespaces (amber), constants/fields/properties (slate) — parsed from 10 languages (Rust, Python, TypeScript, Java, Kotlin, Swift, Objective-C, C, C++, Go) and laid out as petals around their parent file in L3	Click for qualified name, symbol type, parent file, and the named edges: `defined_in`, `calls`, `imports`, `member_of`. A symbol imported by two projects sits in the space between their clouds, making `what connects to what` literally the shape of the code

What L6 is for. L5 and below tell you what Claude did; L6 tells you what the code is. Once AST symbols are on the map, three things become visible for free: (1) shared code — any function, class or module referenced by two projects drifts into the inter-project gap, so reused primitives reveal themselves without a dependency audit; (2) impact — clicking a symbol surfaces every caller, importer, and member edge, so "if I change this, what breaks?" is a graph neighbourhood, not a grep; (3) the picture of the codebase itself — because the forces come from real defined_in / calls / imports / member_of edges, a dense petal around a file means a fat internal API and a thin one means a leaf module. Click any node and the side panel lists the named callers, imports, and members instead of a bare count. L6 nodes are the only ones without a fixed radial slot — they orbit their parent file, so the layer collapses cleanly when you filter it out.

Thin dashed violet threads between clouds mark cross-domain files and shared MCPs. A single grouped filter select (All / L1–L6 / by kind / by file cluster / by AST edge kind / Cross-domain) isolates any slice; a text search narrows within that slice.

Everything Claude touches live is visible: Edit, Write, Read, Grep, Glob, NotebookRead, NotebookEdit, and Bash paths inside commands — captured via the PostToolUse hook with compact markers so the graph rebuilds every ~2 minutes with fresh data.

Board View — consolidation stages as kanban columns (labile · early_ltp · late_ltp · consolidated · reconsolidating). Each column header reads live bucket metrics: decay rate, vulnerability, plasticity, heat / importance / encoding / interference medians, hippocampal dependency, and replay count — with the advancement rule (replay ≥ 3, DA ≥ 1 or imp > 0.3, etc.) printed under the bar. "At-risk" counter flags memories near promotion or decay. Cards inside each column carry heat, importance, surprise, valence, arousal, and the exact tool that created the memory.

Detail panel — every measurement explained. Clicking a memory (or a file, skill, command, agent, hook, MCP, discussion) opens a modal with the raw value and a one-line plain-language explanation. Consolidation stage, activity (heat), baseline activity, importance, surprise, emotional tone, emotional intensity, confidence, plasticity, stability — each is a labeled bar with a sentence like "How unexpected this memory was when it arrived. Surprises stick in the mind better than routine events." No more staring at opaque numbers.

Knowledge View — curated memory cards with heat-based left border, emotion tag, consolidation stage, and evidence file references. Filter by domain or emotion; click any card for a full-screen detail panel with Markdown + JSON pretty-print.

Wiki View — every project organized as Project → Kind → Pages in the left panel, with the welcome screen showing a coverage grid (scope % + file % + missing scopes per project). Click Edit on any page to open a split-pane editor with the Markdown source on the left and a live-preview pane on the right:

Each page renders with:

EB Garamond body, IBM Plex Mono code, centered academic-paper layout
Curation-gap banner — every file-doc page declares which of its 13 canonical sections are still missing or thin, with the description of what should go in each. The autonomous worker drains the queue every cycle; the banner shrinks visibly.
[[wiki/path]] cross-links resolve to other pages; bare slugs route to filtered search
Mermaid lens — magnifier button on every rendered diagram opens a viewport viewer with zoom + pan
Heat bar, lifecycle pill (active / area / archived / evergreen), staleness flag, backlinks footer
Inspector drawer — full audit trail (memos, source claim events, draft history) for every page
Inline CodeMirror 6 editor + live preview with KaTeX math (see Write Papers in Cortex above)
BibTeX citations, figure/equation/table auto-numbering, cross-references
Pandoc export → PDF / LaTeX / DOCX / HTML
wiki/_dashboards/<project>.md — per-project coverage scoreboard, regenerated each consolidate cycle

Pipeline View — horizontal Sankey flow from domains through the write gate into consolidation stages. Width of each ribbon = memory volume. Makes retention and drop-off across stages visible at a glance.

Agent Integration

Cortex works with teams of specialized agents — and in 3.17.0 it uses one itself. The headless authoring worker is a Claude agent that wakes up every six hours and drains the wiki curation queue: it reads page metadata, queries the codebase-intelligence MCP tools to ground each section in real call-graph facts, then invokes claude -p (your existing Claude Code credentials, no API key) to author the missing content. The result is autonomous documentation maintained by the same agent infrastructure your team already uses.

Each agent — Cortex's worker included — has scoped memory (agent_topic) while sharing critical decisions across the team — based on Wegner's transactive memory theory (1987): teams store more knowledge than individuals because each member specializes.

Specialization — each agent writes to its own topic. Engineer's debugging notes don't clutter tester's recall. The headless wiki worker writes to agent_topic=wiki-curation so its drafts don't pollute interactive recall.

Coordination — decisions auto-protect and propagate. When engineer decides "use Redis over Memcached," every agent sees it at next session start. ADRs the worker drafts at session end (Entry / Mandatory / How / Result / Serves) are part of this propagation: the task-record IS the cross-agent shared memory.

Directory — entity-based queries span all topics. "What do we know about the reranker?" returns results from engineer, tester, researcher, AND the wiki worker's drafts naming the reranker.

Autonomous wiki maintenance. SessionStart auto-spawns the consolidate hook every six hours. The hook reads ~/.claude/methodology/.last_consolidate; if older than the TTL, it spawns a detached subprocess that drains the curation-gap queue. No cron, no daemon, no manual invocation — the agent runs because you open Claude Code, and stops when nothing's left to author.

Works with any custom agents. See zetetic-team-subagents for a ready-made team of 27 specialists — each with scoped memory that doesn't clutter the others.

Architecture

Clean Architecture with strict dependency rules. Inner layers never import outer layers. The 3.17.0 release added the autonomous wiki-curation subsystem on top of the existing layers without crossing any boundary.

Layer	What lives here	Count
shared/	Pure utilities (text, hash, similarity, types)	11 modules
core/	Neuroscience + retrieval + wiki curation logic	175+ modules
context_assembly/	Structured context assembler	10 modules
infrastructure/	PostgreSQL, embeddings, file I/O, MCP client	33 modules
handlers/	MCP tools + consolidation cycles	91+ tools / 49 MCP-exposed
hooks/	Lifecycle automation (incl. autonomous consolidate spawn)	9 hooks
server/	HTTP standalone + wiki API + viz	4 modules
observability/	Prometheus text-format metrics	1 module

Wiki curation subsystem (3.17.0):

core/wiki_coverage.py — 42 canonical project scopes, per-domain audit, file-level coverage, anchor freshness window.
core/wiki_curation_gaps.py — 14 canonical file-doc sections, gap detection.
core/wiki_drift.py — drift detection (missing source citations, stale mtime, off-template).
core/wiki_stub_detector.py — placeholder + shallow-content scoring.
core/wiki_file_doc_skeleton.py — generates skeletons with visible gap markers.
core/wiki_coverage_dashboard.py — per-project wiki/_dashboards/<project>.md regenerated each cycle.
core/auto_task_record.py + handlers/auto_task_record_writer.py — session-end auto-draft of ADRs with Entry/Mandatory/How/Result/Serves.
handlers/consolidation/wiki_maintenance.py — purge + audit + dashboard generation.
handlers/consolidation/headless_authoring.py — drains the gap queue via claude -p with codebase-intelligence MCP tools grounding each section.
hooks/consolidate_background.py + session_start.py — autonomous 6-hour spawn loop.

Storage: PostgreSQL 15+ with pgvector (HNSW) and pg_trgm. All retrieval in PL/pgSQL stored procedures.

Concurrency (v3.13+): psycopg_pool.ConnectionPool with two latency classes — interactive_pool (min=2, max=8) for recall/remember/anchor, batch_pool (min=1, max=2) for consolidate/ingest. Tool handlers run on worker threads via asyncio.to_thread; per-tool admission semaphores bound fan-out. Heat is a function computed at read time by effective_heat() — homeostatic writes one scalar per domain per run instead of N rows.

Configuration: Set DATABASE_URL (default: postgresql://localhost:5432/cortex). All parameters use CORTEX_MEMORY_ prefix — see mcp_server/infrastructure/memory_config.py for the full list (~40 parameters). Wiki cycle TTL is CORTEX_CONSOLIDATE_TTL_HOURS (default 6h).

Security

Runs 100% locally — MCP over stdio, PostgreSQL on localhost, visualization on 127.0.0.1. No data leaves your machine. Audit score: 91/100.

Development

pytest                    # 2500+ tests
ruff check .              # Lint
ruff format --check .     # Format

Verification

Every benchmark headline number above is backed by a per-mechanism ablation campaign on the appropriate benchmark for each mechanism's mechanism-of-action. The campaign comprises three artefact sets at full n on a single-seed protocol with code SHAs, dirty flags, manifests, and per-row JSON outputs preserved alongside the writeups:

LongMemEval-S, 17 rows, n=500 — tasks/e1-v3-results.md. Per-mechanism deltas across the integrated stack at the calibrated equilibrium; category-specialization analysis.
LoCoMo, 14 rows, n=1986 (pre-plasticity-fix bytes) — tasks/e1-v3-locomo-results.md. Two-baseline (NO_CONSOLIDATION / WITH_CONSOLIDATION) design; empirical resolution of the architectural-mismatch hypothesis (RECONSOLIDATION ΔMRR = +0.0076, ADAPTIVE_DECAY ΔMRR = -0.0163).
LoCoMo, 14 rows, n=1986 (post-plasticity-fix bytes) — tasks/e1-v3-locomo-results-post-fix.md. Re-run on commit 2f45bcb (descendant of plasticity result-shape fix 5f737fe); cadence-fix anchor agreement re-validated identically (ΔvsNO = +0.0014); two consolidation-only rows (HOMEOSTATIC_PLASTICITY, SCHEMA_ENGINE) recover positive contributions previously masked by the contract bug.

Total: 45 per-mechanism evidence rows across 26 enum mechanisms (17 read-path + 9 consolidation-only routed to LoCoMo). The full thermodynamic memory paper, including §6.3 per-mechanism evidence and §6.3.4.1 plasticity-fix re-run subsection, is at docs/arxiv-thermodynamic/main.pdf (30 pages, all 45 citations resolved). The companion structured context assembly paper is at docs/arxiv-context-assembly/main.pdf (37 pages).

License

MIT

Citation

If you reference the system, the paper PDFs on main are the canonical artefacts (arXiv IDs forthcoming, endorsement in progress):

@software{cortex2026,
  title={Cortex: Persistent Memory for Claude Code},
  author={Deust, Clement},
  year={2026},
  url={https://github.com/cdeust/Cortex}
}

@unpublished{deust2026thermodynamic,
  title={Thermodynamic Memory for Conversational Agents:
         A Per-Mechanism Ablation Study on LongMemEval and LoCoMo},
  author={Deust, Clement},
  year={2026},
  note={arXiv ID forthcoming, endorsement in progress},
  url={https://github.com/cdeust/Cortex/blob/main/docs/arxiv-thermodynamic/main.pdf}
}

@unpublished{deust2026context,
  title={Structured Context Assembly for Long-Horizon Conversational Memory},
  author={Deust, Clement},
  year={2026},
  note={arXiv ID forthcoming, endorsement in progress},
  url={https://github.com/cdeust/Cortex/blob/main/docs/arxiv-context-assembly/main.pdf}
}

Name		Name	Last commit message	Last commit date
Latest commit History 617 Commits
.claude-plugin		.claude-plugin
.claude		.claude
.github		.github
_pipeline		_pipeline
agents		agents
assets		assets
benchmarks		benchmarks
commands		commands
docker		docker
docs		docs
mcp_server		mcp_server
scripts		scripts
skills		skills
tasks		tasks
tests_py		tests_py
ui		ui
video		video
.dockerignore		.dockerignore
.gitignore		.gitignore
.mcp.json		.mcp.json
.npmignore		.npmignore
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
glama.json		glama.json
package.json		package.json
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Getting Started

Write papers in Cortex

What this actually feels like

Retrieval that actually works

LongMemEval — can you find a fact from 40 sessions ago?

LoCoMo — can you handle trick questions and multi-hop reasoning?

BEAM — 10 million tokens of conversation, one memory system

The science under the hood

Hippocampal Replay: context that survives compaction

Autonomous project wiki

Neural Graph

Agent Integration

Architecture

Security

Development

Verification

License

Citation

About

Uh oh!

Releases 54

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Getting Started

Write papers in Cortex

What this actually feels like

Retrieval that actually works

LongMemEval — can you find a fact from 40 sessions ago?

LoCoMo — can you handle trick questions and multi-hop reasoning?

BEAM — 10 million tokens of conversation, one memory system

The science under the hood

Hippocampal Replay: context that survives compaction

Autonomous project wiki

Neural Graph

Agent Integration

Architecture

Security

Development

Verification

License

Citation

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 54

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages