diff --git a/docs/decisions/021-memory-plane-and-pluggable-recall.md b/docs/decisions/021-memory-plane-and-pluggable-recall.md new file mode 100644 index 0000000..ff3509a --- /dev/null +++ b/docs/decisions/021-memory-plane-and-pluggable-recall.md @@ -0,0 +1,520 @@ +# ADR-021: Memory Plane as a Compiled Capability + +**Date:** 2026-03-30 +**Status:** Draft +**Depends on:** ADR-017 (Pod Defaults and Service Self-Description), ADR-018 (Session History and Persistent Memory Surfaces), ADR-020 (Compiled Tool Plane with Native and Mediated Execution Modes) +**Amends:** ADR-018 (defines the derived retrieval plane), ADR-020 (extends the canonical capability IR) +**Implementation:** Plan: docs/plans/2026-03-30-memory-plane-and-pluggable-recall.md + +## Context + +ADR-018 established the substrate for infrastructure-owned retention: + +- `cllama` writes normalized per-agent session history +- session history and runner-owned `/claw/memory` are separate surfaces +- Phase 2 (scoped read API) and Phase 3 (derived retrieval) were deferred + +ADR-020 establishes the next major pattern: + +- services self-describe through `claw.describe` +- `claw up` compiles per-agent manifests +- `cllama` may mediate request-time behavior from those manifests +- the backend implementation remains external + +The open question is how memory fits this model. + +Memory clearly resembles ADR-020's compiled capability flow: + +- it should be declared by a self-described service +- the consumer should subscribe in pod YAML +- `claw up` should compile per-agent manifests +- `cllama` should orchestrate request-time behavior +- backend logic should remain external and swappable + +But memory is not the same kind of capability as either feeds or tools: + +- **Feeds** are query-agnostic live context with TTL semantics. +- **Tools** are model-invoked callable operations. +- **Memory** needs both a synchronous pre-turn recall path and an asynchronous post-turn retain path, both initiated by infrastructure rather than by the model. + +If memory is forced into the feed shape, recall becomes query-blind and loses most of its value. + +If memory is forced into the tool shape, recall becomes opt-in at the model layer and loses the reliability that makes it infrastructure-worthy. + +The right question is therefore not "is memory a feed or a tool?" + +The right question is: "how does memory extend the same compiled-capability architecture without pretending to be a different lifecycle than it is?" + +## Decision + +### 1. Memory is a first-class capability in the same compiled model as feeds and tools + +ADR-020's architectural pattern is the right one: + +- declare capability in `claw.describe` +- subscribe in pod YAML +- compile per-agent runtime artifacts +- let `cllama` mediate request-time behavior + +Memory follows that exact pattern. + +It is therefore part of the same compiled capability model as feeds and tools. + +It is not a plugin universe, not a runner-local convention, and not a special one-off side channel. + +### 2. Memory is a sibling capability, not a subtype of feeds or tools + +The canonical capability model is extended from: + +- `tools[]` +- `feeds[]` +- `skill` +- `endpoints[]` + +to: + +- `tools[]` +- `feeds[]` +- `memory` +- `skill` +- `endpoints[]` + +The distinction is lifecycle, not importance. + +| Capability | Primary purpose | Trigger | Query-aware | Typical artifact | Runtime owner | +|---|---|---|---|---|---| +| `feeds[]` | Ambient live context | Service/polling cadence | No | `feeds.json` | `cllama` fetch + inject | +| `tools[]` | Explicit callable operations | Model tool call | Yes, via arguments | `tools.json` or runner config | `cllama` (`mediated`) or runner (`native`) | +| `memory` | Derived durable context and retention hooks | Infra lifecycle | Yes | `memory.json` | `cllama` | + +Feeds tell the model what is happening now. + +Tools let the model do something on purpose. + +Memory lets infrastructure retain derived continuity and re-surface it automatically. + +### 3. Memory is `mediated` by definition in v1 + +ADR-020 distinguishes `native` and `mediated` execution for tools. + +Memory does not follow that split in the same way. + +For the ambient memory plane, Clawdapus only supports the mediated model: + +- `cllama` orchestrates recall before the upstream inference request +- `cllama` dispatches retain after the request completes +- `cllama` applies governance filters on both directions + +There is no runner-native equivalent that preserves the trust boundary, compile-time determinism, and cross-runner portability that motivate this feature. + +This does **not** mean memory services can never expose tools. + +A memory service may also declare ordinary `tools[]`, such as: + +- `search_memory` +- `pin_fact` +- `forget_memory` +- `list_open_commitments` + +Those explicit operations live on the tool plane and follow ADR-020 normally. + +But ambient recall and retain are part of the memory plane, not the tool plane. + +### 4. ADR-020's descriptor version should be treated as the umbrella capability version + +ADR-020 already proposes `claw.describe` `version: 2`. + +Because ADR-020 is still draft and unimplemented, this ADR amends its interpretation: + +`version: 2` is the umbrella schema version for compiled service capabilities, not a tools-only bump. + +A `version: 2` descriptor may therefore include any combination of: + +- `feeds[]` +- `tools[]` +- `memory` +- `skill` +- `endpoints[]` + +This avoids pointless schema churn where tools land as one `v2` and memory immediately forces a second incompatible revision for the same implementation wave. + +If ADR-020 were to ship first exactly as currently written, then memory would need either: + +- an explicit amendment to ADR-020 before implementation, or +- a `version: 3` descriptor bump + +The preferred path is to avoid that split and treat `v2` as the shared capability-evolution step. + +### 5. The memory capability is declared by providers and subscribed to by consumers + +Memory follows the same provider-owns, consumer-subscribes rule as feeds and tools. + +A service declares memory capability in its descriptor. + +An agent subscribes to exactly one memory relationship in pod YAML. + +That relationship points to one service boundary, even if the service internally layers multiple strategies such as: + +- semantic retrieval +- graph memory +- rolling summaries +- periodic consolidation + +Clawdapus should not expose an arbitrary stack of memory backends directly to one agent. + +### 6. The memory descriptor is small and lifecycle-shaped + +The provider descriptor adds an optional `memory` object: + +```json +{ + "version": 2, + "description": "Derived memory service", + "memory": { + "recall": { "path": "/recall" }, + "retain": { "path": "/retain" }, + "forget": { "path": "/forget" } + }, + "auth": { "type": "bearer", "env": "MEMORY_API_TOKEN" } +} +``` + +Notes: + +- `recall` is required when a service wants to participate in hot-path context injection +- `retain` is required when a service wants low-latency processing of new turns +- `forget` is optional and reserved for governed operations + +The descriptor does **not** negotiate a semantic vocabulary for recall inputs. + +`cllama` sends a fixed payload shape with only simple numeric bounds configured at compile time. + +The service ignores fields it does not need. + +### 7. Pod subscription is explicit and singular + +The consumer surface in pod YAML is an explicit memory relationship: + +```yaml +x-claw: + memory: + service: team-memory + timeout-ms: 300 +``` + +Pod-level `memory-defaults` follows the normal defaults model. + +Service-level declaration overrides the default unless `...`-style list composition is later proven necessary. + +For memory, the default expectation is one relationship, not list composition. + +V1 should keep this operator surface small. + +Simple numeric shaping such as recent-window size, request byte caps, and injected byte caps should begin as implementation defaults rather than as a large user-facing knob surface. + +### 8. `claw up` compiles a dedicated per-agent `memory.json` + +Memory follows ADR-020's compile pipeline: + +| Step | Feeds | Tools (`mediated`) | Memory | +|---|---|---|---| +| Descriptor declares | `feeds[]` | `tools[]` | `memory` | +| Consumer policy | `feeds:` subscription | `tools:` allowlist | `memory:` relationship | +| Artifact written | `feeds.json` | `tools.json` | `memory.json` | +| Runtime consumer | `cllama` feed fetcher | `cllama` mediator | `cllama` recall/retain orchestrator | + +`memory.json` is per-agent because: + +- auth is per agent +- the subscribed service is per agent +- future policy and observability may differ per agent + +The manifest shape should be simple: + +```json +{ + "version": 1, + "service": "team-memory", + "base_url": "http://team-memory:8080", + "recall": { + "path": "/recall", + "timeout_ms": 300 + }, + "retain": { + "path": "/retain" + }, + "forget": { + "path": "/forget" + }, + "auth": { + "type": "bearer", + "token": "resolved-token-value" + } +} +``` + +Auth resolution follows the same order as ADR-020 mediated tools and existing feeds: + +- projected per-agent service credential when available +- otherwise descriptor-declared auth when that fallback is valid + +Memory should not invent a second auth model. + +The implementation may still compile default bounds into the runtime config, but those should begin as internal defaults, not as a large operator-facing contract. + +### 9. The memory plane has three distinct operations + +#### Recall + +Recall is synchronous and hot-path: + +1. `cllama` authenticates the agent as usual +2. `cllama` loads `memory.json` +3. `cllama` builds a bounded recall payload from the current request +4. `cllama` calls the memory service +5. `cllama` filters and injects the returned blocks +6. `cllama` forwards the enriched request upstream + +Recall exists to surface **derived durable state**, not transcript tails. + +If recall fails, the request continues without memory by default. + +At the contract level, recall has a fixed shape: + +- request carries agent identity, pod identity, basic request metadata, and the latest user message plus a small bounded recent context window +- response carries a bounded list of text blocks with optional metadata such as `kind`, `source`, `score`, and `ts` + +The exact JSON can evolve, but that shape is architectural. + +The recent context window is an implementation default in v1, not a large declarative vocabulary and not an operator tuning surface unless real usage proves it necessary. + +Injection is provider-format-aware. + +The implementation must resolve how the same logical memory block is rendered for: + +- OpenAI-style `messages[]` requests +- Anthropic-style requests with top-level `system` handling + +This ADR does not prescribe the exact injection primitive, but it does require one bounded logical memory block that is inserted consistently across both request families. + +#### Retain + +Retain is asynchronous and best-effort: + +1. the normalized session-history entry is appended to the ledger +2. `cllama` dispatches that same normalized entry to the memory service +3. failures are observed but do not fail the already-completed inference request + +Retain exists to reduce freshness lag. + +It does not replace ledger durability. + +#### Forget + +Forget is governed and optional: + +- it is not a normal runner capability +- it exists for operator policy, future Master Claw workflows, and backfill hygiene + +Forget applies to the external memory backend and to replay behavior. + +It does **not** justify mutating the append-only ledger in place. + +Instead, forgetting requires tombstone or redaction metadata that future replay and backfill paths honor. + +### 10. Memory traffic must be observable + +Memory mediation is part of the governed request path and must emit structured telemetry. + +At minimum, the implementation should record: + +- whether recall was attempted, skipped, succeeded, timed out, or failed +- recall latency +- number of blocks returned +- number of blocks removed by policy +- injected byte count +- whether retain delivery succeeded or failed +- retain delivery latency + +This should align with the existing structured logging and audit direction rather than inventing a separate unstructured debug path. + +### 11. Session history remains the substrate and source of truth + +This ADR does not change ADR-018's ownership boundary. + +The roles become: + +- `history.jsonl`: immutable ledger, audit substrate, replay substrate +- memory service: derived state, indexing, summarization, salience, ranking +- `cllama`: orchestration, policy filtering, hot-path injection, best-effort delivery +- runner `/claw/memory`: local scratchpad and portable runner-owned state + +This means memory quality can improve radically over time without changing the retention substrate. + +### 12. Backfill is first-class, not a repair hack + +The retain webhook is only the low-latency path. + +A memory service must also be able to build or rebuild from the ledger. + +This requires: + +- ADR-018 Phase 2 style scoped history read access +- a future explicit replay or backfill flow +- replay semantics that honor forget tombstones + +Without backfill, the retain webhook is merely a convenience. + +With backfill, memory services become truly swappable. + +ADR-018 Phase 2 style scoped history read access is therefore a prerequisite for the first supported rollout of the memory plane. + +A local prototype may read ledger files directly, but that is not sufficient for the supported, swappable, runner-agnostic memory plane this ADR defines. + +### 13. Memory is not the same as runner session continuity + +The runner still owns immediate conversational recency. + +The memory plane is deliberately not a strong read-after-write substitute for the runner's live session window. + +That is acceptable because the memory plane is for: + +- cross-session continuity +- durable facts +- older episodic summaries +- decisions and commitments +- long-range project state + +not for replaying the last few raw turns back into the model. + +### 14. Operators should prefer one ambient memory plane + +When the infrastructure memory plane is enabled, runner-native memory injection and runner-native memory-search tools may become redundant or actively conflicting. + +If `cllama` injects governed memory context while the runner also injects its own memory context, the agent may receive: + +- duplicate facts +- contradictory summaries +- repeated commitments +- mismatched privacy or forgetting policy + +The operational guidance should therefore be: + +- prefer the infrastructure memory plane as the single ambient recall mechanism +- disable runner-native memory plugins or memory injection where practical when using the infrastructure plane +- do not attempt generic forced disablement across all runners from Clawdapus itself + +Clawdapus should document this overlap explicitly rather than treating it as a purely neutral coexistence case. + +## Rationale + +### Why not model memory as a feed? + +Feeds are the wrong shape: + +- they are query-agnostic +- they are naturally TTL-cached +- they represent live service state, not derived continuity + +Memory recall needs the current request as input. + +If it does not, it is usually not doing real recall. + +### Why not model memory as a tool? + +Tool-based memory search is useful, but it is not sufficient as the infrastructure plane. + +If recall depends on the model deciding to call a tool: + +- reliability becomes model-dependent +- runners without shared tool hosting lose parity +- cross-runner portability collapses back toward runner plugins + +Explicit memory tools are a complement, not the substrate. + +### Why keep memory separate from runner-owned `/claw/memory`? + +The ownership boundary from ADR-018 is still correct. + +Runner memory is agent-authored and writable. + +Infrastructure memory is operator-governed and proxy-mediated. + +Collapsing them would blur authority and make replay, redaction, and audit much harder. + +### Why no `native` memory mode? + +Ambient memory recall is valuable precisely because it is reliable, governed, and runner-agnostic. + +If runners each implement their own retain and recall path: + +- persistence becomes runner-coupled +- backend stores fragment +- cross-runner continuity regresses +- policy enforcement becomes inconsistent + +That is the failure mode this ADR exists to avoid. + +### Why a dedicated `memory.json` instead of folding everything into one manifest? + +Today the codebase already uses small dedicated per-agent artifacts: + +- `metadata.json` +- `feeds.json` +- `service-auth/*.json` + +ADR-020 adds `tools.json`. + +Adding `memory.json` is consistent with that pattern and avoids prematurely inventing a generic super-manifest before the capability shapes stabilize. + +A future manifest unification is possible, but not required to land the architecture cleanly. + +## Consequences + +**Positive:** + +- Memory fits the same declare -> compile -> mediate architecture as feeds and tools. +- Memory vendors remain swappable behind one stable contract. +- Cross-runner continuity no longer depends on runner-native plugins or per-runner databases. +- Governance applies to both retention and recall traffic. +- Backfill and replay become first-class concerns rather than an afterthought. +- The descriptor and context changes can be made once, alongside ADR-020, instead of in two conflicting passes. + +**Negative:** + +- `cllama` gains another hot-path responsibility and must budget latency tightly. +- The capability model becomes broader: operators must understand feeds, tools, and memory as related but distinct surfaces. +- Runner-native memory systems may overlap or conflict with the infrastructure plane and require operator discipline. +- Because memory is mediated-only in v1, there is no short path that reuses runner-local memory plumbing. + +**Neutral:** + +- A memory service may expose both `memory` and `tools[]`; these are complementary, not duplicative. +- This ADR does not standardize embeddings, ranking, graph schemas, or salience logic. +- This ADR does not require immediate implementation of shared or cross-agent memory namespaces. + +## Implementation Direction + +To avoid shape churn, ADR-020 and this ADR should be implemented as one descriptor/context evolution wave. + +The practical order is: + +1. extend `internal/describe.ServiceDescriptor` for `version: 2` capability parsing +2. add the new user-facing pod grammar for tools and memory together +3. extend `internal/cllama.AgentContextInput` and manifest generation once +4. implement ADR-018 Phase 2 history read/backfill substrate +5. implement `memory.json` compilation and `cllama` recall/retain hooks +6. implement `tools.json` mediation and any shared manifest/auth helpers that fall out of the work + +The important point is not the exact file order. + +The important point is to avoid implementing ADR-020 as if tools are the only future compiled capability and then immediately refactoring the same surfaces again for memory. + +The first supported end-to-end checkpoint is after steps 4 and 5: + +- one self-described memory service can be wired to one agent +- recall injects derived blocks in the request path +- retain delivers normalized entries post-turn +- replay/backfill is supported through the scoped history surface + +Without that checkpoint, the system may be an interesting prototype, but it is not yet the supported memory plane defined by this ADR. diff --git a/docs/plans/2026-03-30-memory-plane-and-pluggable-recall.md b/docs/plans/2026-03-30-memory-plane-and-pluggable-recall.md new file mode 100644 index 0000000..0bdb1a2 --- /dev/null +++ b/docs/plans/2026-03-30-memory-plane-and-pluggable-recall.md @@ -0,0 +1,1140 @@ +# Memory Plane and Pluggable Recall Plan + +## Goal + +Introduce memory as a first-class infrastructure plane in Clawdapus: + +- runner-agnostic +- durable across rebuilds and runner swaps +- governed by `cllama` +- implemented by swappable memory services rather than by runner plugins + +This document is intentionally a plan, not yet an ADR. It is meant to sharpen the boundary between: + +- what Clawdapus should own +- what `cllama` should own +- what memory backends and vendors should own + +The central claim is: + +**Clawdapus should own the reliable lifecycle hooks and policy surface for memory, but it should not own the intelligence of memory itself.** + +**Raw recent history is not the product. Derived durable state is the product.** + +That intelligence includes retention strategy, salience, summarization, embeddings, graph extraction, affect modeling, ranking, deduplication, decay, and recall selection. + +Those should remain swappable behind a stable service contract. + +--- + +## Why This Exists + +The current repo already has the right primitives, but not yet a complete memory plane: + +- `cllama` already captures durable per-agent session history at the proxy boundary. +- `cllama` already injects live context through feeds and request decoration. +- Clawdapus already compiles per-agent manifests into mounted context directories. +- Services already self-describe through `claw.describe`. +- The manifesto already states that memory must survive the container and the runner. + +What is missing is the pipeline that connects: + +1. raw retained turns +2. derived memory artifacts +3. request-time recall + +through a clean, pluggable service contract. + +The problem is not only persistence. The problem is useful recall. + +If the system only re-injects the most recent raw turns, it adds very little. Runners already maintain live sessions and recency windows. The real value of infrastructure memory is the ability to surface durable, relevant, derived context that the live session window does not preserve reliably. + +Examples: + +- long-lived user preferences +- stable facts about operators, services, repos, or accounts +- open commitments and unresolved tasks +- previous decisions and their rationale +- episodic summaries from older sessions +- project state that spans many conversations +- cross-runner continuity after migration or rebuild + +In other words: + +**Raw recent history is not the product. Derived durable state is the product.** + +Raw history is still essential, but as the source of truth, not as the typical recall payload. + +--- + +## Current Repo Position + +The architecture in-tree already points in this direction. + +### 1. Infra-owned retention already exists + +ADR-018 established: + +- session history is infra-owned +- session history is written by `cllama` +- portable memory is runner-owned +- the two surfaces must remain distinct + +This is the correct foundation. Raw history should be captured once at the one place all cllama-enabled runners share: the governance proxy boundary. + +### 2. Request-time context injection already exists + +The current `cllama` implementation already: + +- loads per-agent feed manifests +- fetches live context +- prepends it into OpenAI and Anthropic requests +- injects current time + +So memory recall is not a new category of behavior. It is a new kind of request-time enrichment. + +### 3. Service self-description already exists + +Services already advertise capabilities through `claw.describe`, and `claw up` already compiles those capabilities into per-agent runtime artifacts. + +That means memory backends do not need to be runner plugins. They can be pod services with normal Clawdapus discovery, auth projection, and compile-time wiring. + +### 4. Tool mediation is already converging on the same architecture + +ADR-020 is already establishing the pattern: + +- service declares capability +- `claw up` compiles a manifest +- `cllama` may mediate or inject behavior at request time +- backend implementation remains external + +Memory should follow the same architectural logic instead of inventing a separate plugin universe. + +More strongly: + +**Memory should use the exact same structural pattern that tools are moving toward.** + +- provider declares capability in `claw.describe` +- consumer subscribes in pod YAML +- `claw up` compiles a per-agent manifest +- `cllama` enforces and mediates request-time behavior +- backend implementation remains swappable + +This parallel should be treated as core architectural framing, not as an incidental similarity. + +--- + +## Problem Statement + +Without a shared memory plane, users are pushed toward runner-local memory systems: + +- OpenClaw plugins +- per-runner vector databases +- runner-specific hooks +- incompatible stores and formats + +This has several structural downsides: + +- memory becomes coupled to one runner family +- changing `CLAW_TYPE` threatens continuity +- memory persistence depends on runner cooperation +- every runner may duplicate infrastructure work +- every runner may spin up its own retrieval stack +- governance over retained and recalled content becomes inconsistent + +This violates the repository's direction in two ways: + +1. It undermines runner-agnostic persistence. +2. It moves a governance-relevant concern back into the trusted application layer. + +Memory quality may ultimately be where a large share of agent performance comes from. That is a reason to expose strong hooks and clean contracts, not a reason to hardcode memory intelligence into the proxy or into runners. + +--- + +## Design Principles + +### 1. The ledger is sacred + +`history.jsonl` is the immutable substrate. + +It is: + +- append-only +- normalized +- operator-visible +- rebuildable input for any future memory backend + +Memory services may fail, change, or be replaced. The ledger remains the stable truth. + +### 2. Portable memory stays runner-owned + +`/claw/memory` remains: + +- runner-writable +- agent-authored +- format-agnostic +- separate from infra-owned history + +We should not collapse session history and portable memory into a single surface. + +### 3. `cllama` owns orchestration, not cognition + +`cllama` should know: + +- when to call recall +- when to call retain +- how to inject recall results +- how to apply policy filters +- how to measure failures and latency + +`cllama` should not know: + +- how to embed text +- how to rank memories +- how to build graphs +- how to infer affect +- how to compact or summarize +- which vendor algorithm is best + +### 4. Memory intelligence must be swappable + +Clawdapus should make it possible to plug in: + +- mem0 +- supermemory +- graph-based memory systems +- local pgvector/Qdrant/Chroma implementations +- simple rolling-summary stores +- domain-specific memory engines + +without requiring: + +- runner changes +- new proxy code for each backend +- store migration to a Clawdapus-owned schema + +### 5. One memory relationship per agent + +An agent should subscribe to one memory service, not to an arbitrary list of memory backends. + +If an operator wants a layered strategy, such as: + +- raw retention +- semantic recall +- graph memory +- periodic summarization + +that should be composed behind one memory service boundary. + +This keeps the Clawdapus surface simple and avoids exploding the agent-facing memory model. + +### 6. Recall should return derived state, not transcript tails + +The memory plane should optimize for: + +- stable facts +- commitments +- episodic summaries +- project state +- relevant long-range context + +not for "last N messages." + +If a backend cannot produce anything more useful than recent transcript slices, it should not yet be in the hot path. + +### 6.5. Hot-path latency must be budgeted aggressively + +Recall runs in the inference hot path, so it must be treated like an expensive privilege, not a free convenience. + +That implies: + +- strict short timeouts +- no automatic retries on the hot path +- graceful degradation when recall fails +- explicit per-agent opt-in +- bounded payload size + +If a backend cannot return useful derived state within the allowed budget, it should not be enabled for synchronous recall. + +### 7. Governance must apply to memory traffic too + +Memory is a cognitive surface. + +That means the same infrastructure that governs model traffic should be able to: + +- scrub sensitive data before retention +- redact or suppress recalled content before reinjection +- forget or purge retained content when policy requires it + +This is one of the strongest reasons not to leave memory solely inside runners. + +### 7.5. Forget must be compatible with an append-only ledger + +The raw ledger should remain append-only. + +That means a governed forget operation should not rewrite `history.jsonl` in place. + +Instead, forgetting should eventually work through: + +- deletion in the external memory backend +- a tombstone or redaction sidecar ledger owned by infrastructure +- replay and backfill logic that honors those tombstones and does not re-ingest forgotten material + +The goal is: + +- preserve auditability of the raw retention substrate +- prevent forgotten content from re-entering derived memory on a later rebuild or backfill + +### 8. Compile-time wiring, not runtime self-registration + +The memory relationship should be declared in pod YAML and compiled by `claw up`, just like feeds and tools. + +No runtime plugin discovery. +No runner-specific boot-time registration. +No hidden self-attachment logic. + +--- + +## The Memory Pipeline + +The proposed memory plane has four stages. + +### Stage 1: Capture + +`cllama` records every successful inference turn into the durable ledger. + +This already exists. + +Output: + +- append-only `history.jsonl` + +### Stage 2: Retain + +After a successful turn, `cllama` may send a best-effort structured retention webhook to a configured memory service. + +This is an optimization and low-latency trigger, not the source of truth. + +If it fails: + +- the turn is still durable in `history.jsonl` +- the memory service may catch up later from the ledger + +### Stage 3: Process + +The memory service performs its own internal work: + +- summarization +- salience extraction +- fact extraction +- embeddings +- graph linking +- deduplication +- affect tagging +- decayed ranking updates + +This stage is entirely outside `cllama`. + +### Stage 4: Recall + +Before forwarding the next model request upstream, `cllama` may query the memory service for relevant derived context and inject the returned memory blocks into the prompt. + +This is synchronous and bounded: + +- timeout-controlled +- size-capped +- policy-filtered + +This is where memory affects model behavior. + +--- + +## What Clawdapus Should Own + +Clawdapus should own the shared contract and lifecycle hooks. + +### A. The raw ledger + +Already implemented: + +- one normalized history stream per agent +- outside `.claw-runtime` +- durable across restarts and `claw up` + +### B. The memory relationship declaration + +At pod level and/or service level, operators should be able to declare: + +- which memory service an agent uses +- whether recall is enabled +- whether retain webhook is enabled +- bounded hot-path knobs such as timeouts and recall-context size + +### C. Compile-time wiring + +`claw up` should: + +- validate that the referenced memory service exists +- inspect its descriptor +- resolve URLs and auth +- project per-agent memory config into context +- mount the needed runtime files into `cllama` + +### D. The request lifecycle hooks + +`cllama` should: + +- call recall before the upstream LLM request +- inject returned memory blocks into the prompt +- call retain after a successful turn +- log memory hook failures and latency + +### E. Governance hooks + +`cllama` should be able to: + +- scrub retained content before forwarding to memory service +- redact recalled content before injecting it +- support a future governed `forget` action + +### F. Observability + +We should be able to answer: + +- did recall run? +- how long did it take? +- did it time out? +- how many bytes were injected? +- how many blocks were returned? +- did policy remove any blocks? +- did retain webhook fail? + +### G. A minimal reference implementation + +Clawdapus should eventually ship a small reference memory service image that proves the contract end-to-end. + +This reference is not meant to be state of the art. It is meant to: + +- validate the contract +- provide spike coverage +- offer a baseline for operators + +--- + +## What Clawdapus Should Not Own + +### 1. A universal memory algorithm + +Clawdapus should not define: + +- the one true salience metric +- the one true embedding model +- the one true summary format +- the one true graph extraction strategy + +### 2. Vendor-specific backend semantics + +Clawdapus should not hardcode: + +- mem0 APIs +- supermemory APIs +- Graphiti semantics +- Qdrant schema assumptions +- Chroma collection naming + +### 3. Per-runner memory plugins as the primary path + +Runners may still offer native memory tools or plugins, but those should not be the architecture Clawdapus depends on. + +### 4. Memory store internals + +Clawdapus should not care whether a backend uses: + +- SQLite +- JSONL +- Postgres + pgvector +- Qdrant +- graph DBs +- hybrid layers + +as long as it obeys the stable service contract. + +--- + +## Proposed User-Facing Model + +The agent should declare one memory service relationship. + +Suggested shape: + +```yaml +x-claw: + memory-defaults: + service: claw-memory + timeout-ms: 300 + +services: + analyst: + x-claw: + agent: ./agents/ANALYST.md + memory: + service: claw-memory +``` + +Notes: + +- `memory` should be an object, not only a scalar. +- We may support scalar sugar later, but the compiled model should be object-shaped. +- One memory service per agent is intentional. +- Simple payload-shaping bounds should begin as implementation defaults rather than as a large operator-facing knob surface. + +This is deliberately modest. The operator is declaring: + +- who the memory provider is +- how much hot-path budget is available + +The operator is not trying to teach Clawdapus how memory works internally. + +--- + +## Proposed Descriptor Extension + +The current descriptor should gain an optional memory capability section in the next descriptor version line. + +This plan must not create a second incompatible `claw.describe` version `2`. + +ADR-020 already drafts descriptor version `2` for tools. Memory must therefore do one of the following: + +- fold into the same `version: 2` descriptor expansion as tools +- or, if it lands later and cannot be merged cleanly, become `version: 3` + +There must not be two competing meanings for descriptor version `2`. + +Example: + +```json +{ + "version": 2, + "description": "Shared memory service with semantic recall and durable turn retention.", + "memory": { + "recall": { + "path": "/recall" + }, + "retain": { + "path": "/retain" + }, + "forget": { + "path": "/forget" + } + }, + "auth": { + "type": "bearer", + "env": "CLAW_MEMORY_TOKEN" + } +} +``` + +Notes: + +- `forget` is optional. +- The descriptor does not declare ranking semantics or embedding behavior. +- The descriptor does not negotiate a request vocabulary. +- The service receives a fixed bounded payload and ignores what it does not need. + +This matches the current Clawdapus style: + +- provider declares capability +- consumer subscribes by service +- `claw up` compiles the projection + +--- + +## Proposed Runtime Manifest + +`claw up` should compile a new per-agent manifest: + +```text +/claw/context//memory.json +``` + +This mirrors the current: + +- `feeds.json` +- `service-auth/` +- future `tools.json` + +Suggested shape: + +```json +{ + "service": "claw-memory", + "recall": { + "url": "http://claw-memory:8080/recall", + "enabled": true, + "timeout_ms": 300, + "max_bytes": 4096, + "recent_messages": 3, + "auth": "bearer-token-if-needed" + }, + "retain": { + "url": "http://claw-memory:8080/retain", + "enabled": true, + "auth": "bearer-token-if-needed" + }, + "forget": { + "url": "http://claw-memory:8080/forget", + "enabled": true, + "auth": "bearer-token-if-needed" + } +} +``` + +This manifest is consumed by `cllama`, not by the runner. + +That is important: + +- no runner plugin system required +- no per-runner memory client code +- no duplication across drivers + +--- + +## Proposed Wire Contracts + +The wire contract should be deliberately small. + +### 1. Recall + +`cllama` sends a fixed request body to the memory service. + +Suggested request: + +```json +{ + "agent_id": "analyst-0", + "pod": "trading-desk", + "ts": "2026-03-30T15:04:05Z", + "request_path": "/v1/chat/completions", + "requested_model": "anthropic/claude-sonnet-4", + "messages": [ + {"role":"assistant","content":"..."}, + {"role":"user","content":"..."}, + {"role":"user","content":"..."} + ], + "metadata": { + "timezone": "America/New_York" + } +} +``` + +Notes: + +- `messages` is bounded only by simple numeric limits such as recent message count or byte cap. +- The payload is intentionally generic. +- The memory service may ignore fields it does not need. + +Suggested response: + +```json +{ + "blocks": [ + { + "kind": "profile", + "text": "Operator prefers concise summaries and dislikes speculative tone.", + "source": "user-profile", + "score": 0.93, + "ts": "2026-03-28T12:00:00Z" + }, + { + "kind": "commitment", + "text": "Open action: finalize the migration ADR and reconcile model-policy docs drift.", + "source": "episodic-summary", + "score": 0.88, + "ts": "2026-03-29T19:00:00Z" + } + ], + "ttl_seconds": 30 +} +``` + +`cllama` then: + +- applies policy filtering +- formats the returned blocks into a bounded injected context block +- prepends that block into the outbound LLM request + +### 2. Retain + +Retain should be best-effort and should happen after a successful turn. + +Suggested request: + +```json +{ + "agent_id": "analyst-0", + "pod": "trading-desk", + "entry": { + "version": 1, + "ts": "2026-03-30T15:04:05Z", + "claw_id": "analyst-0", + "path": "/v1/chat/completions", + "requested_model": "anthropic/claude-sonnet-4", + "effective_provider": "anthropic", + "effective_model": "claude-sonnet-4", + "status_code": 200, + "stream": false, + "request_original": {}, + "request_effective": {}, + "response": {}, + "usage": {} + } +} +``` + +Notes: + +- The ledger remains the durable truth regardless of webhook outcome. +- The memory service may process this immediately or queue it internally. +- The retain contract deliberately reuses the normalized session-history entry rather than inventing a second event shape. + +### 3. Forget + +`forget` is optional and should be treated as a governance operation, not a normal runner capability. + +Suggested request: + +```json +{ + "agent_id": "analyst-0", + "scope": { + "from": "2026-03-01T00:00:00Z", + "to": "2026-03-30T00:00:00Z" + }, + "reason": "policy_redaction" +} +``` + +We should not overdesign this early. The important point is that the service contract leaves room for a governed deletion path. + +--- + +## Request Lifecycle in `cllama` + +The hot-path and non-hot-path behavior should be explicit. + +### Pre-turn recall path + +For every proxied inference request: + +1. resolve the agent identity as usual +2. load `memory.json` if present +3. if recall is enabled: + - build the bounded recall request + - call the memory service with a short timeout + - parse returned blocks + - apply policy filters + - inject the resulting memory block into the prompt +4. continue the normal upstream request flow + +Failure behavior: + +- timeout: continue without memory +- 5xx from memory service: continue without memory +- malformed response: continue without memory + +The memory plane should degrade gracefully. It must not become a single point of total inference failure by default. + +### Read-after-write semantics + +The memory plane should not promise strong read-after-write consistency for rapid-fire turns. + +That is acceptable because: + +- runner sessions already cover immediate recency +- the memory plane is for durable derived state, not for replacing the live conversation window +- the retain webhook is best-effort and asynchronous by design + +In practice this means: + +- the next turn may run recall before the previous turn has been fully processed by the backend +- the system remains correct because immediate continuity still comes from the runner session +- the memory plane improves medium- and long-range continuity, not single-turn echoing + +### Post-turn retain path + +After a successful upstream completion: + +1. write the normalized entry to the ledger as usual +2. if retain is enabled: + - dispatch a best-effort webhook to the memory service + - do not block the response already returning to the runner + +The retain webhook may fail silently except for observability and alerting. Recovery comes from the ledger. + +--- + +## What Counts As Real Memory Recall + +This is the product line we should draw explicitly. + +The recall layer is worthwhile when it returns: + +- durable facts +- user/operator preferences +- open loops and commitments +- prior decisions and rationale +- episodic summaries +- project state +- relevant older context outside the runner session window + +The recall layer is not yet worthwhile if it mostly returns: + +- the last few turns +- transcript tails that the runner already has +- an unprocessed dump of recent messages + +This distinction matters because it keeps the design honest. + +If the memory service is not adding meaningful abstraction over the live session window, it is not yet justifying hot-path latency. + +--- + +## Session Stitching + +Session stitching will come up quickly, but it should not be treated as a gating prerequisite. + +There are three levels: + +### Level 1: No stitching + +The service processes the full ledger keyed by agent identity and whatever surface metadata is already available. + +This is still useful for: + +- durable facts +- recurring preferences +- high-level commitments + +### Level 2: Soft stitching + +The service groups events by obvious metadata when it exists, such as: + +- DM peer +- thread ID +- channel ID +- task ID +- repo or project hints + +This is likely enough for early "resume where we left off" quality. + +### Level 3: Hard stitching + +The service infers continuity across fragmented contexts and restarts even when metadata is weak. + +This is valuable, but should remain a backend problem. + +Clawdapus does not need to solve stitching globally in order to provide a good memory plane. + +--- + +## Governance Model + +Memory traffic should be governable in both directions. + +### Retention governance + +Before retain webhook delivery, `cllama` may: + +- remove secrets +- redact known sensitive patterns +- suppress content classes from retention entirely + +### Recall governance + +Before reinjection, `cllama` may: + +- remove restricted content +- suppress blocks from disallowed sources +- cap categories or sizes +- redact content that now violates stricter policy than when it was originally retained + +### Forget governance + +The operator or a future Master Claw should be able to trigger targeted forgetting through a governed path. + +This is one of the strongest arguments for making memory a first-class infra surface rather than only a runner convenience. + +Forget must also be compatible with an append-only ledger. + +That implies a future forget implementation should likely include: + +- deletion in the external memory backend +- an infra-owned tombstone or redaction ledger +- backfill and replay logic that honors those tombstones and does not re-ingest forgotten material + +--- + +## Persistence Model + +The memory service itself is a normal compose service. + +That means its persistence model should be the same as other stateful pod services: + +- named volumes +- bind mounts +- external databases + +`claw up` and `claw down` should not destroy those stores unless the operator explicitly destroys them through normal container lifecycle actions. + +This is much cleaner than runner-local plugin stores because: + +- store lifetime is independent from the runner container +- one memory engine can serve many agents +- state can survive `CLAW_TYPE` migrations + +--- + +## Backfill And Replay + +Backfill should be treated as a first-class operation, not as an implementation detail. + +If a new memory backend is introduced after months of retained history already exist, the operator must be able to populate it from the ledger deterministically. + +The architecture should therefore assume a future explicit backfill path, likely involving: + +- a `cllama` history read API suitable for replay consumers +- a dedicated CLI flow such as `claw memory backfill` +- backend idempotency or replay markers so the same ledger can be consumed safely more than once + +The retain webhook is the low-latency path. Backfill is the durability path for new or recovering services. + +--- + +## Relationship to Runner-Native Memory + +Runners may continue to provide native memory tools or session systems. + +That is acceptable, but it should not be the infrastructure dependency. + +The intended architecture is: + +- runner-native session and short-term working memory remain local concerns +- infrastructure memory provides durable, governed, cross-session recall + +In practice, once an agent is behind `cllama` and subscribed to a memory service, many runner-native memory tools may become redundant. + +That redundancy is tolerable in principle but dangerous in practice. + +If the infrastructure plane injects memory context while the runner also injects its own memory context, the agent may see: + +- duplicate facts +- contradictory summaries +- repeated commitments +- different privacy or forgetting policies + +So the operational recommendation should be: + +- when using the infrastructure memory plane, operators should disable runner-native memory plugins or memory-search tools where practical +- Clawdapus should document that guidance clearly +- Clawdapus should not attempt to force-disable runner behavior generically across all runners + +Clawdapus should not attempt to disable runner-native memory features globally. It should provide a better shared path. + +--- + +## Recommended Phase Plan + +### Milestone 1: Complete ADR-018 Phase 2 and define backfill + +Add the self-scoped history read surface to `cllama`. + +Benefits: + +- memory services can consume normalized history through a stable proxy-owned interface +- backfill does not require filesystem coupling +- future operators and tools gain a consistent introspection surface +- replay becomes a first-class lifecycle rather than an implicit recovery hack + +This milestone should also define the expected operational backfill flow for new or recovering memory services. + +### Milestone 2: Add the memory capability and `cllama` hooks + +Implement: + +- descriptor extension for memory capability +- `x-claw.memory` +- pod defaults for memory +- `memory.json` compilation +- auth projection for memory services +- pre-turn recall call +- bounded injection +- post-turn retain webhook +- graceful degradation +- memory-specific observability events + +This is the first full end-to-end memory plane. + +### Milestone 3: Reference adapter and governance hardening + +Implement: + +- retain-side filtering +- recall-side filtering +- optional `forget` path +- tombstone-aware replay semantics +- alerting for repeated memory-service failures + +Provide a small baseline image, likely: + +- Go-based service +- durable SQLite or JSONL storage +- rolling summaries +- simple fact extraction +- simple BM25 or similarly boring local ranking +- no vendor-specific dependencies required to prove the contract + +This reference should be intentionally modest. The point is to validate the contract, not to define the state of the art. + +--- + +## Candidate File Map + +This is not a full implementation checklist, but it identifies the likely change surface. + +### Main repo + +- `internal/describe/descriptor.go` +- `internal/describe/registry.go` +- `internal/pod/types.go` +- `internal/pod/parser.go` +- `cmd/claw/compose_up.go` +- `internal/cllama/context.go` +- `internal/pod/compose_emit.go` +- `docs/CLLAMA_SPEC.md` +- a new ADR once the plan is accepted + +### cllama submodule + +- `cllama/internal/proxy/handler.go` +- `cllama/internal/agentctx/...` +- new memory manifest loader package or extension of existing context loading +- logging and audit additions + +--- + +## Open Questions + +These are important, but they should not block the core architecture. + +### 1. How much request context should recall receive? + +The proxy should send a fixed request shape with only simple numeric payload bounds such as: + +- last N messages +- max request bytes + +We should avoid a richer negotiated vocabulary here unless real implementations prove it necessary. + +### 2. Should recall responses support categories? + +Probably yes, eventually. + +Possible categories: + +- `profile` +- `commitment` +- `decision` +- `episode` +- `state` + +But the first version can simply accept opaque blocks with optional metadata. + +### 3. Should `cllama` cache recall results? + +Maybe, but not initially. + +Recall is more query-shaped than feeds. A poor cache may create incorrect reuse and hide backend problems. + +### 4. Should retain delivery be in-process async or delegated to a queue? + +For the first implementation, best-effort in-process dispatch is likely enough because the ledger is the real durability mechanism. + +### 5. Should the memory service read the ledger directly or through an API? + +Long-term, the stable read API is cleaner. + +Short-term, direct ledger reading may be acceptable for local prototypes. + +### 6. How should affect fit into the model? + +Affect is exactly the kind of advanced derived state that should remain backend-defined. + +Clawdapus should make it possible, not standardize it early. + +### 7. How should multi-agent sharing work? + +The first version should assume private per-agent recall by default. + +Shared or world memory should require explicit backend semantics and likely future policy controls for: + +- agent-private memory +- pod-shared memory +- operator-defined namespaces + +This is important, but not required to define the initial memory plane. + +### 8. What metadata can the proxy reliably provide for stitching? + +Today the proxy may not always have a canonical thread or session identifier across all runners and providers. + +The first version should therefore treat: + +- `agent_id` +- `pod` +- bounded recent messages +- whatever stable metadata is already present + +as the minimum recall input. + +Richer stitching metadata may require later surface-specific propagation through headers, request bodies, or runner config. + +--- + +## Non-Goals + +This plan does not propose: + +- replacing runner-native sessions +- collapsing portable memory into proxy-owned memory +- mandating one storage engine +- defining a canonical embedding model +- defining a canonical graph schema +- forcing all runners to adopt a common memory plugin +- making `cllama` itself a memory database +- exposing vendor-specific memory tools directly to agents by default + +--- + +## Decision Shape For A Future ADR + +If this plan is accepted, the future ADR should probably decide the following: + +1. Memory is a first-class Clawdapus plane with compile-time wiring. +2. `cllama` owns pre-turn recall orchestration and post-turn retain orchestration. +3. Session history remains the immutable ledger and source of truth. +4. Portable memory remains runner-owned and separate. +5. Memory intelligence lives in pluggable services, not in `cllama`. +6. Agents subscribe to one memory service relationship at a time. +7. Recall should optimize for derived durable state, not transcript tails. + +--- + +## Recommended Next Step + +The next document should likely be an ADR that: + +- cites ADR-018 and ADR-020 explicitly as prior art +- resolves the descriptor versioning question +- treats backfill as a first-class operation +- defines the fixed recall and retain wire contracts +- states clearly that the memory plane is for derived durable state, not transcript tails