From 60438186e1a42d7fe3fb50922aa27157eb018ebf Mon Sep 17 00:00:00 2001 From: Wojtek Date: Sun, 29 Mar 2026 22:26:28 -0400 Subject: [PATCH 01/18] docs(adr): add ADR-020 compiled tool mediation via cllama Defines how cllama mediates tool execution between LLM providers and pod services using MCP-shaped schemas delivered through provider-native tool calling. Reviewed across four rounds by three independent models (Codex, Gemini, Claude). --- .../020-cllama-compiled-tool-mediation.md | 491 ++++++++++++++++++ 1 file changed, 491 insertions(+) create mode 100644 docs/decisions/020-cllama-compiled-tool-mediation.md diff --git a/docs/decisions/020-cllama-compiled-tool-mediation.md b/docs/decisions/020-cllama-compiled-tool-mediation.md new file mode 100644 index 0000000..52a42b4 --- /dev/null +++ b/docs/decisions/020-cllama-compiled-tool-mediation.md @@ -0,0 +1,491 @@ +# ADR-020: Compiled Tool Plane with cllama-Mediated Compatibility Mode + +**Date:** 2026-03-29 +**Status:** Draft +**Depends on:** ADR-007 (Credential Starvation), ADR-017 (Service Self-Description), ADR-019 (Model Policy Authority) +**Amends:** ADR-018 (Session History) — extends the recording contract to include tool execution traces and failed requests +**Evolves:** ADR-004 (Service Surface Skills) — skills remain as behavioral guidance; tools add a callable interface + +## Context + +Clawdapus services self-describe via `claw.describe` (ADR-017). The descriptor declares feeds, endpoints, auth, and skill file paths. `claw up` compiles these into per-agent context: CLAWDAPUS.md documents available services, Anthropic-format skill files explain how to use them, and feeds deliver live data. The agent reads the documentation and constructs HTTP calls manually. + +This is fragile. Agents hallucinate endpoint paths, forget auth headers, confuse HTTP methods, and misformat request bodies. Documentation tells the agent *about* the service — it does not give the agent a *callable interface* to it. + +Three industry standards converge on this problem: + +- **MCP** (Model Context Protocol) defines structured tool schemas (`tools/list`, `tools/call`) with JSON-RPC execution — the mechanical interface standard. MCP's tool schema shape (`name`, `description`, `inputSchema`, `annotations`) is a clean, provider-agnostic way to describe callable capabilities. +- **Anthropic Skills** (markdown with YAML frontmatter) describe when and why to use tools — the behavioral guidance standard. Already implemented in Clawdapus via `internal/skillmd/format.go`. +- **Docker/OCI** provides image labels, container networking, and compose services — the deployment standard. + +Clawdapus should bridge these standards, not replace any of them. The descriptor is the Rosetta Stone. This ADR adopts MCP's tool schema shape as the capability description format and delivers tools through the provider-native tool-calling API (OpenAI `tools[]`, Anthropic `tools[]`). This is not MCP runtime adoption — there is no JSON-RPC transport, no `tools/call`, no MCP client in v1. It is a transitional compatibility layer that uses MCP's schema vocabulary while delivering capabilities through the mechanisms runners already consume. + +### Why not have runners connect to MCP directly? + +**Runner agnosticism.** Seven runners exist and they do not yet share a universal path for consuming pod-shared MCP/client configuration. Adding bespoke pod-shared capability loading to each scales linearly with the driver count. + +**Credential starvation.** ADR-007 withholds LLM API keys from runners. ADR-019 withholds model authority. Direct MCP access would give runners unmediated service access, breaking the governance chain. + +**Compile-time determinism.** MCP discovery is runtime. Clawdapus requires all wiring resolved during `claw up`. + +## Positions Evaluated + +Three architectures were evaluated across four rounds by three independent reviewers. All converged on Position A. + +### Position A: cllama owns tool mediation (selected compatibility mode) + +`claw.describe` v2 adds MCP-shaped tool schemas alongside the existing Anthropic skill. `claw up` compiles per-agent `tools.json` (filtered by tool policy) next to `feeds.json`. cllama injects tools into LLM requests, intercepts tool_calls, executes them against services, and loops until terminal text. Runners are unchanged. + +**For:** Zero runner changes. Extends credential starvation to tools. Follows the feed injection pattern exactly. Aligns with Manifesto Principle 7 (governance in a separate process). + +**Against:** cllama becomes stateful within a request lifecycle. Streaming requires non-streaming upstream when tools are injected. Adds complexity to the proxy. + +**Evaluation:** The statefulness is bounded (single HTTP request lifecycle, no cross-request state). The streaming trade-off is acceptable for a compatibility mode. The complexity follows the existing pattern — feeds already turn cllama from a passthrough into a context-aware proxy. Additive composition of runner-local and pod-shared tools remains the preferred steady state, but that requires the runner to own the combined tool loop. + +### Position B: MCP broker via claw-api (rejected) + +**Fatal objection:** Universal runner support for pod-shared MCP/client configuration would still be required. Conflates fleet governance with capability delivery. + +### Position C: Separate claw-mcp gateway (rejected) + +**Fatal objection:** Another infrastructure service per pod. Universal runner support for pod-shared MCP/client configuration would still be required. Runtime discovery violates compile-time determinism. + +Both reviewers who originally proposed B and C reversed their positions after examining the handler.go code paths and the runner agnosticism constraint. + +## Decision + +### 0. One capability IR, two delivery modes + +Clawdapus should have one canonical capability IR and two delivery modes: + +| Mode | Tool-loop owner | How pod-shared tools are delivered | Mixed local + pod tools | +|---|---|---|---| +| `mediated` | cllama | Provider-native `tools[]` injection from compiled `tools.json` | Not within one upstream tool round | +| `native` | Runner | Runner loads compiled MCP/client config additively with local tools | Yes, preferred steady state | + +The larger picture is additive composition. Pod-shared tools should sit alongside runner-local tools, not replace them. But additive composition is only protocol-safe when one client/executor owns the whole tool loop. That is naturally the runner in `native` mode. + +This ADR therefore does two things: + +- defines the canonical capability IR (`tools[]`, `feeds[]`, `skill`, `endpoints[]`) +- defines `mediated` mode as the bootstrap path for runners that cannot yet host pod-shared tools natively + +`native` mode is the intended steady state once runners can consume pod-shared MCP/client configuration **and** audit parity exists for pod-shared tool usage. `mediated` mode is the zero-runner-change compatibility layer that gets Clawdapus to governed service tools without waiting for every runner to catch up. + +### 1. Canonical capability IR via `claw.describe` v2 + +The descriptor defines four capability types. Each serves a distinct purpose and maps to an industry standard: + +| Descriptor field | Industry standard | Clawdapus role | Compiled artifact | +|---|---|---|---| +| `tools[]` | MCP tool schema shape | LLM-callable interface | `tools.json` | +| `skill` | Anthropic skill format | Behavioral guidance | `skills/.md` | +| `feeds[]` | (Clawdapus-native) | Live data delivery | `feeds.json` | +| `endpoints[]` | OpenAPI-adjacent | Operator documentation | CLAWDAPUS.md (when no tools) | + +**`tools[]` are the LLM-callable interface.** The tool schema uses MCP's visible shape (`name`, `description`, `inputSchema`, `annotations`). The only non-MCP addition is hidden execution metadata (`http`) for Clawdapus compilation — the LLM never sees it. + +**`endpoints[]` are operator documentation.** They describe the service's HTTP surface for human operators, `claw inspect`, and manual debugging. They are NOT used for LLM tool calling. When a service declares `tools[]`, its endpoint details are omitted from agent-facing `CLAWDAPUS.md` entirely. Agents interact through governed tools or not at all. Endpoint details remain available to operators through `claw inspect`, descriptor inspection, and other operator surfaces. Services that declare only `endpoints[]` and no `tools[]` continue to use the current manual-documentation path. + +This separation is critical: `tools[]` are the governed, credential-starved interface. `endpoints[]` are the ungoverned, human-readable reference. They may describe the same HTTP operations but serve different audiences and different trust models. + +This all-or-nothing suppression is intentional. Partial tool coverage is not an invitation for agents to fall back to manual HTTP on the remaining endpoints. If an operation should be agent-usable, it should be exposed as a governed tool. If it should remain human-only, it stays in operator-facing endpoint documentation. + +**Descriptor v2 example:** + +```json +{ + "version": 2, + "description": "Trading Desk API — broker connectivity, trade execution, and market context.", + "tools": [ + { + "name": "get_market_context", + "description": "Retrieve agent-scoped market context: positions, balance, buying power", + "inputSchema": { + "type": "object", + "properties": { + "claw_id": { "type": "string", "description": "Agent identifier" } + }, + "required": ["claw_id"] + }, + "http": { "method": "GET", "path": "/api/v1/market_context/{claw_id}" }, + "annotations": { "readOnly": true } + }, + { + "name": "execute_trade", + "description": "Execute a market order", + "inputSchema": { + "type": "object", + "properties": { + "symbol": { "type": "string" }, + "side": { "type": "string", "enum": ["buy", "sell"] }, + "quantity": { "type": "number" } + }, + "required": ["symbol", "side", "quantity"] + }, + "http": { "method": "POST", "path": "/api/v1/trades", "body": "json" }, + "annotations": { "readOnly": false } + } + ], + "feeds": [ + { "name": "market-context", "path": "/api/v1/market_context/{claw_id}", "ttl": 30 } + ], + "skill": "skills/trading-policy.md", + "auth": { "type": "bearer", "env": "TRADING_API_TOKEN" } +} +``` + +Note that `market-context` appears as both a feed and a tool. This is intentional: the feed delivers periodic context injection into the system prompt (the agent always has fresh market data), while the tool provides on-demand invocation (the agent explicitly requests context when it needs it for a specific decision). They complement each other — the feed ensures ambient awareness, the tool enables deliberate action. +``` + +**Tool annotations** use MCP's `annotations` field. `readOnly` distinguishes safe queries from side-effecting operations. This metadata is used by tool policy (below) and visible in `claw audit` output. Future annotations (`idempotent`, `confirmationRequired`) extend this without schema changes. + +**MCP-native services** bake their tool schemas into the image as a `.claw-tools.json` artifact (a snapshot of `tools/list` output). `claw up` reads this from the image like any other descriptor artifact — no live MCP connection during compilation. Live MCP discovery is a future `claw discover` command that updates baked schemas against a running pod. + +**Why baked, not live?** `claw up` resolves descriptors before containers start (`compose_up.go:344`). It cannot connect to services that don't exist yet. Requiring baked schemas maintains compile-time hermeticity and avoids bootstrap circular dependencies. + +### 2. Three-layer authority model + +Service access has three independent dimensions. Each is declared separately in pod YAML: + +| Layer | Declaration | What it controls | Default | +|---|---|---|---| +| **Topology** | `surfaces: [service://X]` | Network reachability between containers | No access | +| **Verb authority** | `tools: [{ service: X, allow: ... }]` | Which operations the LLM can invoke | No tools | +| **Credential scope** | Per-agent principal (ADR-015) | Whose identity is used for the call | Service-level auth | + +Declaring `service://X` grants network reachability. Tool access requires explicit `tools:` declaration. These are distinct: topology is transport, tools are verb authority, credentials are identity. + +```yaml +# claw-pod.yml +services: + analyst: + x-claw: + agent: agents/analyst + surfaces: + - service://trading-api # reachability + tools: + - service: trading-api # verb authority + allow: + - get_market_context # read-only access + + executor: + x-claw: + agent: agents/executor + surfaces: + - service://trading-api + tools: + - service: trading-api + allow: all # full access (explicit) +``` + +**No tools by default.** If `tools:` is omitted, no tools are compiled — even if the surface's descriptor declares them. This matches ADR-015's deny-by-default scoping model and prevents accidental exposure of destructive tools. + +This is compiled MGL-style policy applied at infrastructure time: the pod author declares which capabilities each agent role may access, and the compilation pipeline enforces it by emitting only the permitted tools into each agent's manifest. + +`tools:` is intentionally list-shaped so it composes cleanly with pod defaults and `...` spread. Each entry has: + +- `service`: the providing compose service +- `allow`: either `all` or a list of tool names + +After pod-default expansion, grants are normalized by service name: + +- `allow: all` wins for that service +- otherwise tool names are unioned + +**Pod defaults and spread.** `tools-defaults:` at pod level uses the same list shape. Service-level `tools:` follows the standard replace-on-declare rule, and `...` splices pod defaults into the service list before normalization. This keeps the external grammar aligned with the existing defaults model while still yielding a service-keyed compiled policy. + +### 3. Compiled `tools.json` follows the feed pipeline + +The tool compilation pipeline mirrors `feeds.json` exactly: + +| Step | Feeds (existing) | Tools (new) | +|---|---|---| +| Descriptor declares | `feeds[]` with name, path, TTL | `tools[]` with name, inputSchema, http | +| Registry built | `BuildFeedRegistry()` from descriptors | `BuildToolRegistry()` from descriptors | +| Policy filters | Feed subscription in pod YAML | `tools:` declaration in pod YAML | +| Manifest written | `feeds.json` in context dir | `tools.json` in context dir | +| Auth inlined | Bearer token in feed manifest entry | Bearer token in tool manifest entry | +| cllama loads | Feed fetcher reads `feeds.json` | Tool mediator reads `tools.json` | +| cllama uses | Inject feed data into system prompt | Inject tool schemas into `tools[]` | + +Auth is inlined into `tools.json` using the same resolution order as `feeds.json`: per-agent service-auth projection (ADR-015 principal scoping) takes precedence, falling back to descriptor-level auth from service environment. For `claw-api` tools, `cllama` first authenticates the caller using the agent bearer token, then executes the tool using the projected `claw-api` principal credential from `service-auth/`. The ingress bearer token and the downstream service principal remain distinct. + +`claw-api` follows this ADR as a normal self-describing service. Its tools are declared through the same `tools[]` IR, gated by the same `tools:` policy, and authenticated through the same projected service-principal path. Existing `claw-api: self` wiring remains a credential-projection convenience, not a grant of verb authority. Write-plane verbs remain subject to both tool allowlisting and ADR-015 principal scope. + +**Compiled manifest at `/claw/context//tools.json`:** + +```json +{ + "version": 1, + "tools": [ + { + "name": "trading-api.get_market_context", + "description": "Retrieve agent-scoped market context", + "inputSchema": { "..." : "..." }, + "annotations": { "readOnly": true }, + "execution": { + "transport": "http", + "service": "trading-api", + "base_url": "http://trading-api:4000", + "method": "GET", + "path": "/api/v1/market_context/{claw_id}", + "auth": { "type": "bearer", "token": "resolved-token-value" } + } + } + ], + "policy": { + "max_rounds": 8, + "timeout_per_tool_ms": 30000, + "total_timeout_ms": 120000 + } +} +``` + +The manifest separates LLM-facing schema (name, description, inputSchema) from execution metadata (transport, URL, auth). The LLM sees only the schema. cllama uses the execution metadata to make HTTP calls. The agent never learns the service URL or credential. + +**Namespacing** is mandatory. The compiled manifest prefixes tool names with the service name (`trading-api.get_market_context`). The descriptor stays service-agnostic; namespacing is applied at compile time. This prevents collisions when multiple services expose tools with the same base name. + +**Path placeholders.** `{claw_id}` in HTTP paths is substituted by cllama at execution time using the authenticated agent's identity (already resolved via bearer token). Other placeholders (`{param}`) are substituted from the tool call's `arguments` object. + +### 4. Mediated mode: cllama injection, interception, execution + +This section defines `mediated` mode only. In `native` mode, the runner owns tool presentation and execution and combines pod-shared tools with its own local tools additively. + +In `mediated` mode, cllama gains the ability to inject tools into LLM requests and execute tool_calls transparently. This extends the existing pattern: + +| Capability | Declaration | Compiled artifact | Runtime enforcement | +|---|---|---|---| +| LLM access | API keys in `.env` | `providers.json` | cllama key pool (ADR-007) | +| Model selection | `MODEL` in Clawfile | `model_policy` in metadata | cllama policy enforcement (ADR-019) | +| Data context | `feeds` in descriptor | `feeds.json` | cllama fetcher + injection | +| **Service tools** | **`tools` in descriptor** | **`tools.json`** | **cllama injection + execution** | + +#### Tool injection + +When `tools.json` is loaded for an agent in `mediated` mode, cllama becomes the sole upstream tool presenter for that request. It replaces the outgoing request's `tools[]` field with managed tools only (LLM-facing schemas only). Managed tools are namespaced as `.` (e.g., `trading-api.get_market_context`), which distinguishes them from runner-native tools when logs or transcripts are inspected. + +Additive composition of runner-local and pod-shared tools belongs to `native` mode, where the runner is the sole tool client. `mediated` mode is intentionally narrower: pod-shared tools are executed by cllama, and the upstream tool round is treated as cllama-owned. + +#### Streaming behavior + +When cllama injects managed tools, it forces `stream: false` on the upstream LLM request. This prevents partial text from being flushed to the runner before a tool_call is detected. If the runner originally requested streaming, cllama re-streams the final text response as synthetic SSE chunks after the tool chain completes. + +If the downstream client requested streaming, cllama SHOULD keep the downstream HTTP stream alive during mediation with harmless SSE keepalive or progress comments. These are transport-level liveness signals, not synthetic assistant tokens. The goal is to prevent the runner UI from appearing hung while cllama executes hidden tool rounds. + +Requests where cllama has NO managed tools to inject are unaffected — streaming passes through normally. + +**Why not speculative streaming?** Detecting tool_calls mid-stream requires parsing provider-specific SSE chunk formats, buffering partial JSON, and handling edge cases where tool_calls arrive late. The complexity couples cllama to provider serialization details. Forcing non-streaming is simple, correct, and provider-agnostic. The latency cost (no token streaming during tool-augmented requests) is acceptable for chat agents, which are the primary tool consumers. + +#### Response handling: single executor per response + +A fundamental constraint: when the LLM returns tool_calls, the protocol requires results for ALL calls before it will continue. Two independent executors (cllama + runner) cannot both fulfill a single response's tool_calls without one fabricating results for the other's tools. Fabricated results let the LLM reason over output that never happened. + +The right way to support mixed local + pod tools is `native` mode, where the runner owns the full loop. `mediated` mode cannot safely provide transparent mixed execution. + +**v1 rule: `mediated` mode is request-scoped exclusive.** When cllama is acting as the tool executor for a request, runner-local tools are not combined into that upstream tool round. If the upstream response nevertheless contains unexpected non-managed tool calls, this is a defensive fallback path: either the model hallucinated a tool name or a client/request mismatch leaked an unexpected tool reference. cllama refuses execution of these calls and feeds structured errors back to the LLM within the mediated loop (see below), giving the model a chance to re-emit only managed tools or respond in text. + +**If the response contains managed tool_calls only:** +1. cllama validates each call against the manifest (reject unknown tools — fail closed) +2. Executes managed tools sequentially against target services +3. Constructs a follow-up LLM request with tool results appended +4. Repeats until the LLM returns terminal text +5. Returns the final response to the runner + +**If the response contains non-managed tool_calls in `mediated` mode:** +- Treat them as invalid for this request and feed back structured tool errors inside the mediated loop +- The error message should be prescriptive: `This request is in mediated mode. Action required: re-emit only managed service tools for this turn, or respond in text.` + +**If the response contains only text:** +- Return directly (or re-stream if the runner requested streaming). + +This single-executor model handles the common cases cleanly: +- Service-only tool chains: cllama handles transparently, runner sees text +- Native additive tool chains: runner handles both local and pod-shared tools in `native` mode +- Mixed batches in `mediated` mode: refuse execution, feed errors back + +**Future:** `native` mode is the preferred additive path. Any later two-phase mediated execution would require an explicit runner-side protocol extension and is not the architectural target. + +#### Transcript continuity across turns + +`mediated` mode creates a hidden tool loop. Returning only terminal text to the runner is not enough, because the runner's local conversation history will not include the intermediate assistant/tool turns that produced that text. On the next user turn, the runner may send an incomplete transcript back to cllama. + +This mode therefore requires a continuity shim. Session history alone is not sufficient because it is an audit record, not part of the live prompt path. `mediated` mode MUST preserve effective tool-round context across turns using one of these strategies: + +1. **Transcript reflection.** If the runner/protocol can accept it, cllama returns the effective assistant/tool transcript in provider-native form so the runner stores the mediated turns locally. This is the preferred v1 strategy because it keeps continuity in the runner's own transcript. +2. **Continuity summary.** Otherwise, cllama persists a compact summary of the mediated tool rounds and injects that summary into the next request before forwarding upstream. + +The exact mechanism is an implementation choice, but the requirement is architectural: hidden tool rounds must not disappear between user turns. + +#### Error handling + +Tool execution errors are fed back to the LLM as structured results, not returned to the runner: +```json +{ + "role": "tool", + "tool_call_id": "call_abc", + "content": "{\"ok\": false, \"error\": {\"code\": \"timeout\", \"message\": \"Service did not respond within 30s\"}}" +} +``` +The LLM decides how to communicate the failure. If cllama itself fails (internal error, budget exhaustion), it returns `502` to the runner. No partial text is sent because non-streaming prevents premature flushing. + +#### Budget and timeouts + +- `max_rounds` (default 8): Maximum tool loop iterations per request. Prevents infinite loops. +- `timeout_per_tool_ms` (default 30,000): Per-tool execution timeout. +- `total_timeout_ms` (default 120,000): Total chain timeout including all LLM calls and tool executions. +- `max_tool_result_bytes` (default 16,384): Tool results exceeding this are truncated with a notice, preventing context window exhaustion. + +Truncation MUST be explicit in the structured result so the model does not reason over partial data as if it were complete. Minimum shape: + +```json +{ + "ok": true, + "data": "...first bytes...", + "truncated": true, + "original_bytes": 52000 +} +``` + +All configurable at pod level, compiled into `tools.json`. + +### 5. Boundary: what cllama does and does not do + +**cllama manages:** Tools from `tools.json` — injection, interception, execution, auth, audit. + +**cllama does NOT manage:** +- **Runner-native tools.** Shell, file ops, send_message, browser remain runner-owned. Additive composition with pod-shared tools happens in `native` mode, not inside the `mediated` tool loop. +- **Dynamic discovery.** The manifest is static. No runtime `tools/list`. No tool registration. +- **Cognitive decisions.** The LLM chooses which tools to call. cllama is a mechanical executor. +- **General cross-request state.** Each request gets a fresh tool loop. The only exception is bounded continuity state required to preserve mediated tool-round context across turns. + +cllama is a **tool mediator**, not an agent framework. It extends the proxy's existing intercept-enforce-forward pattern to a new dimension. + +### 6. Skills and tools: complementary, not competing + +Skills and tools are sibling concepts from the same descriptor. A service emits both: + +| Concept | Standard | Format | Audience | Example | +|---|---|---|---|---| +| Tool | MCP tool schema | JSON Schema | LLM function calling | `execute_trade(symbol, side, qty)` | +| Skill | Anthropic skill | Markdown + YAML frontmatter | Agent context | "Check risk limits before trading" | +| Feed | Clawdapus-native | JSON manifest | cllama system prompt injection | Market data every 30s | + +The skill says *when and why*. The tool provides *how*. The feed delivers *what's happening now*. + +For v1, one skill file per service (not per tool). If a service exposes five tools, the skill can have five sections. `claw up` continues to mount skills at `/claw/skills/` and reference them in CLAWDAPUS.md. CLAWDAPUS.md gains a `## Tools` section listing available tool names and descriptions. + +### 7. Session history and audit (amends ADR-018) + +ADR-018 defines session history as successful 2xx completions recorded to `history.jsonl`. This ADR extends that contract in two ways: (1) tool-mediated requests record a `tool_trace` capturing each execution round, and (2) failed tool executions are also recorded, since tool failures are the most important events to audit. The recorder gains a `status` field (`"ok"` or `"error"`) to distinguish successful from failed entries. + +Session history expands with a `tool_trace` field: + +```json +{ + "agent_id": "analyst-0", + "timestamp": "2026-03-29T14:30:00Z", + "model": "anthropic/claude-sonnet-4", + "request": { "messages": ["..."] }, + "response": { "content": "Your portfolio shows..." }, + "usage": { "prompt_tokens": 2000, "completion_tokens": 400, "total_rounds": 2 }, + "tool_trace": [ + { + "round": 1, + "tool_calls": [ + { + "name": "trading-api.get_market_context", + "arguments": { "claw_id": "analyst-0" }, + "result": { "ok": true, "data": { "balance": 50000 } }, + "latency_ms": 120, + "service": "trading-api" + } + ], + "round_usage": { "prompt_tokens": 800, "completion_tokens": 200 } + } + ] +} +``` + +`usage` aggregates ALL LLM calls in the chain (the runner's bill). `tool_trace` captures each round for audit. + +### 8. MCP primitives: analogous projection, not equivalence + +MCP defines three primitives. Each has an analogous Clawdapus concept, but the semantics differ: + +| MCP primitive | MCP semantics | Clawdapus analogue | Difference | +|---|---|---|---| +| Tools | Model-invoked, JSON-RPC execution | `tools[]` → `tools.json` | Same intent. Clawdapus uses provider-native tool calling, not JSON-RPC. | +| Resources | Application-controlled, URI-addressed data | `feeds[]` → `feeds.json` | Feeds are auto-injected context with TTL. MCP resources are on-demand and client-fetched. | +| Prompts | User-invoked templates with arguments | `skill` → mounted skill files | Skills are ambient behavioral guidance. MCP prompts are explicit user actions. | + +These are analogous projections, not semantic equivalents. Clawdapus covers similar ground through its own mechanisms, optimized for compile-time determinism and proxy-mediated delivery. This ADR adds the last capability type (callable tools) using MCP's schema vocabulary, so that future MCP interop is a transport change, not a schema rewrite. + +## Future Extensions + +**Graduation path.** `native` mode is the preferred steady state only if audit parity exists. When a runner can consume pod-shared MCP/client configuration, `claw up` generates that config and the runner merges pod-shared tools additively with its local tool set. Native mode does not graduate on runner capability alone; pod-shared tool execution must remain auditable. The preferred audit strategy is a Clawdapus-owned MCP proxy/broker for pod-shared tools, so runners gain additive composition without giving up observable service-tool traffic. + +**Dynamic tool context.** Context-sensitive filtering: time-of-day policies, alert-driven restriction, session-scoped escalation. Reads from cllama's existing context loader. + +**Parallel tool execution.** `parallel_safe: true` annotation on tools. Concurrent execution via goroutine pool. + +**Live MCP discovery.** `claw discover` command connects to a running pod's MCP services and updates baked tool schemas. Development-time convenience, not a compilation dependency. + +## Consequences + +**Positive:** +- Agents gain reliable, structured service interaction via MCP-shaped tool schemas delivered through provider-native tool calling. +- `mediated` mode extends credential starvation to service tools. Agent-facing endpoint details are omitted for services that declare managed tools. +- One canonical capability IR supports both zero-runner-change mediation and future additive native composition with audit parity. +- Uses MCP schema vocabulary while preserving Clawdapus's compile-time determinism and proxy-mediated delivery. +- Follows the exact same architectural pattern as feeds: declare → compile → inject. + +**Negative:** +- cllama gains complexity: tool injection, execution loop, and response coordination. +- Non-streaming upstream for tool-augmented requests adds latency. +- `mediated` mode cannot transparently mix runner-local and pod-shared tools in one upstream tool round. +- `native` mode remains contingent on an auditable pod-shared transport path; runner capability alone is insufficient. + +**Neutral:** +- `claw.describe` v1 descriptors work unchanged. The `version: 2` field gates new behavior. +- Existing skills continue to function with their role clarified as behavioral guidance. + +## Implementation + +### Phase 1: Compile-time plumbing +- Add `Tools []ToolDescriptor` and `Annotations` to `internal/describe/descriptor.go` +- Add `buildToolManifestEntries` in `compose_up.go` (beside `buildFeedManifestEntries`) +- Reuse URL synthesis (`compose_up.go:996`) and bearer auth projection (`compose_up.go:977`) +- Add list-shaped `tools:` / `tools-defaults:` policy parsing with explicit opt-in semantics +- Write per-agent `tools.json` to context directory (`internal/cllama/context.go`) +- Project tool names into CLAWDAPUS.md `## Tools` section +- Unit tests alongside existing `compose_up_test.go` and `context_test.go` + +### Phase 2: cllama tool injection + non-streaming mediation (OpenAI) + +This is the critical path — all architectural complexity concentrates here. Consider subdividing into 2a (injection + single-round execution) and 2b (multi-round loop + continuity + re-streaming). + +- Load `tools.json` in cllama agent context loader (`agentctx`) +- Replace outgoing request's `tools[]` with managed tools in `handleOpenAI` +- Force `stream: false` when managed tools are injected +- Detect `tool_calls` in response, execute managed tools via HTTP +- Implement tool execution loop with budget, timeouts, and result truncation +- Return final text to runner (re-stream if needed) +- Implement transcript continuity (prefer transcript reflection for v1) +- Record `tool_trace` in session history + +### Phase 3: Anthropic parity and polish +- Implement Anthropic-format tool handling (`tool_use` / `tool_result` content blocks) +- Unified tool execution path for both API formats +- `claw audit` tool usage reporting + +### Phase 4: MCP execution transport +- Implement MCP client in cllama for downstream `tools/call` execution +- Support baked `.claw-tools.json` from MCP-native images +- `claw discover` command for live MCP schema updates + +### Phase 5: Native additive mode and graduation +- `parallel_safe` annotation and concurrent tool execution +- Dynamic tool filtering (time-based, alert-driven) +- Generate runner-side MCP/client config for pod-shared tools +- Define audit path for `native` mode tool execution From f44ada02507f283a96927007beed5b1220eaa43c Mon Sep 17 00:00:00 2001 From: Wojtek Date: Mon, 30 Mar 2026 16:00:52 -0400 Subject: [PATCH 02/18] Refine ADR-020 execution and identity model --- .../020-cllama-compiled-tool-mediation.md | 181 ++++++++++++------ 1 file changed, 119 insertions(+), 62 deletions(-) diff --git a/docs/decisions/020-cllama-compiled-tool-mediation.md b/docs/decisions/020-cllama-compiled-tool-mediation.md index 52a42b4..1644b91 100644 --- a/docs/decisions/020-cllama-compiled-tool-mediation.md +++ b/docs/decisions/020-cllama-compiled-tool-mediation.md @@ -1,4 +1,4 @@ -# ADR-020: Compiled Tool Plane with cllama-Mediated Compatibility Mode +# ADR-020: Compiled Tool Plane with Native and Mediated Execution Modes **Date:** 2026-03-29 **Status:** Draft @@ -18,29 +18,44 @@ Three industry standards converge on this problem: - **Anthropic Skills** (markdown with YAML frontmatter) describe when and why to use tools — the behavioral guidance standard. Already implemented in Clawdapus via `internal/skillmd/format.go`. - **Docker/OCI** provides image labels, container networking, and compose services — the deployment standard. -Clawdapus should bridge these standards, not replace any of them. The descriptor is the Rosetta Stone. This ADR adopts MCP's tool schema shape as the capability description format and delivers tools through the provider-native tool-calling API (OpenAI `tools[]`, Anthropic `tools[]`). This is not MCP runtime adoption — there is no JSON-RPC transport, no `tools/call`, no MCP client in v1. It is a transitional compatibility layer that uses MCP's schema vocabulary while delivering capabilities through the mechanisms runners already consume. +Clawdapus should bridge these standards, not replace any of them. The descriptor is the Rosetta Stone. This ADR adopts MCP's tool schema shape as the capability description format and supports two execution models: -### Why not have runners connect to MCP directly? +- **Native execution.** Clawdapus governs which tools are presented, but the runner executes them using whatever authentication and network identity the included surface already supports. +- **Mediated execution.** Clawdapus governs both presentation and execution, but only for Clawdapus-compatible services or services that explicitly trust a delegated Clawdapus broker/assertion path. + +This is not MCP runtime adoption in v1 — there is no required JSON-RPC transport, no mandatory `tools/call`, and no universal MCP client. It is a transitional compatibility layer that uses MCP's schema vocabulary while delivering capabilities through the mechanisms runners and services can realistically support today. + +The central architectural distinction is: + +- **Governance identity** answers "which policy applies to this agent?" +- **Execution identity** answers "what credential or network identity does the backend actually authorize?" + +Clawdapus always owns governance identity. It does **not** automatically own execution identity. Presentation governance is universal. Rights projection is conditional on the execution path. + +### Why not make runner-native MCP the only v1 path? **Runner agnosticism.** Seven runners exist and they do not yet share a universal path for consuming pod-shared MCP/client configuration. Adding bespoke pod-shared capability loading to each scales linearly with the driver count. -**Credential starvation.** ADR-007 withholds LLM API keys from runners. ADR-019 withholds model authority. Direct MCP access would give runners unmediated service access, breaking the governance chain. +**Execution identity is surface-specific.** On customer infrastructure, runners may legitimately need to execute against AD, mTLS, OAuth, or service-account protected surfaces without Clawdapus brokering those credentials. That is a valid native model, but it does not eliminate the need for a compatibility path when runners cannot yet host pod-shared tools natively. **Compile-time determinism.** MCP discovery is runtime. Clawdapus requires all wiring resolved during `claw up`. ## Positions Evaluated -Three architectures were evaluated across four rounds by three independent reviewers. All converged on Position A. +Three architectures were evaluated across four rounds by three independent reviewers. The consistent conclusion was: + +- `native` execution is the cleaner steady-state architecture when runners can host pod-shared tools and the backend's own auth model should remain authoritative. +- `mediated` execution is the only zero-runner-change compatibility path for governed pod-shared tools across the current runner set. ### Position A: cllama owns tool mediation (selected compatibility mode) `claw.describe` v2 adds MCP-shaped tool schemas alongside the existing Anthropic skill. `claw up` compiles per-agent `tools.json` (filtered by tool policy) next to `feeds.json`. cllama injects tools into LLM requests, intercepts tool_calls, executes them against services, and loops until terminal text. Runners are unchanged. -**For:** Zero runner changes. Extends credential starvation to tools. Follows the feed injection pattern exactly. Aligns with Manifesto Principle 7 (governance in a separate process). +**For:** Zero runner changes. Gives Clawdapus an auditable compatibility path for governed tools. Follows the feed injection pattern exactly. Aligns with Manifesto Principle 7 (governance in a separate process). **Against:** cllama becomes stateful within a request lifecycle. Streaming requires non-streaming upstream when tools are injected. Adds complexity to the proxy. -**Evaluation:** The statefulness is bounded (single HTTP request lifecycle, no cross-request state). The streaming trade-off is acceptable for a compatibility mode. The complexity follows the existing pattern — feeds already turn cllama from a passthrough into a context-aware proxy. Additive composition of runner-local and pod-shared tools remains the preferred steady state, but that requires the runner to own the combined tool loop. +**Evaluation:** The statefulness is bounded (single HTTP request lifecycle, with only a narrow continuity exception across turns). The streaming trade-off is acceptable for a compatibility mode. The complexity follows the existing pattern — feeds already turn cllama from a passthrough into a context-aware proxy. Additive composition of runner-local and pod-shared tools remains the preferred steady state, but that requires the runner to own the combined tool loop. ### Position B: MCP broker via claw-api (rejected) @@ -58,27 +73,33 @@ Both reviewers who originally proposed B and C reversed their positions after ex Clawdapus should have one canonical capability IR and two delivery modes: -| Mode | Tool-loop owner | How pod-shared tools are delivered | Mixed local + pod tools | -|---|---|---|---| -| `mediated` | cllama | Provider-native `tools[]` injection from compiled `tools.json` | Not within one upstream tool round | -| `native` | Runner | Runner loads compiled MCP/client config additively with local tools | Yes, preferred steady state | +| Mode | What Clawdapus governs | Tool-loop owner | How pod-shared tools are delivered | Backend auth path | +|---|---|---|---|---| +| `native` | Presentation, policy, audit hooks | Runner | Runner loads compiled pod-shared tool/MCP/client config additively with local tools | Whatever auth and network identity the included surface already supports | +| `mediated` | Presentation, policy, execution | cllama or a Clawdapus-owned broker | Provider-native `tools[]` injection from compiled `tools.json` | Delegated service credential or explicit trust in a Clawdapus-compatible broker/assertion path | -The larger picture is additive composition. Pod-shared tools should sit alongside runner-local tools, not replace them. But additive composition is only protocol-safe when one client/executor owns the whole tool loop. That is naturally the runner in `native` mode. +The larger picture is additive composition. Pod-shared tools should sit alongside runner-local tools, not replace them. That is naturally the runner in `native` mode, where one client/executor owns the whole tool loop and backend authorization remains the backend's problem. This ADR therefore does two things: - defines the canonical capability IR (`tools[]`, `feeds[]`, `skill`, `endpoints[]`) -- defines `mediated` mode as the bootstrap path for runners that cannot yet host pod-shared tools natively +- defines two projections of that IR: native presentation/execution and mediated presentation/execution + +Three rules follow from this split: + +- **Presentation governance is universal.** Clawdapus decides which tools an agent can see. +- **Execution mediation is optional.** Native execution is valid when the runner should execute directly against the included surface. +- **Rights projection is conditional.** Clawdapus only projects backend rights when the execution path actually carries delegated credentials or a trusted Clawdapus assertion. -`native` mode is the intended steady state once runners can consume pod-shared MCP/client configuration **and** audit parity exists for pod-shared tool usage. `mediated` mode is the zero-runner-change compatibility layer that gets Clawdapus to governed service tools without waiting for every runner to catch up. +`native` mode is the default execution model and the intended steady state once runners can consume pod-shared MCP/client configuration **and** audit parity exists for pod-shared tool usage. `mediated` mode is the compatibility layer for runners that cannot yet host pod-shared tools natively, and for Clawdapus-compatible services where full mediated execution is desirable. ### 1. Canonical capability IR via `claw.describe` v2 The descriptor defines four capability types. Each serves a distinct purpose and maps to an industry standard: -| Descriptor field | Industry standard | Clawdapus role | Compiled artifact | +| Descriptor field | Industry standard | Clawdapus role | Compiled projection | |---|---|---|---| -| `tools[]` | MCP tool schema shape | LLM-callable interface | `tools.json` | +| `tools[]` | MCP tool schema shape | LLM-callable interface | `tools.json` (`mediated`) or runner-side tool/client config (`native`) | | `skill` | Anthropic skill format | Behavioral guidance | `skills/.md` | | `feeds[]` | (Clawdapus-native) | Live data delivery | `feeds.json` | | `endpoints[]` | OpenAPI-adjacent | Operator documentation | CLAWDAPUS.md (when no tools) | @@ -87,7 +108,7 @@ The descriptor defines four capability types. Each serves a distinct purpose and **`endpoints[]` are operator documentation.** They describe the service's HTTP surface for human operators, `claw inspect`, and manual debugging. They are NOT used for LLM tool calling. When a service declares `tools[]`, its endpoint details are omitted from agent-facing `CLAWDAPUS.md` entirely. Agents interact through governed tools or not at all. Endpoint details remain available to operators through `claw inspect`, descriptor inspection, and other operator surfaces. Services that declare only `endpoints[]` and no `tools[]` continue to use the current manual-documentation path. -This separation is critical: `tools[]` are the governed, credential-starved interface. `endpoints[]` are the ungoverned, human-readable reference. They may describe the same HTTP operations but serve different audiences and different trust models. +This separation is critical: `tools[]` are the governed, model-callable interface. `endpoints[]` are the ungoverned, human-readable reference. They may describe the same HTTP operations but serve different audiences and different trust models. This all-or-nothing suppression is intentional. Partial tool coverage is not an invitation for agents to fall back to manual HTTP on the remaining endpoints. If an operation should be agent-usable, it should be exposed as a governed tool. If it should remain human-only, it stays in operator-facing endpoint documentation. @@ -144,17 +165,18 @@ Note that `market-context` appears as both a feed and a tool. This is intentiona **Why baked, not live?** `claw up` resolves descriptors before containers start (`compose_up.go:344`). It cannot connect to services that don't exist yet. Requiring baked schemas maintains compile-time hermeticity and avoids bootstrap circular dependencies. -### 2. Three-layer authority model +### 2. Authority and identity model -Service access has three independent dimensions. Each is declared separately in pod YAML: +Service access has four independent dimensions. The first three are declared in pod YAML; the fourth depends on the execution mode: -| Layer | Declaration | What it controls | Default | +| Layer | Declaration / source | What it controls | Default | |---|---|---|---| | **Topology** | `surfaces: [service://X]` | Network reachability between containers | No access | | **Verb authority** | `tools: [{ service: X, allow: ... }]` | Which operations the LLM can invoke | No tools | -| **Credential scope** | Per-agent principal (ADR-015) | Whose identity is used for the call | Service-level auth | +| **Governance identity** | Agent bearer + context metadata | Which Clawdapus policy applies | Authenticated caller only | +| **Execution identity** | Runner-native backend auth (`native`) or projected credential / trusted broker (`mediated`) | What the backend actually authorizes | Surface-specific auth model | -Declaring `service://X` grants network reachability. Tool access requires explicit `tools:` declaration. These are distinct: topology is transport, tools are verb authority, credentials are identity. +Declaring `service://X` grants network reachability. Tool access requires explicit `tools:` declaration. These are distinct: topology is transport, tools are verb authority, governance identity decides what Clawdapus presents, and execution identity decides what the backend accepts. ```yaml # claw-pod.yml @@ -183,6 +205,10 @@ services: This is compiled MGL-style policy applied at infrastructure time: the pod author declares which capabilities each agent role may access, and the compilation pipeline enforces it by emitting only the permitted tools into each agent's manifest. +**Native mode keeps backend auth native.** In `native` mode, Clawdapus does not rewrite or terminate backend authentication. The runner executes against the included surface using whatever execution identity that surface already supports: Active Directory, mTLS, OAuth, service accounts, customer-specific credentials, or any other existing scheme. Clawdapus governs visibility and intent; the backend still enforces rights. + +**Mediated mode requires a real trust path.** In `mediated` mode, Clawdapus may execute the tool on the agent's behalf, but only when the surface is Clawdapus-compatible or explicitly trusts delegated credentials or brokered identity assertions. An `X-Claw-ID` header or authenticated caller identity is not enough by itself. If the backend does not trust a projected Clawdapus path, then `mediated` mode can govern presentation but not honestly claim end-to-end rights projection. + `tools:` is intentionally list-shaped so it composes cleanly with pod defaults and `...` spread. Each entry has: - `service`: the providing compose service @@ -195,25 +221,30 @@ After pod-default expansion, grants are normalized by service name: **Pod defaults and spread.** `tools-defaults:` at pod level uses the same list shape. Service-level `tools:` follows the standard replace-on-declare rule, and `...` splices pod defaults into the service list before normalization. This keeps the external grammar aligned with the existing defaults model while still yielding a service-keyed compiled policy. -### 3. Compiled `tools.json` follows the feed pipeline +### 3. Compiled projections from one IR + +The tool compilation pipeline mirrors the feed pipeline at the registry and policy layers, then forks into mode-specific projections: -The tool compilation pipeline mirrors `feeds.json` exactly: +| Step | Feeds (existing) | Tools (`native`) | Tools (`mediated`) | +|---|---|---|---| +| Descriptor declares | `feeds[]` with name, path, TTL | `tools[]` with name, inputSchema, http | `tools[]` with name, inputSchema, http | +| Registry built | `BuildFeedRegistry()` from descriptors | `BuildToolRegistry()` from descriptors | `BuildToolRegistry()` from descriptors | +| Policy filters | Feed subscription in pod YAML | `tools:` declaration in pod YAML | `tools:` declaration in pod YAML | +| Artifact written | `feeds.json` in context dir | Runner-side tool/MCP/client config | `tools.json` in context dir | +| Auth handling | Feed auth manifest | Surface-native auth remains external | Projected auth may be inlined for trusted mediated execution | +| Runtime consumer | cllama feed fetcher | Runner tool host / MCP client / native loader | cllama mediator | + +The canonical IR is shared. What changes by mode is the execution projection. -| Step | Feeds (existing) | Tools (new) | -|---|---|---| -| Descriptor declares | `feeds[]` with name, path, TTL | `tools[]` with name, inputSchema, http | -| Registry built | `BuildFeedRegistry()` from descriptors | `BuildToolRegistry()` from descriptors | -| Policy filters | Feed subscription in pod YAML | `tools:` declaration in pod YAML | -| Manifest written | `feeds.json` in context dir | `tools.json` in context dir | -| Auth inlined | Bearer token in feed manifest entry | Bearer token in tool manifest entry | -| cllama loads | Feed fetcher reads `feeds.json` | Tool mediator reads `tools.json` | -| cllama uses | Inject feed data into system prompt | Inject tool schemas into `tools[]` | +In `native` mode, Clawdapus compiles the allowed tool catalog into whatever runner-side configuration is needed to present pod-shared tools alongside local tools. Clawdapus does not need to inline bearer tokens or terminate backend auth in that path. The runner executes against the included surface using its native execution identity, and the surface's own auth scheme remains authoritative. -Auth is inlined into `tools.json` using the same resolution order as `feeds.json`: per-agent service-auth projection (ADR-015 principal scoping) takes precedence, falling back to descriptor-level auth from service environment. For `claw-api` tools, `cllama` first authenticates the caller using the agent bearer token, then executes the tool using the projected `claw-api` principal credential from `service-auth/`. The ingress bearer token and the downstream service principal remain distinct. +In `mediated` mode, Clawdapus writes `/claw/context//tools.json` as the execution manifest for cllama or a Clawdapus-owned broker. This is the mode that mirrors `feeds.json` most closely because the proxy is both the runtime consumer and the execution point. + +Auth is only inlined into `tools.json` for `mediated` execution, using the same resolution order as `feeds.json`: per-agent service-auth projection (ADR-015 principal scoping) takes precedence, falling back to descriptor-level auth from service environment when that fallback is actually valid for the target surface. For `claw-api` tools, cllama first authenticates the caller using the agent bearer token, then executes the tool using the projected `claw-api` principal credential from `service-auth/`. The ingress bearer token and the downstream service principal remain distinct. `claw-api` follows this ADR as a normal self-describing service. Its tools are declared through the same `tools[]` IR, gated by the same `tools:` policy, and authenticated through the same projected service-principal path. Existing `claw-api: self` wiring remains a credential-projection convenience, not a grant of verb authority. Write-plane verbs remain subject to both tool allowlisting and ADR-015 principal scope. -**Compiled manifest at `/claw/context//tools.json`:** +**`mediated` manifest at `/claw/context//tools.json`:** ```json { @@ -242,15 +273,31 @@ Auth is inlined into `tools.json` using the same resolution order as `feeds.json } ``` -The manifest separates LLM-facing schema (name, description, inputSchema) from execution metadata (transport, URL, auth). The LLM sees only the schema. cllama uses the execution metadata to make HTTP calls. The agent never learns the service URL or credential. +The mediated manifest separates LLM-facing schema (name, description, inputSchema) from execution metadata (transport, URL, auth). The LLM sees only the schema. cllama uses the execution metadata to make HTTP calls. This path hides service URL and credential details from the agent because Clawdapus is the executor. **Namespacing** is mandatory. The compiled manifest prefixes tool names with the service name (`trading-api.get_market_context`). The descriptor stays service-agnostic; namespacing is applied at compile time. This prevents collisions when multiple services expose tools with the same base name. -**Path placeholders.** `{claw_id}` in HTTP paths is substituted by cllama at execution time using the authenticated agent's identity (already resolved via bearer token). Other placeholders (`{param}`) are substituted from the tool call's `arguments` object. +**Path placeholders.** `{claw_id}` in HTTP paths is substituted at execution time using the authenticated agent identity for mediated calls, or by the runner-side tool host in native mode. Other placeholders (`{param}`) are substituted from the tool call's `arguments` object. + +### 4. Native mode: presentation-governed, runner-executed + +`native` mode is the default execution model. Clawdapus compiles and filters the pod-shared tool catalog, but the runner owns the tool loop and executes tools additively with its own local tools. + +This is the right model when: + +- the surface already has its own enterprise auth model +- the runner should act under a customer-managed execution identity +- Clawdapus should govern cognition and presentation without becoming an auth broker -### 4. Mediated mode: cllama injection, interception, execution +Examples include Active Directory, mTLS, OAuth, service accounts, and other customer-specific infrastructure auth schemes. In this mode, Clawdapus does not claim end-to-end rights projection. It governs what the model can see and ask for; the backend still governs what actually runs. -This section defines `mediated` mode only. In `native` mode, the runner owns tool presentation and execution and combines pod-shared tools with its own local tools additively. +Audit in `native` mode is required for graduation but may be indirect at first. A runner can load pod-shared tools natively before audit parity exists, but Clawdapus should not treat that path as governance-complete until tool execution remains observable through a broker, proxy, or equivalent telemetry path. + +### 5. Mediated mode: cllama injection, interception, execution + +This section defines `mediated` mode only. `Mediated` execution is the compatibility path for unchanged runners and the full-governance path for Clawdapus-compatible services. + +Here, **Clawdapus-compatible** means the backend either accepts projected service principals generated by `claw up` or explicitly trusts a Clawdapus broker/assertion path for execution authorization. In `mediated` mode, cllama gains the ability to inject tools into LLM requests and execute tool_calls transparently. This extends the existing pattern: @@ -349,9 +396,9 @@ Truncation MUST be explicit in the structured result so the model does not reaso All configurable at pod level, compiled into `tools.json`. -### 5. Boundary: what cllama does and does not do +### 6. Boundary: what cllama does and does not do -**cllama manages:** Tools from `tools.json` — injection, interception, execution, auth, audit. +**cllama manages in `mediated` mode:** Tools from `tools.json` — injection, interception, execution, delegated auth where supported, and audit. **cllama does NOT manage:** - **Runner-native tools.** Shell, file ops, send_message, browser remain runner-owned. Additive composition with pod-shared tools happens in `native` mode, not inside the `mediated` tool loop. @@ -361,7 +408,7 @@ All configurable at pod level, compiled into `tools.json`. cllama is a **tool mediator**, not an agent framework. It extends the proxy's existing intercept-enforce-forward pattern to a new dimension. -### 6. Skills and tools: complementary, not competing +### 7. Skills and tools: complementary, not competing Skills and tools are sibling concepts from the same descriptor. A service emits both: @@ -375,10 +422,12 @@ The skill says *when and why*. The tool provides *how*. The feed delivers *what' For v1, one skill file per service (not per tool). If a service exposes five tools, the skill can have five sections. `claw up` continues to mount skills at `/claw/skills/` and reference them in CLAWDAPUS.md. CLAWDAPUS.md gains a `## Tools` section listing available tool names and descriptions. -### 7. Session history and audit (amends ADR-018) +### 8. Session history and audit (amends ADR-018) ADR-018 defines session history as successful 2xx completions recorded to `history.jsonl`. This ADR extends that contract in two ways: (1) tool-mediated requests record a `tool_trace` capturing each execution round, and (2) failed tool executions are also recorded, since tool failures are the most important events to audit. The recorder gains a `status` field (`"ok"` or `"error"`) to distinguish successful from failed entries. +For `native` mode, equivalent visibility is still required before the path can be treated as governance-complete. That visibility may come from a Clawdapus-owned proxy/broker, runner-reported telemetry, or another auditable transport, but the architecture requires parity in observable tool usage even when execution identity remains native to the surface. + Session history expands with a `tool_trace` field: ```json @@ -409,13 +458,13 @@ Session history expands with a `tool_trace` field: `usage` aggregates ALL LLM calls in the chain (the runner's bill). `tool_trace` captures each round for audit. -### 8. MCP primitives: analogous projection, not equivalence +### 9. MCP primitives: analogous projection, not equivalence MCP defines three primitives. Each has an analogous Clawdapus concept, but the semantics differ: | MCP primitive | MCP semantics | Clawdapus analogue | Difference | |---|---|---|---| -| Tools | Model-invoked, JSON-RPC execution | `tools[]` → `tools.json` | Same intent. Clawdapus uses provider-native tool calling, not JSON-RPC. | +| Tools | Model-invoked, JSON-RPC execution | `tools[]` → mediated manifest or native runner config | Same intent. Clawdapus uses provider-native tool calling or runner-native hosting, not required JSON-RPC in v1. | | Resources | Application-controlled, URI-addressed data | `feeds[]` → `feeds.json` | Feeds are auto-injected context with TTL. MCP resources are on-demand and client-fetched. | | Prompts | User-invoked templates with arguments | `skill` → mounted skill files | Skills are ambient behavioral guidance. MCP prompts are explicit user actions. | @@ -423,7 +472,7 @@ These are analogous projections, not semantic equivalents. Clawdapus covers simi ## Future Extensions -**Graduation path.** `native` mode is the preferred steady state only if audit parity exists. When a runner can consume pod-shared MCP/client configuration, `claw up` generates that config and the runner merges pod-shared tools additively with its local tool set. Native mode does not graduate on runner capability alone; pod-shared tool execution must remain auditable. The preferred audit strategy is a Clawdapus-owned MCP proxy/broker for pod-shared tools, so runners gain additive composition without giving up observable service-tool traffic. +**Graduation path.** `native` mode is the preferred steady state only if audit parity exists. When a runner can consume pod-shared MCP/client configuration, `claw up` generates that config and the runner merges pod-shared tools additively with its local tool set. Native mode does not graduate on runner capability alone; pod-shared tool execution must remain auditable. The preferred audit strategy is a Clawdapus-owned proxy or MCP broker for pod-shared tools, so runners gain additive composition without giving up observable service-tool traffic. `Mediated` mode remains a supported long-term path for Clawdapus-compatible services that want brokered execution and rights projection. **Dynamic tool context.** Context-sensitive filtering: time-of-day policies, alert-driven restriction, session-scoped escalation. Reads from cllama's existing context loader. @@ -434,17 +483,18 @@ These are analogous projections, not semantic equivalents. Clawdapus covers simi ## Consequences **Positive:** -- Agents gain reliable, structured service interaction via MCP-shaped tool schemas delivered through provider-native tool calling. -- `mediated` mode extends credential starvation to service tools. Agent-facing endpoint details are omitted for services that declare managed tools. -- One canonical capability IR supports both zero-runner-change mediation and future additive native composition with audit parity. -- Uses MCP schema vocabulary while preserving Clawdapus's compile-time determinism and proxy-mediated delivery. -- Follows the exact same architectural pattern as feeds: declare → compile → inject. +- Agents gain reliable, structured service interaction via one MCP-shaped capability IR that can be projected into either native runner execution or mediated execution. +- `native` mode cleanly fits enterprise and customer infrastructure where backend auth should remain authoritative and runner execution identity is real. +- `mediated` mode extends credential starvation to service tools for Clawdapus-compatible surfaces and other trusted brokered paths. Agent-facing endpoint details are omitted for services that declare managed tools. +- Uses MCP schema vocabulary while preserving Clawdapus's compile-time determinism. +- The mediated projection follows the same declare → compile → inject pattern as feeds. **Negative:** +- Two execution modes add conceptual complexity and require clear operator guidance about when Clawdapus is only governing presentation versus fully brokering execution. - cllama gains complexity: tool injection, execution loop, and response coordination. - Non-streaming upstream for tool-augmented requests adds latency. - `mediated` mode cannot transparently mix runner-local and pod-shared tools in one upstream tool round. -- `native` mode remains contingent on an auditable pod-shared transport path; runner capability alone is insufficient. +- `native` mode remains contingent on an auditable pod-shared transport path or telemetry path; runner capability alone is insufficient for governance-complete execution. **Neutral:** - `claw.describe` v1 descriptors work unchanged. The `version: 2` field gates new behavior. @@ -452,18 +502,25 @@ These are analogous projections, not semantic equivalents. Clawdapus covers simi ## Implementation -### Phase 1: Compile-time plumbing +### Phase 1: Compile-time IR and policy plumbing - Add `Tools []ToolDescriptor` and `Annotations` to `internal/describe/descriptor.go` -- Add `buildToolManifestEntries` in `compose_up.go` (beside `buildFeedManifestEntries`) +- Add tool registry/materialization alongside feed registry/materialization in `compose_up.go` - Reuse URL synthesis (`compose_up.go:996`) and bearer auth projection (`compose_up.go:977`) - Add list-shaped `tools:` / `tools-defaults:` policy parsing with explicit opt-in semantics -- Write per-agent `tools.json` to context directory (`internal/cllama/context.go`) +- Write per-agent mediated `tools.json` to context directory (`internal/cllama/context.go`) - Project tool names into CLAWDAPUS.md `## Tools` section - Unit tests alongside existing `compose_up_test.go` and `context_test.go` -### Phase 2: cllama tool injection + non-streaming mediation (OpenAI) +### Phase 2: Native projection and runner integration + +- Generate runner-side tool/MCP/client config from the compiled tool catalog +- Load pod-shared tools additively alongside runner-local tools where a runner supports it +- Preserve backend auth as runner/surface responsibility in this path +- Define the first audit-capable native transport or telemetry strategy + +### Phase 3: cllama tool injection + non-streaming mediation (OpenAI) -This is the critical path — all architectural complexity concentrates here. Consider subdividing into 2a (injection + single-round execution) and 2b (multi-round loop + continuity + re-streaming). +This is the critical path for the mediated compatibility layer. Consider subdividing into 3a (injection + single-round execution) and 3b (multi-round loop + continuity + re-streaming). - Load `tools.json` in cllama agent context loader (`agentctx`) - Replace outgoing request's `tools[]` with managed tools in `handleOpenAI` @@ -474,18 +531,18 @@ This is the critical path — all architectural complexity concentrates here. Co - Implement transcript continuity (prefer transcript reflection for v1) - Record `tool_trace` in session history -### Phase 3: Anthropic parity and polish +### Phase 4: Anthropic parity and audit polish - Implement Anthropic-format tool handling (`tool_use` / `tool_result` content blocks) - Unified tool execution path for both API formats - `claw audit` tool usage reporting -### Phase 4: MCP execution transport +### Phase 5: MCP transport and discovery - Implement MCP client in cllama for downstream `tools/call` execution - Support baked `.claw-tools.json` from MCP-native images - `claw discover` command for live MCP schema updates -### Phase 5: Native additive mode and graduation +### Phase 6: Graduation and advanced policy - `parallel_safe` annotation and concurrent tool execution - Dynamic tool filtering (time-based, alert-driven) -- Generate runner-side MCP/client config for pod-shared tools -- Define audit path for `native` mode tool execution +- Expand runner-side MCP/client config coverage across drivers +- Graduate `native` mode per runner only when audit parity exists From cd28ca82d42b8c4f416f6b0c036b5f056296c092 Mon Sep 17 00:00:00 2001 From: Wojtek Date: Tue, 31 Mar 2026 00:06:27 -0400 Subject: [PATCH 03/18] Soften ambient memory docs language --- MANIFESTO.md | 4 +++- README.md | 3 ++- 2 files changed, 5 insertions(+), 2 deletions(-) diff --git a/MANIFESTO.md b/MANIFESTO.md index dafee92..70e9135 100644 --- a/MANIFESTO.md +++ b/MANIFESTO.md @@ -38,7 +38,7 @@ Clawdapus is infrastructure for bots the way Docker is infrastructure for applic 6. **Compute Is a Privilege** — Every cognitive cycle is an authorized expenditure. The operator assigns models and schedules; the proxy enforces budgets and rate limits. The bot does not choose its own budget. 7. **Think Twice, Act Once** — A reasoning model cannot be its own judge. Prompt-level guardrails are part of the same cognitive process they are trying to constrain. Governance must be executed by a separate, independent process. 8. **Drift is an Open Metric** — We do not trust a bot's self-report. However, defining and measuring behavioral drift is complex and organization-specific. By delegating interception to a swappable governance proxy, the infrastructure avoids defining drift itself, leaving it as an open operational metric for the proxy to explore and quantify. -9. **Memory Survives the Container (and the Runner)** — A bot acting as a persistent presence cannot afford amnesia. Session history is captured at the proxy boundary and stored outside the runtime directory — infrastructure-owned, always present, never dependent on runner cooperation. The runner's own scratch space is separately persisted. Two surfaces, two owners, both durable. Because the architecture is the agent, and the runtime is just the voice, you can swap the `CLAW_TYPE` (the runner) without losing the mind. Knowledge and context seamlessly cross runtime boundaries. +9. **Memory Survives the Container (and the Runner)** — A bot acting as a persistent presence cannot afford amnesia. Session history is captured at the proxy boundary and stored outside the runtime directory — infrastructure-owned, always present, never dependent on runner cooperation. The runner's own scratch space is separately persisted. Two surfaces, two owners, both durable. Because the architecture is the agent, and the runtime is just the voice, you can swap the `CLAW_TYPE` (the runner) without losing the mind. Knowledge and context seamlessly cross runtime boundaries. But retention alone is not memory. The architecture is moving toward an **ambient memory plane**: pluggable memory services deriving durable state from the retained record, and the governance proxy recalling relevant context back into the inference stream on future turns — automatically, without the agent asking. The agent would not manage its own long-term memory. Infrastructure would. --- @@ -57,6 +57,8 @@ These layers are independently versioned, independently deployable, and independ Two persistence surfaces support the running bot. **Session history** is infrastructure-owned: the governance proxy captures every successful LLM turn at the network boundary and writes it to a durable directory outside the runtime tree. This happens regardless of runner type, without any runner cooperation. **Portable memory** is runner-owned: the agent's scratch and note-taking space, mounted at `/claw/memory`. Both surfaces survive container restarts and `claw up` re-runs. A bot deployed for months does not lose its conversational past when its container is recreated. See [ADR-018](docs/decisions/018-session-history-and-memory-retention.md). +A planned **ambient memory plane** would build on these surfaces. Pluggable memory services would consume the session history ledger, derive durable state — facts, commitments, episodic summaries, project context — and the governance proxy would recall that state into future turns automatically. Memory recall would be query-aware: unlike feeds, which deliver the same cached content regardless of conversation, recall would be shaped by the current inference request. Memory intelligence — embeddings, ranking, summarization, graph extraction — would live in swappable services behind a stable contract, not in the proxy or the runner. See [ADR-021](docs/decisions/021-memory-plane-and-pluggable-recall.md). + ### V. The Behavioral Contract The behavioral contract is the single most important file in the architecture. It is the bot's purpose, defined by the operator, delivered as a read-only bind mount from the host. Even if the container is fully compromised (root access), the contract remains untouchable. diff --git a/README.md b/README.md index 4701e48..c211aa3 100644 --- a/README.md +++ b/README.md @@ -350,6 +350,7 @@ When a reasoning model tries to govern itself, the guardrails are part of the sa - **Identity resolution:** Single proxy serves an entire pod. Bearer tokens resolve which agent is calling. - **Cost accounting:** Extracts token usage from every response, multiplies by pricing table, tracks per agent/provider/model. - **Audit logging:** Structured JSON on stdout — timestamp, agent, model, latency, tokens, cost, intervention reason. +- **Planned ambient memory:** The architecture is moving toward querying pluggable memory services before each inference turn, injecting relevant derived context — facts, commitments, summaries — into the prompt automatically. Memory intelligence will live in swappable services, not in the proxy. - **Operator dashboard:** Real-time web UI at host port 8181 by default (container `:8081`) — agent activity, provider status, cost breakdown. The reference implementation is [`cllama`](https://github.com/mostlydev/cllama) — a zero-dependency Go binary that implements the transport layer (identity, routing, cost tracking). Future proxy types (`cllama-policy`) will add bidirectional interception: evaluating outbound prompts and amending inbound responses against the agent's behavioral contract. @@ -482,7 +483,7 @@ Bots install things. That's how real work gets done. Tracked mutation is evoluti 6. **Claws are users** — standard credentials; the proxy governs intent, the service's own auth governs execution 7. **Compute is a privilege** — operator assigns models and schedules; proxy enforces budgets and rate limits; bot doesn't choose 8. **Think twice, act once** — a reasoning model cannot be its own judge -9. **Memory survives the container (and the runner)** — session history is captured at the proxy boundary and persisted outside the runtime directory. Bots don't start amnesia-fresh after every restart. Infrastructure owns the record; the runner owns the scratch space. Two surfaces, two owners, never merged. Because the architecture is the agent, you can swap the runtime (`CLAW_TYPE`) without losing the mind; knowledge seamlessly crosses driver boundaries. +9. **Memory survives the container (and the runner)** — session history is captured at the proxy boundary and persisted outside the runtime directory. Bots don't start amnesia-fresh after every restart. Infrastructure owns the record; the runner owns the scratch space. Two surfaces, two owners, never merged. Because the architecture is the agent, you can swap the runtime (`CLAW_TYPE`) without losing the mind; knowledge seamlessly crosses driver boundaries. Retention is only half of memory. The architecture is moving toward an **ambient memory plane**: pluggable memory services deriving durable state from the retained record, and the proxy recalling relevant context into future inference turns automatically. The agent would not manage its own long-term memory — infrastructure would. --- From 402dc465269bc4758153f707edce6a6753cd4637 Mon Sep 17 00:00:00 2001 From: Wojtek Date: Tue, 31 Mar 2026 00:09:23 -0400 Subject: [PATCH 04/18] Draft ADR-021 memory plane --- .../021-memory-plane-and-pluggable-recall.md | 520 ++++++++ ...03-30-memory-plane-and-pluggable-recall.md | 1140 +++++++++++++++++ 2 files changed, 1660 insertions(+) create mode 100644 docs/decisions/021-memory-plane-and-pluggable-recall.md create mode 100644 docs/plans/2026-03-30-memory-plane-and-pluggable-recall.md diff --git a/docs/decisions/021-memory-plane-and-pluggable-recall.md b/docs/decisions/021-memory-plane-and-pluggable-recall.md new file mode 100644 index 0000000..ff3509a --- /dev/null +++ b/docs/decisions/021-memory-plane-and-pluggable-recall.md @@ -0,0 +1,520 @@ +# ADR-021: Memory Plane as a Compiled Capability + +**Date:** 2026-03-30 +**Status:** Draft +**Depends on:** ADR-017 (Pod Defaults and Service Self-Description), ADR-018 (Session History and Persistent Memory Surfaces), ADR-020 (Compiled Tool Plane with Native and Mediated Execution Modes) +**Amends:** ADR-018 (defines the derived retrieval plane), ADR-020 (extends the canonical capability IR) +**Implementation:** Plan: docs/plans/2026-03-30-memory-plane-and-pluggable-recall.md + +## Context + +ADR-018 established the substrate for infrastructure-owned retention: + +- `cllama` writes normalized per-agent session history +- session history and runner-owned `/claw/memory` are separate surfaces +- Phase 2 (scoped read API) and Phase 3 (derived retrieval) were deferred + +ADR-020 establishes the next major pattern: + +- services self-describe through `claw.describe` +- `claw up` compiles per-agent manifests +- `cllama` may mediate request-time behavior from those manifests +- the backend implementation remains external + +The open question is how memory fits this model. + +Memory clearly resembles ADR-020's compiled capability flow: + +- it should be declared by a self-described service +- the consumer should subscribe in pod YAML +- `claw up` should compile per-agent manifests +- `cllama` should orchestrate request-time behavior +- backend logic should remain external and swappable + +But memory is not the same kind of capability as either feeds or tools: + +- **Feeds** are query-agnostic live context with TTL semantics. +- **Tools** are model-invoked callable operations. +- **Memory** needs both a synchronous pre-turn recall path and an asynchronous post-turn retain path, both initiated by infrastructure rather than by the model. + +If memory is forced into the feed shape, recall becomes query-blind and loses most of its value. + +If memory is forced into the tool shape, recall becomes opt-in at the model layer and loses the reliability that makes it infrastructure-worthy. + +The right question is therefore not "is memory a feed or a tool?" + +The right question is: "how does memory extend the same compiled-capability architecture without pretending to be a different lifecycle than it is?" + +## Decision + +### 1. Memory is a first-class capability in the same compiled model as feeds and tools + +ADR-020's architectural pattern is the right one: + +- declare capability in `claw.describe` +- subscribe in pod YAML +- compile per-agent runtime artifacts +- let `cllama` mediate request-time behavior + +Memory follows that exact pattern. + +It is therefore part of the same compiled capability model as feeds and tools. + +It is not a plugin universe, not a runner-local convention, and not a special one-off side channel. + +### 2. Memory is a sibling capability, not a subtype of feeds or tools + +The canonical capability model is extended from: + +- `tools[]` +- `feeds[]` +- `skill` +- `endpoints[]` + +to: + +- `tools[]` +- `feeds[]` +- `memory` +- `skill` +- `endpoints[]` + +The distinction is lifecycle, not importance. + +| Capability | Primary purpose | Trigger | Query-aware | Typical artifact | Runtime owner | +|---|---|---|---|---|---| +| `feeds[]` | Ambient live context | Service/polling cadence | No | `feeds.json` | `cllama` fetch + inject | +| `tools[]` | Explicit callable operations | Model tool call | Yes, via arguments | `tools.json` or runner config | `cllama` (`mediated`) or runner (`native`) | +| `memory` | Derived durable context and retention hooks | Infra lifecycle | Yes | `memory.json` | `cllama` | + +Feeds tell the model what is happening now. + +Tools let the model do something on purpose. + +Memory lets infrastructure retain derived continuity and re-surface it automatically. + +### 3. Memory is `mediated` by definition in v1 + +ADR-020 distinguishes `native` and `mediated` execution for tools. + +Memory does not follow that split in the same way. + +For the ambient memory plane, Clawdapus only supports the mediated model: + +- `cllama` orchestrates recall before the upstream inference request +- `cllama` dispatches retain after the request completes +- `cllama` applies governance filters on both directions + +There is no runner-native equivalent that preserves the trust boundary, compile-time determinism, and cross-runner portability that motivate this feature. + +This does **not** mean memory services can never expose tools. + +A memory service may also declare ordinary `tools[]`, such as: + +- `search_memory` +- `pin_fact` +- `forget_memory` +- `list_open_commitments` + +Those explicit operations live on the tool plane and follow ADR-020 normally. + +But ambient recall and retain are part of the memory plane, not the tool plane. + +### 4. ADR-020's descriptor version should be treated as the umbrella capability version + +ADR-020 already proposes `claw.describe` `version: 2`. + +Because ADR-020 is still draft and unimplemented, this ADR amends its interpretation: + +`version: 2` is the umbrella schema version for compiled service capabilities, not a tools-only bump. + +A `version: 2` descriptor may therefore include any combination of: + +- `feeds[]` +- `tools[]` +- `memory` +- `skill` +- `endpoints[]` + +This avoids pointless schema churn where tools land as one `v2` and memory immediately forces a second incompatible revision for the same implementation wave. + +If ADR-020 were to ship first exactly as currently written, then memory would need either: + +- an explicit amendment to ADR-020 before implementation, or +- a `version: 3` descriptor bump + +The preferred path is to avoid that split and treat `v2` as the shared capability-evolution step. + +### 5. The memory capability is declared by providers and subscribed to by consumers + +Memory follows the same provider-owns, consumer-subscribes rule as feeds and tools. + +A service declares memory capability in its descriptor. + +An agent subscribes to exactly one memory relationship in pod YAML. + +That relationship points to one service boundary, even if the service internally layers multiple strategies such as: + +- semantic retrieval +- graph memory +- rolling summaries +- periodic consolidation + +Clawdapus should not expose an arbitrary stack of memory backends directly to one agent. + +### 6. The memory descriptor is small and lifecycle-shaped + +The provider descriptor adds an optional `memory` object: + +```json +{ + "version": 2, + "description": "Derived memory service", + "memory": { + "recall": { "path": "/recall" }, + "retain": { "path": "/retain" }, + "forget": { "path": "/forget" } + }, + "auth": { "type": "bearer", "env": "MEMORY_API_TOKEN" } +} +``` + +Notes: + +- `recall` is required when a service wants to participate in hot-path context injection +- `retain` is required when a service wants low-latency processing of new turns +- `forget` is optional and reserved for governed operations + +The descriptor does **not** negotiate a semantic vocabulary for recall inputs. + +`cllama` sends a fixed payload shape with only simple numeric bounds configured at compile time. + +The service ignores fields it does not need. + +### 7. Pod subscription is explicit and singular + +The consumer surface in pod YAML is an explicit memory relationship: + +```yaml +x-claw: + memory: + service: team-memory + timeout-ms: 300 +``` + +Pod-level `memory-defaults` follows the normal defaults model. + +Service-level declaration overrides the default unless `...`-style list composition is later proven necessary. + +For memory, the default expectation is one relationship, not list composition. + +V1 should keep this operator surface small. + +Simple numeric shaping such as recent-window size, request byte caps, and injected byte caps should begin as implementation defaults rather than as a large user-facing knob surface. + +### 8. `claw up` compiles a dedicated per-agent `memory.json` + +Memory follows ADR-020's compile pipeline: + +| Step | Feeds | Tools (`mediated`) | Memory | +|---|---|---|---| +| Descriptor declares | `feeds[]` | `tools[]` | `memory` | +| Consumer policy | `feeds:` subscription | `tools:` allowlist | `memory:` relationship | +| Artifact written | `feeds.json` | `tools.json` | `memory.json` | +| Runtime consumer | `cllama` feed fetcher | `cllama` mediator | `cllama` recall/retain orchestrator | + +`memory.json` is per-agent because: + +- auth is per agent +- the subscribed service is per agent +- future policy and observability may differ per agent + +The manifest shape should be simple: + +```json +{ + "version": 1, + "service": "team-memory", + "base_url": "http://team-memory:8080", + "recall": { + "path": "/recall", + "timeout_ms": 300 + }, + "retain": { + "path": "/retain" + }, + "forget": { + "path": "/forget" + }, + "auth": { + "type": "bearer", + "token": "resolved-token-value" + } +} +``` + +Auth resolution follows the same order as ADR-020 mediated tools and existing feeds: + +- projected per-agent service credential when available +- otherwise descriptor-declared auth when that fallback is valid + +Memory should not invent a second auth model. + +The implementation may still compile default bounds into the runtime config, but those should begin as internal defaults, not as a large operator-facing contract. + +### 9. The memory plane has three distinct operations + +#### Recall + +Recall is synchronous and hot-path: + +1. `cllama` authenticates the agent as usual +2. `cllama` loads `memory.json` +3. `cllama` builds a bounded recall payload from the current request +4. `cllama` calls the memory service +5. `cllama` filters and injects the returned blocks +6. `cllama` forwards the enriched request upstream + +Recall exists to surface **derived durable state**, not transcript tails. + +If recall fails, the request continues without memory by default. + +At the contract level, recall has a fixed shape: + +- request carries agent identity, pod identity, basic request metadata, and the latest user message plus a small bounded recent context window +- response carries a bounded list of text blocks with optional metadata such as `kind`, `source`, `score`, and `ts` + +The exact JSON can evolve, but that shape is architectural. + +The recent context window is an implementation default in v1, not a large declarative vocabulary and not an operator tuning surface unless real usage proves it necessary. + +Injection is provider-format-aware. + +The implementation must resolve how the same logical memory block is rendered for: + +- OpenAI-style `messages[]` requests +- Anthropic-style requests with top-level `system` handling + +This ADR does not prescribe the exact injection primitive, but it does require one bounded logical memory block that is inserted consistently across both request families. + +#### Retain + +Retain is asynchronous and best-effort: + +1. the normalized session-history entry is appended to the ledger +2. `cllama` dispatches that same normalized entry to the memory service +3. failures are observed but do not fail the already-completed inference request + +Retain exists to reduce freshness lag. + +It does not replace ledger durability. + +#### Forget + +Forget is governed and optional: + +- it is not a normal runner capability +- it exists for operator policy, future Master Claw workflows, and backfill hygiene + +Forget applies to the external memory backend and to replay behavior. + +It does **not** justify mutating the append-only ledger in place. + +Instead, forgetting requires tombstone or redaction metadata that future replay and backfill paths honor. + +### 10. Memory traffic must be observable + +Memory mediation is part of the governed request path and must emit structured telemetry. + +At minimum, the implementation should record: + +- whether recall was attempted, skipped, succeeded, timed out, or failed +- recall latency +- number of blocks returned +- number of blocks removed by policy +- injected byte count +- whether retain delivery succeeded or failed +- retain delivery latency + +This should align with the existing structured logging and audit direction rather than inventing a separate unstructured debug path. + +### 11. Session history remains the substrate and source of truth + +This ADR does not change ADR-018's ownership boundary. + +The roles become: + +- `history.jsonl`: immutable ledger, audit substrate, replay substrate +- memory service: derived state, indexing, summarization, salience, ranking +- `cllama`: orchestration, policy filtering, hot-path injection, best-effort delivery +- runner `/claw/memory`: local scratchpad and portable runner-owned state + +This means memory quality can improve radically over time without changing the retention substrate. + +### 12. Backfill is first-class, not a repair hack + +The retain webhook is only the low-latency path. + +A memory service must also be able to build or rebuild from the ledger. + +This requires: + +- ADR-018 Phase 2 style scoped history read access +- a future explicit replay or backfill flow +- replay semantics that honor forget tombstones + +Without backfill, the retain webhook is merely a convenience. + +With backfill, memory services become truly swappable. + +ADR-018 Phase 2 style scoped history read access is therefore a prerequisite for the first supported rollout of the memory plane. + +A local prototype may read ledger files directly, but that is not sufficient for the supported, swappable, runner-agnostic memory plane this ADR defines. + +### 13. Memory is not the same as runner session continuity + +The runner still owns immediate conversational recency. + +The memory plane is deliberately not a strong read-after-write substitute for the runner's live session window. + +That is acceptable because the memory plane is for: + +- cross-session continuity +- durable facts +- older episodic summaries +- decisions and commitments +- long-range project state + +not for replaying the last few raw turns back into the model. + +### 14. Operators should prefer one ambient memory plane + +When the infrastructure memory plane is enabled, runner-native memory injection and runner-native memory-search tools may become redundant or actively conflicting. + +If `cllama` injects governed memory context while the runner also injects its own memory context, the agent may receive: + +- duplicate facts +- contradictory summaries +- repeated commitments +- mismatched privacy or forgetting policy + +The operational guidance should therefore be: + +- prefer the infrastructure memory plane as the single ambient recall mechanism +- disable runner-native memory plugins or memory injection where practical when using the infrastructure plane +- do not attempt generic forced disablement across all runners from Clawdapus itself + +Clawdapus should document this overlap explicitly rather than treating it as a purely neutral coexistence case. + +## Rationale + +### Why not model memory as a feed? + +Feeds are the wrong shape: + +- they are query-agnostic +- they are naturally TTL-cached +- they represent live service state, not derived continuity + +Memory recall needs the current request as input. + +If it does not, it is usually not doing real recall. + +### Why not model memory as a tool? + +Tool-based memory search is useful, but it is not sufficient as the infrastructure plane. + +If recall depends on the model deciding to call a tool: + +- reliability becomes model-dependent +- runners without shared tool hosting lose parity +- cross-runner portability collapses back toward runner plugins + +Explicit memory tools are a complement, not the substrate. + +### Why keep memory separate from runner-owned `/claw/memory`? + +The ownership boundary from ADR-018 is still correct. + +Runner memory is agent-authored and writable. + +Infrastructure memory is operator-governed and proxy-mediated. + +Collapsing them would blur authority and make replay, redaction, and audit much harder. + +### Why no `native` memory mode? + +Ambient memory recall is valuable precisely because it is reliable, governed, and runner-agnostic. + +If runners each implement their own retain and recall path: + +- persistence becomes runner-coupled +- backend stores fragment +- cross-runner continuity regresses +- policy enforcement becomes inconsistent + +That is the failure mode this ADR exists to avoid. + +### Why a dedicated `memory.json` instead of folding everything into one manifest? + +Today the codebase already uses small dedicated per-agent artifacts: + +- `metadata.json` +- `feeds.json` +- `service-auth/*.json` + +ADR-020 adds `tools.json`. + +Adding `memory.json` is consistent with that pattern and avoids prematurely inventing a generic super-manifest before the capability shapes stabilize. + +A future manifest unification is possible, but not required to land the architecture cleanly. + +## Consequences + +**Positive:** + +- Memory fits the same declare -> compile -> mediate architecture as feeds and tools. +- Memory vendors remain swappable behind one stable contract. +- Cross-runner continuity no longer depends on runner-native plugins or per-runner databases. +- Governance applies to both retention and recall traffic. +- Backfill and replay become first-class concerns rather than an afterthought. +- The descriptor and context changes can be made once, alongside ADR-020, instead of in two conflicting passes. + +**Negative:** + +- `cllama` gains another hot-path responsibility and must budget latency tightly. +- The capability model becomes broader: operators must understand feeds, tools, and memory as related but distinct surfaces. +- Runner-native memory systems may overlap or conflict with the infrastructure plane and require operator discipline. +- Because memory is mediated-only in v1, there is no short path that reuses runner-local memory plumbing. + +**Neutral:** + +- A memory service may expose both `memory` and `tools[]`; these are complementary, not duplicative. +- This ADR does not standardize embeddings, ranking, graph schemas, or salience logic. +- This ADR does not require immediate implementation of shared or cross-agent memory namespaces. + +## Implementation Direction + +To avoid shape churn, ADR-020 and this ADR should be implemented as one descriptor/context evolution wave. + +The practical order is: + +1. extend `internal/describe.ServiceDescriptor` for `version: 2` capability parsing +2. add the new user-facing pod grammar for tools and memory together +3. extend `internal/cllama.AgentContextInput` and manifest generation once +4. implement ADR-018 Phase 2 history read/backfill substrate +5. implement `memory.json` compilation and `cllama` recall/retain hooks +6. implement `tools.json` mediation and any shared manifest/auth helpers that fall out of the work + +The important point is not the exact file order. + +The important point is to avoid implementing ADR-020 as if tools are the only future compiled capability and then immediately refactoring the same surfaces again for memory. + +The first supported end-to-end checkpoint is after steps 4 and 5: + +- one self-described memory service can be wired to one agent +- recall injects derived blocks in the request path +- retain delivers normalized entries post-turn +- replay/backfill is supported through the scoped history surface + +Without that checkpoint, the system may be an interesting prototype, but it is not yet the supported memory plane defined by this ADR. diff --git a/docs/plans/2026-03-30-memory-plane-and-pluggable-recall.md b/docs/plans/2026-03-30-memory-plane-and-pluggable-recall.md new file mode 100644 index 0000000..0bdb1a2 --- /dev/null +++ b/docs/plans/2026-03-30-memory-plane-and-pluggable-recall.md @@ -0,0 +1,1140 @@ +# Memory Plane and Pluggable Recall Plan + +## Goal + +Introduce memory as a first-class infrastructure plane in Clawdapus: + +- runner-agnostic +- durable across rebuilds and runner swaps +- governed by `cllama` +- implemented by swappable memory services rather than by runner plugins + +This document is intentionally a plan, not yet an ADR. It is meant to sharpen the boundary between: + +- what Clawdapus should own +- what `cllama` should own +- what memory backends and vendors should own + +The central claim is: + +**Clawdapus should own the reliable lifecycle hooks and policy surface for memory, but it should not own the intelligence of memory itself.** + +**Raw recent history is not the product. Derived durable state is the product.** + +That intelligence includes retention strategy, salience, summarization, embeddings, graph extraction, affect modeling, ranking, deduplication, decay, and recall selection. + +Those should remain swappable behind a stable service contract. + +--- + +## Why This Exists + +The current repo already has the right primitives, but not yet a complete memory plane: + +- `cllama` already captures durable per-agent session history at the proxy boundary. +- `cllama` already injects live context through feeds and request decoration. +- Clawdapus already compiles per-agent manifests into mounted context directories. +- Services already self-describe through `claw.describe`. +- The manifesto already states that memory must survive the container and the runner. + +What is missing is the pipeline that connects: + +1. raw retained turns +2. derived memory artifacts +3. request-time recall + +through a clean, pluggable service contract. + +The problem is not only persistence. The problem is useful recall. + +If the system only re-injects the most recent raw turns, it adds very little. Runners already maintain live sessions and recency windows. The real value of infrastructure memory is the ability to surface durable, relevant, derived context that the live session window does not preserve reliably. + +Examples: + +- long-lived user preferences +- stable facts about operators, services, repos, or accounts +- open commitments and unresolved tasks +- previous decisions and their rationale +- episodic summaries from older sessions +- project state that spans many conversations +- cross-runner continuity after migration or rebuild + +In other words: + +**Raw recent history is not the product. Derived durable state is the product.** + +Raw history is still essential, but as the source of truth, not as the typical recall payload. + +--- + +## Current Repo Position + +The architecture in-tree already points in this direction. + +### 1. Infra-owned retention already exists + +ADR-018 established: + +- session history is infra-owned +- session history is written by `cllama` +- portable memory is runner-owned +- the two surfaces must remain distinct + +This is the correct foundation. Raw history should be captured once at the one place all cllama-enabled runners share: the governance proxy boundary. + +### 2. Request-time context injection already exists + +The current `cllama` implementation already: + +- loads per-agent feed manifests +- fetches live context +- prepends it into OpenAI and Anthropic requests +- injects current time + +So memory recall is not a new category of behavior. It is a new kind of request-time enrichment. + +### 3. Service self-description already exists + +Services already advertise capabilities through `claw.describe`, and `claw up` already compiles those capabilities into per-agent runtime artifacts. + +That means memory backends do not need to be runner plugins. They can be pod services with normal Clawdapus discovery, auth projection, and compile-time wiring. + +### 4. Tool mediation is already converging on the same architecture + +ADR-020 is already establishing the pattern: + +- service declares capability +- `claw up` compiles a manifest +- `cllama` may mediate or inject behavior at request time +- backend implementation remains external + +Memory should follow the same architectural logic instead of inventing a separate plugin universe. + +More strongly: + +**Memory should use the exact same structural pattern that tools are moving toward.** + +- provider declares capability in `claw.describe` +- consumer subscribes in pod YAML +- `claw up` compiles a per-agent manifest +- `cllama` enforces and mediates request-time behavior +- backend implementation remains swappable + +This parallel should be treated as core architectural framing, not as an incidental similarity. + +--- + +## Problem Statement + +Without a shared memory plane, users are pushed toward runner-local memory systems: + +- OpenClaw plugins +- per-runner vector databases +- runner-specific hooks +- incompatible stores and formats + +This has several structural downsides: + +- memory becomes coupled to one runner family +- changing `CLAW_TYPE` threatens continuity +- memory persistence depends on runner cooperation +- every runner may duplicate infrastructure work +- every runner may spin up its own retrieval stack +- governance over retained and recalled content becomes inconsistent + +This violates the repository's direction in two ways: + +1. It undermines runner-agnostic persistence. +2. It moves a governance-relevant concern back into the trusted application layer. + +Memory quality may ultimately be where a large share of agent performance comes from. That is a reason to expose strong hooks and clean contracts, not a reason to hardcode memory intelligence into the proxy or into runners. + +--- + +## Design Principles + +### 1. The ledger is sacred + +`history.jsonl` is the immutable substrate. + +It is: + +- append-only +- normalized +- operator-visible +- rebuildable input for any future memory backend + +Memory services may fail, change, or be replaced. The ledger remains the stable truth. + +### 2. Portable memory stays runner-owned + +`/claw/memory` remains: + +- runner-writable +- agent-authored +- format-agnostic +- separate from infra-owned history + +We should not collapse session history and portable memory into a single surface. + +### 3. `cllama` owns orchestration, not cognition + +`cllama` should know: + +- when to call recall +- when to call retain +- how to inject recall results +- how to apply policy filters +- how to measure failures and latency + +`cllama` should not know: + +- how to embed text +- how to rank memories +- how to build graphs +- how to infer affect +- how to compact or summarize +- which vendor algorithm is best + +### 4. Memory intelligence must be swappable + +Clawdapus should make it possible to plug in: + +- mem0 +- supermemory +- graph-based memory systems +- local pgvector/Qdrant/Chroma implementations +- simple rolling-summary stores +- domain-specific memory engines + +without requiring: + +- runner changes +- new proxy code for each backend +- store migration to a Clawdapus-owned schema + +### 5. One memory relationship per agent + +An agent should subscribe to one memory service, not to an arbitrary list of memory backends. + +If an operator wants a layered strategy, such as: + +- raw retention +- semantic recall +- graph memory +- periodic summarization + +that should be composed behind one memory service boundary. + +This keeps the Clawdapus surface simple and avoids exploding the agent-facing memory model. + +### 6. Recall should return derived state, not transcript tails + +The memory plane should optimize for: + +- stable facts +- commitments +- episodic summaries +- project state +- relevant long-range context + +not for "last N messages." + +If a backend cannot produce anything more useful than recent transcript slices, it should not yet be in the hot path. + +### 6.5. Hot-path latency must be budgeted aggressively + +Recall runs in the inference hot path, so it must be treated like an expensive privilege, not a free convenience. + +That implies: + +- strict short timeouts +- no automatic retries on the hot path +- graceful degradation when recall fails +- explicit per-agent opt-in +- bounded payload size + +If a backend cannot return useful derived state within the allowed budget, it should not be enabled for synchronous recall. + +### 7. Governance must apply to memory traffic too + +Memory is a cognitive surface. + +That means the same infrastructure that governs model traffic should be able to: + +- scrub sensitive data before retention +- redact or suppress recalled content before reinjection +- forget or purge retained content when policy requires it + +This is one of the strongest reasons not to leave memory solely inside runners. + +### 7.5. Forget must be compatible with an append-only ledger + +The raw ledger should remain append-only. + +That means a governed forget operation should not rewrite `history.jsonl` in place. + +Instead, forgetting should eventually work through: + +- deletion in the external memory backend +- a tombstone or redaction sidecar ledger owned by infrastructure +- replay and backfill logic that honors those tombstones and does not re-ingest forgotten material + +The goal is: + +- preserve auditability of the raw retention substrate +- prevent forgotten content from re-entering derived memory on a later rebuild or backfill + +### 8. Compile-time wiring, not runtime self-registration + +The memory relationship should be declared in pod YAML and compiled by `claw up`, just like feeds and tools. + +No runtime plugin discovery. +No runner-specific boot-time registration. +No hidden self-attachment logic. + +--- + +## The Memory Pipeline + +The proposed memory plane has four stages. + +### Stage 1: Capture + +`cllama` records every successful inference turn into the durable ledger. + +This already exists. + +Output: + +- append-only `history.jsonl` + +### Stage 2: Retain + +After a successful turn, `cllama` may send a best-effort structured retention webhook to a configured memory service. + +This is an optimization and low-latency trigger, not the source of truth. + +If it fails: + +- the turn is still durable in `history.jsonl` +- the memory service may catch up later from the ledger + +### Stage 3: Process + +The memory service performs its own internal work: + +- summarization +- salience extraction +- fact extraction +- embeddings +- graph linking +- deduplication +- affect tagging +- decayed ranking updates + +This stage is entirely outside `cllama`. + +### Stage 4: Recall + +Before forwarding the next model request upstream, `cllama` may query the memory service for relevant derived context and inject the returned memory blocks into the prompt. + +This is synchronous and bounded: + +- timeout-controlled +- size-capped +- policy-filtered + +This is where memory affects model behavior. + +--- + +## What Clawdapus Should Own + +Clawdapus should own the shared contract and lifecycle hooks. + +### A. The raw ledger + +Already implemented: + +- one normalized history stream per agent +- outside `.claw-runtime` +- durable across restarts and `claw up` + +### B. The memory relationship declaration + +At pod level and/or service level, operators should be able to declare: + +- which memory service an agent uses +- whether recall is enabled +- whether retain webhook is enabled +- bounded hot-path knobs such as timeouts and recall-context size + +### C. Compile-time wiring + +`claw up` should: + +- validate that the referenced memory service exists +- inspect its descriptor +- resolve URLs and auth +- project per-agent memory config into context +- mount the needed runtime files into `cllama` + +### D. The request lifecycle hooks + +`cllama` should: + +- call recall before the upstream LLM request +- inject returned memory blocks into the prompt +- call retain after a successful turn +- log memory hook failures and latency + +### E. Governance hooks + +`cllama` should be able to: + +- scrub retained content before forwarding to memory service +- redact recalled content before injecting it +- support a future governed `forget` action + +### F. Observability + +We should be able to answer: + +- did recall run? +- how long did it take? +- did it time out? +- how many bytes were injected? +- how many blocks were returned? +- did policy remove any blocks? +- did retain webhook fail? + +### G. A minimal reference implementation + +Clawdapus should eventually ship a small reference memory service image that proves the contract end-to-end. + +This reference is not meant to be state of the art. It is meant to: + +- validate the contract +- provide spike coverage +- offer a baseline for operators + +--- + +## What Clawdapus Should Not Own + +### 1. A universal memory algorithm + +Clawdapus should not define: + +- the one true salience metric +- the one true embedding model +- the one true summary format +- the one true graph extraction strategy + +### 2. Vendor-specific backend semantics + +Clawdapus should not hardcode: + +- mem0 APIs +- supermemory APIs +- Graphiti semantics +- Qdrant schema assumptions +- Chroma collection naming + +### 3. Per-runner memory plugins as the primary path + +Runners may still offer native memory tools or plugins, but those should not be the architecture Clawdapus depends on. + +### 4. Memory store internals + +Clawdapus should not care whether a backend uses: + +- SQLite +- JSONL +- Postgres + pgvector +- Qdrant +- graph DBs +- hybrid layers + +as long as it obeys the stable service contract. + +--- + +## Proposed User-Facing Model + +The agent should declare one memory service relationship. + +Suggested shape: + +```yaml +x-claw: + memory-defaults: + service: claw-memory + timeout-ms: 300 + +services: + analyst: + x-claw: + agent: ./agents/ANALYST.md + memory: + service: claw-memory +``` + +Notes: + +- `memory` should be an object, not only a scalar. +- We may support scalar sugar later, but the compiled model should be object-shaped. +- One memory service per agent is intentional. +- Simple payload-shaping bounds should begin as implementation defaults rather than as a large operator-facing knob surface. + +This is deliberately modest. The operator is declaring: + +- who the memory provider is +- how much hot-path budget is available + +The operator is not trying to teach Clawdapus how memory works internally. + +--- + +## Proposed Descriptor Extension + +The current descriptor should gain an optional memory capability section in the next descriptor version line. + +This plan must not create a second incompatible `claw.describe` version `2`. + +ADR-020 already drafts descriptor version `2` for tools. Memory must therefore do one of the following: + +- fold into the same `version: 2` descriptor expansion as tools +- or, if it lands later and cannot be merged cleanly, become `version: 3` + +There must not be two competing meanings for descriptor version `2`. + +Example: + +```json +{ + "version": 2, + "description": "Shared memory service with semantic recall and durable turn retention.", + "memory": { + "recall": { + "path": "/recall" + }, + "retain": { + "path": "/retain" + }, + "forget": { + "path": "/forget" + } + }, + "auth": { + "type": "bearer", + "env": "CLAW_MEMORY_TOKEN" + } +} +``` + +Notes: + +- `forget` is optional. +- The descriptor does not declare ranking semantics or embedding behavior. +- The descriptor does not negotiate a request vocabulary. +- The service receives a fixed bounded payload and ignores what it does not need. + +This matches the current Clawdapus style: + +- provider declares capability +- consumer subscribes by service +- `claw up` compiles the projection + +--- + +## Proposed Runtime Manifest + +`claw up` should compile a new per-agent manifest: + +```text +/claw/context//memory.json +``` + +This mirrors the current: + +- `feeds.json` +- `service-auth/` +- future `tools.json` + +Suggested shape: + +```json +{ + "service": "claw-memory", + "recall": { + "url": "http://claw-memory:8080/recall", + "enabled": true, + "timeout_ms": 300, + "max_bytes": 4096, + "recent_messages": 3, + "auth": "bearer-token-if-needed" + }, + "retain": { + "url": "http://claw-memory:8080/retain", + "enabled": true, + "auth": "bearer-token-if-needed" + }, + "forget": { + "url": "http://claw-memory:8080/forget", + "enabled": true, + "auth": "bearer-token-if-needed" + } +} +``` + +This manifest is consumed by `cllama`, not by the runner. + +That is important: + +- no runner plugin system required +- no per-runner memory client code +- no duplication across drivers + +--- + +## Proposed Wire Contracts + +The wire contract should be deliberately small. + +### 1. Recall + +`cllama` sends a fixed request body to the memory service. + +Suggested request: + +```json +{ + "agent_id": "analyst-0", + "pod": "trading-desk", + "ts": "2026-03-30T15:04:05Z", + "request_path": "/v1/chat/completions", + "requested_model": "anthropic/claude-sonnet-4", + "messages": [ + {"role":"assistant","content":"..."}, + {"role":"user","content":"..."}, + {"role":"user","content":"..."} + ], + "metadata": { + "timezone": "America/New_York" + } +} +``` + +Notes: + +- `messages` is bounded only by simple numeric limits such as recent message count or byte cap. +- The payload is intentionally generic. +- The memory service may ignore fields it does not need. + +Suggested response: + +```json +{ + "blocks": [ + { + "kind": "profile", + "text": "Operator prefers concise summaries and dislikes speculative tone.", + "source": "user-profile", + "score": 0.93, + "ts": "2026-03-28T12:00:00Z" + }, + { + "kind": "commitment", + "text": "Open action: finalize the migration ADR and reconcile model-policy docs drift.", + "source": "episodic-summary", + "score": 0.88, + "ts": "2026-03-29T19:00:00Z" + } + ], + "ttl_seconds": 30 +} +``` + +`cllama` then: + +- applies policy filtering +- formats the returned blocks into a bounded injected context block +- prepends that block into the outbound LLM request + +### 2. Retain + +Retain should be best-effort and should happen after a successful turn. + +Suggested request: + +```json +{ + "agent_id": "analyst-0", + "pod": "trading-desk", + "entry": { + "version": 1, + "ts": "2026-03-30T15:04:05Z", + "claw_id": "analyst-0", + "path": "/v1/chat/completions", + "requested_model": "anthropic/claude-sonnet-4", + "effective_provider": "anthropic", + "effective_model": "claude-sonnet-4", + "status_code": 200, + "stream": false, + "request_original": {}, + "request_effective": {}, + "response": {}, + "usage": {} + } +} +``` + +Notes: + +- The ledger remains the durable truth regardless of webhook outcome. +- The memory service may process this immediately or queue it internally. +- The retain contract deliberately reuses the normalized session-history entry rather than inventing a second event shape. + +### 3. Forget + +`forget` is optional and should be treated as a governance operation, not a normal runner capability. + +Suggested request: + +```json +{ + "agent_id": "analyst-0", + "scope": { + "from": "2026-03-01T00:00:00Z", + "to": "2026-03-30T00:00:00Z" + }, + "reason": "policy_redaction" +} +``` + +We should not overdesign this early. The important point is that the service contract leaves room for a governed deletion path. + +--- + +## Request Lifecycle in `cllama` + +The hot-path and non-hot-path behavior should be explicit. + +### Pre-turn recall path + +For every proxied inference request: + +1. resolve the agent identity as usual +2. load `memory.json` if present +3. if recall is enabled: + - build the bounded recall request + - call the memory service with a short timeout + - parse returned blocks + - apply policy filters + - inject the resulting memory block into the prompt +4. continue the normal upstream request flow + +Failure behavior: + +- timeout: continue without memory +- 5xx from memory service: continue without memory +- malformed response: continue without memory + +The memory plane should degrade gracefully. It must not become a single point of total inference failure by default. + +### Read-after-write semantics + +The memory plane should not promise strong read-after-write consistency for rapid-fire turns. + +That is acceptable because: + +- runner sessions already cover immediate recency +- the memory plane is for durable derived state, not for replacing the live conversation window +- the retain webhook is best-effort and asynchronous by design + +In practice this means: + +- the next turn may run recall before the previous turn has been fully processed by the backend +- the system remains correct because immediate continuity still comes from the runner session +- the memory plane improves medium- and long-range continuity, not single-turn echoing + +### Post-turn retain path + +After a successful upstream completion: + +1. write the normalized entry to the ledger as usual +2. if retain is enabled: + - dispatch a best-effort webhook to the memory service + - do not block the response already returning to the runner + +The retain webhook may fail silently except for observability and alerting. Recovery comes from the ledger. + +--- + +## What Counts As Real Memory Recall + +This is the product line we should draw explicitly. + +The recall layer is worthwhile when it returns: + +- durable facts +- user/operator preferences +- open loops and commitments +- prior decisions and rationale +- episodic summaries +- project state +- relevant older context outside the runner session window + +The recall layer is not yet worthwhile if it mostly returns: + +- the last few turns +- transcript tails that the runner already has +- an unprocessed dump of recent messages + +This distinction matters because it keeps the design honest. + +If the memory service is not adding meaningful abstraction over the live session window, it is not yet justifying hot-path latency. + +--- + +## Session Stitching + +Session stitching will come up quickly, but it should not be treated as a gating prerequisite. + +There are three levels: + +### Level 1: No stitching + +The service processes the full ledger keyed by agent identity and whatever surface metadata is already available. + +This is still useful for: + +- durable facts +- recurring preferences +- high-level commitments + +### Level 2: Soft stitching + +The service groups events by obvious metadata when it exists, such as: + +- DM peer +- thread ID +- channel ID +- task ID +- repo or project hints + +This is likely enough for early "resume where we left off" quality. + +### Level 3: Hard stitching + +The service infers continuity across fragmented contexts and restarts even when metadata is weak. + +This is valuable, but should remain a backend problem. + +Clawdapus does not need to solve stitching globally in order to provide a good memory plane. + +--- + +## Governance Model + +Memory traffic should be governable in both directions. + +### Retention governance + +Before retain webhook delivery, `cllama` may: + +- remove secrets +- redact known sensitive patterns +- suppress content classes from retention entirely + +### Recall governance + +Before reinjection, `cllama` may: + +- remove restricted content +- suppress blocks from disallowed sources +- cap categories or sizes +- redact content that now violates stricter policy than when it was originally retained + +### Forget governance + +The operator or a future Master Claw should be able to trigger targeted forgetting through a governed path. + +This is one of the strongest arguments for making memory a first-class infra surface rather than only a runner convenience. + +Forget must also be compatible with an append-only ledger. + +That implies a future forget implementation should likely include: + +- deletion in the external memory backend +- an infra-owned tombstone or redaction ledger +- backfill and replay logic that honors those tombstones and does not re-ingest forgotten material + +--- + +## Persistence Model + +The memory service itself is a normal compose service. + +That means its persistence model should be the same as other stateful pod services: + +- named volumes +- bind mounts +- external databases + +`claw up` and `claw down` should not destroy those stores unless the operator explicitly destroys them through normal container lifecycle actions. + +This is much cleaner than runner-local plugin stores because: + +- store lifetime is independent from the runner container +- one memory engine can serve many agents +- state can survive `CLAW_TYPE` migrations + +--- + +## Backfill And Replay + +Backfill should be treated as a first-class operation, not as an implementation detail. + +If a new memory backend is introduced after months of retained history already exist, the operator must be able to populate it from the ledger deterministically. + +The architecture should therefore assume a future explicit backfill path, likely involving: + +- a `cllama` history read API suitable for replay consumers +- a dedicated CLI flow such as `claw memory backfill` +- backend idempotency or replay markers so the same ledger can be consumed safely more than once + +The retain webhook is the low-latency path. Backfill is the durability path for new or recovering services. + +--- + +## Relationship to Runner-Native Memory + +Runners may continue to provide native memory tools or session systems. + +That is acceptable, but it should not be the infrastructure dependency. + +The intended architecture is: + +- runner-native session and short-term working memory remain local concerns +- infrastructure memory provides durable, governed, cross-session recall + +In practice, once an agent is behind `cllama` and subscribed to a memory service, many runner-native memory tools may become redundant. + +That redundancy is tolerable in principle but dangerous in practice. + +If the infrastructure plane injects memory context while the runner also injects its own memory context, the agent may see: + +- duplicate facts +- contradictory summaries +- repeated commitments +- different privacy or forgetting policies + +So the operational recommendation should be: + +- when using the infrastructure memory plane, operators should disable runner-native memory plugins or memory-search tools where practical +- Clawdapus should document that guidance clearly +- Clawdapus should not attempt to force-disable runner behavior generically across all runners + +Clawdapus should not attempt to disable runner-native memory features globally. It should provide a better shared path. + +--- + +## Recommended Phase Plan + +### Milestone 1: Complete ADR-018 Phase 2 and define backfill + +Add the self-scoped history read surface to `cllama`. + +Benefits: + +- memory services can consume normalized history through a stable proxy-owned interface +- backfill does not require filesystem coupling +- future operators and tools gain a consistent introspection surface +- replay becomes a first-class lifecycle rather than an implicit recovery hack + +This milestone should also define the expected operational backfill flow for new or recovering memory services. + +### Milestone 2: Add the memory capability and `cllama` hooks + +Implement: + +- descriptor extension for memory capability +- `x-claw.memory` +- pod defaults for memory +- `memory.json` compilation +- auth projection for memory services +- pre-turn recall call +- bounded injection +- post-turn retain webhook +- graceful degradation +- memory-specific observability events + +This is the first full end-to-end memory plane. + +### Milestone 3: Reference adapter and governance hardening + +Implement: + +- retain-side filtering +- recall-side filtering +- optional `forget` path +- tombstone-aware replay semantics +- alerting for repeated memory-service failures + +Provide a small baseline image, likely: + +- Go-based service +- durable SQLite or JSONL storage +- rolling summaries +- simple fact extraction +- simple BM25 or similarly boring local ranking +- no vendor-specific dependencies required to prove the contract + +This reference should be intentionally modest. The point is to validate the contract, not to define the state of the art. + +--- + +## Candidate File Map + +This is not a full implementation checklist, but it identifies the likely change surface. + +### Main repo + +- `internal/describe/descriptor.go` +- `internal/describe/registry.go` +- `internal/pod/types.go` +- `internal/pod/parser.go` +- `cmd/claw/compose_up.go` +- `internal/cllama/context.go` +- `internal/pod/compose_emit.go` +- `docs/CLLAMA_SPEC.md` +- a new ADR once the plan is accepted + +### cllama submodule + +- `cllama/internal/proxy/handler.go` +- `cllama/internal/agentctx/...` +- new memory manifest loader package or extension of existing context loading +- logging and audit additions + +--- + +## Open Questions + +These are important, but they should not block the core architecture. + +### 1. How much request context should recall receive? + +The proxy should send a fixed request shape with only simple numeric payload bounds such as: + +- last N messages +- max request bytes + +We should avoid a richer negotiated vocabulary here unless real implementations prove it necessary. + +### 2. Should recall responses support categories? + +Probably yes, eventually. + +Possible categories: + +- `profile` +- `commitment` +- `decision` +- `episode` +- `state` + +But the first version can simply accept opaque blocks with optional metadata. + +### 3. Should `cllama` cache recall results? + +Maybe, but not initially. + +Recall is more query-shaped than feeds. A poor cache may create incorrect reuse and hide backend problems. + +### 4. Should retain delivery be in-process async or delegated to a queue? + +For the first implementation, best-effort in-process dispatch is likely enough because the ledger is the real durability mechanism. + +### 5. Should the memory service read the ledger directly or through an API? + +Long-term, the stable read API is cleaner. + +Short-term, direct ledger reading may be acceptable for local prototypes. + +### 6. How should affect fit into the model? + +Affect is exactly the kind of advanced derived state that should remain backend-defined. + +Clawdapus should make it possible, not standardize it early. + +### 7. How should multi-agent sharing work? + +The first version should assume private per-agent recall by default. + +Shared or world memory should require explicit backend semantics and likely future policy controls for: + +- agent-private memory +- pod-shared memory +- operator-defined namespaces + +This is important, but not required to define the initial memory plane. + +### 8. What metadata can the proxy reliably provide for stitching? + +Today the proxy may not always have a canonical thread or session identifier across all runners and providers. + +The first version should therefore treat: + +- `agent_id` +- `pod` +- bounded recent messages +- whatever stable metadata is already present + +as the minimum recall input. + +Richer stitching metadata may require later surface-specific propagation through headers, request bodies, or runner config. + +--- + +## Non-Goals + +This plan does not propose: + +- replacing runner-native sessions +- collapsing portable memory into proxy-owned memory +- mandating one storage engine +- defining a canonical embedding model +- defining a canonical graph schema +- forcing all runners to adopt a common memory plugin +- making `cllama` itself a memory database +- exposing vendor-specific memory tools directly to agents by default + +--- + +## Decision Shape For A Future ADR + +If this plan is accepted, the future ADR should probably decide the following: + +1. Memory is a first-class Clawdapus plane with compile-time wiring. +2. `cllama` owns pre-turn recall orchestration and post-turn retain orchestration. +3. Session history remains the immutable ledger and source of truth. +4. Portable memory remains runner-owned and separate. +5. Memory intelligence lives in pluggable services, not in `cllama`. +6. Agents subscribe to one memory service relationship at a time. +7. Recall should optimize for derived durable state, not transcript tails. + +--- + +## Recommended Next Step + +The next document should likely be an ADR that: + +- cites ADR-018 and ADR-020 explicitly as prior art +- resolves the descriptor versioning question +- treats backfill as a first-class operation +- defines the fixed recall and retain wire contracts +- states clearly that the memory plane is for derived durable state, not transcript tails From 85141c7894f38ac875a671e09733d703f13dac05 Mon Sep 17 00:00:00 2001 From: Wojtek Date: Tue, 31 Mar 2026 17:38:59 -0400 Subject: [PATCH 05/18] Add descriptor v2 and pod grammar for tools and memory --- internal/describe/descriptor.go | 131 +++++++++++++++- internal/describe/descriptor_test.go | 104 ++++++++++++ internal/pod/parser.go | 158 +++++++++++++++++++ internal/pod/parser_capabilities_test.go | 191 +++++++++++++++++++++++ internal/pod/types.go | 12 ++ 5 files changed, 594 insertions(+), 2 deletions(-) create mode 100644 internal/pod/parser_capabilities_test.go diff --git a/internal/describe/descriptor.go b/internal/describe/descriptor.go index 4f65da6..a0a7aae 100644 --- a/internal/describe/descriptor.go +++ b/internal/describe/descriptor.go @@ -12,6 +12,8 @@ type ServiceDescriptor struct { Version int `json:"version"` Description string `json:"description,omitempty"` Feeds []FeedDescriptor `json:"feeds,omitempty"` + Tools []ToolDescriptor `json:"tools,omitempty"` + Memory *MemoryDescriptor `json:"memory,omitempty"` Endpoints []EndpointDescriptor `json:"endpoints,omitempty"` Auth *AuthDescriptor `json:"auth,omitempty"` Skill string `json:"skill,omitempty"` @@ -35,6 +37,30 @@ type AuthDescriptor struct { Env string `json:"env,omitempty"` } +type ToolDescriptor struct { + Name string `json:"name"` + Description string `json:"description"` + InputSchema map[string]interface{} `json:"inputSchema"` + HTTP *ToolHTTP `json:"http,omitempty"` + Annotations map[string]interface{} `json:"annotations,omitempty"` +} + +type ToolHTTP struct { + Method string `json:"method"` + Path string `json:"path"` + Body string `json:"body,omitempty"` +} + +type MemoryDescriptor struct { + Recall *MemoryEndpoint `json:"recall,omitempty"` + Retain *MemoryEndpoint `json:"retain,omitempty"` + Forget *MemoryEndpoint `json:"forget,omitempty"` +} + +type MemoryEndpoint struct { + Path string `json:"path"` +} + func Parse(data []byte) (*ServiceDescriptor, error) { var descriptor ServiceDescriptor if err := json.Unmarshal(data, &descriptor); err != nil { @@ -50,8 +76,10 @@ func (d *ServiceDescriptor) Validate() error { if d == nil { return nil } - if d.Version != 1 { - return fmt.Errorf("descriptor version must be 1, got %d", d.Version) + switch d.Version { + case 1, 2: + default: + return fmt.Errorf("descriptor version must be 1 or 2, got %d", d.Version) } seenFeeds := make(map[string]struct{}, len(d.Feeds)) @@ -94,6 +122,24 @@ func (d *ServiceDescriptor) Validate() error { } } + if d.Version == 1 { + if len(d.Tools) > 0 { + return fmt.Errorf("descriptor version 1 does not support tools") + } + if d.Memory != nil { + return fmt.Errorf("descriptor version 1 does not support memory") + } + } + + if d.Version >= 2 { + if err := validateTools(d.Tools); err != nil { + return err + } + if err := validateMemory(d.Memory); err != nil { + return err + } + } + if d.Auth != nil { d.Auth.Type = strings.ToLower(strings.TrimSpace(d.Auth.Type)) d.Auth.Env = strings.TrimSpace(d.Auth.Env) @@ -108,3 +154,84 @@ func (d *ServiceDescriptor) Validate() error { d.Skill = strings.TrimSpace(d.Skill) return nil } + +func validateTools(tools []ToolDescriptor) error { + seenTools := make(map[string]struct{}, len(tools)) + for i := range tools { + tool := &tools[i] + tool.Name = strings.TrimSpace(tool.Name) + tool.Description = strings.TrimSpace(tool.Description) + if tool.Name == "" { + return fmt.Errorf("tools[%d]: name is required", i) + } + if _, exists := seenTools[tool.Name]; exists { + return fmt.Errorf("tools[%d]: duplicate tool name %q", i, tool.Name) + } + seenTools[tool.Name] = struct{}{} + if tool.Description == "" { + return fmt.Errorf("tools[%d]: description is required", i) + } + if tool.InputSchema == nil { + return fmt.Errorf("tools[%d]: inputSchema is required", i) + } + schemaType, ok := tool.InputSchema["type"].(string) + if !ok || strings.TrimSpace(schemaType) == "" { + return fmt.Errorf("tools[%d]: inputSchema.type must be \"object\"", i) + } + if strings.ToLower(strings.TrimSpace(schemaType)) != "object" { + return fmt.Errorf("tools[%d]: inputSchema.type must be \"object\"", i) + } + if tool.HTTP == nil { + continue + } + tool.HTTP.Method = strings.ToUpper(strings.TrimSpace(tool.HTTP.Method)) + tool.HTTP.Path = strings.TrimSpace(tool.HTTP.Path) + tool.HTTP.Body = strings.ToLower(strings.TrimSpace(tool.HTTP.Body)) + switch tool.HTTP.Method { + case "GET", "POST", "PUT", "PATCH", "DELETE": + default: + return fmt.Errorf("tools[%d]: http.method %q is unsupported", i, tool.HTTP.Method) + } + if tool.HTTP.Path == "" { + return fmt.Errorf("tools[%d]: http.path is required", i) + } + if !strings.HasPrefix(tool.HTTP.Path, "/") { + return fmt.Errorf("tools[%d]: http.path %q must start with '/'", i, tool.HTTP.Path) + } + switch tool.HTTP.Body { + case "", "json": + default: + return fmt.Errorf("tools[%d]: http.body %q is unsupported", i, tool.HTTP.Body) + } + } + return nil +} + +func validateMemory(memory *MemoryDescriptor) error { + if memory == nil { + return nil + } + if memory.Recall == nil && memory.Retain == nil && memory.Forget == nil { + return fmt.Errorf("memory: at least one of recall or retain must be declared") + } + if memory.Recall == nil && memory.Retain == nil { + return fmt.Errorf("memory: at least one of recall or retain must be declared") + } + for name, endpoint := range map[string]*MemoryEndpoint{ + "recall": memory.Recall, + "retain": memory.Retain, + "forget": memory.Forget, + } { + if endpoint == nil { + continue + } + endpoint.Path = strings.TrimSpace(endpoint.Path) + if endpoint.Path == "" { + return fmt.Errorf("memory.%s.path is required", name) + } + if !strings.HasPrefix(endpoint.Path, "/") { + return fmt.Errorf("memory.%s.path %q must start with '/'", name, endpoint.Path) + } + } + return nil +} diff --git a/internal/describe/descriptor_test.go b/internal/describe/descriptor_test.go index 37ffced..e354235 100644 --- a/internal/describe/descriptor_test.go +++ b/internal/describe/descriptor_test.go @@ -39,3 +39,107 @@ func TestBuildFeedRegistryRejectsCollisions(t *testing.T) { t.Fatal("expected duplicate feed name error") } } + +func TestParseDescriptorV2SupportsToolsAndMemory(t *testing.T) { + data := []byte(`{ + "version": 2, + "description": " Capability service ", + "tools": [{ + "name": " search_memory ", + "description": " Search memory ", + "inputSchema": {"type": "object", "properties": {"query": {"type": "string"}}}, + "http": {"method": "post", "path": " /recall ", "body": "JSON"} + }], + "memory": { + "recall": {"path": " /recall "}, + "retain": {"path": "/retain"} + }, + "auth": {"type": "bearer", "env": "MEMORY_TOKEN"} + }`) + + descriptor, err := Parse(data) + if err != nil { + t.Fatalf("Parse: %v", err) + } + if descriptor.Version != 2 { + t.Fatalf("expected version 2, got %d", descriptor.Version) + } + if descriptor.Description != "Capability service" { + t.Fatalf("expected trimmed description, got %q", descriptor.Description) + } + if got := descriptor.Tools[0].Name; got != "search_memory" { + t.Fatalf("expected trimmed tool name, got %q", got) + } + if got := descriptor.Tools[0].HTTP.Method; got != "POST" { + t.Fatalf("expected normalized http method, got %q", got) + } + if got := descriptor.Tools[0].HTTP.Path; got != "/recall" { + t.Fatalf("expected trimmed http path, got %q", got) + } + if got := descriptor.Tools[0].HTTP.Body; got != "json" { + t.Fatalf("expected normalized http body, got %q", got) + } + if got := descriptor.Memory.Recall.Path; got != "/recall" { + t.Fatalf("expected trimmed recall path, got %q", got) + } +} + +func TestParseDescriptorRejectsVersionOneToolsAndMemory(t *testing.T) { + tests := []struct { + name string + data string + }{ + { + name: "tools", + data: `{"version":1,"tools":[{"name":"lookup","description":"Lookup","inputSchema":{"type":"object"}}]}`, + }, + { + name: "memory", + data: `{"version":1,"memory":{"recall":{"path":"/recall"}}}`, + }, + } + + for _, tc := range tests { + t.Run(tc.name, func(t *testing.T) { + if _, err := Parse([]byte(tc.data)); err == nil { + t.Fatal("expected validation error") + } + }) + } +} + +func TestParseDescriptorRejectsInvalidV2CapabilityShape(t *testing.T) { + tests := []struct { + name string + data string + }{ + { + name: "tool schema type", + data: `{"version":2,"tools":[{"name":"lookup","description":"Lookup","inputSchema":{"type":"array"}}]}`, + }, + { + name: "duplicate tool names", + data: `{"version":2,"tools":[{"name":"lookup","description":"Lookup","inputSchema":{"type":"object"}},{"name":"lookup","description":"Lookup again","inputSchema":{"type":"object"}}]}`, + }, + { + name: "invalid http method", + data: `{"version":2,"tools":[{"name":"lookup","description":"Lookup","inputSchema":{"type":"object"},"http":{"method":"trace","path":"/lookup"}}]}`, + }, + { + name: "memory forget only", + data: `{"version":2,"memory":{"forget":{"path":"/forget"}}}`, + }, + { + name: "memory empty path", + data: `{"version":2,"memory":{"recall":{"path":" "}}}`, + }, + } + + for _, tc := range tests { + t.Run(tc.name, func(t *testing.T) { + if _, err := Parse([]byte(tc.data)); err == nil { + t.Fatal("expected validation error") + } + }) + } +} diff --git a/internal/pod/parser.go b/internal/pod/parser.go index 43b864a..877b5a8 100644 --- a/internal/pod/parser.go +++ b/internal/pod/parser.go @@ -63,6 +63,8 @@ type rawClawBlock struct { Count int `yaml:"count"` Handles map[string]interface{} `yaml:"handles"` Feeds []rawFeedEntry `yaml:"feeds"` + Tools []rawToolPolicyEntry `yaml:"tools"` + Memory *rawMemoryEntry `yaml:"memory"` Include []rawIncludeEntry `yaml:"include"` Surfaces []interface{} `yaml:"surfaces"` Skills []string `yaml:"skills"` @@ -79,6 +81,16 @@ type rawFeedEntry struct { Unresolved bool `yaml:"-"` } +type rawToolPolicyEntry struct { + Service string `yaml:"service"` + Allow interface{} `yaml:"allow"` +} + +type rawMemoryEntry struct { + Service string `yaml:"service"` + TimeoutMS *int `yaml:"timeout-ms"` +} + type rawIncludeEntry struct { ID string `yaml:"id"` File string `yaml:"file"` @@ -211,6 +223,14 @@ func Parse(r io.Reader) (*Pod, error) { if err != nil { return nil, fmt.Errorf("service %q: parse feeds: %w", name, err) } + tools, err := parseTools(svc.XClaw.Tools) + if err != nil { + return nil, fmt.Errorf("service %q: parse tools: %w", name, err) + } + memory, err := parseMemory(svc.XClaw.Memory) + if err != nil { + return nil, fmt.Errorf("service %q: parse memory: %w", name, err) + } invoke := make([]InvokeEntry, 0, len(svc.XClaw.Invoke)) for _, rawInv := range svc.XClaw.Invoke { if rawInv.Schedule == "" || rawInv.Message == "" { @@ -235,6 +255,8 @@ func Parse(r io.Reader) (*Pod, error) { Count: count, Handles: handles, Feeds: feeds, + Tools: tools, + Memory: memory, Include: include, Surfaces: parsedSurfaces, Skills: skills, @@ -419,6 +441,99 @@ func parseFeeds(rawFeeds []rawFeedEntry) ([]FeedEntry, error) { return out, nil } +func parseTools(rawTools []rawToolPolicyEntry) ([]ToolPolicyEntry, error) { + if len(rawTools) == 0 { + return nil, nil + } + + out := make([]ToolPolicyEntry, 0, len(rawTools)) + for i, raw := range rawTools { + service := strings.TrimSpace(raw.Service) + if service == "" { + return nil, fmt.Errorf("tool policy %d: service is required", i) + } + allow, err := parseToolAllow(raw.Allow) + if err != nil { + return nil, fmt.Errorf("tool policy %d: %w", i, err) + } + out = append(out, ToolPolicyEntry{ + Service: service, + Allow: allow, + }) + } + return out, nil +} + +func parseToolAllow(raw interface{}) ([]string, error) { + if raw == nil { + return []string{"all"}, nil + } + + var allow []string + switch v := raw.(type) { + case string: + allow = []string{v} + case []string: + allow = append([]string(nil), v...) + case []interface{}: + allow = make([]string, 0, len(v)) + for i, item := range v { + s, ok := item.(string) + if !ok { + return nil, fmt.Errorf("allow[%d] must be a string, got %T", i, item) + } + allow = append(allow, s) + } + default: + return nil, fmt.Errorf("allow must be a string or list, got %T", raw) + } + + if len(allow) == 0 { + return nil, fmt.Errorf("allow must not be empty") + } + + normalized := make([]string, 0, len(allow)) + hasAll := false + for i, item := range allow { + item = strings.TrimSpace(item) + if item == "" { + return nil, fmt.Errorf("allow[%d] must not be empty", i) + } + if item == "all" { + hasAll = true + } + normalized = append(normalized, item) + } + if hasAll && len(normalized) > 1 { + return nil, fmt.Errorf(`allow "all" must not be combined with named tools`) + } + return normalized, nil +} + +func parseMemory(raw *rawMemoryEntry) (*MemoryEntry, error) { + if raw == nil { + return nil, nil + } + + service := strings.TrimSpace(raw.Service) + if service == "" { + return nil, fmt.Errorf("service is required") + } + + timeoutMS := 300 + if raw.TimeoutMS != nil { + if *raw.TimeoutMS <= 0 { + return nil, fmt.Errorf("timeout-ms must be > 0") + } + timeoutMS = *raw.TimeoutMS + } + + return &MemoryEntry{ + Service: service, + TimeoutMS: timeoutMS, + }, nil +} + func (r *rawFeedEntry) UnmarshalYAML(node *yaml.Node) error { switch node.Kind { case yaml.ScalarNode: @@ -442,6 +557,28 @@ func (r *rawFeedEntry) UnmarshalYAML(node *yaml.Node) error { } } +func (r *rawToolPolicyEntry) UnmarshalYAML(node *yaml.Node) error { + switch node.Kind { + case yaml.ScalarNode: + if strings.TrimSpace(node.Value) == "" { + return fmt.Errorf("tool service must not be empty") + } + r.Service = strings.TrimSpace(node.Value) + r.Allow = nil + return nil + case yaml.MappingNode: + type alias rawToolPolicyEntry + var parsed alias + if err := node.Decode(&parsed); err != nil { + return err + } + *r = rawToolPolicyEntry(parsed) + return nil + default: + return fmt.Errorf("tool entry must be a string or mapping, got %s", yamlKindString(node.Kind)) + } +} + func expandPodDefaults(root map[string]interface{}) error { if len(root) == 0 { return nil @@ -459,6 +596,8 @@ func expandPodDefaults(root map[string]interface{}) error { CllamaDefaults: deepCopyMapOrNil(rawXClaw["cllama-defaults"]), SurfacesDefaults: deepCopySliceOrNil(rawXClaw["surfaces-defaults"]), FeedsDefaults: deepCopySliceOrNil(rawXClaw["feeds-defaults"]), + ToolsDefaults: deepCopySliceOrNil(rawXClaw["tools-defaults"]), + MemoryDefaults: deepCopyMapOrNil(rawXClaw["memory-defaults"]), SkillsDefaults: deepCopySliceOrNil(rawXClaw["skills-defaults"]), } @@ -499,6 +638,8 @@ type podDefaults struct { CllamaDefaults map[string]interface{} SurfacesDefaults []interface{} FeedsDefaults []interface{} + ToolsDefaults []interface{} + MemoryDefaults map[string]interface{} SkillsDefaults []interface{} } @@ -512,6 +653,12 @@ func applyRawPodDefaults(raw map[string]interface{}, defaults podDefaults) error if err := applyRawListDefaults(raw, "feeds", defaults.FeedsDefaults); err != nil { return err } + if err := applyRawListDefaults(raw, "tools", defaults.ToolsDefaults); err != nil { + return err + } + if err := applyRawObjectDefault(raw, "memory", defaults.MemoryDefaults); err != nil { + return err + } if err := applyRawListDefaults(raw, "skills", defaults.SkillsDefaults); err != nil { return err } @@ -589,6 +736,17 @@ func applyRawListDefaults(raw map[string]interface{}, key string, defaults []int return nil } +func applyRawObjectDefault(raw map[string]interface{}, key string, defaults map[string]interface{}) error { + if len(defaults) == 0 { + return nil + } + if _, present := raw[key]; present { + return nil + } + raw[key] = deepCopyMap(defaults) + return nil +} + func deepCopyMapOrNil(raw interface{}) map[string]interface{} { m, err := mapStringAny(raw) if err != nil || m == nil { diff --git a/internal/pod/parser_capabilities_test.go b/internal/pod/parser_capabilities_test.go new file mode 100644 index 0000000..601d038 --- /dev/null +++ b/internal/pod/parser_capabilities_test.go @@ -0,0 +1,191 @@ +package pod + +import ( + "strings" + "testing" +) + +func TestParsePodExtractsToolsAndMemory(t *testing.T) { + const yaml = ` +x-claw: + pod: capabilities-pod + +services: + analyst: + image: analyst:latest + x-claw: + agent: ./AGENTS.md + tools: + - trading-api + - service: analytics + allow: + - get_summary + - get_report + memory: + service: team-memory + timeout-ms: 450 +` + + pod, err := Parse(strings.NewReader(yaml)) + if err != nil { + t.Fatalf("Parse: %v", err) + } + + analyst := pod.Services["analyst"] + if analyst == nil || analyst.Claw == nil { + t.Fatal("expected analyst claw service") + } + if len(analyst.Claw.Tools) != 2 { + t.Fatalf("expected 2 tool policies, got %+v", analyst.Claw.Tools) + } + if analyst.Claw.Tools[0].Service != "trading-api" { + t.Fatalf("expected scalar shorthand service, got %+v", analyst.Claw.Tools[0]) + } + if len(analyst.Claw.Tools[0].Allow) != 1 || analyst.Claw.Tools[0].Allow[0] != "all" { + t.Fatalf("expected scalar shorthand to default to allow all, got %+v", analyst.Claw.Tools[0]) + } + if analyst.Claw.Tools[1].Service != "analytics" { + t.Fatalf("expected second tool policy service analytics, got %+v", analyst.Claw.Tools[1]) + } + if len(analyst.Claw.Tools[1].Allow) != 2 || analyst.Claw.Tools[1].Allow[0] != "get_summary" || analyst.Claw.Tools[1].Allow[1] != "get_report" { + t.Fatalf("unexpected allow list: %+v", analyst.Claw.Tools[1]) + } + if analyst.Claw.Memory == nil { + t.Fatal("expected memory entry") + } + if analyst.Claw.Memory.Service != "team-memory" || analyst.Claw.Memory.TimeoutMS != 450 { + t.Fatalf("unexpected memory entry: %+v", analyst.Claw.Memory) + } +} + +func TestParsePodDefaultsInheritAndReplaceCapabilityConfig(t *testing.T) { + const yaml = ` +x-claw: + pod: defaults-pod + tools-defaults: + - service: trading-api + memory-defaults: + service: team-memory + timeout-ms: 275 + +services: + worker: + image: worker:latest + x-claw: + agent: ./AGENTS.md + analyst: + image: analyst:latest + x-claw: + agent: ./AGENTS.md + tools: + - ... + - service: analytics + allow: + - get_summary + memory: + service: special-memory + reviewer: + image: reviewer:latest + x-claw: + agent: ./AGENTS.md + tools: [] + memory: null +` + + pod, err := Parse(strings.NewReader(yaml)) + if err != nil { + t.Fatalf("Parse: %v", err) + } + + worker := pod.Services["worker"].Claw + if len(worker.Tools) != 1 || worker.Tools[0].Service != "trading-api" || worker.Tools[0].Allow[0] != "all" { + t.Fatalf("expected inherited tool defaults, got %+v", worker.Tools) + } + if worker.Memory == nil || worker.Memory.Service != "team-memory" || worker.Memory.TimeoutMS != 275 { + t.Fatalf("expected inherited memory defaults, got %+v", worker.Memory) + } + + analyst := pod.Services["analyst"].Claw + if len(analyst.Tools) != 2 { + t.Fatalf("expected spread-expanded tools, got %+v", analyst.Tools) + } + if analyst.Tools[0].Service != "trading-api" || analyst.Tools[1].Service != "analytics" { + t.Fatalf("unexpected tool order after spread: %+v", analyst.Tools) + } + if analyst.Memory == nil || analyst.Memory.Service != "special-memory" { + t.Fatalf("expected replaced memory service, got %+v", analyst.Memory) + } + if analyst.Memory.TimeoutMS != 300 { + t.Fatalf("expected service-level memory replacement to use entry default timeout, got %+v", analyst.Memory) + } + + reviewer := pod.Services["reviewer"].Claw + if len(reviewer.Tools) != 0 { + t.Fatalf("expected explicit empty tools to suppress defaults, got %+v", reviewer.Tools) + } + if reviewer.Memory != nil { + t.Fatalf("expected null memory to suppress defaults, got %+v", reviewer.Memory) + } +} + +func TestParsePodRejectsInvalidCapabilityConfig(t *testing.T) { + tests := []struct { + name string + yaml string + }{ + { + name: "tool allow all mixed with names", + yaml: ` +x-claw: + pod: invalid-pod +services: + analyst: + image: analyst:latest + x-claw: + agent: ./AGENTS.md + tools: + - service: trading-api + allow: + - all + - get_report +`, + }, + { + name: "memory timeout invalid", + yaml: ` +x-claw: + pod: invalid-pod +services: + analyst: + image: analyst:latest + x-claw: + agent: ./AGENTS.md + memory: + service: team-memory + timeout-ms: 0 +`, + }, + { + name: "tools spread without defaults", + yaml: ` +x-claw: + pod: invalid-pod +services: + analyst: + image: analyst:latest + x-claw: + agent: ./AGENTS.md + tools: + - ... +`, + }, + } + + for _, tc := range tests { + t.Run(tc.name, func(t *testing.T) { + if _, err := Parse(strings.NewReader(tc.yaml)); err == nil { + t.Fatal("expected validation error") + } + }) + } +} diff --git a/internal/pod/types.go b/internal/pod/types.go index edcf2e7..6b3ec73 100644 --- a/internal/pod/types.go +++ b/internal/pod/types.go @@ -44,6 +44,8 @@ type ClawBlock struct { Count int Handles map[string]*driver.HandleInfo // platform → contact card Feeds []FeedEntry + Tools []ToolPolicyEntry + Memory *MemoryEntry Include []IncludeEntry Surfaces []driver.ResolvedSurface Skills []string @@ -71,6 +73,16 @@ type FeedEntry struct { Unresolved bool } +type ToolPolicyEntry struct { + Service string + Allow []string +} + +type MemoryEntry struct { + Service string + TimeoutMS int +} + type IncludeEntry struct { ID string File string From 79da997d91ce8c4059c80cc7adef1d9c5aa694b6 Mon Sep 17 00:00:00 2001 From: Wojtek Date: Tue, 31 Mar 2026 17:45:21 -0400 Subject: [PATCH 06/18] Remove redundant nil check in validateMemory The three-field nil check (recall, retain, forget all nil) was a strict subset of the two-field check (recall and retain both nil) that immediately followed it. The second check already covers the forget-only case correctly. --- internal/describe/descriptor.go | 3 --- 1 file changed, 3 deletions(-) diff --git a/internal/describe/descriptor.go b/internal/describe/descriptor.go index a0a7aae..4b13d40 100644 --- a/internal/describe/descriptor.go +++ b/internal/describe/descriptor.go @@ -211,9 +211,6 @@ func validateMemory(memory *MemoryDescriptor) error { if memory == nil { return nil } - if memory.Recall == nil && memory.Retain == nil && memory.Forget == nil { - return fmt.Errorf("memory: at least one of recall or retain must be declared") - } if memory.Recall == nil && memory.Retain == nil { return fmt.Errorf("memory: at least one of recall or retain must be declared") } From 7989de55cb120651caeb0a48dc880b1143db79b9 Mon Sep 17 00:00:00 2001 From: Wojtek Date: Tue, 31 Mar 2026 17:46:45 -0400 Subject: [PATCH 07/18] Compile tool and memory manifests into cllama context --- cmd/claw/compose_up.go | 280 ++++++++++++++++++++++++++++- cmd/claw/compose_up_test.go | 214 ++++++++++++++++++++++ internal/cllama/context.go | 83 ++++++++- internal/cllama/context_test.go | 54 ++++++ internal/describe/registry.go | 39 ++++ internal/describe/registry_test.go | 48 +++++ 6 files changed, 709 insertions(+), 9 deletions(-) create mode 100644 internal/describe/registry_test.go diff --git a/cmd/claw/compose_up.go b/cmd/claw/compose_up.go index f858536..dd9bc12 100644 --- a/cmd/claw/compose_up.go +++ b/cmd/claw/compose_up.go @@ -348,9 +348,21 @@ func runComposeUp(podFile string) error { if err != nil { return fmt.Errorf("build feed registry: %w", err) } + toolRegistry, err := describe.BuildToolRegistry(serviceDescriptors) + if err != nil { + return fmt.Errorf("build tool registry: %w", err) + } if err := resolveFeedSubscriptions(p, feedRegistry); err != nil { return err } + resolvedTools, err := resolveToolSubscriptions(p, toolRegistry) + if err != nil { + return err + } + resolvedMemory, err := resolveMemorySubscriptions(p, serviceDescriptors) + if err != nil { + return err + } for name, rc := range resolvedClaws { svc := p.Services[name] if svc == nil || svc.Claw == nil { @@ -475,12 +487,22 @@ func runComposeUp(podFile string) error { if err != nil { return fmt.Errorf("service %q: build feed manifest: %w", ordinalName, err) } - md := shared.GenerateClawdapusMD(&ordinalRC, p.Name) + tools, err := buildToolManifestEntries(p, serviceDescriptors, runtimeEnv, name, resolvedTools[name], ordinalAuth) + if err != nil { + return fmt.Errorf("service %q: build tool manifest: %w", ordinalName, err) + } + memory, err := buildMemoryManifestEntry(p, serviceDescriptors, runtimeEnv, name, resolvedMemory[name], ordinalAuth) + if err != nil { + return fmt.Errorf("service %q: build memory manifest: %w", ordinalName, err) + } + md := augmentClawdapusMD(shared.GenerateClawdapusMD(&ordinalRC, p.Name), tools, memory) contextInputs = append(contextInputs, cllama.AgentContextInput{ AgentID: ordinalName, AgentsMD: string(agentContent), ClawdapusMD: md, Feeds: feeds, + Tools: tools, + Memory: memory, ServiceAuth: ordinalAuth, Metadata: cllama.InjectCompiledModelPolicy(map[string]any{ "service": name, @@ -500,12 +522,22 @@ func runComposeUp(podFile string) error { if err != nil { return fmt.Errorf("service %q: build feed manifest: %w", name, err) } - md := shared.GenerateClawdapusMD(rc, p.Name) + tools, err := buildToolManifestEntries(p, serviceDescriptors, runtimeEnv, name, resolvedTools[name], svcAuth) + if err != nil { + return fmt.Errorf("service %q: build tool manifest: %w", name, err) + } + memory, err := buildMemoryManifestEntry(p, serviceDescriptors, runtimeEnv, name, resolvedMemory[name], svcAuth) + if err != nil { + return fmt.Errorf("service %q: build memory manifest: %w", name, err) + } + md := augmentClawdapusMD(shared.GenerateClawdapusMD(rc, p.Name), tools, memory) contextInputs = append(contextInputs, cllama.AgentContextInput{ AgentID: name, AgentsMD: string(agentContent), ClawdapusMD: md, Feeds: feeds, + Tools: tools, + Memory: memory, ServiceAuth: svcAuth, Metadata: cllama.InjectCompiledModelPolicy(map[string]any{ "service": name, @@ -925,6 +957,92 @@ func discordHandleIDsFromPod(p *pod.Pod) []string { return uniqueSortedStrings(ids) } +type resolvedMemorySubscription struct { + Service string + Config *pod.MemoryEntry +} + +func resolveToolSubscriptions(p *pod.Pod, registry describe.ToolRegistry) (map[string][]describe.ToolSpec, error) { + if p == nil { + return nil, nil + } + + resolved := make(map[string][]describe.ToolSpec) + for serviceName, svc := range p.Services { + if svc == nil || svc.Claw == nil || len(svc.Claw.Tools) == 0 { + continue + } + + selected := make([]describe.ToolSpec, 0) + seen := make(map[string]struct{}) + for i, policy := range svc.Claw.Tools { + specs, ok := registry[policy.Service] + if !ok { + return nil, fmt.Errorf("service %q: tool policy %d references unknown tool service %q", serviceName, i, policy.Service) + } + byName := make(map[string]describe.ToolSpec, len(specs)) + for _, spec := range specs { + byName[spec.Name] = spec + } + + if len(policy.Allow) == 1 && policy.Allow[0] == "all" { + for _, spec := range specs { + key := spec.Service + "." + spec.Name + if _, exists := seen[key]; exists { + continue + } + seen[key] = struct{}{} + selected = append(selected, spec) + } + continue + } + + for _, toolName := range policy.Allow { + spec, ok := byName[toolName] + if !ok { + return nil, fmt.Errorf("service %q: tool policy for %q references unknown tool %q", serviceName, policy.Service, toolName) + } + key := spec.Service + "." + spec.Name + if _, exists := seen[key]; exists { + continue + } + seen[key] = struct{}{} + selected = append(selected, spec) + } + } + if len(selected) > 0 { + resolved[serviceName] = selected + } + } + return resolved, nil +} + +func resolveMemorySubscriptions(p *pod.Pod, descriptors map[string]*describe.ServiceDescriptor) (map[string]*resolvedMemorySubscription, error) { + if p == nil { + return nil, nil + } + + resolved := make(map[string]*resolvedMemorySubscription) + for serviceName, svc := range p.Services { + if svc == nil || svc.Claw == nil || svc.Claw.Memory == nil { + continue + } + target := svc.Claw.Memory.Service + descriptor := descriptors[target] + if descriptor == nil { + return nil, fmt.Errorf("service %q: memory target %q has no descriptor", serviceName, target) + } + if descriptor.Memory == nil { + return nil, fmt.Errorf("service %q: memory target %q does not declare a memory capability", serviceName, target) + } + resolved[serviceName] = &resolvedMemorySubscription{ + Service: target, + Config: svc.Claw.Memory, + } + } + return resolved, nil +} + func buildFeedManifestEntries(p *pod.Pod, descriptors map[string]*describe.ServiceDescriptor, runtimeEnv map[string]string, serviceName string, clawID string, serviceAuth []cllama.ServiceAuthEntry) ([]cllama.FeedManifestEntry, error) { svc := p.Services[serviceName] if svc == nil || svc.Claw == nil || len(svc.Claw.Feeds) == 0 { @@ -974,6 +1092,84 @@ func buildFeedManifestEntries(p *pod.Pod, descriptors map[string]*describe.Servi return entries, nil } +func buildToolManifestEntries(p *pod.Pod, descriptors map[string]*describe.ServiceDescriptor, runtimeEnv map[string]string, serviceName string, tools []describe.ToolSpec, serviceAuth []cllama.ServiceAuthEntry) ([]cllama.ToolManifestEntry, error) { + svc := p.Services[serviceName] + if svc == nil || svc.Claw == nil || len(tools) == 0 { + return nil, nil + } + + entries := make([]cllama.ToolManifestEntry, 0, len(tools)) + for _, tool := range tools { + if tool.HTTP == nil { + return nil, fmt.Errorf("tool %q from %q has no HTTP execution metadata", tool.Name, tool.Service) + } + baseURL, err := resolveServiceBaseURL(p, tool.Service) + if err != nil { + return nil, fmt.Errorf("tool %q: %w", tool.Name, err) + } + auth, err := resolveManifestAuth(svc.Environment, descriptors[tool.Service], runtimeEnv, serviceAuth, tool.Service) + if err != nil { + return nil, fmt.Errorf("tool %q: %w", tool.Name, err) + } + entries = append(entries, cllama.ToolManifestEntry{ + Name: tool.Service + "." + tool.Name, + Description: tool.Description, + InputSchema: tool.InputSchema, + Annotations: tool.Annotations, + Execution: cllama.ToolExecution{ + Transport: "http", + Service: tool.Service, + BaseURL: baseURL, + Method: tool.HTTP.Method, + Path: tool.HTTP.Path, + Auth: auth, + }, + }) + } + return entries, nil +} + +func buildMemoryManifestEntry(p *pod.Pod, descriptors map[string]*describe.ServiceDescriptor, runtimeEnv map[string]string, serviceName string, memory *resolvedMemorySubscription, serviceAuth []cllama.ServiceAuthEntry) (*cllama.MemoryManifestEntry, error) { + svc := p.Services[serviceName] + if svc == nil || svc.Claw == nil || memory == nil { + return nil, nil + } + + descriptor := descriptors[memory.Service] + if descriptor == nil || descriptor.Memory == nil { + return nil, fmt.Errorf("memory target %q has no descriptor memory capability", memory.Service) + } + + baseURL, err := resolveServiceBaseURL(p, memory.Service) + if err != nil { + return nil, err + } + auth, err := resolveManifestAuth(svc.Environment, descriptor, runtimeEnv, serviceAuth, memory.Service) + if err != nil { + return nil, err + } + + entry := &cllama.MemoryManifestEntry{ + Version: 1, + Service: memory.Service, + BaseURL: baseURL, + Auth: auth, + } + if descriptor.Memory.Recall != nil { + entry.Recall = &cllama.MemoryOp{ + Path: descriptor.Memory.Recall.Path, + TimeoutMS: memory.Config.TimeoutMS, + } + } + if descriptor.Memory.Retain != nil { + entry.Retain = &cllama.MemoryOp{Path: descriptor.Memory.Retain.Path} + } + if descriptor.Memory.Forget != nil { + entry.Forget = &cllama.MemoryOp{Path: descriptor.Memory.Forget.Path} + } + return entry, nil +} + func resolveFeedAuthFromServiceEnv(env map[string]string, descriptor *describe.ServiceDescriptor, runtimeEnv map[string]string) (string, error) { if descriptor == nil || descriptor.Auth == nil || descriptor.Auth.Type != "bearer" || descriptor.Auth.Env == "" { return "", nil @@ -993,9 +1189,32 @@ func resolveFeedAuthFromServiceEnv(env map[string]string, descriptor *describe.S return expanded, nil } -func resolveFeedURL(p *pod.Pod, source string, feedPath string) (string, error) { +func resolveManifestAuth(env map[string]string, descriptor *describe.ServiceDescriptor, runtimeEnv map[string]string, projected []cllama.ServiceAuthEntry, targetService string) (*cllama.AuthEntry, error) { + for _, entry := range projected { + if entry.Service == targetService && entry.AuthType == "bearer" && entry.Token != "" { + return &cllama.AuthEntry{ + Type: "bearer", + Token: entry.Token, + }, nil + } + } + + token, err := resolveFeedAuthFromServiceEnv(env, descriptor, runtimeEnv) + if err != nil { + return nil, err + } + if token == "" { + return nil, nil + } + return &cllama.AuthEntry{ + Type: "bearer", + Token: token, + }, nil +} + +func resolveServiceBaseURL(p *pod.Pod, source string) (string, error) { if p.ClawAPI != nil && source == "claw-api" { - return buildFeedURL(source, clawAPIInternalPort(p.ClawAPI.Addr), feedPath), nil + return buildBaseURL(source, clawAPIInternalPort(p.ClawAPI.Addr)), nil } svc, ok := p.Services[source] if !ok { @@ -1008,11 +1227,56 @@ func resolveFeedURL(p *pod.Pod, source string, feedPath string) (string, error) port = strings.TrimSpace(ports[0]) } } - return buildFeedURL(source, port, feedPath), nil + return buildBaseURL(source, port), nil +} + +func resolveFeedURL(p *pod.Pod, source string, feedPath string) (string, error) { + baseURL, err := resolveServiceBaseURL(p, source) + if err != nil { + return "", err + } + return buildFeedURL(baseURL, feedPath), nil +} + +func buildBaseURL(source, port string) string { + return fmt.Sprintf("http://%s:%s", source, port) +} + +func buildFeedURL(baseURL, feedPath string) string { + return baseURL + feedPath } -func buildFeedURL(source, port, feedPath string) string { - return fmt.Sprintf("http://%s:%s%s", source, port, feedPath) +func augmentClawdapusMD(base string, tools []cllama.ToolManifestEntry, memory *cllama.MemoryManifestEntry) string { + if len(tools) == 0 && memory == nil { + return base + } + + var b strings.Builder + b.WriteString(base) + if len(tools) > 0 { + b.WriteString("## Tools\n\n") + b.WriteString("Managed service tools are compiled into your proxy context.\n\n") + for _, tool := range tools { + line := fmt.Sprintf("- `%s`", tool.Name) + if tool.Description != "" { + line += fmt.Sprintf(" — %s", tool.Description) + } + b.WriteString(line + "\n") + } + b.WriteString("\n") + } + if memory != nil { + b.WriteString("## Ambient Memory\n\n") + b.WriteString(fmt.Sprintf("- **Service:** `%s`\n", memory.Service)) + if memory.Recall != nil { + b.WriteString("- **Recall:** active before each inference turn\n") + } + if memory.Retain != nil { + b.WriteString("- **Retention:** post-turn retention hook compiled\n") + } + b.WriteString("\n") + } + return b.String() } type conversationWallTokenPair struct { @@ -2750,7 +3014,7 @@ func buildServiceSurfaceInfo(descriptor *describe.ServiceDescriptor) *driver.Ser info.AuthType = descriptor.Auth.Type info.AuthEnv = descriptor.Auth.Env } - if len(descriptor.Endpoints) > 0 { + if len(descriptor.Endpoints) > 0 && len(descriptor.Tools) == 0 { info.Endpoints = make([]driver.ServiceEndpoint, 0, len(descriptor.Endpoints)) for _, endpoint := range descriptor.Endpoints { info.Endpoints = append(info.Endpoints, driver.ServiceEndpoint{ diff --git a/cmd/claw/compose_up_test.go b/cmd/claw/compose_up_test.go index e0f95f0..76d44c3 100644 --- a/cmd/claw/compose_up_test.go +++ b/cmd/claw/compose_up_test.go @@ -10,6 +10,7 @@ import ( "testing" "github.com/mostlydev/clawdapus/internal/clawapi" + "github.com/mostlydev/clawdapus/internal/cllama" "github.com/mostlydev/clawdapus/internal/describe" "github.com/mostlydev/clawdapus/internal/driver" "github.com/mostlydev/clawdapus/internal/driver/openclaw" @@ -2187,6 +2188,219 @@ func TestBuildFeedManifestProjectsBearerAuthFromServiceEnv(t *testing.T) { } } +func TestResolveToolSubscriptionsSupportsAllAndSubset(t *testing.T) { + p := &pod.Pod{ + Services: map[string]*pod.Service{ + "analyst": { + Claw: &pod.ClawBlock{ + Tools: []pod.ToolPolicyEntry{ + {Service: "trading-api", Allow: []string{"all"}}, + {Service: "analytics", Allow: []string{"get_summary"}}, + }, + }, + }, + }, + } + + registry := describe.ToolRegistry{ + "trading-api": { + {Name: "get_market_context", Service: "trading-api"}, + {Name: "get_positions", Service: "trading-api"}, + }, + "analytics": { + {Name: "get_summary", Service: "analytics"}, + {Name: "get_report", Service: "analytics"}, + }, + } + + resolved, err := resolveToolSubscriptions(p, registry) + if err != nil { + t.Fatalf("resolveToolSubscriptions: %v", err) + } + got := resolved["analyst"] + if len(got) != 3 { + t.Fatalf("expected 3 resolved tools, got %+v", got) + } + if got[0].Service != "trading-api" || got[1].Service != "trading-api" || got[2].Name != "get_summary" { + t.Fatalf("unexpected resolved tool order: %+v", got) + } +} + +func TestResolveToolSubscriptionsRejectsUnknownTool(t *testing.T) { + p := &pod.Pod{ + Services: map[string]*pod.Service{ + "analyst": { + Claw: &pod.ClawBlock{ + Tools: []pod.ToolPolicyEntry{ + {Service: "trading-api", Allow: []string{"missing_tool"}}, + }, + }, + }, + }, + } + + registry := describe.ToolRegistry{ + "trading-api": { + {Name: "get_market_context", Service: "trading-api"}, + }, + } + + if _, err := resolveToolSubscriptions(p, registry); err == nil { + t.Fatal("expected unknown tool error") + } +} + +func TestResolveMemorySubscriptionsRejectsMissingCapability(t *testing.T) { + p := &pod.Pod{ + Services: map[string]*pod.Service{ + "analyst": { + Claw: &pod.ClawBlock{ + Memory: &pod.MemoryEntry{Service: "team-memory", TimeoutMS: 300}, + }, + }, + }, + } + + if _, err := resolveMemorySubscriptions(p, map[string]*describe.ServiceDescriptor{ + "team-memory": {Version: 2}, + }); err == nil { + t.Fatal("expected missing memory capability error") + } +} + +func TestBuildToolManifestEntriesNamescopesAndProjectsAuth(t *testing.T) { + p := &pod.Pod{ + Services: map[string]*pod.Service{ + "analyst": { + Environment: map[string]string{ + "TRADING_API_TOKEN": "${TRADING_API_TOKEN}", + }, + Claw: &pod.ClawBlock{}, + }, + "trading-api": { + Expose: []string{"4000"}, + }, + }, + } + + tools, err := buildToolManifestEntries( + p, + map[string]*describe.ServiceDescriptor{ + "trading-api": { + Version: 2, + Auth: &describe.AuthDescriptor{ + Type: "bearer", + Env: "TRADING_API_TOKEN", + }, + }, + }, + map[string]string{ + "TRADING_API_TOKEN": "real-token", + }, + "analyst", + []describe.ToolSpec{{ + Name: "get_market_context", + Service: "trading-api", + Description: "Retrieve market context", + InputSchema: map[string]interface{}{"type": "object"}, + HTTP: &describe.ToolHTTP{ + Method: "GET", + Path: "/api/v1/market_context/{claw_id}", + }, + }}, + nil, + ) + if err != nil { + t.Fatalf("buildToolManifestEntries: %v", err) + } + if len(tools) != 1 { + t.Fatalf("expected 1 tool entry, got %+v", tools) + } + if tools[0].Name != "trading-api.get_market_context" { + t.Fatalf("expected namespaced tool name, got %+v", tools[0]) + } + if tools[0].Execution.BaseURL != "http://trading-api:4000" { + t.Fatalf("unexpected tool base URL: %+v", tools[0].Execution) + } + if tools[0].Execution.Auth == nil || tools[0].Execution.Auth.Token != "real-token" { + t.Fatalf("expected projected auth token, got %+v", tools[0].Execution.Auth) + } +} + +func TestBuildMemoryManifestEntryUsesProjectedServiceAuth(t *testing.T) { + p := &pod.Pod{ + Services: map[string]*pod.Service{ + "analyst": { + Claw: &pod.ClawBlock{}, + }, + "team-memory": { + Expose: []string{"8081"}, + }, + }, + } + + entry, err := buildMemoryManifestEntry( + p, + map[string]*describe.ServiceDescriptor{ + "team-memory": { + Version: 2, + Memory: &describe.MemoryDescriptor{ + Recall: &describe.MemoryEndpoint{Path: "/recall"}, + Retain: &describe.MemoryEndpoint{Path: "/retain"}, + }, + }, + }, + nil, + "analyst", + &resolvedMemorySubscription{ + Service: "team-memory", + Config: &pod.MemoryEntry{Service: "team-memory", TimeoutMS: 450}, + }, + []cllama.ServiceAuthEntry{{ + Service: "team-memory", + AuthType: "bearer", + Token: "memory-token", + }}, + ) + if err != nil { + t.Fatalf("buildMemoryManifestEntry: %v", err) + } + if entry == nil || entry.Service != "team-memory" { + t.Fatalf("unexpected memory entry: %+v", entry) + } + if entry.BaseURL != "http://team-memory:8081" { + t.Fatalf("unexpected memory base URL: %+v", entry) + } + if entry.Recall == nil || entry.Recall.TimeoutMS != 450 { + t.Fatalf("unexpected recall config: %+v", entry) + } + if entry.Auth == nil || entry.Auth.Token != "memory-token" { + t.Fatalf("expected projected memory auth, got %+v", entry.Auth) + } +} + +func TestBuildServiceSurfaceInfoOmitsEndpointsWhenToolsDeclared(t *testing.T) { + info := buildServiceSurfaceInfo(&describe.ServiceDescriptor{ + Version: 2, + Description: "Trading API", + Tools: []describe.ToolDescriptor{{ + Name: "get_market_context", + Description: "Retrieve market context", + InputSchema: map[string]interface{}{"type": "object"}, + }}, + Endpoints: []describe.EndpointDescriptor{{ + Method: "GET", + Path: "/api/v1/market_context", + }}, + }) + if info == nil { + t.Fatal("expected surface info") + } + if len(info.Endpoints) != 0 { + t.Fatalf("expected endpoints to be suppressed when tools are declared, got %+v", info.Endpoints) + } +} + func TestInjectConversationWallAddsServiceAndFeed(t *testing.T) { p := &pod.Pod{ Name: "desk", diff --git a/internal/cllama/context.go b/internal/cllama/context.go index b1aeb94..df1743d 100644 --- a/internal/cllama/context.go +++ b/internal/cllama/context.go @@ -13,6 +13,8 @@ type AgentContextInput struct { ClawdapusMD string Metadata map[string]interface{} Feeds []FeedManifestEntry + Tools []ToolManifestEntry + Memory *MemoryManifestEntry ServiceAuth []ServiceAuthEntry } @@ -32,9 +34,64 @@ type ServiceAuthEntry struct { Principal string `json:"principal,omitempty"` } +type AuthEntry struct { + Type string `json:"type"` + Token string `json:"token,omitempty"` +} + +type ToolManifestEntry struct { + Name string `json:"name"` + Description string `json:"description"` + InputSchema map[string]interface{} `json:"inputSchema"` + Annotations map[string]interface{} `json:"annotations,omitempty"` + Execution ToolExecution `json:"execution"` +} + +type ToolExecution struct { + Transport string `json:"transport"` + Service string `json:"service"` + BaseURL string `json:"base_url"` + Method string `json:"method"` + Path string `json:"path"` + Auth *AuthEntry `json:"auth,omitempty"` +} + +type ToolManifest struct { + Version int `json:"version"` + Tools []ToolManifestEntry `json:"tools"` + Policy ToolPolicy `json:"policy"` +} + +type ToolPolicy struct { + MaxRounds int `json:"max_rounds"` + TimeoutPerToolMS int `json:"timeout_per_tool_ms"` + TotalTimeoutMS int `json:"total_timeout_ms"` +} + +type MemoryManifestEntry struct { + Version int `json:"version"` + Service string `json:"service"` + BaseURL string `json:"base_url"` + Recall *MemoryOp `json:"recall,omitempty"` + Retain *MemoryOp `json:"retain,omitempty"` + Forget *MemoryOp `json:"forget,omitempty"` + Auth *AuthEntry `json:"auth,omitempty"` +} + +type MemoryOp struct { + Path string `json:"path"` + TimeoutMS int `json:"timeout_ms,omitempty"` +} + +var DefaultToolPolicy = ToolPolicy{ + MaxRounds: 8, + TimeoutPerToolMS: 30000, + TotalTimeoutMS: 120000, +} + // GenerateContextDir writes per-agent context files under: // -// /context//{AGENTS.md,CLAWDAPUS.md,metadata.json,feeds.json,service-auth/...} +// /context//{AGENTS.md,CLAWDAPUS.md,metadata.json,feeds.json,tools.json,memory.json,service-auth/...} func GenerateContextDir(runtimeDir string, agents []AgentContextInput) error { for _, agent := range agents { if agent.AgentID == "" { @@ -73,6 +130,30 @@ func GenerateContextDir(runtimeDir string, agents []AgentContextInput) error { } } + if len(agent.Tools) > 0 { + toolsJSON, err := json.MarshalIndent(ToolManifest{ + Version: 1, + Tools: agent.Tools, + Policy: DefaultToolPolicy, + }, "", " ") + if err != nil { + return fmt.Errorf("marshal tools for %q: %w", agent.AgentID, err) + } + if err := os.WriteFile(filepath.Join(agentDir, "tools.json"), append(toolsJSON, '\n'), 0644); err != nil { + return fmt.Errorf("write tools.json for %q: %w", agent.AgentID, err) + } + } + + if agent.Memory != nil { + memoryJSON, err := json.MarshalIndent(agent.Memory, "", " ") + if err != nil { + return fmt.Errorf("marshal memory for %q: %w", agent.AgentID, err) + } + if err := os.WriteFile(filepath.Join(agentDir, "memory.json"), append(memoryJSON, '\n'), 0644); err != nil { + return fmt.Errorf("write memory.json for %q: %w", agent.AgentID, err) + } + } + if len(agent.ServiceAuth) > 0 { authDir := filepath.Join(agentDir, "service-auth") if err := os.MkdirAll(authDir, 0700); err != nil { diff --git a/internal/cllama/context_test.go b/internal/cllama/context_test.go index 9756d5d..30473ff 100644 --- a/internal/cllama/context_test.go +++ b/internal/cllama/context_test.go @@ -93,6 +93,32 @@ func TestGenerateContextDirWritesOptionalFeedsAndServiceAuth(t *testing.T) { TTL: 30, URL: "http://claw-api:8080/fleet/alerts", }}, + Tools: []ToolManifestEntry{{ + Name: "trading-api.get_market_context", + Description: "Retrieve market context", + InputSchema: map[string]interface{}{"type": "object"}, + Execution: ToolExecution{ + Transport: "http", + Service: "trading-api", + BaseURL: "http://trading-api:4000", + Method: "GET", + Path: "/api/v1/market_context/{claw_id}", + Auth: &AuthEntry{Type: "bearer", Token: "service-token"}, + }, + }}, + Memory: &MemoryManifestEntry{ + Version: 1, + Service: "team-memory", + BaseURL: "http://team-memory:8080", + Recall: &MemoryOp{ + Path: "/recall", + TimeoutMS: 300, + }, + Retain: &MemoryOp{ + Path: "/retain", + }, + Auth: &AuthEntry{Type: "bearer", Token: "memory-token"}, + }, ServiceAuth: []ServiceAuthEntry{{ Service: "claw-api", AuthType: "bearer", @@ -117,6 +143,34 @@ func TestGenerateContextDirWritesOptionalFeedsAndServiceAuth(t *testing.T) { t.Fatalf("unexpected feeds payload: %v", feeds) } + toolsRaw, err := os.ReadFile(filepath.Join(dir, "context", "octopus", "tools.json")) + if err != nil { + t.Fatal(err) + } + var toolsManifest map[string]interface{} + if err := json.Unmarshal(toolsRaw, &toolsManifest); err != nil { + t.Fatal(err) + } + if toolsManifest["version"].(float64) != 1 { + t.Fatalf("unexpected tools manifest version: %v", toolsManifest) + } + tools := toolsManifest["tools"].([]interface{}) + if len(tools) != 1 { + t.Fatalf("unexpected tools manifest payload: %v", toolsManifest) + } + + memoryRaw, err := os.ReadFile(filepath.Join(dir, "context", "octopus", "memory.json")) + if err != nil { + t.Fatal(err) + } + var memory map[string]interface{} + if err := json.Unmarshal(memoryRaw, &memory); err != nil { + t.Fatal(err) + } + if memory["service"] != "team-memory" { + t.Fatalf("unexpected memory manifest payload: %v", memory) + } + authRaw, err := os.ReadFile(filepath.Join(dir, "context", "octopus", "service-auth", "claw-api.json")) if err != nil { t.Fatal(err) diff --git a/internal/describe/registry.go b/internal/describe/registry.go index 48a32cc..735ecbb 100644 --- a/internal/describe/registry.go +++ b/internal/describe/registry.go @@ -11,6 +11,17 @@ type FeedSpec struct { Auth *AuthDescriptor } +type ToolSpec struct { + Name string + Service string + Description string + InputSchema map[string]interface{} + HTTP *ToolHTTP + Annotations map[string]interface{} +} + +type ToolRegistry map[string][]ToolSpec + func BuildFeedRegistry(descriptors map[string]*ServiceDescriptor) (map[string]FeedSpec, error) { registry := make(map[string]FeedSpec) for serviceName, descriptor := range descriptors { @@ -37,3 +48,31 @@ func BuildFeedRegistry(descriptors map[string]*ServiceDescriptor) (map[string]Fe } return registry, nil } + +func BuildToolRegistry(descriptors map[string]*ServiceDescriptor) (ToolRegistry, error) { + registry := make(ToolRegistry) + for serviceName, descriptor := range descriptors { + if descriptor == nil || len(descriptor.Tools) == 0 { + continue + } + seen := make(map[string]struct{}, len(descriptor.Tools)) + specs := make([]ToolSpec, 0, len(descriptor.Tools)) + for _, tool := range descriptor.Tools { + if _, exists := seen[tool.Name]; exists { + return nil, fmt.Errorf("tool name %q is declared multiple times by %q", tool.Name, serviceName) + } + seen[tool.Name] = struct{}{} + spec := ToolSpec{ + Name: tool.Name, + Service: serviceName, + Description: tool.Description, + InputSchema: tool.InputSchema, + HTTP: tool.HTTP, + Annotations: tool.Annotations, + } + specs = append(specs, spec) + } + registry[serviceName] = specs + } + return registry, nil +} diff --git a/internal/describe/registry_test.go b/internal/describe/registry_test.go new file mode 100644 index 0000000..dc62b23 --- /dev/null +++ b/internal/describe/registry_test.go @@ -0,0 +1,48 @@ +package describe + +import "testing" + +func TestBuildToolRegistryGroupsToolsByService(t *testing.T) { + registry, err := BuildToolRegistry(map[string]*ServiceDescriptor{ + "trading-api": { + Version: 2, + Tools: []ToolDescriptor{ + {Name: "get_market_context", Description: "Get market context", InputSchema: map[string]interface{}{"type": "object"}}, + {Name: "get_positions", Description: "Get positions", InputSchema: map[string]interface{}{"type": "object"}}, + }, + }, + "analytics": { + Version: 2, + Tools: []ToolDescriptor{ + {Name: "get_report", Description: "Get report", InputSchema: map[string]interface{}{"type": "object"}}, + }, + }, + }) + if err != nil { + t.Fatalf("BuildToolRegistry: %v", err) + } + if len(registry["trading-api"]) != 2 { + t.Fatalf("expected 2 trading-api tools, got %+v", registry["trading-api"]) + } + if registry["trading-api"][0].Service != "trading-api" { + t.Fatalf("expected trading-api service tag, got %+v", registry["trading-api"][0]) + } + if len(registry["analytics"]) != 1 || registry["analytics"][0].Name != "get_report" { + t.Fatalf("unexpected analytics registry entry: %+v", registry["analytics"]) + } +} + +func TestBuildToolRegistryRejectsDuplicateNamesWithinService(t *testing.T) { + _, err := BuildToolRegistry(map[string]*ServiceDescriptor{ + "trading-api": { + Version: 2, + Tools: []ToolDescriptor{ + {Name: "lookup", Description: "Lookup", InputSchema: map[string]interface{}{"type": "object"}}, + {Name: "lookup", Description: "Lookup again", InputSchema: map[string]interface{}{"type": "object"}}, + }, + }, + }) + if err == nil { + t.Fatal("expected duplicate tool name error") + } +} From dcfdfe969cf9f3ec0c54ff5c5e92467ee439c12c Mon Sep 17 00:00:00 2001 From: Wojtek Date: Tue, 31 Mar 2026 17:52:10 -0400 Subject: [PATCH 08/18] Update cllama for history read API --- cllama | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/cllama b/cllama index 1154911..23d7974 160000 --- a/cllama +++ b/cllama @@ -1 +1 @@ -Subproject commit 1154911d38748bb146aa3c4fb1b2ffd59bddc51d +Subproject commit 23d797481da77a0f702b48f8423193f5714bf765 From 103231bf39acc89dfe391efb6b85af081e50b05a Mon Sep 17 00:00:00 2001 From: Wojtek Date: Tue, 31 Mar 2026 17:56:35 -0400 Subject: [PATCH 09/18] Update cllama for memory runtime hooks --- cllama | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/cllama b/cllama index 23d7974..8c49bab 160000 --- a/cllama +++ b/cllama @@ -1 +1 @@ -Subproject commit 23d797481da77a0f702b48f8423193f5714bf765 +Subproject commit 8c49bab94b5d3080f5cd3c864315e54b56b90f7e From 03c4a3def4d24684d07f0cd6f2cc3824ff8ccd19 Mon Sep 17 00:00:00 2001 From: Wojtek Date: Tue, 31 Mar 2026 18:13:09 -0400 Subject: [PATCH 10/18] Add local history export command --- cmd/claw/history.go | 144 +++++++++++++++++++++++++++++++++++++++ cmd/claw/history_test.go | 82 ++++++++++++++++++++++ 2 files changed, 226 insertions(+) create mode 100644 cmd/claw/history.go create mode 100644 cmd/claw/history_test.go diff --git a/cmd/claw/history.go b/cmd/claw/history.go new file mode 100644 index 0000000..381678d --- /dev/null +++ b/cmd/claw/history.go @@ -0,0 +1,144 @@ +package main + +import ( + "bufio" + "encoding/json" + "fmt" + "io" + "os" + "path/filepath" + "strings" + "time" + + "github.com/spf13/cobra" +) + +const ( + defaultHistoryExportLimit = 100 + maxHistoryExportLimit = 1000 +) + +var ( + historyAfter string + historyLimit int +) + +var historyCmd = &cobra.Command{ + Use: "history", + Short: "Inspect persistent agent session history", +} + +var historyExportCmd = &cobra.Command{ + Use: "export ", + Short: "Export an agent's session history as NDJSON", + Args: cobra.ExactArgs(1), + RunE: func(cmd *cobra.Command, args []string) error { + after, err := parseHistoryAfter(historyAfter) + if err != nil { + return err + } + limit, err := normalizeHistoryLimit(historyLimit) + if err != nil { + return err + } + podDir, err := resolveHistoryPodDir() + if err != nil { + return err + } + historyPath := filepath.Join(podDir, ".claw-session-history", args[0], "history.jsonl") + return exportHistoryFile(cmd.OutOrStdout(), historyPath, after, limit) + }, +} + +func resolveHistoryPodDir() (string, error) { + if composePodFile != "" { + absPodFile, err := filepath.Abs(composePodFile) + if err != nil { + return "", fmt.Errorf("resolve pod file path %q: %w", composePodFile, err) + } + if _, err := os.Stat(absPodFile); err != nil { + return "", fmt.Errorf("open pod file %q: %w", composePodFile, err) + } + return filepath.Dir(absPodFile), nil + } + cwd, err := os.Getwd() + if err != nil { + return "", fmt.Errorf("resolve current directory: %w", err) + } + return cwd, nil +} + +func parseHistoryAfter(raw string) (*time.Time, error) { + raw = strings.TrimSpace(raw) + if raw == "" { + return nil, nil + } + ts, err := time.Parse(time.RFC3339, raw) + if err != nil { + return nil, fmt.Errorf("after must be RFC3339") + } + return &ts, nil +} + +func normalizeHistoryLimit(limit int) (int, error) { + if limit <= 0 { + return 0, fmt.Errorf("limit must be > 0") + } + if limit > maxHistoryExportLimit { + return maxHistoryExportLimit, nil + } + return limit, nil +} + +func exportHistoryFile(w io.Writer, historyPath string, after *time.Time, limit int) error { + f, err := os.Open(historyPath) + if err != nil { + if os.IsNotExist(err) { + return fmt.Errorf("no session history found at %q", historyPath) + } + return err + } + defer f.Close() + + scanner := bufio.NewScanner(f) + emitted := 0 + for scanner.Scan() { + line := scanner.Bytes() + if len(strings.TrimSpace(string(line))) == 0 { + continue + } + if after != nil { + var meta struct { + TS string `json:"ts"` + } + if err := json.Unmarshal(line, &meta); err != nil { + return fmt.Errorf("parse history entry: %w", err) + } + ts, err := time.Parse(time.RFC3339, strings.TrimSpace(meta.TS)) + if err != nil { + return fmt.Errorf("parse history timestamp %q: %w", meta.TS, err) + } + if !ts.After(*after) { + continue + } + } + if _, err := fmt.Fprintln(w, string(line)); err != nil { + return err + } + emitted++ + if emitted >= limit { + break + } + } + if err := scanner.Err(); err != nil { + return err + } + return nil +} + +func init() { + historyExportCmd.Flags().StringVar(&historyAfter, "after", "", "Only emit entries after this RFC3339 timestamp") + historyExportCmd.Flags().IntVar(&historyLimit, "limit", defaultHistoryExportLimit, "Maximum number of entries to emit") + historyCmd.AddCommand(historyExportCmd) + rootCmd.AddCommand(historyCmd) +} diff --git a/cmd/claw/history_test.go b/cmd/claw/history_test.go new file mode 100644 index 0000000..7c64ba2 --- /dev/null +++ b/cmd/claw/history_test.go @@ -0,0 +1,82 @@ +package main + +import ( + "bytes" + "os" + "path/filepath" + "strings" + "testing" + "time" +) + +func TestParseHistoryAfter(t *testing.T) { + after, err := parseHistoryAfter("2026-03-31T12:00:00Z") + if err != nil { + t.Fatalf("parseHistoryAfter: %v", err) + } + if after == nil || after.UTC().Format(time.RFC3339) != "2026-03-31T12:00:00Z" { + t.Fatalf("unexpected parsed timestamp: %v", after) + } + if _, err := parseHistoryAfter("not-a-time"); err == nil { + t.Fatal("expected invalid timestamp error") + } +} + +func TestNormalizeHistoryLimit(t *testing.T) { + if _, err := normalizeHistoryLimit(0); err == nil { + t.Fatal("expected error for limit <= 0") + } + got, err := normalizeHistoryLimit(5000) + if err != nil { + t.Fatalf("normalizeHistoryLimit: %v", err) + } + if got != maxHistoryExportLimit { + t.Fatalf("expected capped limit %d, got %d", maxHistoryExportLimit, got) + } +} + +func TestExportHistoryFileAppliesAfterAndLimit(t *testing.T) { + dir := t.TempDir() + historyPath := filepath.Join(dir, "history.jsonl") + content := strings.Join([]string{ + `{"ts":"2026-03-31T12:00:00Z","claw_id":"agent-1"}`, + `{"ts":"2026-03-31T12:01:00Z","claw_id":"agent-1"}`, + `{"ts":"2026-03-31T12:02:00Z","claw_id":"agent-1"}`, + }, "\n") + "\n" + if err := os.WriteFile(historyPath, []byte(content), 0o644); err != nil { + t.Fatal(err) + } + + var out bytes.Buffer + after := time.Date(2026, 3, 31, 12, 0, 0, 0, time.UTC) + if err := exportHistoryFile(&out, historyPath, &after, 1); err != nil { + t.Fatalf("exportHistoryFile: %v", err) + } + lines := strings.Split(strings.TrimSpace(out.String()), "\n") + if len(lines) != 1 { + t.Fatalf("expected 1 emitted line, got %q", out.String()) + } + if !strings.Contains(lines[0], `"2026-03-31T12:01:00Z"`) { + t.Fatalf("unexpected exported line: %q", lines[0]) + } +} + +func TestResolveHistoryPodDirUsesComposePodFile(t *testing.T) { + dir := t.TempDir() + podFile := filepath.Join(dir, "claw-pod.yml") + if err := os.WriteFile(podFile, []byte("services: {}"), 0o644); err != nil { + t.Fatal(err) + } + + prev := composePodFile + composePodFile = podFile + defer func() { composePodFile = prev }() + + got, err := resolveHistoryPodDir() + if err != nil { + t.Fatalf("resolveHistoryPodDir: %v", err) + } + if got != dir { + t.Fatalf("expected pod dir %q, got %q", dir, got) + } +} From b1debbedbb5ba0838a827f3e4ed980961890edcb Mon Sep 17 00:00:00 2001 From: Wojtek Date: Tue, 31 Mar 2026 18:20:42 -0400 Subject: [PATCH 11/18] Project memory replay auth into subscribed services --- cllama | 2 +- cmd/claw/compose_up.go | 192 +++++++++++++++++++++++++++++++++++- cmd/claw/compose_up_test.go | 101 +++++++++++++++++++ 3 files changed, 292 insertions(+), 3 deletions(-) diff --git a/cllama b/cllama index 8c49bab..d62b61a 160000 --- a/cllama +++ b/cllama @@ -1 +1 @@ -Subproject commit 8c49bab94b5d3080f5cd3c864315e54b56b90f7e +Subproject commit d62b61a179f8609731dfef24b2a276b757be4746 diff --git a/cmd/claw/compose_up.go b/cmd/claw/compose_up.go index dd9bc12..aaaf03b 100644 --- a/cmd/claw/compose_up.go +++ b/cmd/claw/compose_up.go @@ -42,6 +42,9 @@ const ( conversationWallFeedLimit = 20 conversationWallInternalPort = "8080" conversationWallDockerfile = "dockerfiles/claw-wall/Dockerfile" + clawInternalNetworkName = "claw-internal" + historyReplayAuthService = "cllama-history" + historyReplayBaseURL = "http://cllama:8080/history" ) var ( @@ -363,6 +366,9 @@ func runComposeUp(podFile string) error { if err != nil { return err } + if err := attachCapabilityProvidersToInternalNetwork(p, resolvedTools, resolvedMemory); err != nil { + return err + } for name, rc := range resolvedClaws { svc := p.Services[name] if svc == nil || svc.Claw == nil { @@ -382,6 +388,10 @@ func runComposeUp(podFile string) error { proxies := make([]pod.CllamaProxyConfig, 0) cllamaDashboardPort := envOrDefault("CLLAMA_UI_PORT", "8181") if cllamaEnabled { + historyReplayAuth, err := prepareHistoryReplayRuntime(p, resolvedClaws, resolvedMemory) + if err != nil { + return err + } proxyTypes := collectProxyTypes(resolvedClaws) if len(proxyTypes) > 1 { return fmt.Errorf("multi-proxy chaining not yet supported: found proxy types %v; Phase 4 supports one proxy type per pod", proxyTypes) @@ -482,7 +492,10 @@ func runComposeUp(podFile string) error { ordinalName := fmt.Sprintf("%s-%d", name, i) ordinalRC := *rc ordinalRC.ServiceName = ordinalName - ordinalAuth := lookupServiceAuth(clawAPIAuth, ordinalName) + ordinalAuth := mergeServiceAuthEntries( + lookupServiceAuth(clawAPIAuth, ordinalName), + lookupServiceAuth(historyReplayAuth, ordinalName), + ) feeds, err := buildFeedManifestEntries(p, serviceDescriptors, runtimeEnv, name, ordinalName, ordinalAuth) if err != nil { return fmt.Errorf("service %q: build feed manifest: %w", ordinalName, err) @@ -517,7 +530,10 @@ func runComposeUp(podFile string) error { continue } - svcAuth := lookupServiceAuth(clawAPIAuth, name) + svcAuth := mergeServiceAuthEntries( + lookupServiceAuth(clawAPIAuth, name), + lookupServiceAuth(historyReplayAuth, name), + ) feeds, err := buildFeedManifestEntries(p, serviceDescriptors, runtimeEnv, name, name, svcAuth) if err != nil { return fmt.Errorf("service %q: build feed manifest: %w", name, err) @@ -1170,6 +1186,50 @@ func buildMemoryManifestEntry(p *pod.Pod, descriptors map[string]*describe.Servi return entry, nil } +func attachCapabilityProvidersToInternalNetwork(p *pod.Pod, resolvedTools map[string][]describe.ToolSpec, resolvedMemory map[string]*resolvedMemorySubscription) error { + if p == nil { + return nil + } + + providers := make(map[string]struct{}) + for _, svc := range p.Services { + if svc == nil || svc.Claw == nil { + continue + } + for _, feed := range svc.Claw.Feeds { + if feed.Unresolved || strings.TrimSpace(feed.Source) == "" { + continue + } + providers[feed.Source] = struct{}{} + } + } + for _, tools := range resolvedTools { + for _, tool := range tools { + if strings.TrimSpace(tool.Service) == "" { + continue + } + providers[tool.Service] = struct{}{} + } + } + for _, memory := range resolvedMemory { + if memory == nil || strings.TrimSpace(memory.Service) == "" { + continue + } + providers[memory.Service] = struct{}{} + } + + for provider := range providers { + svc := p.Services[provider] + if svc == nil { + continue + } + if err := ensureServiceOnNetwork(svc, clawInternalNetworkName); err != nil { + return fmt.Errorf("attach capability provider network: %w", err) + } + } + return nil +} + func resolveFeedAuthFromServiceEnv(env map[string]string, descriptor *describe.ServiceDescriptor, runtimeEnv map[string]string) (string, error) { if descriptor == nil || descriptor.Auth == nil || descriptor.Auth.Type != "bearer" || descriptor.Auth.Env == "" { return "", nil @@ -1584,6 +1644,57 @@ func prepareClawAPIRuntime(runtimeDir string, p *pod.Pod, resolvedClaws map[stri return serviceAuth, nil } +func prepareHistoryReplayRuntime(p *pod.Pod, resolvedClaws map[string]*driver.ResolvedClaw, resolvedMemory map[string]*resolvedMemorySubscription) (map[string]cllama.ServiceAuthEntry, error) { + if p == nil || len(resolvedMemory) == 0 { + return nil, nil + } + + serviceAgents := make(map[string][]string) + for consumer, memory := range resolvedMemory { + if memory == nil || strings.TrimSpace(memory.Service) == "" { + continue + } + count := 1 + if rc := resolvedClaws[consumer]; rc != nil && rc.Count > 0 { + count = rc.Count + } + serviceAgents[memory.Service] = append(serviceAgents[memory.Service], expandedServiceNames(consumer, count)...) + } + if len(serviceAgents) == 0 { + return nil, nil + } + + serviceAuth := make(map[string]cllama.ServiceAuthEntry) + for serviceName, agentIDs := range serviceAgents { + svc := p.Services[serviceName] + if svc == nil { + return nil, fmt.Errorf("history replay target %q not found in pod services", serviceName) + } + + token := cllama.GenerateToken(serviceName + "-history") + agents := uniqueSortedStrings(agentIDs) + if svc.Environment == nil { + svc.Environment = make(map[string]string) + } + svc.Environment["CLAW_HISTORY_URL"] = historyReplayBaseURL + svc.Environment["CLAW_HISTORY_TOKEN"] = token + svc.Environment["CLAW_HISTORY_AGENT_IDS"] = strings.Join(agents, ",") + if err := ensureServiceOnNetwork(svc, clawInternalNetworkName); err != nil { + return nil, fmt.Errorf("service %q: attach history replay network: %w", serviceName, err) + } + + for _, agentID := range agents { + serviceAuth[agentID] = cllama.ServiceAuthEntry{ + Service: historyReplayAuthService, + AuthType: "bearer", + Token: token, + Principal: serviceName, + } + } + } + return serviceAuth, nil +} + func lookupServiceAuth(entries map[string]cllama.ServiceAuthEntry, agentID string) []cllama.ServiceAuthEntry { if len(entries) == 0 { return nil @@ -1595,6 +1706,83 @@ func lookupServiceAuth(entries map[string]cllama.ServiceAuthEntry, agentID strin return []cllama.ServiceAuthEntry{entry} } +func mergeServiceAuthEntries(groups ...[]cllama.ServiceAuthEntry) []cllama.ServiceAuthEntry { + if len(groups) == 0 { + return nil + } + + seen := make(map[string]struct{}) + merged := make([]cllama.ServiceAuthEntry, 0) + for _, group := range groups { + for _, entry := range group { + key := entry.Service + "\x00" + entry.AuthType + "\x00" + entry.Token + if _, ok := seen[key]; ok { + continue + } + seen[key] = struct{}{} + merged = append(merged, entry) + } + } + if len(merged) == 0 { + return nil + } + return merged +} + +func ensureServiceOnNetwork(svc *pod.Service, network string) error { + if svc == nil { + return nil + } + if svc.Compose == nil { + svc.Compose = make(map[string]interface{}) + } + networks, err := appendComposeNetwork(svc.Compose["networks"], network) + if err != nil { + return err + } + svc.Compose["networks"] = networks + return nil +} + +func appendComposeNetwork(base interface{}, network string) (interface{}, error) { + switch tv := base.(type) { + case nil: + return []string{network}, nil + case []string: + out := append([]string(nil), tv...) + for _, existing := range out { + if existing == network { + return out, nil + } + } + return append(out, network), nil + case []interface{}: + out := make([]interface{}, 0, len(tv)+1) + found := false + for _, item := range tv { + out = append(out, item) + if s, ok := item.(string); ok && s == network { + found = true + } + } + if !found { + out = append(out, network) + } + return out, nil + case map[string]interface{}: + out := make(map[string]interface{}, len(tv)+1) + for key, value := range tv { + out[key] = value + } + if _, ok := out[network]; !ok { + out[network] = nil + } + return out, nil + default: + return nil, fmt.Errorf("unsupported networks value type %T", base) + } +} + func discordServiceUserIDs(p *pod.Pod, serviceNames []string, expand func(string) string) ([]string, error) { ids := make([]string, 0, len(serviceNames)) for _, name := range serviceNames { diff --git a/cmd/claw/compose_up_test.go b/cmd/claw/compose_up_test.go index 76d44c3..5dd37a7 100644 --- a/cmd/claw/compose_up_test.go +++ b/cmd/claw/compose_up_test.go @@ -2379,6 +2379,107 @@ func TestBuildMemoryManifestEntryUsesProjectedServiceAuth(t *testing.T) { } } +func TestAttachCapabilityProvidersToInternalNetworkAddsProviderServices(t *testing.T) { + p := &pod.Pod{ + Services: map[string]*pod.Service{ + "analyst": { + Claw: &pod.ClawBlock{ + Feeds: []pod.FeedEntry{{Name: "market", Source: "market-feed", Path: "/feed", TTL: 30}}, + }, + }, + "market-feed": {Compose: map[string]interface{}{}}, + "tool-api": {Compose: map[string]interface{}{}}, + "team-memory": {Compose: map[string]interface{}{}}, + }, + } + + err := attachCapabilityProvidersToInternalNetwork( + p, + map[string][]describe.ToolSpec{ + "analyst": {{Name: "get_market", Service: "tool-api"}}, + }, + map[string]*resolvedMemorySubscription{ + "analyst": {Service: "team-memory", Config: &pod.MemoryEntry{Service: "team-memory", TimeoutMS: 300}}, + }, + ) + if err != nil { + t.Fatalf("attachCapabilityProvidersToInternalNetwork: %v", err) + } + + for _, name := range []string{"market-feed", "tool-api", "team-memory"} { + networks, ok := p.Services[name].Compose["networks"].([]string) + if !ok || len(networks) != 1 || networks[0] != clawInternalNetworkName { + t.Fatalf("expected %s on %s, got %#v", name, clawInternalNetworkName, p.Services[name].Compose["networks"]) + } + } +} + +func TestPrepareHistoryReplayRuntimeProjectsReplayAuthAndEnv(t *testing.T) { + p := &pod.Pod{ + Services: map[string]*pod.Service{ + "analyst": { + Claw: &pod.ClawBlock{ + Memory: &pod.MemoryEntry{Service: "team-memory", TimeoutMS: 300}, + }, + }, + "researcher": { + Claw: &pod.ClawBlock{ + Count: 2, + Memory: &pod.MemoryEntry{Service: "team-memory", TimeoutMS: 300}, + }, + }, + "team-memory": { + Environment: map[string]string{}, + Compose: map[string]interface{}{}, + }, + }, + } + + auth, err := prepareHistoryReplayRuntime( + p, + map[string]*driver.ResolvedClaw{ + "analyst": {Count: 1}, + "researcher": {Count: 2}, + }, + map[string]*resolvedMemorySubscription{ + "analyst": {Service: "team-memory", Config: &pod.MemoryEntry{Service: "team-memory", TimeoutMS: 300}}, + "researcher": {Service: "team-memory", Config: &pod.MemoryEntry{Service: "team-memory", TimeoutMS: 300}}, + }, + ) + if err != nil { + t.Fatalf("prepareHistoryReplayRuntime: %v", err) + } + + env := p.Services["team-memory"].Environment + if env["CLAW_HISTORY_URL"] != historyReplayBaseURL { + t.Fatalf("unexpected CLAW_HISTORY_URL: %q", env["CLAW_HISTORY_URL"]) + } + if env["CLAW_HISTORY_TOKEN"] == "" { + t.Fatal("expected CLAW_HISTORY_TOKEN to be injected") + } + if env["CLAW_HISTORY_AGENT_IDS"] != "analyst,researcher-0,researcher-1" { + t.Fatalf("unexpected CLAW_HISTORY_AGENT_IDS: %q", env["CLAW_HISTORY_AGENT_IDS"]) + } + + networks, ok := p.Services["team-memory"].Compose["networks"].([]string) + if !ok || len(networks) != 1 || networks[0] != clawInternalNetworkName { + t.Fatalf("expected team-memory on %s, got %#v", clawInternalNetworkName, p.Services["team-memory"].Compose["networks"]) + } + + for _, agentID := range []string{"analyst", "researcher-0", "researcher-1"} { + entry, ok := auth[agentID] + if !ok { + t.Fatalf("expected replay auth for %s", agentID) + } + if entry.Service != historyReplayAuthService || entry.AuthType != "bearer" || entry.Principal != "team-memory" { + t.Fatalf("unexpected auth entry for %s: %+v", agentID, entry) + } + if entry.Token != env["CLAW_HISTORY_TOKEN"] { + t.Fatalf("expected %s token to match projected env token", agentID) + } + } +} + func TestBuildServiceSurfaceInfoOmitsEndpointsWhenToolsDeclared(t *testing.T) { info := buildServiceSurfaceInfo(&describe.ServiceDescriptor{ Version: 2, From ba734f5934a42a6f61da687bb0893acd8b406ea8 Mon Sep 17 00:00:00 2001 From: Wojtek Date: Tue, 31 Mar 2026 18:26:04 -0400 Subject: [PATCH 12/18] Align descriptor and memory response validation --- cllama | 2 +- internal/describe/descriptor.go | 2 +- internal/describe/descriptor_test.go | 4 ++++ 3 files changed, 6 insertions(+), 2 deletions(-) diff --git a/cllama b/cllama index d62b61a..703452f 160000 --- a/cllama +++ b/cllama @@ -1 +1 @@ -Subproject commit d62b61a179f8609731dfef24b2a276b757be4746 +Subproject commit 703452f15134ece07b6e54a7747d654b1f017446 diff --git a/internal/describe/descriptor.go b/internal/describe/descriptor.go index 4b13d40..bb244c4 100644 --- a/internal/describe/descriptor.go +++ b/internal/describe/descriptor.go @@ -182,7 +182,7 @@ func validateTools(tools []ToolDescriptor) error { return fmt.Errorf("tools[%d]: inputSchema.type must be \"object\"", i) } if tool.HTTP == nil { - continue + return fmt.Errorf("tools[%d]: http is required", i) } tool.HTTP.Method = strings.ToUpper(strings.TrimSpace(tool.HTTP.Method)) tool.HTTP.Path = strings.TrimSpace(tool.HTTP.Path) diff --git a/internal/describe/descriptor_test.go b/internal/describe/descriptor_test.go index e354235..183a889 100644 --- a/internal/describe/descriptor_test.go +++ b/internal/describe/descriptor_test.go @@ -125,6 +125,10 @@ func TestParseDescriptorRejectsInvalidV2CapabilityShape(t *testing.T) { name: "invalid http method", data: `{"version":2,"tools":[{"name":"lookup","description":"Lookup","inputSchema":{"type":"object"},"http":{"method":"trace","path":"/lookup"}}]}`, }, + { + name: "missing tool http", + data: `{"version":2,"tools":[{"name":"lookup","description":"Lookup","inputSchema":{"type":"object"}}]}`, + }, { name: "memory forget only", data: `{"version":2,"memory":{"forget":{"path":"/forget"}}}`, From 0c0e96330e27b211b36b2c9b0b82bd8134fd9a1e Mon Sep 17 00:00:00 2001 From: Wojtek Date: Tue, 31 Mar 2026 18:37:08 -0400 Subject: [PATCH 13/18] Update memory plane plan with implementation status --- ...03-30-memory-plane-and-pluggable-recall.md | 68 +++++++++++++++++++ 1 file changed, 68 insertions(+) diff --git a/docs/plans/2026-03-30-memory-plane-and-pluggable-recall.md b/docs/plans/2026-03-30-memory-plane-and-pluggable-recall.md index 0bdb1a2..7472fe7 100644 --- a/docs/plans/2026-03-30-memory-plane-and-pluggable-recall.md +++ b/docs/plans/2026-03-30-memory-plane-and-pluggable-recall.md @@ -124,6 +124,41 @@ This parallel should be treated as core architectural framing, not as an inciden --- +## Implementation Status (2026-03-31) + +This branch now implements the core memory-plane substrate that this plan was proposing. + +Completed: + +- descriptor `version: 2` support for `tools[]` and `memory` +- pod grammar for `x-claw.tools`, `x-claw.memory`, `tools-defaults`, and `memory-defaults` +- compiled `tools.json` and `memory.json` manifests in the per-agent `cllama` context +- a scoped `cllama` history read API at `GET /history/{agentID}` +- dedicated replay auth projection for subscribed memory services via: + - `service-auth/cllama-history.json` in agent context + - `CLAW_HISTORY_URL` + - `CLAW_HISTORY_TOKEN` + - `CLAW_HISTORY_AGENT_IDS` +- automatic attachment of declared feed/tool/memory provider services to `claw-internal` +- pre-turn recall and post-turn best-effort retain hooks in `cllama` +- provider-format-aware memory injection for OpenAI-style and Anthropic-style requests + +Still open: + +- operator replay UX such as `claw memory backfill` +- tombstone/forget flow and replay hygiene +- dedicated success telemetry for memory operations +- a more scalable replay path than forward-scanning large JSONL files +- ADR-020 mediated tool runtime + +That means the plan should now be read as: + +- architectural rationale for the memory plane +- explanation of the boundaries that remain important +- checklist of the remaining hardening and operator-facing work + +--- + ## Problem Statement Without a shared memory plane, users are pushed toward runner-local memory systems: @@ -906,6 +941,10 @@ The architecture should therefore assume a future explicit backfill path, likely - a dedicated CLI flow such as `claw memory backfill` - backend idempotency or replay markers so the same ledger can be consumed safely more than once +The first of these now exists in-tree, and subscribed memory services receive a dedicated replay token plus history URL projection. + +What still does **not** exist is the operator-facing replay UX. A memory service can rebuild from the ledger, but the repo does not yet provide the final `claw memory backfill` workflow. + The retain webhook is the low-latency path. Backfill is the durability path for new or recovering services. --- @@ -946,6 +985,8 @@ Clawdapus should not attempt to disable runner-native memory features globally. ### Milestone 1: Complete ADR-018 Phase 2 and define backfill +Status: mostly complete on this branch. + Add the self-scoped history read surface to `cllama`. Benefits: @@ -957,8 +998,16 @@ Benefits: This milestone should also define the expected operational backfill flow for new or recovering memory services. +What remains here is mostly operator UX and replay ergonomics: + +- `claw memory backfill` +- replay markers or idempotency guidance +- better large-history replay performance + ### Milestone 2: Add the memory capability and `cllama` hooks +Status: core complete on this branch. + Implement: - descriptor extension for memory capability @@ -974,6 +1023,13 @@ Implement: This is the first full end-to-end memory plane. +The main remaining gaps are: + +- success-path observability for recall and retain +- governed forget/tombstone semantics +- policy filtering on retain and recall +- any optional payload-bounding refinements beyond the current fixed request shape + ### Milestone 3: Reference adapter and governance hardening Implement: @@ -1035,6 +1091,12 @@ The proxy should send a fixed request shape with only simple numeric payload bou We should avoid a richer negotiated vocabulary here unless real implementations prove it necessary. +Current implementation note: + +- the branch currently sends the full inbound `messages` payload and, for Anthropic requests, the top-level `system` field +- this is acceptable as a first implementation because the service may ignore what it does not need +- if payload size becomes a practical problem, bounded recent-context shaping can be added later without changing the core contract + ### 2. Should recall responses support categories? Probably yes, eventually. @@ -1065,6 +1127,12 @@ Long-term, the stable read API is cleaner. Short-term, direct ledger reading may be acceptable for local prototypes. +Current implementation note: + +- the stable read API now exists +- subscribed memory services also receive a dedicated replay auth projection +- direct filesystem scraping should therefore be treated as a prototype shortcut, not as the intended supported path + ### 6. How should affect fit into the model? Affect is exactly the kind of advanced derived state that should remain backend-defined. From 8991b70b06a7e67e5fbaf9e1e47ce613a48db913 Mon Sep 17 00:00:00 2001 From: Wojtek Date: Tue, 31 Mar 2026 19:05:48 -0400 Subject: [PATCH 14/18] Add claw memory backfill command --- cmd/claw/memory.go | 597 ++++++++++++++++++ cmd/claw/memory_test.go | 197 ++++++ ...03-30-memory-plane-and-pluggable-recall.md | 24 +- 3 files changed, 813 insertions(+), 5 deletions(-) create mode 100644 cmd/claw/memory.go create mode 100644 cmd/claw/memory_test.go diff --git a/cmd/claw/memory.go b/cmd/claw/memory.go new file mode 100644 index 0000000..9e96ebc --- /dev/null +++ b/cmd/claw/memory.go @@ -0,0 +1,597 @@ +package main + +import ( + "bufio" + "bytes" + "encoding/json" + "fmt" + "io" + "net" + "net/http" + "net/url" + "os" + "os/exec" + "path/filepath" + "sort" + "strings" + "time" + + "github.com/spf13/cobra" + "gopkg.in/yaml.v3" +) + +var ( + memoryBackfillAfter string + memoryBackfillLimit int + memoryBackfillURL string + memoryBackfillToken string + memoryBackfillAgent []string + + runComposeOutputCommand = runComposeOutputCommandDefault +) + +type memoryAuthEntry struct { + Type string `json:"type"` + Token string `json:"token,omitempty"` +} + +type memoryOpEntry struct { + Path string `json:"path"` + TimeoutMS int `json:"timeout_ms,omitempty"` +} + +type memoryManifestFile struct { + Service string `json:"service"` + BaseURL string `json:"base_url"` + Retain *memoryOpEntry `json:"retain,omitempty"` + Auth *memoryAuthEntry `json:"auth,omitempty"` +} + +type memoryBackfillTarget struct { + AgentID string + HistoryPath string + Pod string + Metadata map[string]any + Manifest memoryManifestFile +} + +type historyEntryHeader struct { + TS string `json:"ts"` + RequestedModel string `json:"requested_model"` +} + +type backfillComposeFile struct { + Services map[string]backfillComposeService `yaml:"services"` +} + +type backfillComposeService struct { + Ports []any `yaml:"ports"` +} + +var memoryCmd = &cobra.Command{ + Use: "memory", + Short: "Operate on configured memory services", +} + +var memoryBackfillCmd = &cobra.Command{ + Use: "backfill ", + Short: "Replay retained session history to a memory service retain endpoint", + Args: cobra.ExactArgs(1), + RunE: func(cmd *cobra.Command, args []string) error { + generatedPath, err := resolveComposeGeneratedPath() + if err != nil { + return err + } + + after, err := parseHistoryAfter(memoryBackfillAfter) + if err != nil { + return err + } + limit, err := normalizeBackfillLimit(memoryBackfillLimit) + if err != nil { + return err + } + + podDir := filepath.Dir(generatedPath) + targets, err := discoverMemoryBackfillTargets(podDir, args[0], memoryBackfillAgent) + if err != nil { + return err + } + retainURL, err := resolveMemoryBackfillURL(generatedPath, targets[0].Manifest, memoryBackfillURL) + if err != nil { + return err + } + + token := strings.TrimSpace(memoryBackfillToken) + if token == "" && targets[0].Manifest.Auth != nil && strings.EqualFold(targets[0].Manifest.Auth.Type, "bearer") { + token = targets[0].Manifest.Auth.Token + } + + summary, err := replayMemoryBackfill(cmd.OutOrStdout(), retainURL, token, targets, after, limit) + if err != nil { + return err + } + fmt.Fprintf(cmd.OutOrStdout(), "Backfilled %d entr%s across %d agent%s to %s\n", + summary.Entries, + entryPlural(summary.Entries), + summary.Agents, + countPlural(summary.Agents), + retainURL, + ) + if summary.SkippedAgents > 0 { + fmt.Fprintf(cmd.OutOrStdout(), "Skipped %d agent%s with no matching history entries\n", summary.SkippedAgents, countPlural(summary.SkippedAgents)) + } + return nil + }, +} + +type memoryBackfillSummary struct { + Entries int + Agents int + SkippedAgents int +} + +func discoverMemoryBackfillTargets(podDir, memoryService string, requestedAgents []string) ([]memoryBackfillTarget, error) { + contextRoot := filepath.Join(podDir, ".claw-runtime", "context") + entries, err := os.ReadDir(contextRoot) + if err != nil { + if os.IsNotExist(err) { + return nil, fmt.Errorf("no generated agent context found in %q (run 'claw up' first)", contextRoot) + } + return nil, fmt.Errorf("list generated agent context: %w", err) + } + + requested := make(map[string]struct{}, len(requestedAgents)) + for _, agentID := range requestedAgents { + agentID = strings.TrimSpace(agentID) + if agentID == "" { + continue + } + requested[agentID] = struct{}{} + } + foundRequested := make(map[string]struct{}, len(requested)) + + targets := make([]memoryBackfillTarget, 0) + for _, entry := range entries { + if !entry.IsDir() { + continue + } + agentID := entry.Name() + if len(requested) > 0 { + if _, ok := requested[agentID]; !ok { + continue + } + } + + manifestPath := filepath.Join(contextRoot, agentID, "memory.json") + raw, err := os.ReadFile(manifestPath) + if err != nil { + if os.IsNotExist(err) { + continue + } + return nil, fmt.Errorf("read memory manifest for %q: %w", agentID, err) + } + + var manifest memoryManifestFile + if err := json.Unmarshal(raw, &manifest); err != nil { + return nil, fmt.Errorf("parse memory manifest for %q: %w", agentID, err) + } + if manifest.Service != memoryService { + continue + } + if manifest.Retain == nil || strings.TrimSpace(manifest.Retain.Path) == "" { + return nil, fmt.Errorf("memory service %q has no retain endpoint for agent %q", memoryService, agentID) + } + + metaPath := filepath.Join(contextRoot, agentID, "metadata.json") + metaRaw, err := os.ReadFile(metaPath) + if err != nil { + return nil, fmt.Errorf("read metadata for %q: %w", agentID, err) + } + var metadata map[string]any + if err := json.Unmarshal(metaRaw, &metadata); err != nil { + return nil, fmt.Errorf("parse metadata for %q: %w", agentID, err) + } + + targets = append(targets, memoryBackfillTarget{ + AgentID: agentID, + HistoryPath: filepath.Join(podDir, ".claw-session-history", agentID, "history.jsonl"), + Pod: stringFromMap(metadata, "pod"), + Metadata: metadata, + Manifest: manifest, + }) + foundRequested[agentID] = struct{}{} + } + + sort.Slice(targets, func(i, j int) bool { + return targets[i].AgentID < targets[j].AgentID + }) + + if len(requested) > 0 { + missing := make([]string, 0) + for agentID := range requested { + if _, ok := foundRequested[agentID]; !ok { + missing = append(missing, agentID) + } + } + sort.Strings(missing) + if len(missing) > 0 { + return nil, fmt.Errorf("memory service %q is not subscribed by agent%s %s", memoryService, countPlural(len(missing)), strings.Join(missing, ", ")) + } + } + + if len(targets) == 0 { + return nil, fmt.Errorf("no agents subscribe to memory service %q", memoryService) + } + return targets, nil +} + +func normalizeBackfillLimit(limit int) (int, error) { + if limit < 0 { + return 0, fmt.Errorf("limit must be >= 0") + } + return limit, nil +} + +func resolveMemoryBackfillURL(composePath string, manifest memoryManifestFile, override string) (string, error) { + retainPath := manifestRetainPath(manifest) + if retainPath == "" { + return "", fmt.Errorf("memory manifest has no retain endpoint") + } + if strings.TrimSpace(override) != "" { + return joinRetainURL(override, retainPath) + } + + baseURL, err := url.Parse(strings.TrimSpace(manifest.BaseURL)) + if err != nil { + return "", fmt.Errorf("parse memory base URL %q: %w", manifest.BaseURL, err) + } + service := strings.TrimSpace(baseURL.Hostname()) + port := strings.TrimSpace(baseURL.Port()) + if service == "" || port == "" { + return "", fmt.Errorf("memory base URL %q must include service host and port", manifest.BaseURL) + } + + hostPort, err := resolvePublishedHostPort(composePath, service, port) + if err != nil { + return "", fmt.Errorf("resolve host URL for memory service %q: %w (pass --url to override)", service, err) + } + return (&url.URL{ + Scheme: "http", + Host: hostPort, + Path: retainPath, + }).String(), nil +} + +func manifestRetainPath(manifest memoryManifestFile) string { + if manifest.Retain == nil { + return "" + } + return strings.TrimSpace(manifest.Retain.Path) +} + +func joinRetainURL(raw, retainPath string) (string, error) { + u, err := url.Parse(strings.TrimSpace(raw)) + if err != nil { + return "", fmt.Errorf("parse --url: %w", err) + } + if u.Scheme == "" || u.Host == "" { + return "", fmt.Errorf("--url must include scheme and host") + } + if strings.TrimSpace(u.Path) == "" || u.Path == "/" { + u.Path = retainPath + } + return u.String(), nil +} + +func resolvePublishedHostPort(composePath, service, containerPort string) (string, error) { + if hostPort, err := resolveHostPortFromComposeFile(composePath, service, containerPort); err == nil { + return hostPort, nil + } + out, err := runComposeOutputCommand("compose", "-f", composePath, "port", service, containerPort) + if err != nil { + return "", formatComposeOutputError("docker compose port", err, out) + } + hostPort := strings.TrimSpace(string(out)) + if hostPort == "" { + return "", fmt.Errorf("docker compose port returned no host binding") + } + return normalizeHostPort(hostPort), nil +} + +func resolveHostPortFromComposeFile(composePath, service, containerPort string) (string, error) { + raw, err := os.ReadFile(composePath) + if err != nil { + return "", err + } + var compose backfillComposeFile + if err := yaml.Unmarshal(raw, &compose); err != nil { + return "", fmt.Errorf("parse compose file: %w", err) + } + svc, ok := compose.Services[service] + if !ok { + return "", fmt.Errorf("service %q not found in compose.generated.yml", service) + } + for _, portEntry := range svc.Ports { + hostPort, ok := matchComposePortEntry(portEntry, containerPort) + if ok { + return normalizeHostPort(hostPort), nil + } + } + return "", fmt.Errorf("service %q does not publish container port %s with a deterministic host port", service, containerPort) +} + +func matchComposePortEntry(entry any, containerPort string) (string, bool) { + switch v := entry.(type) { + case string: + return parseComposePortString(v, containerPort) + case map[string]any: + return parseComposePortMap(v, containerPort) + case map[any]any: + converted := make(map[string]any, len(v)) + for key, value := range v { + converted[fmt.Sprint(key)] = value + } + return parseComposePortMap(converted, containerPort) + default: + return "", false + } +} + +func parseComposePortString(raw, containerPort string) (string, bool) { + raw = strings.TrimSpace(raw) + if raw == "" { + return "", false + } + if idx := strings.Index(raw, "/"); idx >= 0 { + raw = raw[:idx] + } + parts := strings.Split(raw, ":") + if len(parts) < 2 { + return "", false + } + + target := strings.TrimSpace(parts[len(parts)-1]) + if target != containerPort { + return "", false + } + published := strings.TrimSpace(parts[len(parts)-2]) + if published == "" { + return "", false + } + host := "127.0.0.1" + if len(parts) > 2 { + host = strings.Trim(strings.Join(parts[:len(parts)-2], ":"), "[]") + } + return net.JoinHostPort(normalizeHost(host), published), true +} + +func parseComposePortMap(raw map[string]any, containerPort string) (string, bool) { + target := strings.TrimSpace(fmt.Sprint(raw["target"])) + if target != containerPort { + return "", false + } + published := strings.TrimSpace(fmt.Sprint(raw["published"])) + if published == "" || published == "" { + return "", false + } + host := normalizeHost(strings.Trim(strings.TrimSpace(fmt.Sprint(raw["host_ip"])), "[]")) + return net.JoinHostPort(host, published), true +} + +func normalizeHostPort(hostPort string) string { + hostPort = strings.TrimSpace(hostPort) + if hostPort == "" { + return hostPort + } + if strings.HasPrefix(hostPort, ":::") { + return net.JoinHostPort("127.0.0.1", strings.TrimPrefix(hostPort, ":::")) + } + host, port, err := net.SplitHostPort(hostPort) + if err != nil { + return hostPort + } + return net.JoinHostPort(normalizeHost(strings.Trim(host, "[]")), port) +} + +func normalizeHost(host string) string { + host = strings.TrimSpace(host) + switch host { + case "", "0.0.0.0", "::": + return "127.0.0.1" + default: + return host + } +} + +func formatComposeOutputError(prefix string, err error, out []byte) error { + msg := strings.TrimSpace(string(out)) + if msg == "" { + return fmt.Errorf("%s failed: %w", prefix, err) + } + return fmt.Errorf("%s failed: %s", prefix, msg) +} + +type backfillRetainRequest struct { + AgentID string `json:"agent_id"` + Pod string `json:"pod,omitempty"` + Metadata map[string]any `json:"metadata,omitempty"` + Entry json.RawMessage `json:"entry"` +} + +func replayMemoryBackfill(stdout io.Writer, retainURL, authToken string, targets []memoryBackfillTarget, after *time.Time, limit int) (memoryBackfillSummary, error) { + var summary memoryBackfillSummary + client := &http.Client{Timeout: 10 * time.Second} + + for _, target := range targets { + replayed, err := replayHistoryFileToMemory(client, retainURL, authToken, target, after, limit) + if err != nil { + return summary, fmt.Errorf("backfill %q: %w", target.AgentID, err) + } + if replayed == 0 { + summary.SkippedAgents++ + continue + } + summary.Agents++ + summary.Entries += replayed + fmt.Fprintf(stdout, "Replayed %d entr%s for %s\n", replayed, entryPlural(replayed), target.AgentID) + } + + return summary, nil +} + +func replayHistoryFileToMemory(client *http.Client, retainURL, authToken string, target memoryBackfillTarget, after *time.Time, limit int) (int, error) { + f, err := os.Open(target.HistoryPath) + if err != nil { + if os.IsNotExist(err) { + return 0, nil + } + return 0, err + } + defer f.Close() + + scanner := bufio.NewScanner(f) + scanner.Buffer(make([]byte, 0, 64*1024), 8*1024*1024) + replayed := 0 + for scanner.Scan() { + line := bytes.TrimSpace(scanner.Bytes()) + if len(line) == 0 { + continue + } + if after != nil || limit > 0 { + meta, err := parseHistoryEntryHeader(line) + if err != nil { + return replayed, err + } + if after != nil { + ts, err := time.Parse(time.RFC3339, meta.TS) + if err != nil { + return replayed, fmt.Errorf("parse history timestamp %q: %w", meta.TS, err) + } + if !ts.After(*after) { + continue + } + } + if limit > 0 && replayed >= limit { + break + } + if err := postRetainBackfill(client, retainURL, authToken, target, meta.RequestedModel, line); err != nil { + return replayed, err + } + replayed++ + continue + } + meta, err := parseHistoryEntryHeader(line) + if err != nil { + return replayed, err + } + if err := postRetainBackfill(client, retainURL, authToken, target, meta.RequestedModel, line); err != nil { + return replayed, err + } + replayed++ + } + if err := scanner.Err(); err != nil { + return replayed, err + } + return replayed, nil +} + +func parseHistoryEntryHeader(line []byte) (historyEntryHeader, error) { + var header historyEntryHeader + if err := json.Unmarshal(line, &header); err != nil { + return historyEntryHeader{}, fmt.Errorf("parse history entry: %w", err) + } + return header, nil +} + +func postRetainBackfill(client *http.Client, retainURL, authToken string, target memoryBackfillTarget, requestedModel string, rawEntry []byte) error { + payload, err := json.Marshal(backfillRetainRequest{ + AgentID: target.AgentID, + Pod: target.Pod, + Metadata: backfillMetadata(target.Metadata, requestedModel), + Entry: json.RawMessage(append([]byte(nil), rawEntry...)), + }) + if err != nil { + return fmt.Errorf("marshal retain payload: %w", err) + } + + req, err := http.NewRequest(http.MethodPost, retainURL, bytes.NewReader(payload)) + if err != nil { + return fmt.Errorf("build retain request: %w", err) + } + req.Header.Set("Content-Type", "application/json") + req.Header.Set("Accept", "application/json") + if strings.TrimSpace(authToken) != "" { + req.Header.Set("Authorization", "Bearer "+strings.TrimSpace(authToken)) + } + + resp, err := client.Do(req) + if err != nil { + return fmt.Errorf("send retain request: %w", err) + } + defer resp.Body.Close() + if resp.StatusCode < 200 || resp.StatusCode >= 300 { + body, _ := io.ReadAll(io.LimitReader(resp.Body, 2048)) + msg := strings.TrimSpace(string(body)) + if msg == "" { + msg = http.StatusText(resp.StatusCode) + } + return fmt.Errorf("retain returned status %d: %s", resp.StatusCode, msg) + } + return nil +} + +func backfillMetadata(metadata map[string]any, requestedModel string) map[string]any { + if metadata == nil { + return nil + } + out := map[string]any{ + "service": stringFromMap(metadata, "service"), + "type": stringFromMap(metadata, "type"), + "path": "retain", + } + if timezone := stringFromMap(metadata, "timezone"); timezone != "" { + out["timezone"] = timezone + } + if requestedModel != "" { + out["requested_model"] = requestedModel + } + return out +} + +func stringFromMap(values map[string]any, key string) string { + if values == nil { + return "" + } + v, _ := values[key].(string) + return v +} + +func entryPlural(n int) string { + if n == 1 { + return "y" + } + return "ies" +} + +func countPlural(n int) string { + if n == 1 { + return "" + } + return "s" +} + +func runComposeOutputCommandDefault(args ...string) ([]byte, error) { + cmd := exec.Command("docker", args...) + return cmd.CombinedOutput() +} + +func init() { + memoryBackfillCmd.Flags().StringVar(&memoryBackfillAfter, "after", "", "Only replay entries after this RFC3339 timestamp") + memoryBackfillCmd.Flags().IntVar(&memoryBackfillLimit, "limit", 0, "Maximum entries to replay per agent (0 means all)") + memoryBackfillCmd.Flags().StringVar(&memoryBackfillURL, "url", "", "Override the memory retain endpoint URL (defaults to the published service URL)") + memoryBackfillCmd.Flags().StringVar(&memoryBackfillToken, "auth-token", "", "Override the bearer token used for the memory retain endpoint") + memoryBackfillCmd.Flags().StringSliceVar(&memoryBackfillAgent, "agent", nil, "Restrict backfill to specific agent IDs") + memoryCmd.AddCommand(memoryBackfillCmd) + rootCmd.AddCommand(memoryCmd) +} diff --git a/cmd/claw/memory_test.go b/cmd/claw/memory_test.go new file mode 100644 index 0000000..58abcea --- /dev/null +++ b/cmd/claw/memory_test.go @@ -0,0 +1,197 @@ +package main + +import ( + "encoding/json" + "io" + "net/http" + "net/http/httptest" + "os" + "path/filepath" + "strings" + "testing" + "time" +) + +func TestDiscoverMemoryBackfillTargets(t *testing.T) { + podDir := t.TempDir() + writeMemoryContextFixture(t, podDir, "analyst", memoryManifestFile{ + Service: "team-memory", + BaseURL: "http://team-memory:8081", + Retain: &memoryOpEntry{Path: "/retain"}, + Auth: &memoryAuthEntry{Type: "bearer", Token: "memory-token"}, + }, map[string]any{ + "pod": "desk", + "service": "analyst", + "type": "openclaw", + "timezone": "America/New_York", + }) + writeMemoryContextFixture(t, podDir, "reviewer", memoryManifestFile{ + Service: "other-memory", + BaseURL: "http://other-memory:9090", + Retain: &memoryOpEntry{Path: "/retain"}, + }, map[string]any{ + "pod": "desk", + }) + + targets, err := discoverMemoryBackfillTargets(podDir, "team-memory", []string{"analyst"}) + if err != nil { + t.Fatalf("discoverMemoryBackfillTargets: %v", err) + } + if len(targets) != 1 { + t.Fatalf("expected 1 target, got %d", len(targets)) + } + if targets[0].AgentID != "analyst" { + t.Fatalf("unexpected target agent: %+v", targets[0]) + } + if targets[0].Manifest.Auth == nil || targets[0].Manifest.Auth.Token != "memory-token" { + t.Fatalf("expected memory auth token, got %+v", targets[0].Manifest.Auth) + } + if !strings.HasSuffix(targets[0].HistoryPath, filepath.Join(".claw-session-history", "analyst", "history.jsonl")) { + t.Fatalf("unexpected history path: %q", targets[0].HistoryPath) + } +} + +func TestResolveMemoryBackfillURLUsesComposePublishedPort(t *testing.T) { + dir := t.TempDir() + composePath := filepath.Join(dir, "compose.generated.yml") + content := ` +services: + team-memory: + ports: + - "127.0.0.1:7400:8081" +` + if err := os.WriteFile(composePath, []byte(content), 0o644); err != nil { + t.Fatal(err) + } + + got, err := resolveMemoryBackfillURL(composePath, memoryManifestFile{ + Service: "team-memory", + BaseURL: "http://team-memory:8081", + Retain: &memoryOpEntry{Path: "/retain"}, + }, "") + if err != nil { + t.Fatalf("resolveMemoryBackfillURL: %v", err) + } + if got != "http://127.0.0.1:7400/retain" { + t.Fatalf("unexpected retain URL: %q", got) + } +} + +func TestResolveMemoryBackfillURLFallsBackToComposePortCommand(t *testing.T) { + dir := t.TempDir() + composePath := filepath.Join(dir, "compose.generated.yml") + content := ` +services: + team-memory: + ports: + - "8081" +` + if err := os.WriteFile(composePath, []byte(content), 0o644); err != nil { + t.Fatal(err) + } + + prev := runComposeOutputCommand + runComposeOutputCommand = func(args ...string) ([]byte, error) { + return []byte("0.0.0.0:49153\n"), nil + } + defer func() { runComposeOutputCommand = prev }() + + got, err := resolveMemoryBackfillURL(composePath, memoryManifestFile{ + Service: "team-memory", + BaseURL: "http://team-memory:8081", + Retain: &memoryOpEntry{Path: "/retain"}, + }, "") + if err != nil { + t.Fatalf("resolveMemoryBackfillURL: %v", err) + } + if got != "http://127.0.0.1:49153/retain" { + t.Fatalf("unexpected retain URL: %q", got) + } +} + +func TestReplayHistoryFileToMemory(t *testing.T) { + dir := t.TempDir() + historyPath := filepath.Join(dir, "history.jsonl") + content := strings.Join([]string{ + `{"ts":"2026-03-31T12:00:00Z","claw_id":"analyst","requested_model":"claude-3-7-sonnet"}`, + `{"ts":"2026-03-31T12:01:00Z","claw_id":"analyst","requested_model":"claude-3-7-sonnet"}`, + }, "\n") + "\n" + if err := os.WriteFile(historyPath, []byte(content), 0o644); err != nil { + t.Fatal(err) + } + + var authHeader string + var calls []map[string]any + srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { + authHeader = r.Header.Get("Authorization") + body, err := io.ReadAll(r.Body) + if err != nil { + t.Fatalf("read request body: %v", err) + } + var payload map[string]any + if err := json.Unmarshal(body, &payload); err != nil { + t.Fatalf("unmarshal request payload: %v", err) + } + calls = append(calls, payload) + w.WriteHeader(http.StatusAccepted) + })) + defer srv.Close() + + target := memoryBackfillTarget{ + AgentID: "analyst", + HistoryPath: historyPath, + Pod: "desk", + Metadata: map[string]any{ + "service": "analyst", + "type": "openclaw", + "timezone": "America/New_York", + }, + } + after := time.Date(2026, 3, 31, 12, 0, 0, 0, time.UTC) + replayed, err := replayHistoryFileToMemory(srv.Client(), srv.URL, "secret-token", target, &after, 0) + if err != nil { + t.Fatalf("replayHistoryFileToMemory: %v", err) + } + if replayed != 1 { + t.Fatalf("expected 1 replayed entry, got %d", replayed) + } + if authHeader != "Bearer secret-token" { + t.Fatalf("unexpected auth header: %q", authHeader) + } + if len(calls) != 1 { + t.Fatalf("expected 1 retain call, got %d", len(calls)) + } + if calls[0]["agent_id"] != "analyst" || calls[0]["pod"] != "desk" { + t.Fatalf("unexpected top-level retain payload: %+v", calls[0]) + } + metadata := calls[0]["metadata"].(map[string]any) + if metadata["path"] != "retain" || metadata["requested_model"] != "claude-3-7-sonnet" || metadata["timezone"] != "America/New_York" { + t.Fatalf("unexpected metadata payload: %+v", metadata) + } + entry := calls[0]["entry"].(map[string]any) + if entry["claw_id"] != "analyst" { + t.Fatalf("unexpected replayed entry: %+v", entry) + } +} + +func writeMemoryContextFixture(t *testing.T, podDir, agentID string, manifest memoryManifestFile, metadata map[string]any) { + t.Helper() + contextDir := filepath.Join(podDir, ".claw-runtime", "context", agentID) + if err := os.MkdirAll(contextDir, 0o755); err != nil { + t.Fatal(err) + } + manifestRaw, err := json.Marshal(manifest) + if err != nil { + t.Fatal(err) + } + if err := os.WriteFile(filepath.Join(contextDir, "memory.json"), append(manifestRaw, '\n'), 0o644); err != nil { + t.Fatal(err) + } + metadataRaw, err := json.Marshal(metadata) + if err != nil { + t.Fatal(err) + } + if err := os.WriteFile(filepath.Join(contextDir, "metadata.json"), append(metadataRaw, '\n'), 0o644); err != nil { + t.Fatal(err) + } +} diff --git a/docs/plans/2026-03-30-memory-plane-and-pluggable-recall.md b/docs/plans/2026-03-30-memory-plane-and-pluggable-recall.md index 7472fe7..2f01028 100644 --- a/docs/plans/2026-03-30-memory-plane-and-pluggable-recall.md +++ b/docs/plans/2026-03-30-memory-plane-and-pluggable-recall.md @@ -142,10 +142,14 @@ Completed: - automatic attachment of declared feed/tool/memory provider services to `claw-internal` - pre-turn recall and post-turn best-effort retain hooks in `cllama` - provider-format-aware memory injection for OpenAI-style and Anthropic-style requests +- operator replay UX via `claw memory backfill`, which: + - discovers subscribed agents from generated context + - replays the immutable local ledger back through the memory retain contract + - auto-resolves a host-published retain URL when possible + - supports explicit `--url` override when the memory service remains internal-only Still open: -- operator replay UX such as `claw memory backfill` - tombstone/forget flow and replay hygiene - dedicated success telemetry for memory operations - a more scalable replay path than forward-scanning large JSONL files @@ -941,12 +945,23 @@ The architecture should therefore assume a future explicit backfill path, likely - a dedicated CLI flow such as `claw memory backfill` - backend idempotency or replay markers so the same ledger can be consumed safely more than once -The first of these now exists in-tree, and subscribed memory services receive a dedicated replay token plus history URL projection. +The first two now exist in-tree: -What still does **not** exist is the operator-facing replay UX. A memory service can rebuild from the ledger, but the repo does not yet provide the final `claw memory backfill` workflow. +- `cllama` exposes a scoped history read API +- subscribed memory services receive dedicated replay token plus history URL projection +- operators can trigger replay with `claw memory backfill` The retain webhook is the low-latency path. Backfill is the durability path for new or recovering services. +The current CLI shape is deliberately pragmatic: + +- it uses the local immutable ledger as the replay source +- it replays through the memory service's declared `retain` endpoint +- it auto-discovers a host URL when the service publishes a host port +- it allows `--url` override when the service is reachable some other way + +That is enough to make replay operational today without adding a second memory-specific control plane. A future backend-native replay trigger may still be worth adding later. + --- ## Relationship to Runner-Native Memory @@ -985,7 +1000,7 @@ Clawdapus should not attempt to disable runner-native memory features globally. ### Milestone 1: Complete ADR-018 Phase 2 and define backfill -Status: mostly complete on this branch. +Status: functionally complete on this branch, with scaling and idempotency work still open. Add the self-scoped history read surface to `cllama`. @@ -1000,7 +1015,6 @@ This milestone should also define the expected operational backfill flow for new What remains here is mostly operator UX and replay ergonomics: -- `claw memory backfill` - replay markers or idempotency guidance - better large-history replay performance From 1f2d06251b09bd5a393614b3a84ec7ce5f7d7abc Mon Sep 17 00:00:00 2001 From: Wojtek Date: Tue, 31 Mar 2026 19:17:17 -0400 Subject: [PATCH 15/18] Tighten memory backfill replay handling --- cmd/claw/memory.go | 104 +++++++++++++++++++---------- cmd/claw/memory_test.go | 143 +++++++++++++++++++++++++++++++++++++++- 2 files changed, 211 insertions(+), 36 deletions(-) diff --git a/cmd/claw/memory.go b/cmd/claw/memory.go index 9e96ebc..2caa4de 100644 --- a/cmd/claw/memory.go +++ b/cmd/claw/memory.go @@ -3,6 +3,7 @@ package main import ( "bufio" "bytes" + "context" "encoding/json" "fmt" "io" @@ -68,6 +69,8 @@ type backfillComposeService struct { Ports []any `yaml:"ports"` } +const defaultMemoryBackfillTimeout = 10 * time.Second + var memoryCmd = &cobra.Command{ Use: "memory", Short: "Operate on configured memory services", @@ -97,17 +100,15 @@ var memoryBackfillCmd = &cobra.Command{ if err != nil { return err } + if err := validateSharedBackfillTargetShape(targets, strings.TrimSpace(memoryBackfillURL) != ""); err != nil { + return err + } retainURL, err := resolveMemoryBackfillURL(generatedPath, targets[0].Manifest, memoryBackfillURL) if err != nil { return err } - token := strings.TrimSpace(memoryBackfillToken) - if token == "" && targets[0].Manifest.Auth != nil && strings.EqualFold(targets[0].Manifest.Auth.Type, "bearer") { - token = targets[0].Manifest.Auth.Token - } - - summary, err := replayMemoryBackfill(cmd.OutOrStdout(), retainURL, token, targets, after, limit) + summary, err := replayMemoryBackfill(cmd.OutOrStdout(), retainURL, strings.TrimSpace(memoryBackfillToken), targets, after, limit) if err != nil { return err } @@ -226,6 +227,8 @@ func discoverMemoryBackfillTargets(podDir, memoryService string, requestedAgents return targets, nil } +// Unlike `claw history export`, a backfill limit of 0 means "replay all" +// because this command is primarily used for full rebuilds and recovery. func normalizeBackfillLimit(limit int) (int, error) { if limit < 0 { return 0, fmt.Errorf("limit must be >= 0") @@ -233,6 +236,24 @@ func normalizeBackfillLimit(limit int) (int, error) { return limit, nil } +func validateSharedBackfillTargetShape(targets []memoryBackfillTarget, urlOverride bool) error { + if len(targets) == 0 { + return nil + } + first := targets[0].Manifest + firstPath := manifestRetainPath(first) + for _, target := range targets[1:] { + currentPath := manifestRetainPath(target.Manifest) + if !urlOverride && target.Manifest.BaseURL != first.BaseURL { + return fmt.Errorf("memory service %q has inconsistent base URLs across subscribed agents; pass --url to override", first.Service) + } + if currentPath != firstPath { + return fmt.Errorf("memory service %q has inconsistent retain paths across subscribed agents", first.Service) + } + } + return nil +} + func resolveMemoryBackfillURL(composePath string, manifest memoryManifestFile, override string) (string, error) { retainPath := manifestRetainPath(manifest) if retainPath == "" { @@ -421,10 +442,10 @@ type backfillRetainRequest struct { func replayMemoryBackfill(stdout io.Writer, retainURL, authToken string, targets []memoryBackfillTarget, after *time.Time, limit int) (memoryBackfillSummary, error) { var summary memoryBackfillSummary - client := &http.Client{Timeout: 10 * time.Second} + client := &http.Client{} for _, target := range targets { - replayed, err := replayHistoryFileToMemory(client, retainURL, authToken, target, after, limit) + replayed, err := replayHistoryFileToMemory(client, retainURL, effectiveBackfillAuthToken(authToken, target.Manifest), backfillRetainTimeout(target.Manifest), target, after, limit) if err != nil { return summary, fmt.Errorf("backfill %q: %w", target.AgentID, err) } @@ -440,7 +461,24 @@ func replayMemoryBackfill(stdout io.Writer, retainURL, authToken string, targets return summary, nil } -func replayHistoryFileToMemory(client *http.Client, retainURL, authToken string, target memoryBackfillTarget, after *time.Time, limit int) (int, error) { +func effectiveBackfillAuthToken(override string, manifest memoryManifestFile) string { + if strings.TrimSpace(override) != "" { + return strings.TrimSpace(override) + } + if manifest.Auth != nil && strings.EqualFold(manifest.Auth.Type, "bearer") { + return manifest.Auth.Token + } + return "" +} + +func backfillRetainTimeout(manifest memoryManifestFile) time.Duration { + if manifest.Retain != nil && manifest.Retain.TimeoutMS > 0 { + return time.Duration(manifest.Retain.TimeoutMS) * time.Millisecond + } + return defaultMemoryBackfillTimeout +} + +func replayHistoryFileToMemory(client *http.Client, retainURL, authToken string, timeout time.Duration, target memoryBackfillTarget, after *time.Time, limit int) (int, error) { f, err := os.Open(target.HistoryPath) if err != nil { if os.IsNotExist(err) { @@ -458,34 +496,23 @@ func replayHistoryFileToMemory(client *http.Client, retainURL, authToken string, if len(line) == 0 { continue } - if after != nil || limit > 0 { - meta, err := parseHistoryEntryHeader(line) - if err != nil { - return replayed, err - } - if after != nil { - ts, err := time.Parse(time.RFC3339, meta.TS) - if err != nil { - return replayed, fmt.Errorf("parse history timestamp %q: %w", meta.TS, err) - } - if !ts.After(*after) { - continue - } - } - if limit > 0 && replayed >= limit { - break - } - if err := postRetainBackfill(client, retainURL, authToken, target, meta.RequestedModel, line); err != nil { - return replayed, err - } - replayed++ - continue - } meta, err := parseHistoryEntryHeader(line) if err != nil { return replayed, err } - if err := postRetainBackfill(client, retainURL, authToken, target, meta.RequestedModel, line); err != nil { + if limit > 0 && replayed >= limit { + break + } + if after != nil { + ts, err := time.Parse(time.RFC3339, meta.TS) + if err != nil { + return replayed, fmt.Errorf("parse history timestamp %q: %w", meta.TS, err) + } + if !ts.After(*after) { + continue + } + } + if err := postRetainBackfill(client, retainURL, authToken, timeout, target, meta.RequestedModel, line); err != nil { return replayed, err } replayed++ @@ -504,7 +531,7 @@ func parseHistoryEntryHeader(line []byte) (historyEntryHeader, error) { return header, nil } -func postRetainBackfill(client *http.Client, retainURL, authToken string, target memoryBackfillTarget, requestedModel string, rawEntry []byte) error { +func postRetainBackfill(client *http.Client, retainURL, authToken string, timeout time.Duration, target memoryBackfillTarget, requestedModel string, rawEntry []byte) error { payload, err := json.Marshal(backfillRetainRequest{ AgentID: target.AgentID, Pod: target.Pod, @@ -515,7 +542,10 @@ func postRetainBackfill(client *http.Client, retainURL, authToken string, target return fmt.Errorf("marshal retain payload: %w", err) } - req, err := http.NewRequest(http.MethodPost, retainURL, bytes.NewReader(payload)) + ctx, cancel := context.WithTimeout(context.Background(), timeout) + defer cancel() + + req, err := http.NewRequestWithContext(ctx, http.MethodPost, retainURL, bytes.NewReader(payload)) if err != nil { return fmt.Errorf("build retain request: %w", err) } @@ -541,6 +571,10 @@ func postRetainBackfill(client *http.Client, retainURL, authToken string, target return nil } +// backfillMetadata intentionally mirrors only the stable subset of live retain +// metadata. Replay is derived from the ledger, so it does not attempt to +// reconstruct transient per-request state beyond fields that materially affect +// memory indexing or policy. func backfillMetadata(metadata map[string]any, requestedModel string) map[string]any { if metadata == nil { return nil diff --git a/cmd/claw/memory_test.go b/cmd/claw/memory_test.go index 58abcea..ca618a3 100644 --- a/cmd/claw/memory_test.go +++ b/cmd/claw/memory_test.go @@ -12,6 +12,19 @@ import ( "time" ) +func TestNormalizeBackfillLimit(t *testing.T) { + limit, err := normalizeBackfillLimit(0) + if err != nil { + t.Fatalf("normalizeBackfillLimit(0): %v", err) + } + if limit != 0 { + t.Fatalf("expected 0 to mean unlimited, got %d", limit) + } + if _, err := normalizeBackfillLimit(-1); err == nil { + t.Fatal("expected negative limit to fail") + } +} + func TestDiscoverMemoryBackfillTargets(t *testing.T) { podDir := t.TempDir() writeMemoryContextFixture(t, podDir, "analyst", memoryManifestFile{ @@ -141,6 +154,10 @@ func TestReplayHistoryFileToMemory(t *testing.T) { AgentID: "analyst", HistoryPath: historyPath, Pod: "desk", + Manifest: memoryManifestFile{ + Service: "team-memory", + Retain: &memoryOpEntry{Path: "/retain", TimeoutMS: 1000}, + }, Metadata: map[string]any{ "service": "analyst", "type": "openclaw", @@ -148,7 +165,7 @@ func TestReplayHistoryFileToMemory(t *testing.T) { }, } after := time.Date(2026, 3, 31, 12, 0, 0, 0, time.UTC) - replayed, err := replayHistoryFileToMemory(srv.Client(), srv.URL, "secret-token", target, &after, 0) + replayed, err := replayHistoryFileToMemory(srv.Client(), srv.URL, "secret-token", backfillRetainTimeout(target.Manifest), target, &after, 0) if err != nil { t.Fatalf("replayHistoryFileToMemory: %v", err) } @@ -174,6 +191,130 @@ func TestReplayHistoryFileToMemory(t *testing.T) { } } +func TestReplayHistoryFileToMemoryHonorsLimit(t *testing.T) { + dir := t.TempDir() + historyPath := filepath.Join(dir, "history.jsonl") + content := strings.Join([]string{ + `{"ts":"2026-03-31T12:00:00Z","claw_id":"analyst","requested_model":"m1"}`, + `{"ts":"2026-03-31T12:01:00Z","claw_id":"analyst","requested_model":"m2"}`, + }, "\n") + "\n" + if err := os.WriteFile(historyPath, []byte(content), 0o644); err != nil { + t.Fatal(err) + } + + calls := 0 + srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { + calls++ + w.WriteHeader(http.StatusNoContent) + })) + defer srv.Close() + + target := memoryBackfillTarget{ + AgentID: "analyst", + HistoryPath: historyPath, + Manifest: memoryManifestFile{ + Service: "team-memory", + Retain: &memoryOpEntry{Path: "/retain"}, + }, + } + replayed, err := replayHistoryFileToMemory(srv.Client(), srv.URL, "", backfillRetainTimeout(target.Manifest), target, nil, 1) + if err != nil { + t.Fatalf("replayHistoryFileToMemory: %v", err) + } + if replayed != 1 || calls != 1 { + t.Fatalf("expected 1 replayed entry and call, got replayed=%d calls=%d", replayed, calls) + } +} + +func TestReplayHistoryFileToMemoryReturnsZeroForMissingHistory(t *testing.T) { + target := memoryBackfillTarget{ + AgentID: "analyst", + HistoryPath: filepath.Join(t.TempDir(), "missing.jsonl"), + Manifest: memoryManifestFile{ + Service: "team-memory", + Retain: &memoryOpEntry{Path: "/retain"}, + }, + } + replayed, err := replayHistoryFileToMemory(&http.Client{}, "http://example.invalid/retain", "", backfillRetainTimeout(target.Manifest), target, nil, 0) + if err != nil { + t.Fatalf("replayHistoryFileToMemory: %v", err) + } + if replayed != 0 { + t.Fatalf("expected 0 replayed entries, got %d", replayed) + } +} + +func TestReplayHistoryFileToMemoryFailsOnNon2xx(t *testing.T) { + dir := t.TempDir() + historyPath := filepath.Join(dir, "history.jsonl") + if err := os.WriteFile(historyPath, []byte(`{"ts":"2026-03-31T12:00:00Z","claw_id":"analyst","requested_model":"m1"}`+"\n"), 0o644); err != nil { + t.Fatal(err) + } + + srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { + http.Error(w, "boom", http.StatusBadGateway) + })) + defer srv.Close() + + target := memoryBackfillTarget{ + AgentID: "analyst", + HistoryPath: historyPath, + Manifest: memoryManifestFile{ + Service: "team-memory", + Retain: &memoryOpEntry{Path: "/retain"}, + }, + } + _, err := replayHistoryFileToMemory(srv.Client(), srv.URL, "", backfillRetainTimeout(target.Manifest), target, nil, 0) + if err == nil || !strings.Contains(err.Error(), "retain returned status 502") { + t.Fatalf("expected non-2xx retain error, got %v", err) + } +} + +func TestReplayMemoryBackfillUsesPerTargetAuthAndTimeout(t *testing.T) { + dir := t.TempDir() + historyPath := filepath.Join(dir, "history.jsonl") + if err := os.WriteFile(historyPath, []byte(`{"ts":"2026-03-31T12:00:00Z","claw_id":"analyst","requested_model":"m1"}`+"\n"), 0o644); err != nil { + t.Fatal(err) + } + + auths := make(chan string, 2) + srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { + auths <- r.Header.Get("Authorization") + w.WriteHeader(http.StatusNoContent) + })) + defer srv.Close() + + targets := []memoryBackfillTarget{ + { + AgentID: "a1", + HistoryPath: historyPath, + Manifest: memoryManifestFile{ + Service: "team-memory", + Retain: &memoryOpEntry{Path: "/retain", TimeoutMS: 25}, + Auth: &memoryAuthEntry{Type: "bearer", Token: "token-1"}, + }, + }, + { + AgentID: "a2", + HistoryPath: historyPath, + Manifest: memoryManifestFile{ + Service: "team-memory", + Retain: &memoryOpEntry{Path: "/retain", TimeoutMS: 25}, + Auth: &memoryAuthEntry{Type: "bearer", Token: "token-2"}, + }, + }, + } + + if _, err := replayMemoryBackfill(io.Discard, srv.URL, "", targets, nil, 1); err != nil { + t.Fatalf("replayMemoryBackfill: %v", err) + } + + got := []string{<-auths, <-auths} + if !(got[0] == "Bearer token-1" && got[1] == "Bearer token-2" || got[0] == "Bearer token-2" && got[1] == "Bearer token-1") { + t.Fatalf("unexpected auth headers: %+v", got) + } +} + func writeMemoryContextFixture(t *testing.T, podDir, agentID string, manifest memoryManifestFile, metadata map[string]any) { t.Helper() contextDir := filepath.Join(podDir, ".claw-runtime", "context", agentID) From 90410605fcbd5909d7b4194023946cc59f5abe7b Mon Sep 17 00:00:00 2001 From: Wojtek Date: Tue, 31 Mar 2026 19:24:25 -0400 Subject: [PATCH 16/18] Add memory telemetry and update plan --- cllama | 2 +- docs/plans/2026-03-30-memory-plane-and-pluggable-recall.md | 6 +++++- 2 files changed, 6 insertions(+), 2 deletions(-) diff --git a/cllama b/cllama index 703452f..5ecb944 160000 --- a/cllama +++ b/cllama @@ -1 +1 @@ -Subproject commit 703452f15134ece07b6e54a7747d654b1f017446 +Subproject commit 5ecb944837c927318dc9b41db5a3cb49ec023329 diff --git a/docs/plans/2026-03-30-memory-plane-and-pluggable-recall.md b/docs/plans/2026-03-30-memory-plane-and-pluggable-recall.md index 2f01028..f770516 100644 --- a/docs/plans/2026-03-30-memory-plane-and-pluggable-recall.md +++ b/docs/plans/2026-03-30-memory-plane-and-pluggable-recall.md @@ -142,6 +142,10 @@ Completed: - automatic attachment of declared feed/tool/memory provider services to `claw-internal` - pre-turn recall and post-turn best-effort retain hooks in `cllama` - provider-format-aware memory injection for OpenAI-style and Anthropic-style requests +- structured `memory_op` telemetry in `cllama` for: + - recall skipped, succeeded, timed out, and failed outcomes + - retain skipped, succeeded, timed out, and failed outcomes + - latency, HTTP status, block count, and injected byte count where applicable - operator replay UX via `claw memory backfill`, which: - discovers subscribed agents from generated context - replays the immutable local ledger back through the memory retain contract @@ -151,8 +155,8 @@ Completed: Still open: - tombstone/forget flow and replay hygiene -- dedicated success telemetry for memory operations - a more scalable replay path than forward-scanning large JSONL files +- retain/recall policy filtering and policy-removal accounting - ADR-020 mediated tool runtime That means the plan should now be read as: From 4819dcbee9d524a5fe974e1d62b746ab91b27cf8 Mon Sep 17 00:00:00 2001 From: Wojtek Date: Tue, 31 Mar 2026 20:17:08 -0400 Subject: [PATCH 17/18] Tighten memory telemetry follow-up --- cllama | 2 +- docs/CLLAMA_SPEC.md | 12 +++++++++--- 2 files changed, 10 insertions(+), 4 deletions(-) diff --git a/cllama b/cllama index 5ecb944..0696443 160000 --- a/cllama +++ b/cllama @@ -1 +1 @@ -Subproject commit 5ecb944837c927318dc9b41db5a3cb49ec023329 +Subproject commit 0696443552598fbcf56f61d5982cf73746433f9b diff --git a/docs/CLLAMA_SPEC.md b/docs/CLLAMA_SPEC.md index 083363e..7f04f5e 100644 --- a/docs/CLLAMA_SPEC.md +++ b/docs/CLLAMA_SPEC.md @@ -84,10 +84,16 @@ The (potentially amended) response is returned to the agent container. The `cllama` proxy MUST emit structured JSON logs to `stdout`. Clawdapus collects these logs for the `claw audit` command. Logs must contain the following fields: -- `timestamp`: ISO-8601. +- `ts`: ISO-8601 UTC timestamp. - `claw_id`: The calling agent. -- `type`: `request`, `response`, `intervention`, or `drift_score`. -- `intervention_reason`: If the proxy modified a prompt, dropped a tool, or amended a response, it must describe *why*, referencing the specific policy module or `enforce` rule that triggered the intervention. +- `type`: one of `request`, `response`, `error`, `intervention`, `feed_fetch`, `provider_pool`, or `memory_op`. +- `intervention`: If the proxy modified a prompt or routing decision, it describes why. + +Event-specific fields may also be present: +- `status_code`, `latency_ms`, `tokens_in`, `tokens_out`, `cost_usd` for request/response/error events +- `feed_name`, `feed_url` for feed fetch events +- `provider`, `key_id`, `action`, `reason`, `cooldown_until` for provider-pool events +- `memory_service`, `memory_op`, `memory_status`, `memory_blocks`, `memory_bytes`, `memory_removed` for memory telemetry events ## 6. Session History From 02831d79372495bff5cb26aaa70990488cd0b075 Mon Sep 17 00:00:00 2001 From: Wojtek Date: Tue, 31 Mar 2026 22:36:43 -0400 Subject: [PATCH 18/18] Update memory plane plan status --- ...03-30-memory-plane-and-pluggable-recall.md | 52 ++++++++++++++----- 1 file changed, 40 insertions(+), 12 deletions(-) diff --git a/docs/plans/2026-03-30-memory-plane-and-pluggable-recall.md b/docs/plans/2026-03-30-memory-plane-and-pluggable-recall.md index f770516..fd70710 100644 --- a/docs/plans/2026-03-30-memory-plane-and-pluggable-recall.md +++ b/docs/plans/2026-03-30-memory-plane-and-pluggable-recall.md @@ -128,6 +128,11 @@ This parallel should be treated as core architectural framing, not as an inciden This branch now implements the core memory-plane substrate that this plan was proposing. +This plan should now be read alongside ADR-021 as its implementation-status companion: + +- ADR-021 carries the architectural decision +- this document tracks implementation status, intentional deviations, and remaining work + Completed: - descriptor `version: 2` support for `tools[]` and `memory` @@ -159,6 +164,17 @@ Still open: - retain/recall policy filtering and policy-removal accounting - ADR-020 mediated tool runtime +Implemented with minor intentional drift from the first sketch: + +- recall currently sends the full inbound `messages` payload and, for Anthropic requests, the top-level `system` field rather than pre-shaping a smaller recent-message slice +- `claw memory backfill` currently replays the local immutable ledger through the memory service's declared `retain` endpoint rather than through a backend-native replay control plane + +These are implementation-shape choices, not architectural deviations. They preserve the core model: + +- `cllama` owns orchestration +- the ledger remains the source of truth +- the memory backend remains swappable behind the same contract + That means the plan should now be read as: - architectural rationale for the memory plane @@ -1017,6 +1033,13 @@ Benefits: This milestone should also define the expected operational backfill flow for new or recovering memory services. +Current branch status: + +- `GET /history/{agentID}` exists as the scoped history read surface +- subscribed memory services receive dedicated replay credentials +- `claw memory backfill` provides an operator-facing replay path today +- replay currently uses the local immutable ledger as its source and the memory service's `retain` endpoint as its sink + What remains here is mostly operator UX and replay ergonomics: - replay markers or idempotency guidance @@ -1024,7 +1047,7 @@ What remains here is mostly operator UX and replay ergonomics: ### Milestone 2: Add the memory capability and `cllama` hooks -Status: core complete on this branch. +Status: functionally complete on this branch, with governance hardening still open. Implement: @@ -1043,7 +1066,6 @@ This is the first full end-to-end memory plane. The main remaining gaps are: -- success-path observability for recall and retain - governed forget/tombstone semantics - policy filtering on retain and recall - any optional payload-bounding refinements beyond the current fixed request shape @@ -1085,7 +1107,7 @@ This is not a full implementation checklist, but it identifies the likely change - `internal/cllama/context.go` - `internal/pod/compose_emit.go` - `docs/CLLAMA_SPEC.md` -- a new ADR once the plan is accepted +- `docs/decisions/021-memory-plane-and-pluggable-recall.md` ### cllama submodule @@ -1114,6 +1136,7 @@ Current implementation note: - the branch currently sends the full inbound `messages` payload and, for Anthropic requests, the top-level `system` field - this is acceptable as a first implementation because the service may ignore what it does not need - if payload size becomes a practical problem, bounded recent-context shaping can be added later without changing the core contract +- this means the architecture is implemented, but one of the intended payload-tightening refinements remains open ### 2. Should recall responses support categories? @@ -1177,11 +1200,16 @@ The first version should therefore treat: - `agent_id` - `pod` -- bounded recent messages +- recent conversation context - whatever stable metadata is already present as the minimum recall input. +Current implementation note: + +- the branch currently forwards full inbound request messages as that recent conversation context +- richer stitching metadata is still future work + Richer stitching metadata may require later surface-specific propagation through headers, request bodies, or runner config. --- @@ -1201,9 +1229,9 @@ This plan does not propose: --- -## Decision Shape For A Future ADR +## Decision Shape Captured In ADR-021 -If this plan is accepted, the future ADR should probably decide the following: +ADR-021 now captures the main architectural decisions this plan was arguing for: 1. Memory is a first-class Clawdapus plane with compile-time wiring. 2. `cllama` owns pre-turn recall orchestration and post-turn retain orchestration. @@ -1217,10 +1245,10 @@ If this plan is accepted, the future ADR should probably decide the following: ## Recommended Next Step -The next document should likely be an ADR that: +The next implementation work should focus on the remaining hardening gaps: -- cites ADR-018 and ADR-020 explicitly as prior art -- resolves the descriptor versioning question -- treats backfill as a first-class operation -- defines the fixed recall and retain wire contracts -- states clearly that the memory plane is for derived durable state, not transcript tails +- replay markers or idempotency guidance for safe repeated backfill +- retain-side and recall-side policy filtering +- governed forget/tombstone semantics and replay hygiene +- improved replay scalability for large ledgers +- a boring reference memory adapter that proves the contract end to end