Status: Draft (v1)
cllama is an open standard and reference architecture for a context-aware, bidirectional Large Language Model (LLM) governance proxy. It is designed to run as a shared pod-level service managed by Clawdapus, serving multiple autonomous agents (Claws) within the same pod.
This document defines the contract between Clawdapus (the orchestrator) and a cllama proxy (the policy enforcer). Any container image that adheres to this specification can be used as a CLLAMA proxy.
- Bidirectional Interception:
cllamaintercepts outbound prompts (agent → provider) and inbound responses (provider → agent). - Multi-Agent Identity: A single proxy serves multiple agents. Identity is established via unique per-agent Bearer Tokens supplied in the
Authorizationheader. - Intelligent Authorization: The proxy is context-aware. It uses the bearer token to load the specific agent's identity, active rules (
enforce), and available tools to make dynamic allow/deny/amend decisions. - Credential Starvation: The proxy acts as a secure firewall. Agent containers are provisioned with unique dummy tokens. The proxy holds the real provider API keys, preventing agents from bypassing governance.
- Conversational Upgradability: While not strictly required for v1, the proxy architecture is designed to eventually support natural language configuration (updating rules dynamically via conversation).
A cllama sidecar MUST expose an HTTP API compatible with the OpenAI Chat Completions API (POST /v1/chat/completions).
- Listen Port: The proxy MUST listen on
0.0.0.0:8080. - Base URL Replacement: Clawdapus configures the agent's runner (e.g., OpenClaw, Claude Code) to use
http://cllama-<type>:8080/v1as its LLM base URL (first proxy in chain when chaining is enabled). - Implementation Scope (Phase 4): The wire protocol supports chained proxies, but runtime enforcement currently allows only one proxy type per pod. Declaring multiple proxy types fails fast until Phase 5 chain execution is implemented.
Clawdapus injects the pod's operational context into the cllama container at startup. Because a single proxy serves multiple agents, context is provided through a combination of global environment variables and a Shared Context Mount.
| Variable | Description |
|---|---|
CLAW_POD |
The name of the pod (e.g., crypto-ops). |
PROVIDER_API_KEY_* |
The real provider keys (e.g., OPENAI_API_KEY, ANTHROPIC_API_KEY) supplied securely by the operator. |
CLAW_CONTEXT_ROOT |
The path to the shared context directory (defaults to /claw/context). |
Clawdapus bind-mounts a shared directory into the proxy (at CLAW_CONTEXT_ROOT) containing subdirectories for every agent in the pod. The directory name matches the agent's ID.
/claw/context/
├── crypto-crusher-0/
│ ├── AGENTS.md # Compiled contract (includes, enforce, guide)
│ ├── CLAWDAPUS.md # Infrastructure map
│ └── metadata.json # Identity, handles, and active policy modules
├── crypto-crusher-1/
│ └── ...
When an agent makes a request to the proxy, it MUST include a unique Bearer Token in the Authorization header:
Authorization: Bearer <agent-id>:<secure-secret>
The proxy SHOULD execute the following pipeline:
- Identity Resolution: The proxy uses the
<agent-id>portion (e.g.,crypto-crusher-0) to resolve the agent's context from the corresponding subdirectory inCLAW_CONTEXT_ROOT. - Authentication: The proxy MUST validate the
<secure-secret>before processing the request. - Model Validation: Ensure the requested
modelis within theCLAW_ALLOWED_MODELSlist (parsed frommetadata.json).
- Context Aggregation: The proxy parses the
enforcerules from the agent-specificAGENTS.md. - Tool Scoping: If the agent's request contains
tools, the proxy evaluates the tools against the agent's identity and active policy modules. The proxy MAY drop tools the agent is not authorized to use. - Prompt Decoration (Pre-Prompting): The proxy MAY modify the outbound
messagesarray, injecting specific rules, priorities, or warnings based on the aggregated context. - Policy Blocking: If the outbound prompt violates a loaded policy module, the proxy MAY short-circuit the request and return an error or a mock response.
- Forced Model Routing & Rate Limiting (Compute Metering): Even if the agent requests a specific model (e.g.,
gpt-4o), the proxy MAY seamlessly rewrite the request to use a different, operator-approved model (e.g.,claude-3-haiku-20240307) or provider. The proxy MAY also enforce hard rate limits (returning429 Too Many Requests). This allows the proxy to enforce strict compute budgets, meter usage, and prevent runaway agents from burning tokens, all without the agent knowing its model was downgraded or throttled by infrastructure.
The proxy strips the dummy token, attaches the real PROVIDER_API_KEY, and forwards the decorated request to the upstream LLM provider.
- Response Evaluation: The proxy evaluates the provider's response against the
enforcerules in/claw/context/<agent-id>/AGENTS.mdand the activeCLAW_POLICY_MODULES. - Amendment: If the response contains restricted information (e.g., PII leakage) or violates the tone/instructions of the contract, the proxy MAY rewrite the content.
- Drift Scoring: The proxy analyzes how far the provider's raw response drifted from the agent's ideal behavior defined in the contract. It MUST emit a structured log of this drift score.
The (potentially amended) response is returned to the agent container.
The cllama proxy MUST emit structured JSON logs to stdout. Clawdapus collects these logs for the claw audit command.
Logs must contain the following fields:
ts: ISO-8601 UTC timestamp.claw_id: The calling agent.type: one ofrequest,response,error,intervention,feed_fetch,provider_pool, ormemory_op.intervention: If the proxy modified a prompt or routing decision, it describes why.
Event-specific fields may also be present:
status_code,latency_ms,tokens_in,tokens_out,cost_usdfor request/response/error eventsfeed_name,feed_urlfor feed fetch eventsprovider,key_id,action,reason,cooldown_untilfor provider-pool eventsmemory_service,memory_op,memory_status,memory_blocks,memory_bytes,memory_removedfor memory telemetry events
cllama writes a durable JSONL session history at the proxy boundary. This is an infrastructure-owned record of every completed inference transaction — written by the proxy, not by agents. It is distinct from /claw/memory, which is runner-owned and agent-writable.
| Variable | Default | Description |
|---|---|---|
CLAW_SESSION_HISTORY_DIR |
/claw/session-history |
Host-side base directory for per-agent JSONL history files. When set, cllama writes one file per agent at <dir>/<agent-id>/history.jsonl. |
When orchestrated by Clawdapus, claw up automatically bind-mounts .claw-session-history/ (relative to the pod file) into the cllama container at /claw/session-history whenever cllama is enabled for any service in the pod.
/claw/session-history/
├── crypto-crusher-0/
│ └── history.jsonl
├── crypto-crusher-1/
│ └── history.jsonl
One JSONL file per agent. Each line is a single entry. Entries are appended on every successful upstream completion (HTTP 2xx only). Non-2xx responses are not recorded in session history; they appear only in structured audit logs (see §5).
Each line is a JSON object with the following fields:
| Field | Type | Description |
|---|---|---|
version |
integer | Schema version. Currently 1. |
id |
string | Stable source-event ID for replay and deduplication. |
ts |
string | RFC3339 timestamp of when the response was received. |
claw_id |
string | Agent ID that issued the request. |
path |
string | Request path (e.g., /v1/chat/completions). |
requested_model |
string | Model string as sent by the agent. |
effective_provider |
string | Provider name after routing (e.g., anthropic). |
effective_model |
string | Model name forwarded to the upstream provider. |
status_code |
integer | HTTP status code returned by the upstream. |
stream |
boolean | Whether the response was streamed (SSE). |
request_original |
object | The request body as received from the agent, before any proxy modification. |
request_effective |
object | The request body as forwarded to the upstream provider, after credential swap and any model rewrite. |
response |
object | See response payload below. |
usage |
object | Token counts extracted from the response: prompt_tokens (integer), completion_tokens (integer). |
usage.reported_cost_usd |
number (float) | Cost in USD reported by the provider; omitted when not available |
Response payload (response field):
| Field | Type | Description |
|---|---|---|
format |
string | "json" for standard JSON responses; "sse" for Server-Sent Events streams. |
json |
object | Present when format is "json". The parsed response body. |
text |
string | Present when format is "sse". The raw event stream text. |
Phase 1 is retention only. cllama writes the history; no read API exists. Agents cannot query their own history programmatically. No prompt injection, no retrieval, no summarization. The JSONL files are accessible to operators via the host filesystem mount for offline analysis, auditing, and future tooling.
Clawdapus provides a reference image: ghcr.io/mostlydev/cllama.
The passthrough reference:
- Adheres to the v1 HTTP API and Listen Port.
- Validates the environment (
CLAW_POD,CLAW_CONTEXT_ROOT, provider credentials), bearer-token identity resolution, and mounts. - Acts as a pure, transparent proxy (no decoration, no amendment).
- Emits structured logs of all traffic.
This image is used for testing network integration and serves as the boilerplate for operators to build proprietary cllama policy engines (e.g., incorporating advanced DLP, RAG-based context injection, or conversational configuration).
Tools like ClawRouter act as specialized instances of a cllama proxy focused entirely on forced model routing, rate limiting, and compute metering. A routing proxy seamlessly intercepts model requests, evaluates them against organizational budgets or provider availability, and dynamically routes, downgrades, or rate-limits the request to strictly contain costs across a fleet of untrusted agents.