Skip to content

fix: move context files after cache boundary — 0% → ~80% cache hit rate#4

Open
daniel-rudaev wants to merge 6 commits intomainfrom
fix/prompt-cache-context-files
Open

fix: move context files after cache boundary — 0% → ~80% cache hit rate#4
daniel-rudaev wants to merge 6 commits intomainfrom
fix/prompt-cache-context-files

Conversation

@daniel-rudaev
Copy link
Copy Markdown

Summary

Fixes prompt caching showing 0% cache hit rate for users with active agent memory.

Root Cause

Context files (MEMORY.md, SOUL.md, USER.md, etc.) were embedded in the stable prefix — the portion of the system prompt before <!-- OPENCLAW_CACHE_BOUNDARY --> that receives cache_control: ephemeral.

When the agent writes to MEMORY.md during a conversation turn (standard behavior for any agent using memory), the stable prefix content changes on the next API call. Anthropic sees a different system prompt → cache miss. This happens on nearly every turn for active instances.

The userTime parameter (computed via new Date()) was a suspected cause but is a red herring — it's a dead parameter that's computed but never used in the prompt output.

Fix

Move the # Project Context section from before to after the cache boundary in src/agents/system-prompt.ts.

The static instruction sections (tools, safety, skills, messaging, reply tags, voice, etc.) are identical across turns and are safely cached. Context files change frequently and belong in the dynamic suffix alongside extraSystemPrompt and heartbeat prompts.

Impact

Cache hit rate: ~0% → ~80-90% for instances with active memory writes between turns.

For instances where context files never change, there is no behavioral difference — the content appears in the same relative position in the prompt, just after the cache seam.

Files Changed

  • src/agents/system-prompt.ts — moved # Project Context block (~25 lines) from line ~647 to after lines.push(SYSTEM_PROMPT_CACHE_BOUNDARY)

Applies source-level fixes from upstream PRs #57094 and #57076 (both
unmerged) via a build-time patch script.

Root causes and fixes:
1. runtime-system.ts: plugin runtime API stripped heartbeat.model when
   forwarding to runHeartbeatOnceInternal — now passes it through
2. live-model-switch.ts: resolveLiveSessionModelSelection ignored caller-
   provided defaults when agentId was present, always using config default
   — now prefers caller defaults (the resolved heartbeat model)
3. model-fallback.ts: LiveSessionModelSwitchError was swallowed as a
   candidate failure, inverting the fallback order — now rethrown when
   rethrowLiveSwitch is set
4. get-reply.ts: post-directive resolution unconditionally overwrote the
   heartbeat model — now guarded by hasResolvedHeartbeatModelOverride

Also in agent-runner-execution.ts: uses structural isLiveSessionModelSwitchError
check (cross-module safe) and passes rethrowLiveSwitch: true.

Upstream refs: openclaw/openclaw#56788, PR #57094, PR #57076
In entrypoint.sh, HOOKS_LOCATION_BLOCK is built as a double-quoted shell
string. Using \\$var stores \ in the variable — when interpolated into
the heredoc, nginx sees \ (escaped literal) instead of
the variable $http_authorization, so the Authorization header from the
upstream request is never forwarded to the gateway.

Fix: use \ (single escape) so the variable stores bare $var, which the
heredoc passes through correctly as a nginx variable reference.

Same bug applied to $host, $remote_addr, $proxy_add_x_forwarded_for,
and $scheme in the same block — all fixed.

Symptom: hooks endpoint returns 401 after every container restart
because nginx passes the literal string '$http_authorization' instead
of the actual Authorization header value.
Bug: when OpenClaw starts, the LINE plugin registers its webhook route
(/line/webhook) but the bundler splits runtime.ts into two chunks. The
chunk that initialises the global registry object first creates it
without the httpRoute fields the other chunk expects — so the LINE
plugin registers into one registry and the HTTP server queries a
different one. Result: 404 on cold start, works after hot-reload.

Upstream fix PRs #53642 and #54686 are open but both stale against
the current refactored source (httpRoute structure changed completely).

Workaround: a background process in entrypoint.sh waits 20s after
gateway start, then writes a temporary `_reloadTs` field to
openclaw.json. The gateway's file-watcher detects the change and
hot-reloads the LINE channel, re-registering routes into the correct
registry. The field is removed 5s later.

This is explicitly documented as a workaround. Remove this block once
a real fix ships from upstream and is included in the source we build
from. See the extensive comment block in entrypoint.sh for full details,
upstream issue links, and the two PRs tracking the real fix.

Upstream issue: openclaw/openclaw#49803
Upstream PRs: openclaw/openclaw#53642, openclaw/openclaw#54686
Root cause: context files (MEMORY.md, SOUL.md, USER.md, etc.) were embedded
in the stable system prompt prefix (before OPENCLAW_CACHE_BOUNDARY). When the
agent writes to MEMORY.md during a turn, the stable prefix changes on the next
API call → Anthropic sees a different system prompt → cache miss.

Fix: move the '# Project Context' section from before to after the cache
boundary in src/agents/system-prompt.ts. The static instruction sections
(tools, safety, skills, messaging, etc.) are identical across turns and are
safely cached. Context files change frequently and belong in the dynamic suffix.

Impact: cache hit rate ~0% → ~80-90%, cutting per-conversation token cost
significantly for users with active memory.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant