fix(context+memory+prompt): cut Jarvis multi-session context-loss across four fronts#34
Merged
Merged
Conversation
…oss four fronts
Latest session (~/.local/share/jarvis/conversations/19aa5f9c-...json) showed
the agent repeating identical workspace.context / todo.list / requirement.list
/ triage.scan_candidates calls in adjacent turns just seconds apart, ignoring
the byte-identical results already in context. Diagnosis traced four
compounding causes; this commit lands fixes for all of them.
T1-A — apps/jarvis/src/serve.rs CODING_SYSTEM_PROMPT
Soften the "Before editing, call workspace.context" / "(1) call
workspace.context, plus fs.read..." directives that taught the model to
re-orient on every user turn. Lead with an explicit "Reuse what you
already gathered" rule and rephrase the orientation steps as conditional
("if you don't already have it from earlier in this conversation").
Single-conversation tool-result reuse is now the default behaviour the
prompt asks for.
T2-C — crates/harness-memory/src/summarizing.rs
Smaller summarisation models leaked "The user wants me to summarise..."
preambles into the cached summary text (visible in
__memory__.summary*.json), eating ~150 chars of the
DEFAULT_SUMMARY_MAX_TOKENS=400 budget. Strengthen DEFAULT_SUMMARY_PROMPT
with BAD/GOOD examples and add a conservative strip_summary_preamble
post-process that drops a known opener sentence iff a paragraph break
exists within an 800-byte budget (bails out otherwise so a borderline
first sentence is never amputated). Five new unit tests cover hit /
US-spelling / clean-input / no-paragraph-break / "let me summarise..."
shapes.
T2-D — crates/harness-core/src/agent.rs
Agent::ensure_system_prompt was insert-if-missing only, so once a
conversation persisted any leading System message it was locked into
whatever prompt was active when the conversation was first created —
binary updates to CODING_SYSTEM_PROMPT never reached resumed sessions.
Replace the binary "insert?" check with a content-hash compare:
matched → no-op, mismatched + refresh enabled → replace, mismatched +
refresh disabled → keep (historical behaviour). Add
AgentConfig::refresh_system_prompt_on_resume (default true) plus a
builder; five new unit tests cover insert / sync / refresh /
refresh-off / no-prompt-configured.
T3-E — apps/jarvis/src/serve.rs project_context_max_bytes
Lower the auto-loaded AGENTS.md / CLAUDE.md / .jarvis context cap from
32 KiB to 8 KiB so the system block doesn't drown out mid-conversation
tool results in the model's attention window. Truncation now logs a
startup WARN naming JARVIS_PROJECT_CONTEXT_BYTES so operators with
larger instruction files can opt back to a higher cap deliberately.
CLAUDE.md doc strings updated to match.
T3-F — crates/harness-core/src/memory.rs JsonAwareEstimator
Add an opt-in TokenEstimator that uses chars * 2 / 7 (~chars/3.5) +
8 overhead for Message::Tool and falls through to the standard
CharRatio numbers for everything else. Compensates for the ~15-25 %
underestimate the chars/4 + 4 heuristic produces on dense JSON tool
output. Not wired into default_estimator() to avoid silently shifting
budgets in callers (tests / fallback paths) that depend on the exact
CharRatio numbers; production paths via OpenAI / Anthropic / Google /
Codex providers continue to use their TiktokenEstimator overrides.
Re-exported from harness_core. Three new unit tests cover Tool >
CharRatio / non-Tool == CharRatio / text == CharRatio.
Skipped from the original plan: T1-B (project context hash-incremental
injection). Plan agent's premise — "every turn invalidates prefix cache"
— turned out to depend on render_project_block producing different bytes
turn-to-turn, which it doesn't unless project_memory files change. T1-A
+ T3-E together address the underlying "system prompt drowns out tool
results" symptom without the inject/strip refactor risk.
Verified: cargo clippy --workspace --all-targets --exclude jarvis-desktop
-- -D warnings is clean; cargo test --workspace --exclude jarvis-desktop
passes (1100+ tests, 0 failures, including 13 new ones added here).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
Latest persisted session
19aa5f9c-...jsonshowed Jarvis re-running byte-identicalworkspace.context/todo.list/requirement.list/triage.scan_candidatescalls in adjacent turns just seconds apart, ignoring the results already in context. Diagnosis (full report at~/.claude/plans/jarvis-session-tingly-backus.md) traced four compounding causes; this PR lands fixes for all of them in one commit.What changed
CODING_SYSTEM_PROMPTfrom "Before editing, call workspace.context" / "(1) call workspace.context" to conditional ("if you don't already have it from earlier in this conversation"); lead with an explicit "Reuse what you already gathered" rule.apps/jarvis/src/serve.rsDEFAULT_SUMMARY_PROMPTwith BAD/GOOD examples + addstrip_summary_preamblepost-process to drop "The user wants me to summarise..." openers (visible in__memory__.summary*.jsoncache) before persisting — bails out conservatively when no paragraph break exists in the budget so a borderline first sentence is never amputated.crates/harness-memory/src/summarizing.rsAgent::ensure_system_prompt's insert-if-missing logic with a content-hash compare; addAgentConfig::refresh_system_prompt_on_resume(defaulttrue) so resumed conversations stop being locked into whatever prompt was active at creation time.crates/harness-core/src/agent.rsAGENTS.md/CLAUDE.mddon't drown out mid-conversation tool results; truncation logs a startup WARN namingJARVIS_PROJECT_CONTEXT_BYTESso operators can opt back to a higher cap deliberately.apps/jarvis/src/serve.rs,CLAUDE.mdJsonAwareEstimator(chars * 2 / 7 + 8forMessage::Tool, falls through toCharRatioelsewhere) to compensate for the ~15-25% underestimate thechars/4 + 4heuristic gives on dense JSON tool output. Not wired intodefault_estimator()to avoid silently shifting budgets in tests / fallback paths; production providers still use theirTiktokenEstimatoroverrides.crates/harness-core/src/memory.rs,crates/harness-core/src/lib.rsSkipped from the original plan: T1-B (project-context hash-incremental injection). The Plan agent's premise — "every turn invalidates prefix cache" — turned out to depend on
render_project_blockproducing different bytes turn-to-turn, which it doesn't unless project_memory files change. T1-A + T3-E together address the underlying "system prompt drowns out tool results" symptom without the inject/strip refactor risk.Reviewer notes
refresh_system_prompt_on_resume = trueis the new default — old conversations whose persisted system prompt drifted from the binary's template will get replaced on first re-open. Operators who deliberately persisted custom per-conversation prompts can opt out via the builder.Conversation/MessageJSON fields added. Old persisted sessions load as-is.JARVIS_PROJECT_CONTEXT_BYTEScontinues to override; only its default lowered.JsonAwareEstimatoris opt-in only — set up viawith_estimator(Arc::new(JsonAwareEstimator::new()))if you don't have a tokeniser-backed provider estimator.Test plan
cargo clippy --workspace --all-targets --exclude jarvis-desktop -- -D warnings— cleancargo test --workspace --exclude jarvis-desktop— 1100+ tests, 0 failures, including 13 new ones added here covering: summary preamble strip (5),ensure_system_promptmatrix (5),JsonAwareEstimatorTool/non-Tool/text (3)jarvis servelocally, replay the two-turn sequence from19aa5f9c-...json— ask "项目有哪些要做的事情" then "我想优化 Web UI 体验, TODO board 是否合理" — verify turn 3 references the previous tool results instead of re-running them🤖 Generated with Claude Code