Skip to content

fix(context+memory+prompt): cut Jarvis multi-session context-loss across four fronts#34

Merged
TYRMars merged 1 commit into
mainfrom
claude/review-project-improvements-GBGjJ
May 14, 2026
Merged

fix(context+memory+prompt): cut Jarvis multi-session context-loss across four fronts#34
TYRMars merged 1 commit into
mainfrom
claude/review-project-improvements-GBGjJ

Conversation

@TYRMars
Copy link
Copy Markdown
Owner

@TYRMars TYRMars commented May 14, 2026

Why

Latest persisted session 19aa5f9c-...json showed Jarvis re-running byte-identical workspace.context / todo.list / requirement.list / triage.scan_candidates calls in adjacent turns just seconds apart, ignoring the results already in context. Diagnosis (full report at ~/.claude/plans/jarvis-session-tingly-backus.md) traced four compounding causes; this PR lands fixes for all of them in one commit.

What changed

Tier Change File(s)
T1-A Soften CODING_SYSTEM_PROMPT from "Before editing, call workspace.context" / "(1) call workspace.context" to conditional ("if you don't already have it from earlier in this conversation"); lead with an explicit "Reuse what you already gathered" rule. apps/jarvis/src/serve.rs
T2-C Strengthen DEFAULT_SUMMARY_PROMPT with BAD/GOOD examples + add strip_summary_preamble post-process to drop "The user wants me to summarise..." openers (visible in __memory__.summary*.json cache) before persisting — bails out conservatively when no paragraph break exists in the budget so a borderline first sentence is never amputated. crates/harness-memory/src/summarizing.rs
T2-D Replace Agent::ensure_system_prompt's insert-if-missing logic with a content-hash compare; add AgentConfig::refresh_system_prompt_on_resume (default true) so resumed conversations stop being locked into whatever prompt was active at creation time. crates/harness-core/src/agent.rs
T3-E Lower auto-loaded project-context cap from 32 KiB → 8 KiB so AGENTS.md/CLAUDE.md don't drown out mid-conversation tool results; truncation logs a startup WARN naming JARVIS_PROJECT_CONTEXT_BYTES so operators can opt back to a higher cap deliberately. apps/jarvis/src/serve.rs, CLAUDE.md
T3-F Add opt-in JsonAwareEstimator (chars * 2 / 7 + 8 for Message::Tool, falls through to CharRatio elsewhere) to compensate for the ~15-25% underestimate the chars/4 + 4 heuristic gives on dense JSON tool output. Not wired into default_estimator() to avoid silently shifting budgets in tests / fallback paths; production providers still use their TiktokenEstimator overrides. crates/harness-core/src/memory.rs, crates/harness-core/src/lib.rs

Skipped from the original plan: T1-B (project-context hash-incremental injection). The Plan agent's premise — "every turn invalidates prefix cache" — turned out to depend on render_project_block producing different bytes turn-to-turn, which it doesn't unless project_memory files change. T1-A + T3-E together address the underlying "system prompt drowns out tool results" symptom without the inject/strip refactor risk.

Reviewer notes

  • Backward compatibility: refresh_system_prompt_on_resume = true is the new default — old conversations whose persisted system prompt drifted from the binary's template will get replaced on first re-open. Operators who deliberately persisted custom per-conversation prompts can opt out via the builder.
  • Wire schema unchanged: no Conversation / Message JSON fields added. Old persisted sessions load as-is.
  • No env var renames: JARVIS_PROJECT_CONTEXT_BYTES continues to override; only its default lowered.
  • Memory mode users: JsonAwareEstimator is opt-in only — set up via with_estimator(Arc::new(JsonAwareEstimator::new())) if you don't have a tokeniser-backed provider estimator.

Test plan

  • cargo clippy --workspace --all-targets --exclude jarvis-desktop -- -D warnings — clean
  • cargo test --workspace --exclude jarvis-desktop — 1100+ tests, 0 failures, including 13 new ones added here covering: summary preamble strip (5), ensure_system_prompt matrix (5), JsonAwareEstimator Tool/non-Tool/text (3)
  • Manual repro (recommended): jarvis serve locally, replay the two-turn sequence from 19aa5f9c-...json — ask "项目有哪些要做的事情" then "我想优化 Web UI 体验, TODO board 是否合理" — verify turn 3 references the previous tool results instead of re-running them

🤖 Generated with Claude Code

…oss four fronts

Latest session (~/.local/share/jarvis/conversations/19aa5f9c-...json) showed
the agent repeating identical workspace.context / todo.list / requirement.list
/ triage.scan_candidates calls in adjacent turns just seconds apart, ignoring
the byte-identical results already in context. Diagnosis traced four
compounding causes; this commit lands fixes for all of them.

T1-A — apps/jarvis/src/serve.rs CODING_SYSTEM_PROMPT
  Soften the "Before editing, call workspace.context" / "(1) call
  workspace.context, plus fs.read..." directives that taught the model to
  re-orient on every user turn. Lead with an explicit "Reuse what you
  already gathered" rule and rephrase the orientation steps as conditional
  ("if you don't already have it from earlier in this conversation").
  Single-conversation tool-result reuse is now the default behaviour the
  prompt asks for.

T2-C — crates/harness-memory/src/summarizing.rs
  Smaller summarisation models leaked "The user wants me to summarise..."
  preambles into the cached summary text (visible in
  __memory__.summary*.json), eating ~150 chars of the
  DEFAULT_SUMMARY_MAX_TOKENS=400 budget. Strengthen DEFAULT_SUMMARY_PROMPT
  with BAD/GOOD examples and add a conservative strip_summary_preamble
  post-process that drops a known opener sentence iff a paragraph break
  exists within an 800-byte budget (bails out otherwise so a borderline
  first sentence is never amputated). Five new unit tests cover hit /
  US-spelling / clean-input / no-paragraph-break / "let me summarise..."
  shapes.

T2-D — crates/harness-core/src/agent.rs
  Agent::ensure_system_prompt was insert-if-missing only, so once a
  conversation persisted any leading System message it was locked into
  whatever prompt was active when the conversation was first created —
  binary updates to CODING_SYSTEM_PROMPT never reached resumed sessions.
  Replace the binary "insert?" check with a content-hash compare:
  matched → no-op, mismatched + refresh enabled → replace, mismatched +
  refresh disabled → keep (historical behaviour). Add
  AgentConfig::refresh_system_prompt_on_resume (default true) plus a
  builder; five new unit tests cover insert / sync / refresh /
  refresh-off / no-prompt-configured.

T3-E — apps/jarvis/src/serve.rs project_context_max_bytes
  Lower the auto-loaded AGENTS.md / CLAUDE.md / .jarvis context cap from
  32 KiB to 8 KiB so the system block doesn't drown out mid-conversation
  tool results in the model's attention window. Truncation now logs a
  startup WARN naming JARVIS_PROJECT_CONTEXT_BYTES so operators with
  larger instruction files can opt back to a higher cap deliberately.
  CLAUDE.md doc strings updated to match.

T3-F — crates/harness-core/src/memory.rs JsonAwareEstimator
  Add an opt-in TokenEstimator that uses chars * 2 / 7 (~chars/3.5) +
  8 overhead for Message::Tool and falls through to the standard
  CharRatio numbers for everything else. Compensates for the ~15-25 %
  underestimate the chars/4 + 4 heuristic produces on dense JSON tool
  output. Not wired into default_estimator() to avoid silently shifting
  budgets in callers (tests / fallback paths) that depend on the exact
  CharRatio numbers; production paths via OpenAI / Anthropic / Google /
  Codex providers continue to use their TiktokenEstimator overrides.
  Re-exported from harness_core. Three new unit tests cover Tool >
  CharRatio / non-Tool == CharRatio / text == CharRatio.

Skipped from the original plan: T1-B (project context hash-incremental
injection). Plan agent's premise — "every turn invalidates prefix cache"
— turned out to depend on render_project_block producing different bytes
turn-to-turn, which it doesn't unless project_memory files change. T1-A
+ T3-E together address the underlying "system prompt drowns out tool
results" symptom without the inject/strip refactor risk.

Verified: cargo clippy --workspace --all-targets --exclude jarvis-desktop
-- -D warnings is clean; cargo test --workspace --exclude jarvis-desktop
passes (1100+ tests, 0 failures, including 13 new ones added here).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@TYRMars TYRMars merged commit 3bfe8b9 into main May 14, 2026
1 of 2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant