fix(context+memory+prompt): cut Jarvis multi-session context-loss across four fronts by TYRMars · Pull Request #34 · TYRMars/Jarvis

TYRMars · 2026-05-14T12:12:51Z

Why

Latest persisted session 19aa5f9c-...json showed Jarvis re-running byte-identical workspace.context / todo.list / requirement.list / triage.scan_candidates calls in adjacent turns just seconds apart, ignoring the results already in context. Diagnosis (full report at ~/.claude/plans/jarvis-session-tingly-backus.md) traced four compounding causes; this PR lands fixes for all of them in one commit.

What changed

Tier	Change	File(s)
T1-A	Soften `CODING_SYSTEM_PROMPT` from "Before editing, call workspace.context" / "(1) call workspace.context" to conditional ("if you don't already have it from earlier in this conversation"); lead with an explicit "Reuse what you already gathered" rule.	`apps/jarvis/src/serve.rs`
T2-C	Strengthen `DEFAULT_SUMMARY_PROMPT` with BAD/GOOD examples + add `strip_summary_preamble` post-process to drop "The user wants me to summarise..." openers (visible in `__memory__.summary*.json` cache) before persisting — bails out conservatively when no paragraph break exists in the budget so a borderline first sentence is never amputated.	`crates/harness-memory/src/summarizing.rs`
T2-D	Replace `Agent::ensure_system_prompt`'s insert-if-missing logic with a content-hash compare; add `AgentConfig::refresh_system_prompt_on_resume` (default `true`) so resumed conversations stop being locked into whatever prompt was active at creation time.	`crates/harness-core/src/agent.rs`
T3-E	Lower auto-loaded project-context cap from 32 KiB → 8 KiB so `AGENTS.md`/`CLAUDE.md` don't drown out mid-conversation tool results; truncation logs a startup WARN naming `JARVIS_PROJECT_CONTEXT_BYTES` so operators can opt back to a higher cap deliberately.	`apps/jarvis/src/serve.rs`, `CLAUDE.md`
T3-F	Add opt-in `JsonAwareEstimator` (`chars * 2 / 7 + 8` for `Message::Tool`, falls through to `CharRatio` elsewhere) to compensate for the ~15-25% underestimate the `chars/4 + 4` heuristic gives on dense JSON tool output. Not wired into `default_estimator()` to avoid silently shifting budgets in tests / fallback paths; production providers still use their `TiktokenEstimator` overrides.	`crates/harness-core/src/memory.rs`, `crates/harness-core/src/lib.rs`

Skipped from the original plan: T1-B (project-context hash-incremental injection). The Plan agent's premise — "every turn invalidates prefix cache" — turned out to depend on render_project_block producing different bytes turn-to-turn, which it doesn't unless project_memory files change. T1-A + T3-E together address the underlying "system prompt drowns out tool results" symptom without the inject/strip refactor risk.

Reviewer notes

Backward compatibility: refresh_system_prompt_on_resume = true is the new default — old conversations whose persisted system prompt drifted from the binary's template will get replaced on first re-open. Operators who deliberately persisted custom per-conversation prompts can opt out via the builder.
Wire schema unchanged: no Conversation / Message JSON fields added. Old persisted sessions load as-is.
No env var renames: JARVIS_PROJECT_CONTEXT_BYTES continues to override; only its default lowered.
Memory mode users: JsonAwareEstimator is opt-in only — set up via with_estimator(Arc::new(JsonAwareEstimator::new())) if you don't have a tokeniser-backed provider estimator.

Test plan

cargo clippy --workspace --all-targets --exclude jarvis-desktop -- -D warnings — clean
cargo test --workspace --exclude jarvis-desktop — 1100+ tests, 0 failures, including 13 new ones added here covering: summary preamble strip (5), ensure_system_prompt matrix (5), JsonAwareEstimator Tool/non-Tool/text (3)
Manual repro (recommended): jarvis serve locally, replay the two-turn sequence from 19aa5f9c-...json — ask "项目有哪些要做的事情" then "我想优化 Web UI 体验, TODO board 是否合理" — verify turn 3 references the previous tool results instead of re-running them

🤖 Generated with Claude Code

…oss four fronts Latest session (~/.local/share/jarvis/conversations/19aa5f9c-...json) showed the agent repeating identical workspace.context / todo.list / requirement.list / triage.scan_candidates calls in adjacent turns just seconds apart, ignoring the byte-identical results already in context. Diagnosis traced four compounding causes; this commit lands fixes for all of them. T1-A — apps/jarvis/src/serve.rs CODING_SYSTEM_PROMPT Soften the "Before editing, call workspace.context" / "(1) call workspace.context, plus fs.read..." directives that taught the model to re-orient on every user turn. Lead with an explicit "Reuse what you already gathered" rule and rephrase the orientation steps as conditional ("if you don't already have it from earlier in this conversation"). Single-conversation tool-result reuse is now the default behaviour the prompt asks for. T2-C — crates/harness-memory/src/summarizing.rs Smaller summarisation models leaked "The user wants me to summarise..." preambles into the cached summary text (visible in __memory__.summary*.json), eating ~150 chars of the DEFAULT_SUMMARY_MAX_TOKENS=400 budget. Strengthen DEFAULT_SUMMARY_PROMPT with BAD/GOOD examples and add a conservative strip_summary_preamble post-process that drops a known opener sentence iff a paragraph break exists within an 800-byte budget (bails out otherwise so a borderline first sentence is never amputated). Five new unit tests cover hit / US-spelling / clean-input / no-paragraph-break / "let me summarise..." shapes. T2-D — crates/harness-core/src/agent.rs Agent::ensure_system_prompt was insert-if-missing only, so once a conversation persisted any leading System message it was locked into whatever prompt was active when the conversation was first created — binary updates to CODING_SYSTEM_PROMPT never reached resumed sessions. Replace the binary "insert?" check with a content-hash compare: matched → no-op, mismatched + refresh enabled → replace, mismatched + refresh disabled → keep (historical behaviour). Add AgentConfig::refresh_system_prompt_on_resume (default true) plus a builder; five new unit tests cover insert / sync / refresh / refresh-off / no-prompt-configured. T3-E — apps/jarvis/src/serve.rs project_context_max_bytes Lower the auto-loaded AGENTS.md / CLAUDE.md / .jarvis context cap from 32 KiB to 8 KiB so the system block doesn't drown out mid-conversation tool results in the model's attention window. Truncation now logs a startup WARN naming JARVIS_PROJECT_CONTEXT_BYTES so operators with larger instruction files can opt back to a higher cap deliberately. CLAUDE.md doc strings updated to match. T3-F — crates/harness-core/src/memory.rs JsonAwareEstimator Add an opt-in TokenEstimator that uses chars * 2 / 7 (~chars/3.5) + 8 overhead for Message::Tool and falls through to the standard CharRatio numbers for everything else. Compensates for the ~15-25 % underestimate the chars/4 + 4 heuristic produces on dense JSON tool output. Not wired into default_estimator() to avoid silently shifting budgets in callers (tests / fallback paths) that depend on the exact CharRatio numbers; production paths via OpenAI / Anthropic / Google / Codex providers continue to use their TiktokenEstimator overrides. Re-exported from harness_core. Three new unit tests cover Tool > CharRatio / non-Tool == CharRatio / text == CharRatio. Skipped from the original plan: T1-B (project context hash-incremental injection). Plan agent's premise — "every turn invalidates prefix cache" — turned out to depend on render_project_block producing different bytes turn-to-turn, which it doesn't unless project_memory files change. T1-A + T3-E together address the underlying "system prompt drowns out tool results" symptom without the inject/strip refactor risk. Verified: cargo clippy --workspace --all-targets --exclude jarvis-desktop -- -D warnings is clean; cargo test --workspace --exclude jarvis-desktop passes (1100+ tests, 0 failures, including 13 new ones added here). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

TYRMars merged commit 3bfe8b9 into main May 14, 2026
1 of 2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(context+memory+prompt): cut Jarvis multi-session context-loss across four fronts#34

fix(context+memory+prompt): cut Jarvis multi-session context-loss across four fronts#34
TYRMars merged 1 commit into
mainfrom
claude/review-project-improvements-GBGjJ

TYRMars commented May 14, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

TYRMars commented May 14, 2026

Why

What changed

Reviewer notes

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant