Skip to content

Fix #1897 recovery replay request storm#1939

Open
TianchunGu wants to merge 2 commits into
MemTensor:bugfix/autodev-1897from
TianchunGu:codex/fix-1897-recovery-storm
Open

Fix #1897 recovery replay request storm#1939
TianchunGu wants to merge 2 commits into
MemTensor:bugfix/autodev-1897from
TianchunGu:codex/fix-1897-recovery-storm

Conversation

@TianchunGu

Copy link
Copy Markdown

Summary

Follow-up for #1897, stacked on top of #1938.

#1899 / #1938 stop the terminal-provider-error retry storm at the LLM facade. This PR adds a second guardrail for the capture pipeline: startup recovery of dirty closed episodes can replay a large historical episode, misclassify recovered steps as orphan traces, and then trigger many reflect/summarize LLM calls.

This patch:

  • fetches all episode traces for capture reconciliation instead of using the paginated viewer list() path;
  • matches reflect replay steps by stable content signature, with a timing-insensitive fallback for recovered tool-call traces that lack startedAt/endedAt;
  • skips recovered replay orphan inserts by default (maxRecoveryOrphanInserts: 0) to avoid duplicating historical traces;
  • avoids LLM summarization for any recovered replay orphan insert that an operator explicitly allows;
  • adds a hard per-episode reflect LLM budget (maxReflectLlmCalls, default 128);
  • stops remaining reflect LLM attempts after terminal/circuit-open provider errors.

The goal is to keep normal short-episode reflection behavior intact while preventing a single dirty recovered episode from producing unbounded paid LLM calls.

Validation

From apps/memos-local-plugin:

npx vitest run tests/unit/capture/capture.test.ts tests/unit/capture/capture-batch.test.ts tests/unit/capture/normalizer.test.ts tests/unit/llm/client.test.ts
# Test Files 4 passed (4)
# Tests 57 passed (57)

npm run lint
# tsc -p tsconfig.json --noEmit
# passed

New regression coverage:

  • recovered replay matches tool traces by payload when timestamps drift;
  • startup-recovered replay orphans are not inserted by default;
  • reflect-phase LLM calls are capped per episode.

Related: #1897, #1899, #1938

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant