Skip to content

fix: guard against episode storm stalling foreground sessions#1844

Open
de1tydev wants to merge 8 commits into
MemTensor:dev-20260604-v2.0.19from
de1tydev:fix/episode-storm-guard
Open

fix: guard against episode storm stalling foreground sessions#1844
de1tydev wants to merge 8 commits into
MemTensor:dev-20260604-v2.0.19from
de1tydev:fix/episode-storm-guard

Conversation

@de1tydev

Copy link
Copy Markdown

Problem

Large merged episodes trigger a cascade of expensive post-processing (capture → reward → L2 induction → L3 abstraction → skill crystallization) that can stall OpenClaw and Hermes Agent foreground sessions. This is especially common in long development workflows where relation.classify consistently returns revision/follow_up, allowing a single episode to accumulate dozens or hundreds of turns.

Fixes #1755

Root Causes

  1. No episode turn limit — episodes grow unbounded; the full L1→L2→L3→skill chain hits all at once when the topic finally ends
  2. Synchronous classify in before_prompt_buildrelation.classify() is an LLM call that blocks foreground prompt construction with no timeout
  3. Unlimited background LLM concurrency — capture/reward/L2/L3/skill subscribers fire unlimited parallel LLM calls, starving the event loop

Changes

Fix 1: Episode turn hard limit (maxTurnsPerEpisode)

  • New config: algorithm.session.maxTurnsPerEpisode (default 30, range 5–200)
  • When an open episode reaches this turn count, the next turn forces a topic boundary regardless of relation classification
  • Also applies when reopening recovered episodes

Fix 2: Relation classify timeout (classifyTimeoutMs)

  • New config: algorithm.session.classifyTimeoutMs (default 5000ms, range 1000–30000)
  • relation.classify() calls are wrapped with Promise.race against the timeout
  • On timeout, defaults to new_task (safe conservative boundary)
  • Prevents foreground prompt construction from blocking indefinitely

Fix 3: Background LLM concurrency semaphore (bgLlmConcurrency)

  • New config: algorithm.session.bgLlmConcurrency (default 2, range 1–8)
  • Shared semaphore gates all LLM calls from capture, reward, L2, L3, skill, and feedback subscribers
  • Prevents event-loop starvation from concurrent background processing
  • Capture's existing llmConcurrency (per-step α scoring) is unaffected — the semaphore only applies to the shared LLM client used by post-capture processing

New Files

  • core/util/semaphore.ts — lightweight async semaphore
  • core/util/rate-limited-llm.ts — transparent LLM client wrapper that acquires a semaphore permit per call

Files Modified

  • core/config/schema.ts — 3 new config fields with JSDoc
  • core/config/defaults.ts — defaults: maxTurns=30, classifyTimeout=5s, bgConcurrency=2
  • core/pipeline/types.ts — SessionRoutingConfig extended
  • core/pipeline/deps.ts — config extraction + semaphore wiring
  • core/pipeline/orchestrator.ts — turn-limit guard + classify timeout wrapper

Testing

  • tsc --noEmit passes (no type errors)
  • All new config values have sensible defaults that preserve existing behavior for users who don't change them (the turn limit is the only behavior change: episodes that would have grown past 30 turns now get split)

hijzy and others added 5 commits May 25, 2026 15:02
## Summary
- add an OpenClaw runtime lock to block duplicate plugin instances
before tools/hooks register
- fail startup on viewer port conflicts and clean up partial runtime
state
- keep lightweight local memories searchable/listable without an LLM
final filter, while preserving full-mode self-evolution boundaries
- cover runtime locking, duplicate startup, lightweight retrieval,
delayed agent_end recovery, and partial migration behavior

## Tests
- npm test -- --run tests/unit
- npm run lint
- npm run build
- git diff --check --cached
MemTensor#1807)

Automated PR from mem-agent-0520-niu to mem-agent-0520.
@de1tydev

Copy link
Copy Markdown
Author

Linked to #1755 — detailed root-cause analysis is in the issue comments.

hijzy and others added 3 commits June 1, 2026 19:36
## Description

Please include a summary of the change, the problem it solves, the
implementation approach, and relevant context. List any dependencies
required for this change.

Related Issue (Required):  Fixes #issue_number

## Type of change

Please delete options that are not relevant.

- [ ] Bug fix (non-breaking change which fixes an issue)
- [ ] New feature (non-breaking change which adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to not work as expected)
- [ ] Refactor (does not change functionality, e.g. code style
improvements, linting)
- [ ] Documentation update

## How Has This Been Tested?

Please describe the tests that you ran to verify your changes. Provide
instructions so we can reproduce. Please also list any relevant details
for your test configuration

- [ ] Unit Test
- [ ] Test Script Or Test Steps (please provide)
- [ ] Pipeline Automated API Test (please provide)

## Checklist

- [ ] I have performed a self-review of my own code | 我已自行检查了自己的代码
- [ ] I have commented my code in hard-to-understand areas |
我已在难以理解的地方对代码进行了注释
- [ ] I have added tests that prove my fix is effective or that my
feature works | 我已添加测试以证明我的修复有效或功能正常
- [ ] I have created related documentation issue/PR in
[MemOS-Docs](https://github.com/MemTensor/MemOS-Docs) (if applicable) |
我已在 [MemOS-Docs](https://github.com/MemTensor/MemOS-Docs) 中创建了相关的文档
issue/PR(如果适用)
- [ ] I have linked the issue to this PR (if applicable) | 我已将 issue
链接到此 PR(如果适用)
- [ ] I have mentioned the person who will review this PR | 我已提及将审查此 PR
的人

## Reviewer Checklist
- [ ] closes #xxxx (Replace xxxx with the GitHub issue number)
- [ ] Made sure Checks passed
- [ ] Tests have been provided
@Memtensor-AI Memtensor-AI changed the base branch from main to dev-20260604-v2.0.19 June 10, 2026 15:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

memos-local-plugin: large merged episodes can trigger L2/L3/skill-evolution storm and stall OpenClaw sessions

3 participants