fix: guard against episode storm stalling foreground sessions by de1tydev · Pull Request #1844 · MemTensor/MemOS

de1tydev · 2026-05-31T14:53:55Z

Problem

Large merged episodes trigger a cascade of expensive post-processing (capture → reward → L2 induction → L3 abstraction → skill crystallization) that can stall OpenClaw and Hermes Agent foreground sessions. This is especially common in long development workflows where relation.classify consistently returns revision/follow_up, allowing a single episode to accumulate dozens or hundreds of turns.

Fixes #1755

Root Causes

No episode turn limit — episodes grow unbounded; the full L1→L2→L3→skill chain hits all at once when the topic finally ends
Synchronous classify in before_prompt_build — relation.classify() is an LLM call that blocks foreground prompt construction with no timeout
Unlimited background LLM concurrency — capture/reward/L2/L3/skill subscribers fire unlimited parallel LLM calls, starving the event loop

Changes

Fix 1: Episode turn hard limit (`maxTurnsPerEpisode`)

New config: algorithm.session.maxTurnsPerEpisode (default 30, range 5–200)
When an open episode reaches this turn count, the next turn forces a topic boundary regardless of relation classification
Also applies when reopening recovered episodes

Fix 2: Relation classify timeout (`classifyTimeoutMs`)

New config: algorithm.session.classifyTimeoutMs (default 5000ms, range 1000–30000)
relation.classify() calls are wrapped with Promise.race against the timeout
On timeout, defaults to new_task (safe conservative boundary)
Prevents foreground prompt construction from blocking indefinitely

Fix 3: Background LLM concurrency semaphore (`bgLlmConcurrency`)

New config: algorithm.session.bgLlmConcurrency (default 2, range 1–8)
Shared semaphore gates all LLM calls from capture, reward, L2, L3, skill, and feedback subscribers
Prevents event-loop starvation from concurrent background processing
Capture's existing llmConcurrency (per-step α scoring) is unaffected — the semaphore only applies to the shared LLM client used by post-capture processing

New Files

core/util/semaphore.ts — lightweight async semaphore
core/util/rate-limited-llm.ts — transparent LLM client wrapper that acquires a semaphore permit per call

Files Modified

core/config/schema.ts — 3 new config fields with JSDoc
core/config/defaults.ts — defaults: maxTurns=30, classifyTimeout=5s, bgConcurrency=2
core/pipeline/types.ts — SessionRoutingConfig extended
core/pipeline/deps.ts — config extraction + semaphore wiring
core/pipeline/orchestrator.ts — turn-limit guard + classify timeout wrapper

Testing

tsc --noEmit passes (no type errors)
All new config values have sensible defaults that preserve existing behavior for users who don't change them (the turn limit is the only behavior change: episodes that would have grown past 30 turns now get split)

## Summary - add an OpenClaw runtime lock to block duplicate plugin instances before tools/hooks register - fail startup on viewer port conflicts and clean up partial runtime state - keep lightweight local memories searchable/listable without an LLM final filter, while preserving full-mode self-evolution boundaries - cover runtime locking, duplicate startup, lightweight retrieval, delayed agent_end recovery, and partial migration behavior ## Tests - npm test -- --run tests/unit - npm run lint - npm run build - git diff --check --cached

MemTensor#1807) Automated PR from mem-agent-0520-niu to mem-agent-0520.

…sor#1755)

de1tydev · 2026-05-31T14:53:59Z

Linked to #1755 — detailed root-cause analysis is in the issue comments.

## Description Please include a summary of the change, the problem it solves, the implementation approach, and relevant context. List any dependencies required for this change. Related Issue (Required): Fixes #issue_number ## Type of change Please delete options that are not relevant. - [ ] Bug fix (non-breaking change which fixes an issue) - [ ] New feature (non-breaking change which adds functionality) - [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected) - [ ] Refactor (does not change functionality, e.g. code style improvements, linting) - [ ] Documentation update ## How Has This Been Tested? Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration - [ ] Unit Test - [ ] Test Script Or Test Steps (please provide) - [ ] Pipeline Automated API Test (please provide) ## Checklist - [ ] I have performed a self-review of my own code | 我已自行检查了自己的代码 - [ ] I have commented my code in hard-to-understand areas | 我已在难以理解的地方对代码进行了注释 - [ ] I have added tests that prove my fix is effective or that my feature works | 我已添加测试以证明我的修复有效或功能正常 - [ ] I have created related documentation issue/PR in [MemOS-Docs](https://github.com/MemTensor/MemOS-Docs) (if applicable) | 我已在 [MemOS-Docs](https://github.com/MemTensor/MemOS-Docs) 中创建了相关的文档 issue/PR（如果适用） - [ ] I have linked the issue to this PR (if applicable) | 我已将 issue 链接到此 PR（如果适用） - [ ] I have mentioned the person who will review this PR | 我已提及将审查此 PR 的人 ## Reviewer Checklist - [ ] closes #xxxx (Replace xxxx with the GitHub issue number) - [ ] Made sure Checks passed - [ ] Tests have been provided

hijzy and others added 5 commits May 25, 2026 15:02

fix(memos-local-plugin): guard OpenClaw runtime startup

53f72ce

fix:The failed task was wrongly recorded as a "successful experience".

362f7d1

fix:The failed task was wrongly recorded as a "successful experience". (

9233370

MemTensor#1807) Automated PR from mem-agent-0520-niu to mem-agent-0520.

fix: guard against episode storm stalling foreground sessions (MemTen…

2e8c3dd

…sor#1755)

hijzy and others added 3 commits June 1, 2026 19:36

Merge branch 'main' into mem-agent-0520

f43ec1c

Merge branch 'main' into fix/episode-storm-guard

58d16c5

Memtensor-AI changed the base branch from main to dev-20260604-v2.0.19 June 10, 2026 15:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: guard against episode storm stalling foreground sessions#1844

fix: guard against episode storm stalling foreground sessions#1844
de1tydev wants to merge 8 commits into
MemTensor:dev-20260604-v2.0.19from
de1tydev:fix/episode-storm-guard

de1tydev commented May 31, 2026

Uh oh!

de1tydev commented May 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

de1tydev commented May 31, 2026

Problem

Root Causes

Changes

Fix 1: Episode turn hard limit (maxTurnsPerEpisode)

Fix 2: Relation classify timeout (classifyTimeoutMs)

Fix 3: Background LLM concurrency semaphore (bgLlmConcurrency)

New Files

Files Modified

Testing

Uh oh!

de1tydev commented May 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Fix 1: Episode turn hard limit (`maxTurnsPerEpisode`)

Fix 2: Relation classify timeout (`classifyTimeoutMs`)

Fix 3: Background LLM concurrency semaphore (`bgLlmConcurrency`)