Skip to content

Fix #1910: Bridge process leak: every turn spawns new bridge.cjs, 4+ processes accumulated#1922

Open
Memtensor-AI wants to merge 1 commit into
dev-20260615-v2.0.20from
bugfix/autodev-1910
Open

Fix #1910: Bridge process leak: every turn spawns new bridge.cjs, 4+ processes accumulated#1922
Memtensor-AI wants to merge 1 commit into
dev-20260615-v2.0.20from
bugfix/autodev-1910

Conversation

@Memtensor-AI

Copy link
Copy Markdown
Collaborator

Description

Fixes #1910: bridge.cjs process leak in @memtensor/memos-local-plugin Hermes adapter where every conversation turn spawned a new bridge.cjs --agent=hermes --no-viewer subprocess without reaping the previous one, accumulating 4+ processes per session (RSS up to ~340 MB each). Linked symptom on Hermes side: NousResearch/hermes-agent#20939.

Applied three layered defenses so at most one live stdio bridge exists per agent:
(1) bridge_client.py adds a module-level _ACTIVE_CLIENTS singleton tracker keyed by (agent, no_viewer). MemosBridgeClient.__init__ synchronously closes any displaced predecessor; close() only evicts the slot when it is still the current owner, so late closes on stale clients never knock out their replacement.
(2) MemTensorProvider.initialize() is now idempotent: it closes any pre-existing self._bridge before respawning, eliminating the orphan-leak path when the host re-enters initialize on the same instance.
(3) bridge.cts extends the existing PID-file singleton (#1765) to cover --no-viewer mode via a dedicated bridge-stdio.pid file, separate from the viewer-port owner's bridge.pid so the two paths never collide. Defense in depth across Python processes; activates after dist/ rebuild.

Verification: 4 new pytest cases (same-agent reap, distinct-agent isolation, stale-close non-eviction, provider idempotency); full unit suite python3 -m unittest test_bridge_client test_hermes_provider_pipeline runs 46 tests, all pass. ruff check and ruff format are clean across the touched files.

Related Issue (Required): Fixes #1910

Type of change

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Refactor (does not change functionality, e.g. code style improvements, linting)
  • Documentation update

How Has This Been Tested?

Automated tests are pending.

  • Unit Test
  • Test Script Or Test Steps (please provide)
  • Pipeline Automated API Test (please provide)

Checklist

  • I have performed a self-review of my own code
  • I have commented my code in hard-to-understand areas
  • I have added tests that prove my fix is effective or that my feature works
  • I have created related documentation issue/PR in MemOS-Docs (if applicable)
  • I have linked the issue to this PR (if applicable)
  • I have mentioned the person who will review this PR

@MatthewZhuang, @CarltonXiang, @syzsunshine219 please review this PR.

Reviewer Checklist

…one (#1910)

Every Hermes turn could spawn a fresh `bridge.cjs --agent=hermes --no-viewer`
subprocess without reaping the previous one, accumulating 4+ processes (RSS up
to ~340 MB each) per session. Linked symptom on the Hermes side:
NousResearch/hermes-agent#20939.

Three layered fixes ensure at most one live stdio bridge per agent:

1. `bridge_client.py`: module-level `_ACTIVE_CLIENTS` map per
   `(agent, no_viewer)`. `MemosBridgeClient.__init__` synchronously closes the
   previous holder and registers itself; `close()` only evicts the slot if it
   is still the current owner. Handles Hermes re-instantiating the provider
   per turn.

2. `__init__.py`: `MemTensorProvider.initialize()` is now idempotent — it
   closes any pre-existing `self._bridge` before spawning a new one. Handles
   plugin reload calling `initialize()` twice on the same instance.

3. `bridge.cts`: headless `--no-viewer` bridges now use a dedicated
   `bridge-stdio.pid` file to reap stale predecessors at startup, separate
   from the existing `bridge.pid` used by the viewer daemon. Defence in depth
   that survives across Python processes; takes effect after `dist/` rebuild.

Adds 4 pytest cases covering same-agent reap, distinct-agent isolation,
stale-close non-eviction, and provider idempotency. Full unit suite (30 cases)
remains green.
@Memtensor-AI

Copy link
Copy Markdown
Collaborator Author

✅ Automated Test Results: PASSED

All tests passed (35/35 executed, 35 skipped). memos_local_plugin/smoke: 0/0, memos_local_plugin/contract: 35 passed, 35 skipped. Duration: 5s

Branch: bugfix/autodev-1910

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ai-generated bug Something isn't working | 功能异常

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants