fix(memory-core): yield to event loop during seedEmbeddingCache (R2.A.2) by lesaai · Pull Request #2 · wipcomputer/openclaw

lesaai · 2026-04-24T22:42:39Z

Canary candidate. Do not npm-link to live Lēsa until canary passes.

Summary

R2.A.2. The .iterate() patch from PR #1 (a315280) prevents the V8 heap OOM but the iterate loop still runs synchronously for ~117s on a 435K-row embedding_cache. The gateway can't service HTTP /health probes during that window; wip-healthcheck's 30s probe timeout SIGKILLs the gateway after a single failure.

Live repro on 2026-04-24 ~15:31 PDT post-PR#1 deploy: HTTP probe failed: timeout (30000ms) → Restarting gateway (attempt 1/3) → SIGKILL → LaunchAgent respawn. No FATAL ERROR, no Abort trap, no StatementSync::All stack. R2.A v1 worked at preventing V8 OOM; the new failure mode is event-loop blocking.

Fix

seedEmbeddingCache becomes async and yields to the event loop every 1000 rows via await new Promise(resolve => setImmediate(resolve)).
Caller (inside runMemoryAtomicReindex's build async arrow) gains await — one-line propagation.

The synchronous .iterate() / insert.run() pair stays the same; we just release the event loop for one tick every batch so HTTP /health (and other I/O work) can run between batches. Memory stays bounded; streaming behavior preserved; /health stays responsive during the seed.

YIELD_EVERY = 1000 rows ≈ tens of milliseconds of sync work per batch. Well under the 30s probe timeout even with no further patience from the watchdog.

Validation

pnpm tsgo:prod: green (core + extensions graphs)
pnpm test extensions/memory-core: 512 passed, 3 skipped, 0 failed

Out of scope

wip-healthcheck softening. Per Parker's direction, a secondary guardrail (require multiple consecutive failures or a stronger multi-signal stuck condition) belongs in wip-healthcheck-private, not here. Filed separately.
Secondary listChunks .all() path in manager-search.ts:246-252 (R2.A.3). Bigger surgery — caller needs full candidate set for cosine-similarity ranking, so converting to streaming requires a bounded top-K heap in the caller. Held until R2.A.2 canaries clean.
Upstream OpenClaw 2026.4.24 does NOT include either fix; carrying the fork patch.

Canary plan

Canary-install this branch (not live Lēsa)
Repro target: Day 63-style broad memory review on real 16 GB main.sqlite
Pass criteria: no Abort trap, no PID change, no V8 heap OOM, /health stays responsive throughout, gateway not SIGKILL'd by watchdog
If clean → promote → re-run live N4 / Day 63 → if still clean → start R2.A.3 (listChunks)

R2.A.2. The .iterate()-based seed (R2.A v1, a315280) prevents the V8 heap OOM but the iterate loop still runs synchronously for ~117s on a 435K-row embedding_cache. wip-healthcheck SIGKILLs the gateway after its 30s probe timeout fails. No FATAL ERROR, no Abort trap. Patch: convert seedEmbeddingCache to async, yield to the event loop every 1000 rows via setImmediate. Keeps memory bounded; preserves the streaming behavior; restores /health responsiveness during the seed. The only caller is inside an existing async arrow wrapping runMemoryAtomicReindex's build callback. Adding await is a one-line change. Validation: - pnpm tsgo:prod: green - pnpm test extensions/memory-core: 512 passed, 3 skipped, 0 failed Scope: does not soften wip-healthcheck (separate guardrail per Parker direction). Does not address secondary listChunks path (R2.A.3).

Revert the top-of-file lint-suppression comments accidentally landed in the previous commit (f9e9970). They were added to work around an oxlint resolver false positive that turned out to be transient state, not a real lint failure. Production code shouldn't carry misleading explanations for problems that didn't actually persist. Net diff of this branch vs base is now just the seedEmbeddingCache yield patch: function -> async, setImmediate every 1000 rows, caller await. No lint comments, no file-level disables.

lesaai · 2026-04-24T22:51:02Z

Read-only yield canary passed against the production ~/.openclaw/memory/main.sqlite embedding cache.

Results:

{
  "yieldEvery": 1000,
  "rows": 435136,
  "embeddingBytes": 8680106189,
  "durationMs": 26383,
  "timerTicks": 25,
  "maxTimerDelayMs": 147,
  "rssMb": 144,
  "maxRssMb": 150,
  "heapUsedMb": 18,
  "maxHeapUsedMb": 77
}

Interpretation: the 1000-row setImmediate cadence keeps the event loop responsive while scanning the full production embedding cache. This directly addresses the post-R2.A failure mode where .iterate() avoided V8 heap OOM but starved /health long enough for the watchdog to restart the gateway.

Canary was read-only and did not touch the live gateway.

lesaai added 2 commits April 24, 2026 15:41

lesaai merged commit e3f6864 into cc-mini/chat-completions-upstream-20260423 Apr 24, 2026
93 of 97 checks passed

lesaai deleted the cc-mini/memory-core-yield-during-seed branch April 24, 2026 22:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(memory-core): yield to event loop during seedEmbeddingCache (R2.A.2)#2

fix(memory-core): yield to event loop during seedEmbeddingCache (R2.A.2)#2
lesaai merged 2 commits into
cc-mini/chat-completions-upstream-20260423from
cc-mini/memory-core-yield-during-seed

lesaai commented Apr 24, 2026

Uh oh!

lesaai commented Apr 24, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

lesaai commented Apr 24, 2026

Summary

Fix

Validation

Out of scope

Canary plan

Uh oh!

lesaai commented Apr 24, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant