Bug: /api/v1/embeddings/maintenance causes 100% CPU and event loop starvation on large corpora

## Bug: `GET /api/v1/embeddings/maintenance` causes 100% CPU and event loop starvation on large corpora

### Summary

On a deployment with ~93K rows in the `traces` table (270MB of vector data), every request to `GET /api/v1/embeddings/maintenance` blocks the Node.js main thread for 4+ minutes at 100% CPU, making the entire OpenClaw gateway unresponsive.

### Root Cause

`computeEmbeddingMaintenanceStats()` (in `core/pipeline/memory-core.ts`) calls `collectEmbeddingSlots()` which:

1. Paginates through **every row** of `traces`, `policies`, `world_model`, and `skills` using `repos.*.list()` 
2. Each `list()` call returns **full rows including BLOB vector columns** (`vec_summary`, `vec_action`)
3. Decodes every vector from `Buffer → Float32Array` in JS
4. Builds a sorted array of all slots

For stats-only queries this is massively wasteful — the endpoint only needs `COUNT(*)` grouped by null/not-null/dimension-mismatch, not the actual vector data.

At 93K traces × 2 vector columns × 1536 bytes each ≈ **270MB of vectors loaded into JS heap** on every request, all synchronously on the main thread via better-sqlite3.

### Evidence

```
# strace during hang — 99.96% pread64 on memos.db
 99.96    0.071932           3     22685           pread64

# Log — onTurnStart blocked for 292 seconds
memos.onTurnStart returned hits=7 durationMs=292731

# Event loop delay
liveness warning: eventLoopDelayMaxMs=285883.8 eventLoopUtilization=1
```

### Proposed Fix

Add `embeddingMaintenanceStats()` to `Repos` (`core/storage/repos/index.ts`) that uses pure SQL `COUNT` queries instead of loading vectors:

```sql
SELECT COUNT(*) AS n FROM traces WHERE vec_summary IS NOT NULL;
SELECT COUNT(*) AS n FROM traces WHERE vec_summary IS NOT NULL AND LENGTH(vec_summary) <> @expectedLen;
-- same for vec_action, policies.vec, world_model.vec, skills.vec
```

Then `computeEmbeddingMaintenanceStats()` calls this SQL path for the common case (configured dimension matches stored dimension), falling back to the full `collectEmbeddingSlots()` only when a dimension mismatch is detected.

### Performance Comparison

| Method | Time | Memory |
|--------|------|--------|
| `collectEmbeddingSlots()` (current) | 4+ min, 100% CPU | +270MB heap |
| SQL COUNT (proposed) | ~900ms, <10% CPU | negligible |

### Additional Fix: Vector Scan Bounding

The tier-2 retrieval path (`scanAndTopK` in `core/storage/vector.ts`) also does a brute-force full-table scan on every `onTurnStart`. With 93K rows × 1536 dims, this alone causes 5-30 second event loop blocks. Proposed: FTS pre-seed + configurable time-window (`vectorScanMaxAgeMs`) to bound the scan.

### Environment

- memos-local-plugin v2.0.5
- OpenClaw gateway (latest)
- memos.db: ~93K traces, 853MB on disk
- Node 26.0.0, better-sqlite3
- Linux x86_64, 16 cores

I have a working patch ready and can submit a PR if the approach is approved.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug: /api/v1/embeddings/maintenance causes 100% CPU and event loop starvation on large corpora #1929

Bug: `GET /api/v1/embeddings/maintenance` causes 100% CPU and event loop starvation on large corpora

Summary

Root Cause

Evidence

Proposed Fix

Performance Comparison

Additional Fix: Vector Scan Bounding

Environment

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Method	Time	Memory
`collectEmbeddingSlots()` (current)	4+ min, 100% CPU	+270MB heap
SQL COUNT (proposed)	~900ms, <10% CPU	negligible

Bug: /api/v1/embeddings/maintenance causes 100% CPU and event loop starvation on large corpora #1929

Description

Bug: GET /api/v1/embeddings/maintenance causes 100% CPU and event loop starvation on large corpora

Summary

Root Cause

Evidence

Proposed Fix

Performance Comparison

Additional Fix: Vector Scan Bounding

Environment

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Bug: `GET /api/v1/embeddings/maintenance` causes 100% CPU and event loop starvation on large corpora