feat(737): global / local ctx_search scope — project: current | global | <abs-path> by jetnet · Pull Request #758 · mksglu/context-mode

jetnet · 2026-06-01T17:33:04Z

Summary

Implements the maintainer-approved design for #737 — a real project: scope on ctx_search delivering both project-local and true cross-project (global) recall, with zero migration.

ctx_search(queries: [...])                       # current project (default)
ctx_search(queries: [...], project: "global")    # fan-out across ALL projects
ctx_search(queries: [...], project: "/abs/path") # a specific project

What was broken (all confirmed in the issue thread)

project: "global" reached only 1 of 3 sources (ContentStore); session events + auto-memory ignored the scope.
Default sort: "relevance" skipped session memory entirely (only timeline searched it).
current/default filtered by the pinned dir in shared mode, not the real cwd → silent 0 results.
Per-project mode physically opens only the current DB — global was meaningless without fan-out across DB files.

What this PR does

Decouples the row-level filter from DB-file selection; adds getCurrentWorkingProject() (real cwd) so current is correct in shared mode.
Relevance now searches all three sources (content + session events + auto-memory); sort controls ordering only.
Global fan-out (src/search/global-fanout.ts): unions every sessions/*.db + content/*.db, merges with RRF (k=60), read-only opens (no WAL pragmas / schema init / migration / corruption repair), bounded by CONTEXT_MODE_GLOBAL_FANOUT_MAX (default 1024).
<abs-path> scope: shared mode → row filter; per-project mode → open that project's hashed DBs read-only (incl. worktree-suffixed session DBs).
Attribution: each result header names its origin — proj:<name> + sess:<id>; orphan content in global is labelled proj:(unattributed).
Coverage notice: if the fan-out cap drops DBs, a visible "Global search INCOMPLETE" line reports searched X of Y and the exact value to raise — never silently partial.
Result limits: breadth scopes return 10–12 hits/query (so specific answers aren't buried behind common-term noise); single-DB stays 1–2. Hard 40KB output cap enforced incrementally.
Snippet detail: forward-biased window so explanation after a heading match is captured.

Backwards compatibility

Zero migration. Per-project users are unaffected unless they pass project: "global"; shared-DB users keep working. The one intentional default-behaviour change is the relevance-mode source-set fix (#2 above).

Tests

New tests/core/global-fanout.test.ts and tests/core/ctx-search-plan.test.ts; extended search-project-filter, search, and auto-memory-adapter suites. Targeted suites green (250+), tsc --noEmit clean, npm run build (assert-bundle + asymmetric-drift) green. Validated end-to-end against a real ~170-project install (read-only confirmed: 0 DB files mutated).

Notes for reviewers

Bundles are intentionally not included (gitignored; regenerated at release via prepublishOnly).
New env var CONTEXT_MODE_GLOBAL_FANOUT_MAX is documented in README ("Search environment variables").

Closes #737

…emory Decouple the row-level project filter from DB-file selection so ctx_search gains a real cross-project ("no filter") mode, and fix relevance mode to search all three sources. - db.ts: searchEvents accepts projectDir: string | null (null = no filter, new prepared statement); add canonicalContentDbPath/canonicalSessionDbPath helpers (hash path, no legacy rename) for read-only search paths. - auto-memory.ts: searchAutoMemory accepts null -> union across all per-project memory hash dirs; adapter base-dir detection handles hashed vs non-hashed. - unified.ts: thread projectScope to session events + auto-memory and run them in BOTH relevance and timeline (Bug 2); round-robin interleave so session/ auto-memory survive truncation; carry project/sessionId attribution fields. - ctx-search-schema.ts: expose `project` in per-project mode too; resolver resolves current to real cwd; steer the model to stay on current unless the user explicitly asks for global. - store.ts: export escapeLikeSource so fan-out reuses identical LIKE escaping. Refs mksglu#737

…ion, bounded) New src/search/global-fanout.ts implements true cross-project recall: - listProjectDbs: enumerate every sessions/*.db + content/*.db, pair by project hash, cap the TOTAL file count and report coverage {totalAvailable, opened, cap, truncated} so callers can warn on incomplete fan-out. - Read-only readers (readonlySearchContent FTS5 MATCH + readonlySearchEvents LIKE): { readonly: true } opens, no WAL pragmas, no schema init, no corruption repair/migration; every handle closed in finally. - RRF merge across DBs (k=60), dedupe key excludes per-DB source so identical cross-DB content fuses; relevance = RRF, timeline = chronological. - searchAbsPathProject: per-project abs-path scope opens that project's hashed DBs read-only (incl. worktree-suffixed session DBs), interleaved. - Attribution: session hits carry project_dir + session_id from the row; content hits resolve project_dir via the sibling session DB (hash map). - Default fan-out cap raised 64 -> 1024 (the old default silently dropped most projects on real 150+ DB installs); strict env parse for CONTEXT_MODE_GLOBAL_FANOUT_MAX. - source/contentType filters applied consistently across all sources. Refs mksglu#737

…overage, limits Wire the new capability into the ctx_search handler via extracted, unit-tested pure helpers (src/search/ctx-search-plan.ts). - planCtxSearchScope: discriminated scope (global | absPathPerProject | rowFilter | current) honouring shared vs per-project mode. - getCurrentWorkingProject(): real cwd (not the pinned CONTEXT_MODE_PROJECT_DIR) so shared-mode "current" filters by the right path. - Global / abs-path branches NEVER open a writable store (no getStore(), no migration); content dir resolved via the non-mutating storage helper. - shouldReturnEmptyGuidance: post-search, only when content is empty AND no hits — so auto-memory-only recall is not masked; empty message now points to project:"global". - Coverage notice: prepend a visible "Global search INCOMPLETE" warning (and on the no-results path) when the fan-out cap drops DBs — never silently partial. - effectiveSearchLimit: breadth scopes (global/abs-path) return 10-12 results (5 throttled) so specific hits are not chopped off behind common-term noise; single-DB stays 1-2. - Result header attribution: proj:<name> + sess:<id>; orphan content in breadth scope labelled proj:(unattributed) so the model can't borrow a wrong project. - extractSnippet: forward-biased window (200 back / 520 fwd) + larger budget so detail after a heading match is captured; hard 40KB output cap enforced incrementally with footer headroom. Refs mksglu#737

- project: current | global | <abs-path> table + examples; note the model should stay on current unless the user explicitly asks for global. - document the per-result attribution header (proj:/sess:) and the "Global search INCOMPLETE" coverage notice. - new "Search environment variables" section documenting CONTEXT_MODE_GLOBAL_FANOUT_MAX (default 1024, behaviour, fallback). Refs mksglu#737

…t, diversity cap Cross-project (project:"global") recall surfaced README/tool-call noise but NOT the human's actual question/decision, because session_events search matched the ENTIRE query as one LIKE substring and ranked all categories equally. Validated end-to-end: a real `pi -p` agent now answers "how did we customize pi-statusline" with the correct project/session/date + the model.provider work. New src/search/event-query.ts (shared, unit-tested): - tokenizeSearchQuery: whitespace split, trim non-[A-Za-z0-9%_/@.-] edges while PRESERVING %/_ LIKE-wildcards, drop English stopwords, keep len>=2 (incl "pi"), dedupe, cap 8, [] for empty/all-stopword. - escapeLike (\ % _), tokenWeight (distinctiveness: package/digit tokens outrank generic words; bare 2-char tokens stay low so "pi" can't dominate). - buildEventMatch: per-term weighted OR-LIKE; relevance score = (category boost) * term-sum + 100*exact-phrase. COALESCE(data,'')/COALESCE(category,'') guards 3-valued-logic NULL poisoning (defensive across arbitrary fan-out DBs). - categoryBoostSql: x3 for high-signal categories (role/user-prompt/decision/ plan), x1 otherwise — boost-only, never penalises file/data recall. src/session/db.ts searchEvents: - tokenized dynamic SQL; new orderMode ("timeline" default = chronological, preserving the documented contract + tests; "relevance" = score DESC, created_at DESC, id ASC). Relevance fallback (0 terms) orders by recency, NOT a bare-integer scoreExpr (SQLite would read it as a column index). src/search/global-fanout.ts readonlySearchEvents + merge: - same tokenized matcher + orderMode (read-only opens, closed in finally). - relevance sort: RRF score DESC, origin as an equal-score TIE-break only (prior-session > content), RRF-safe. - HARD session diversity cap (max(3, floor(limit/4)) per sessionId, drop surplus, no backfill) so one chatty session (e.g. the current one, flooding the KB with the topic being worked on) can't monopolise the window; content/auto-memory (no sessionId) uncapped; non-positive limit guarded. src/search/unified.ts threads sort -> orderMode for current/local recall. Tests: new event-query.test.ts; new global-fanout cases (multi-term recall, RRF source fairness, 0-term fallback recency order, category boost role>data, session diversity cap). Reviewed across multiple rounds (oracle + reviewer + an external model). FTS5 on session_events remains the tracked follow-up. Refs mksglu#737

…idance CI (test job, ubuntu + macos) caught a regression the targeted local runs missed: the mksglu#737 empty-results guidance was reworded from the old "No results found / After indexing" phrasing to point users at project:"global". The mksglu#442 Read-deny-policy test asserted the OLD wording via `searchedEmpty`, so it failed with "expected false to be true" at server.test.ts:1509. Teach the assertion to also accept the new "no indexed content" phrasing, and add the explicit exfil pin the test comment already described — the denied secret marker must never appear in search output. Full tests/core/server.test.ts now green (475 passed). Refs mksglu#737

…l-content dedupe key Two MAJORs found in fresh review of the global-search paths (both NEW in this PR), confirmed across independent review passes: 1. **contentType/source filtering leaked across sources** (src/search/unified.ts searchAllSources): - contentType was only applied to the content store, but SessionDB + auto-memory were still queried → `contentType:"code"` wrongly returned session events. Now both are gated behind `if (!contentType)`, matching the policy global-fanout already used (session/auto-memory have no code/prose classification). - auto-memory results are now source-filtered (were not), mirroring global-fanout. - session-event results now carry their CATEGORY as `source` (was the literal "prior-session") in BOTH readonlySearchEvents and the unified single-DB mapper, so the documented `source:"decision"` (etc.) filters session memory by category in single-DB AND global. `origin` ("prior-session"/ "current-session") is unchanged; attribution display uses origin separately. 2. **Global RRF dedupe fused distinct cross-project hits** (src/search/ global-fanout.ts): itemKey was `title + content.slice(0,80)`, so two distinct documents sharing a title + 80-char boilerplate prefix (license headers, decision preambles) collided — one project silently dropped, the other's RRF score inflated. Key is now `title | content.length | fnv1a32(full content)` (new dependency-free FNV-1a helper); project/source stay excluded so genuinely identical content across DBs still fuses as intended. Tests: new regressions for distinct-prefix non-fusion + identical fusion, contentType excluding session/auto-memory, auto-memory source filtering, and category-as-source filtering (single-DB + global). All targeted suites green (incl. server.test.ts) + typecheck + build. Refs mksglu#737

jetnet added 4 commits June 1, 2026 19:30

mksglu changed the base branch from main to next June 1, 2026 18:30

jetnet added 3 commits June 1, 2026 23:57

mksglu marked this pull request as draft June 3, 2026 06:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(737): global / local ctx_search scope — project: current | global | <abs-path>#758

feat(737): global / local ctx_search scope — project: current | global | <abs-path>#758
jetnet wants to merge 7 commits into
mksglu:nextfrom
jetnet:feat/737-global-search

jetnet commented Jun 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

jetnet commented Jun 1, 2026

Summary

What was broken (all confirmed in the issue thread)

What this PR does

Backwards compatibility

Tests

Notes for reviewers

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant