feat(737): global / local ctx_search scope — project: current | global | <abs-path>#758
Draft
jetnet wants to merge 7 commits into
Draft
feat(737): global / local ctx_search scope — project: current | global | <abs-path>#758jetnet wants to merge 7 commits into
jetnet wants to merge 7 commits into
Conversation
added 4 commits
June 1, 2026 19:30
…emory
Decouple the row-level project filter from DB-file selection so ctx_search
gains a real cross-project ("no filter") mode, and fix relevance mode to
search all three sources.
- db.ts: searchEvents accepts projectDir: string | null (null = no filter,
new prepared statement); add canonicalContentDbPath/canonicalSessionDbPath
helpers (hash path, no legacy rename) for read-only search paths.
- auto-memory.ts: searchAutoMemory accepts null -> union across all per-project
memory hash dirs; adapter base-dir detection handles hashed vs non-hashed.
- unified.ts: thread projectScope to session events + auto-memory and run them
in BOTH relevance and timeline (Bug 2); round-robin interleave so session/
auto-memory survive truncation; carry project/sessionId attribution fields.
- ctx-search-schema.ts: expose `project` in per-project mode too; resolver
resolves current to real cwd; steer the model to stay on current unless
the user explicitly asks for global.
- store.ts: export escapeLikeSource so fan-out reuses identical LIKE escaping.
Refs mksglu#737
…ion, bounded)
New src/search/global-fanout.ts implements true cross-project recall:
- listProjectDbs: enumerate every sessions/*.db + content/*.db, pair by project
hash, cap the TOTAL file count and report coverage {totalAvailable, opened,
cap, truncated} so callers can warn on incomplete fan-out.
- Read-only readers (readonlySearchContent FTS5 MATCH + readonlySearchEvents
LIKE): { readonly: true } opens, no WAL pragmas, no schema init, no
corruption repair/migration; every handle closed in finally.
- RRF merge across DBs (k=60), dedupe key excludes per-DB source so identical
cross-DB content fuses; relevance = RRF, timeline = chronological.
- searchAbsPathProject: per-project abs-path scope opens that project's hashed
DBs read-only (incl. worktree-suffixed session DBs), interleaved.
- Attribution: session hits carry project_dir + session_id from the row;
content hits resolve project_dir via the sibling session DB (hash map).
- Default fan-out cap raised 64 -> 1024 (the old default silently dropped most
projects on real 150+ DB installs); strict env parse for
CONTEXT_MODE_GLOBAL_FANOUT_MAX.
- source/contentType filters applied consistently across all sources.
Refs mksglu#737
…overage, limits Wire the new capability into the ctx_search handler via extracted, unit-tested pure helpers (src/search/ctx-search-plan.ts). - planCtxSearchScope: discriminated scope (global | absPathPerProject | rowFilter | current) honouring shared vs per-project mode. - getCurrentWorkingProject(): real cwd (not the pinned CONTEXT_MODE_PROJECT_DIR) so shared-mode "current" filters by the right path. - Global / abs-path branches NEVER open a writable store (no getStore(), no migration); content dir resolved via the non-mutating storage helper. - shouldReturnEmptyGuidance: post-search, only when content is empty AND no hits — so auto-memory-only recall is not masked; empty message now points to project:"global". - Coverage notice: prepend a visible "Global search INCOMPLETE" warning (and on the no-results path) when the fan-out cap drops DBs — never silently partial. - effectiveSearchLimit: breadth scopes (global/abs-path) return 10-12 results (5 throttled) so specific hits are not chopped off behind common-term noise; single-DB stays 1-2. - Result header attribution: proj:<name> + sess:<id>; orphan content in breadth scope labelled proj:(unattributed) so the model can't borrow a wrong project. - extractSnippet: forward-biased window (200 back / 520 fwd) + larger budget so detail after a heading match is captured; hard 40KB output cap enforced incrementally with footer headroom. Refs mksglu#737
- project: current | global | <abs-path> table + examples; note the model should stay on current unless the user explicitly asks for global. - document the per-result attribution header (proj:/sess:) and the "Global search INCOMPLETE" coverage notice. - new "Search environment variables" section documenting CONTEXT_MODE_GLOBAL_FANOUT_MAX (default 1024, behaviour, fallback). Refs mksglu#737
added 3 commits
June 1, 2026 23:57
…t, diversity cap
Cross-project (project:"global") recall surfaced README/tool-call noise but
NOT the human's actual question/decision, because session_events search matched
the ENTIRE query as one LIKE substring and ranked all categories equally.
Validated end-to-end: a real `pi -p` agent now answers "how did we customize
pi-statusline" with the correct project/session/date + the model.provider work.
New src/search/event-query.ts (shared, unit-tested):
- tokenizeSearchQuery: whitespace split, trim non-[A-Za-z0-9%_/@.-] edges while
PRESERVING %/_ LIKE-wildcards, drop English stopwords, keep len>=2 (incl "pi"),
dedupe, cap 8, [] for empty/all-stopword.
- escapeLike (\ % _), tokenWeight (distinctiveness: package/digit tokens outrank
generic words; bare 2-char tokens stay low so "pi" can't dominate).
- buildEventMatch: per-term weighted OR-LIKE; relevance score = (category boost)
* term-sum + 100*exact-phrase. COALESCE(data,'')/COALESCE(category,'') guards
3-valued-logic NULL poisoning (defensive across arbitrary fan-out DBs).
- categoryBoostSql: x3 for high-signal categories (role/user-prompt/decision/
plan), x1 otherwise — boost-only, never penalises file/data recall.
src/session/db.ts searchEvents:
- tokenized dynamic SQL; new orderMode ("timeline" default = chronological,
preserving the documented contract + tests; "relevance" = score DESC,
created_at DESC, id ASC). Relevance fallback (0 terms) orders by recency, NOT
a bare-integer scoreExpr (SQLite would read it as a column index).
src/search/global-fanout.ts readonlySearchEvents + merge:
- same tokenized matcher + orderMode (read-only opens, closed in finally).
- relevance sort: RRF score DESC, origin as an equal-score TIE-break only
(prior-session > content), RRF-safe.
- HARD session diversity cap (max(3, floor(limit/4)) per sessionId, drop surplus,
no backfill) so one chatty session (e.g. the current one, flooding the KB with
the topic being worked on) can't monopolise the window; content/auto-memory
(no sessionId) uncapped; non-positive limit guarded.
src/search/unified.ts threads sort -> orderMode for current/local recall.
Tests: new event-query.test.ts; new global-fanout cases (multi-term recall,
RRF source fairness, 0-term fallback recency order, category boost role>data,
session diversity cap). Reviewed across multiple rounds (oracle + reviewer + an
external model). FTS5 on session_events remains the tracked follow-up.
Refs mksglu#737
…idance CI (test job, ubuntu + macos) caught a regression the targeted local runs missed: the mksglu#737 empty-results guidance was reworded from the old "No results found / After indexing" phrasing to point users at project:"global". The mksglu#442 Read-deny-policy test asserted the OLD wording via `searchedEmpty`, so it failed with "expected false to be true" at server.test.ts:1509. Teach the assertion to also accept the new "no indexed content" phrasing, and add the explicit exfil pin the test comment already described — the denied secret marker must never appear in search output. Full tests/core/server.test.ts now green (475 passed). Refs mksglu#737
…l-content dedupe key
Two MAJORs found in fresh review of the global-search paths (both NEW in this
PR), confirmed across independent review passes:
1. **contentType/source filtering leaked across sources** (src/search/unified.ts
searchAllSources):
- contentType was only applied to the content store, but SessionDB +
auto-memory were still queried → `contentType:"code"` wrongly returned
session events. Now both are gated behind `if (!contentType)`, matching the
policy global-fanout already used (session/auto-memory have no code/prose
classification).
- auto-memory results are now source-filtered (were not), mirroring
global-fanout.
- session-event results now carry their CATEGORY as `source` (was the literal
"prior-session") in BOTH readonlySearchEvents and the unified single-DB
mapper, so the documented `source:"decision"` (etc.) filters session memory
by category in single-DB AND global. `origin` ("prior-session"/
"current-session") is unchanged; attribution display uses origin separately.
2. **Global RRF dedupe fused distinct cross-project hits** (src/search/
global-fanout.ts): itemKey was `title + content.slice(0,80)`, so two distinct
documents sharing a title + 80-char boilerplate prefix (license headers,
decision preambles) collided — one project silently dropped, the other's RRF
score inflated. Key is now `title | content.length | fnv1a32(full content)`
(new dependency-free FNV-1a helper); project/source stay excluded so genuinely
identical content across DBs still fuses as intended.
Tests: new regressions for distinct-prefix non-fusion + identical fusion,
contentType excluding session/auto-memory, auto-memory source filtering, and
category-as-source filtering (single-DB + global). All targeted suites green
(incl. server.test.ts) + typecheck + build.
Refs mksglu#737
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Implements the maintainer-approved design for #737 — a real
project:scope onctx_searchdelivering both project-local and true cross-project (global) recall, with zero migration.What was broken (all confirmed in the issue thread)
project: "global"reached only 1 of 3 sources (ContentStore); session events + auto-memory ignored the scope.sort: "relevance"skipped session memory entirely (only timeline searched it).current/default filtered by the pinned dir in shared mode, not the real cwd → silent 0 results.globalwas meaningless without fan-out across DB files.What this PR does
getCurrentWorkingProject()(real cwd) socurrentis correct in shared mode.sortcontrols ordering only.src/search/global-fanout.ts): unions everysessions/*.db+content/*.db, merges with RRF (k=60), read-only opens (no WAL pragmas / schema init / migration / corruption repair), bounded byCONTEXT_MODE_GLOBAL_FANOUT_MAX(default 1024).<abs-path>scope: shared mode → row filter; per-project mode → open that project's hashed DBs read-only (incl. worktree-suffixed session DBs).proj:<name>+sess:<id>; orphan content in global is labelledproj:(unattributed).searched X of Yand the exact value to raise — never silently partial.Backwards compatibility
Zero migration. Per-project users are unaffected unless they pass
project: "global"; shared-DB users keep working. The one intentional default-behaviour change is the relevance-mode source-set fix (#2 above).Tests
New
tests/core/global-fanout.test.tsandtests/core/ctx-search-plan.test.ts; extendedsearch-project-filter,search, andauto-memory-adaptersuites. Targeted suites green (250+),tsc --noEmitclean,npm run build(assert-bundle + asymmetric-drift) green. Validated end-to-end against a real ~170-project install (read-only confirmed: 0 DB files mutated).Notes for reviewers
prepublishOnly).CONTEXT_MODE_GLOBAL_FANOUT_MAXis documented in README ("Search environment variables").Closes #737