Add low-latency raw memory search mode by ProtonsAndElectrons · Pull Request #195 · XortexAI/XMem

ProtonsAndElectrons · 2026-05-21T18:33:21Z

Closes #163.

Summary

move /v1/memory/search onto a low-latency raw retrieval path that skips LLM tool selection and synthesis by default
include ranked profile, temporal, summary, snippet, and optional code-domain hits
add answer=true to synthesize from already-retrieved raw hits when callers need the slower answer path
cache profile catalogs and repeated retrieval plans for the agentic /retrieve path
report in-process p50/p95/p99 latency stats per retrieval mode

Validation

\.venv\Scripts\python.exe -m pytest tests/integration/test_retrieval_pipeline.py tests/api/test_dependencies_and_routes.py tests/unit/test_schemas.py
\.venv\Scripts\python.exe -m ruff check src/pipelines/retrieval.py src/api/schemas.py src/api/routes/memory.py tests/integration/test_retrieval_pipeline.py
\.venv\Scripts\python.exe -m black --target-version py311 --check src/pipelines/retrieval.py src/api/schemas.py src/api/routes/memory.py tests/integration/test_retrieval_pipeline.py
git diff --check

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f0a4ce247c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-21T18:37:37Z

+        cached = self._profile_catalog_cache.get(user_id)
+        if cached and now - cached[0] <= PROFILE_CATALOG_CACHE_TTL_SECONDS:
+            return cached[1], cached[2]


Invalidate profile catalog cache on fresh memory writes

This cache returns previously fetched profile records for up to 60 seconds without any ingest-triggered invalidation, so a user who ingests new profile facts and immediately calls /v1/memory/retrieve or /v1/memory/search can receive stale results that omit the just-written memory. Because _search_profile and raw profile search both reuse these cached records, this directly regresses read-after-write correctness for normal product flows.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-05-21T18:37:37Z

+                for entry in profile_catalog
+            )
+        )
+        return (user_id, query.strip().lower(), top_k, catalog_key)


Expand retrieval-plan cache key to reflect memory changes

The retrieval-plan cache key only includes user_id, normalized query, top_k, and profile catalog topics/subtopics, so repeated queries within TTL will reuse old tool calls even after non-profile memory changes (e.g., new temporal/summary/snippet data). In that case run() skips tool-selection LLM and can continue querying the wrong domains, producing stale or incomplete answers until the cache expires.

Useful? React with 👍 / 👎.

gemini-code-assist

Code Review

This pull request enhances the memory search functionality by introducing a "code" domain and optional LLM-synthesized answers. Key changes include refactoring the RetrievalPipeline to support raw_search and synthesize_answer methods, implementing caching for profile catalogs and retrieval plans, and adding a latency tracking system with Prometheus metrics. Feedback suggests addressing a potential memory leak in the profile cache, improving encapsulation by avoiding protected method calls, and ensuring that search results are truncated to respect top_k limits and LLM context window constraints.

gemini-code-assist · 2026-05-21T18:38:31Z

+        self._profile_catalog_cache: Dict[
+            str, Tuple[float, List[Dict[str, str]], List[Any]]
+        ] = {}


The _profile_catalog_cache is a dictionary keyed by user_id that grows indefinitely as new users perform searches. Since RetrievalPipeline is a long-lived singleton, this could lead to a memory leak in a production environment with many users. Consider using an LRU cache or setting a maximum size for this dictionary.

gemini-code-assist · 2026-05-21T18:38:31Z

+            code_results = await _search_code(
+                query=req.query,
+                user_id=user_id,
+                org_id=req.org_id,
+                repo=req.repo or "",
+                top_k=req.top_k,
+            )


The _search_code function directly calls the protected method _execute_tool on the CodeRetrievalPipeline. It is generally better to expose a public method for raw search in the pipeline class to maintain encapsulation, similar to how RetrievalPipeline has a raw_search method.

gemini-code-assist · 2026-05-21T18:38:31Z

+
+        if req.answer:
+            answer_start = time.perf_counter()
+            answer = await pipeline.synthesize_answer(req.query, all_results)


When req.answer is true, the all_results list is passed to synthesize_answer without any truncation. If the combined results from memory and code domains are large, this could exceed the LLM's context window. Consider truncating all_results to the top k hits before synthesis.

References

Ensure that context provided to LLMs is within reasonable token limits to avoid truncation or API errors.

gemini-code-assist · 2026-05-21T18:38:31Z

+            memory_results, raw_latency = await pipeline.raw_search(
+                query=req.query,
+                user_id=user_id,
+                domains=memory_domains,
+                top_k=req.top_k,
+            )


The raw_search method returns up to top_k results per domain. With multiple domains selected, the final all_results list can contain significantly more than top_k items. If the API consumer expects at most top_k results total, the final list should be truncated after sorting.

References

Ensure search results respect the requested limit (top_k) across all aggregated domains.

ProtonsAndElectrons · 2026-05-21T18:56:10Z

Follow-up pushed in e555bea:

invalidates retrieval/profile caches after successful memory ingest so raw profile reads and tool plans do not stay stale after writes
bounds the profile catalog cache with LRU eviction and adds per-user memory versions to retrieval-plan cache keys
truncates raw/search results to the requested top_k before optional answer synthesis
adds regression coverage for profile cache invalidation and retrieval-plan cache invalidation

Validation:

./.venv/Scripts/python.exe -m pytest tests/integration/test_retrieval_pipeline.py tests/api/test_dependencies_and_routes.py tests/unit/test_schemas.py
./.venv/Scripts/python.exe -m ruff check src/pipelines/retrieval.py src/api/routes/memory.py tests/integration/test_retrieval_pipeline.py src/api/schemas.py
./.venv/Scripts/python.exe -m black --target-version py311 --check src/pipelines/retrieval.py src/api/routes/memory.py tests/integration/test_retrieval_pipeline.py
git diff --check

GitHub currently reports 1 check on the new head commit, with 0 failed and 0 pending.

ProtonsAndElectrons · 2026-05-21T19:21:42Z

Followed up on the remaining review note about the memory route calling the protected code retrieval dispatcher directly.

Changes in 8f0591e:

added CodeRetrievalPipeline.raw_search(...) as the public no-LLM code search entry point
updated /v1/memory/search code-domain search to call that public method instead of _execute_tool
added focused coverage for the public wrapper and the route helper

Validation run locally:

.\.venv\Scripts\python.exe -m pytest tests/integration/test_code_retrieval_pipeline.py tests/integration/test_retrieval_pipeline.py tests/api/test_dependencies_and_routes.py tests/unit/test_schemas.py
.\.venv\Scripts\python.exe -m ruff check src/pipelines/code_retrieval.py src/api/routes/memory.py tests/integration/test_code_retrieval_pipeline.py tests/api/test_dependencies_and_routes.py
.\.venv\Scripts\python.exe -m black --target-version py311 --check src/api/routes/memory.py tests/integration/test_code_retrieval_pipeline.py tests/api/test_dependencies_and_routes.py
git diff --check

greptile-apps · 2026-05-23T09:22:34Z

Greptile Summary

This PR moves /v1/memory/search onto a low-latency raw retrieval path that skips LLM tool selection by default, adds optional answer synthesis via answer=true, introduces code-domain search through a parallel search_symbols/search_files fast path with RRF fusion, and adds v2 durable async endpoints for ingest, batch-ingest, and scrape. It also caches the profile catalog and retrieval plans per-user with TTL + version-based invalidation, and records per-mode p50/p95/p99 latency internally (only current_ms is surfaced in the API response).

Raw search path (RetrievalPipeline.raw_search): runs profile, temporal, summary, and snippet fetches concurrently via asyncio.gather with per-domain _safe_search wrappers, caches the profile catalog with LRU+TTL, and invalidates caches on ingest via _invalidate_retrieval_cache.
Code domain (CodeRetrievalPipeline.raw_search): search_symbols and search_files are gathered in parallel; results are fused with _fuse_source_records (RRF + stable identity dedup by qualified_name / file_path).
v2 endpoints: async durable job pattern (enqueue → schedule → poll) for ingest, batch-ingest, and scrape, consistent with any existing v2 infrastructure.

Confidence Score: 5/5

Safe to merge; all changes are additive and the new raw-search path degrades gracefully on per-domain failures without discarding already-retrieved results.

The raw search path, caching logic, and code fusion are all implemented correctly. Graceful degradation is consistent across memory domains, code search, and answer synthesis. The caches use proper LRU eviction and version-based invalidation after writes. No data leakage between users was identified in the cache key design or the new v2 endpoints.

No files require special attention, though src/pipelines/retrieval.py and src/api/schemas.py carry the small behavioural changes noted in the review comments.

Important Files Changed

Filename	Overview
src/pipelines/retrieval.py	Adds raw_search, synthesize_answer, record_latency, profile catalog cache (LRU with 60s TTL), and retrieval plan cache (LRU with version-based invalidation). Core logic is sound; _user_memory_versions dict grows without eviction bound.
src/api/routes/memory.py	Rewrites search_memory onto the low-latency raw path with graceful degradation for code and answer failures, adds v2 async ingest/batch-ingest/scrape endpoints backed by the durable job store, and wires cache invalidation into the ingest path.
src/api/schemas.py	Extends SearchRequest with answer, org_id, repo fields and adds snippet/code to allowed domains; snippet silently added to default domain list changes behavior for callers who omit the domains field.
src/pipelines/code_retrieval.py	Adds raw_search fast path that runs search_symbols and search_files in parallel, plus _fuse_source_records (RRF fusion with dedup); content-string fallback in _source_record_key can merge distinct records with identical content.

Sequence Diagram

sequenceDiagram
    participant Client
    participant SearchRoute as POST /v1/memory/search
    participant RetrievalPipeline
    participant CodePipeline as CodeRetrievalPipeline
    participant VectorStore
    participant Neo4j

    Client->>SearchRoute: "{query, domains, top_k, answer?, org_id?}"
    SearchRoute->>SearchRoute: validate org_id if code in domains

    alt memory domains requested
        SearchRoute->>RetrievalPipeline: raw_search(query, user_id, domains, top_k)
        par profile
            RetrievalPipeline->>VectorStore: search_by_metadata (cache-backed)
        and temporal
            RetrievalPipeline->>Neo4j: search_events_by_embedding
        and summary
            RetrievalPipeline->>VectorStore: search_by_text
        and snippet
            RetrievalPipeline->>VectorStore: search_by_text (sandboxed)
        end
        RetrievalPipeline-->>SearchRoute: (results, latency)
    end

    alt code in domains
        SearchRoute->>CodePipeline: raw_search(query, repo, top_k)
        par search_symbols
            CodePipeline->>VectorStore: symbol search
        and search_files
            CodePipeline->>VectorStore: file search
        end
        CodePipeline-->>SearchRoute: fused SourceRecords (RRF)
    end

    SearchRoute->>SearchRoute: sort + slice to top_k

    alt "answer=true"
        SearchRoute->>RetrievalPipeline: synthesize_answer(query, all_results)
        RetrievalPipeline-->>SearchRoute: answer text
    end

    SearchRoute-->>Client: "SearchResponse {results, answer, confidence, mode, latency}"

_{Reviews (8): Last reviewed commit: "Remove test file changes from raw search..." | Re-trigger Greptile}

ProtonsAndElectrons · 2026-05-23T20:44:31Z

Rebased this branch onto current main to clear the merge conflict from the recent upstream local-setup changes.

Local focused validation on c4249289 is green:

python -m pytest tests/api/test_dependencies_and_routes.py tests/integration/test_retrieval_pipeline.py tests/integration/test_code_retrieval_pipeline.py
git diff --check origin/main...HEAD

The GitHub label check is green; Greptile review is still running on the new head.

ProtonsAndElectrons · 2026-05-23T21:43:18Z

Follow-up pushed in 4b6e838 addressing the latest review notes:

validate missing org_id for code-domain searches before running memory-domain raw search
switch retrieval-plan cache to LRU behavior and keep its key independent of top_k
keep aggregate latency percentiles internal/metrics-only; API latency payload now exposes only the current request timing
adjust focused regression coverage for the public latency payload and top-k-independent plan cache reuse

Validation:

python -m pytest tests/api/test_dependencies_and_routes.py tests/integration/test_retrieval_pipeline.py tests/integration/test_code_retrieval_pipeline.py
python -m ruff check src/pipelines/retrieval.py src/api/routes/memory.py tests/integration/test_retrieval_pipeline.py
python -m black --target-version py311 --check src/pipelines/retrieval.py src/api/routes/memory.py tests/integration/test_retrieval_pipeline.py
git diff --check origin/main...HEAD

ProtonsAndElectrons · 2026-05-23T22:19:05Z

Follow-up pushed in 51ef5e0 based on Greptile's code-domain raw search note:

fuse native search_symbols/search_files raw results with reciprocal-rank scoring
dedupe repeated source identities while keeping the highest-scored record
enforce the requested top_k limit on the final code raw-search result set
added regression coverage for duplicate code hits plus final result limiting

Validation:

python -m pytest tests/integration/test_code_retrieval_pipeline.py tests/api/test_dependencies_and_routes.py
python -m ruff check src/pipelines/code_retrieval.py tests/integration/test_code_retrieval_pipeline.py
python -m black --target-version py311 --check tests/integration/test_code_retrieval_pipeline.py
git diff --check origin/main...HEAD

ProtonsAndElectrons · 2026-05-23T23:05:49Z

Follow-up pushed in 033b4d6 for Greptile's code-search failure note:

catch code-domain raw search failures at the mixed /v1/memory/search route boundary
keep already-fetched memory-domain results instead of returning 500 for the whole request
still record code-search latency for observability
added regression coverage proving a profile + code request returns the profile hit when code search raises

Validation:

python -m pytest tests/api/test_dependencies_and_routes.py -q
python -m pytest tests/api/test_dependencies_and_routes.py tests/integration/test_retrieval_pipeline.py tests/integration/test_code_retrieval_pipeline.py -q
python -m black --target-version py311 --check src/api/routes/memory.py tests/api/test_dependencies_and_routes.py
python -m ruff check src/api/routes/memory.py tests/api/test_dependencies_and_routes.py
git diff --check origin/main...HEAD

ProtonsAndElectrons · 2026-05-23T23:29:49Z

Follow-up pushed in e718d5b for Greptile's answer-synthesis failure note:

catch pipeline.synthesize_answer() failures inside /v1/memory/search
keep the successfully retrieved raw results instead of returning 500
fall back to answer="", confidence=0.0, and mode="raw"
still records answer latency for observability
added regression coverage for answer=True preserving raw results when synthesis fails

Validation:

python -m pytest tests/api/test_dependencies_and_routes.py -q
python -m pytest tests/api/test_dependencies_and_routes.py tests/integration/test_retrieval_pipeline.py tests/integration/test_code_retrieval_pipeline.py -q
python -m black --target-version py311 --check src/api/routes/memory.py tests/api/test_dependencies_and_routes.py
python -m ruff check src/api/routes/memory.py tests/api/test_dependencies_and_routes.py
git diff --check origin/main...HEAD

ProtonsAndElectrons · 2026-05-31T08:08:15Z

Quick note on the Greptile P1 comments: the current head addresses both graceful-degradation paths.

033b4d6 preserves already-fetched memory results when code search fails.
e718d5b preserves raw results and falls back to mode="raw"/empty answer when answer synthesis fails.

I re-ran the targeted coverage locally:

.\.venv\Scripts\python.exe -m pytest tests/api/test_dependencies_and_routes.py tests/integration/test_retrieval_pipeline.py tests/integration/test_code_retrieval_pipeline.py -q -> 16 passed

ProtonsAndElectrons · 2026-05-31T16:27:35Z

Addressed the two remaining raw-search review points in 7f62cc0:

preserve the explicit code source_domain even if upstream metadata already contains that key
let code raw search return successful tool results when the sibling tool fails, instead of dropping all code hits

Validation run locally from the repo venv:

.\.venv\Scripts\python.exe -m pytest tests/api/test_dependencies_and_routes.py tests/integration/test_code_retrieval_pipeline.py
# 12 passed

ved015 · 2026-06-01T12:03:21Z

@ProtonsAndElectrons please resolve the conflicts and remove the changes in test files
Ping me after you are done with the changes i will have a look😁

ProtonsAndElectrons · 2026-06-02T05:56:22Z

Done. I rebased the branch onto current main, resolved the conflicts, and removed the test-file changes from the PR diff. The PR now only changes the production files.

Validation from the repo venv:

.\.venv\Scripts\python.exe -m pytest tests/api/test_dependencies_and_routes.py tests/integration/test_code_retrieval_pipeline.py tests/integration/test_retrieval_pipeline.py
# 8 passed

ProtonsAndElectrons · 2026-06-02T06:32:04Z

@ved015 Thanks for the review. I removed the test file changes in 17d0833; the PR diff now only touches the four source files:

src/api/routes/memory.py
src/api/schemas.py
src/pipelines/code_retrieval.py
src/pipelines/retrieval.py

I also rechecked the branch against current origin/main; it is mergeable from my side and git diff --check origin/main...HEAD is clean. Please have another look when you get a chance.

ishaanxgupta · 2026-06-02T08:08:21Z

Codex suggestion — P1: Raw search can drop relevant non-profile domains after global score sorting

In src/pipelines/retrieval.py, raw_search() mixes profile, temporal, summary, and snippet hits, then sorts everything by record.score and truncates to top_k.

Profile hits come from metadata-only lookup, where the vector-store score is not semantic relevance and is effectively 1.0, so a user with top_k profile records can crowd out all temporal/summary/snippet results even when those are the relevant matches for the query. This will not crash, but it can silently break /v1/memory/search quality and answer=true synthesis by feeding the LLM incomplete context.

Suggestion: avoid comparing metadata-only profile scores against semantic scores directly. Either allocate per-domain slots before merging, down-rank/profile-score-normalize metadata hits, or include profile hits separately while preserving top semantic hits from temporal/summary/snippet.

ishaanxgupta · 2026-06-02T08:08:25Z

Codex suggestion — P1: Cache invalidation is incomplete for the active async v2 ingest flow

The new cache invalidation in src/api/routes/memory.py only runs when writes go through _run_ingest_payload().

The current v2 workflow's normal low-effort path writes via per-domain activities (memory_domain_activity) rather than this helper, so profile/summary/temporal writes can still leave the API retrieval cache stale until TTL expiry. Also, if the Temporal worker and API run in different processes, invalidating the worker's singleton would not clear the API process cache.

Suggestion: move invalidation to a place that runs after every successful memory write path, including v2 per-domain activities, and make the API retrieval cache observe a shared/versioned invalidation signal rather than only a process-local method call.

ishaanxgupta · 2026-06-02T08:41:42Z

Thanks a lot for the work here and for following up on the earlier review comments.

After reviewing the current product/API shape, I do not think we need this PR as-is right now. We already have separate memory flows for raw search and LLM-backed retrieval, and this PR broadens /v1/memory/search quite a bit by adding optional answer synthesis, code-domain search, caching, latency tracking, and extra route/schema changes. Some of those ideas may be useful later, but merging all of them together would change existing behavior more than we want at the moment.

I also still see a couple of correctness risks around result ranking across domains and cache invalidation for async ingest paths, so I would rather not merge this into main right now.

Going to close this PR for now. If we revisit this work, the safest path would be a much smaller scoped PR, for example only adding a missing raw-search domain or only improving retrieval latency with benchmarks and focused tests.

ProtonsAndElectrons requested review from ishaanxgupta and ved015 as code owners May 21, 2026 18:33

github-actions Bot added tests api pipelines labels May 21, 2026

chatgpt-codex-connector Bot reviewed May 21, 2026

View reviewed changes

gemini-code-assist Bot reviewed May 21, 2026

View reviewed changes

greptile-apps Bot reviewed May 23, 2026

View reviewed changes

Comment thread src/api/routes/memory.py Outdated

Comment thread src/pipelines/retrieval.py

Comment thread src/pipelines/retrieval.py Outdated

Comment thread src/api/schemas.py

ProtonsAndElectrons force-pushed the bounty-163-raw-search branch from 8f0591e to c424928 Compare May 23, 2026 20:40

greptile-apps Bot reviewed May 23, 2026

View reviewed changes

Comment thread src/api/routes/memory.py

greptile-apps Bot reviewed May 23, 2026

View reviewed changes

Comment thread src/api/routes/memory.py

ProtonsAndElectrons added 8 commits June 2, 2026 07:54

Add low-latency raw memory search mode

3287f4a

Tighten raw search cache and result bounds

d3a1ecb

Expose code retrieval raw search

e584f2e

Address raw search review follow-ups

7b91831

Deduplicate code raw search results

24a313f

Preserve memory search results on code failure

9a97352

Preserve raw results on answer synthesis failure

e193c31

Handle partial code raw-search failures

ca51988

Remove test file changes from raw search PR

17d0833

ProtonsAndElectrons force-pushed the bounty-163-raw-search branch from 7f62cc0 to 17d0833 Compare June 2, 2026 05:56

github-actions Bot removed the tests label Jun 2, 2026

ved015 closed this Jun 2, 2026

Conversation

ProtonsAndElectrons commented May 21, 2026

Summary

Validation

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 21, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot May 21, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 21, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 21, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 21, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 21, 2026

Choose a reason for hiding this comment

Uh oh!

ProtonsAndElectrons commented May 21, 2026

Uh oh!

ProtonsAndElectrons commented May 21, 2026

Uh oh!

greptile-apps Bot commented May 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Sequence Diagram

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ProtonsAndElectrons commented May 23, 2026

Uh oh!

ProtonsAndElectrons commented May 23, 2026

Uh oh!

ProtonsAndElectrons commented May 23, 2026

Uh oh!

Uh oh!

ProtonsAndElectrons commented May 23, 2026

Uh oh!

Uh oh!

ProtonsAndElectrons commented May 23, 2026

Uh oh!

ProtonsAndElectrons commented May 31, 2026

Uh oh!

ProtonsAndElectrons commented May 31, 2026

Uh oh!

ved015 commented Jun 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ProtonsAndElectrons commented Jun 2, 2026

Uh oh!

ProtonsAndElectrons commented Jun 2, 2026

Uh oh!

ishaanxgupta commented Jun 2, 2026

Codex suggestion — P1: Raw search can drop relevant non-profile domains after global score sorting

Uh oh!

ishaanxgupta commented Jun 2, 2026

Codex suggestion — P1: Cache invalidation is incomplete for the active async v2 ingest flow

Uh oh!

ishaanxgupta commented Jun 2, 2026

Uh oh!

Reviewers

Assignees

Labels

greptile-apps Bot commented May 23, 2026 •

edited

Loading

ved015 commented Jun 1, 2026 •

edited

Loading