Add low-latency raw memory search mode#195
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: f0a4ce247c
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| cached = self._profile_catalog_cache.get(user_id) | ||
| if cached and now - cached[0] <= PROFILE_CATALOG_CACHE_TTL_SECONDS: | ||
| return cached[1], cached[2] |
There was a problem hiding this comment.
Invalidate profile catalog cache on fresh memory writes
This cache returns previously fetched profile records for up to 60 seconds without any ingest-triggered invalidation, so a user who ingests new profile facts and immediately calls /v1/memory/retrieve or /v1/memory/search can receive stale results that omit the just-written memory. Because _search_profile and raw profile search both reuse these cached records, this directly regresses read-after-write correctness for normal product flows.
Useful? React with 👍 / 👎.
| for entry in profile_catalog | ||
| ) | ||
| ) | ||
| return (user_id, query.strip().lower(), top_k, catalog_key) |
There was a problem hiding this comment.
Expand retrieval-plan cache key to reflect memory changes
The retrieval-plan cache key only includes user_id, normalized query, top_k, and profile catalog topics/subtopics, so repeated queries within TTL will reuse old tool calls even after non-profile memory changes (e.g., new temporal/summary/snippet data). In that case run() skips tool-selection LLM and can continue querying the wrong domains, producing stale or incomplete answers until the cache expires.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Code Review
This pull request enhances the memory search functionality by introducing a "code" domain and optional LLM-synthesized answers. Key changes include refactoring the RetrievalPipeline to support raw_search and synthesize_answer methods, implementing caching for profile catalogs and retrieval plans, and adding a latency tracking system with Prometheus metrics. Feedback suggests addressing a potential memory leak in the profile cache, improving encapsulation by avoiding protected method calls, and ensuring that search results are truncated to respect top_k limits and LLM context window constraints.
| self._profile_catalog_cache: Dict[ | ||
| str, Tuple[float, List[Dict[str, str]], List[Any]] | ||
| ] = {} |
There was a problem hiding this comment.
The _profile_catalog_cache is a dictionary keyed by user_id that grows indefinitely as new users perform searches. Since RetrievalPipeline is a long-lived singleton, this could lead to a memory leak in a production environment with many users. Consider using an LRU cache or setting a maximum size for this dictionary.
| code_results = await _search_code( | ||
| query=req.query, | ||
| user_id=user_id, | ||
| org_id=req.org_id, | ||
| repo=req.repo or "", | ||
| top_k=req.top_k, | ||
| ) |
There was a problem hiding this comment.
|
|
||
| if req.answer: | ||
| answer_start = time.perf_counter() | ||
| answer = await pipeline.synthesize_answer(req.query, all_results) |
There was a problem hiding this comment.
When req.answer is true, the all_results list is passed to synthesize_answer without any truncation. If the combined results from memory and code domains are large, this could exceed the LLM's context window. Consider truncating all_results to the top k hits before synthesis.
References
- Ensure that context provided to LLMs is within reasonable token limits to avoid truncation or API errors.
| memory_results, raw_latency = await pipeline.raw_search( | ||
| query=req.query, | ||
| user_id=user_id, | ||
| domains=memory_domains, | ||
| top_k=req.top_k, | ||
| ) |
There was a problem hiding this comment.
The raw_search method returns up to top_k results per domain. With multiple domains selected, the final all_results list can contain significantly more than top_k items. If the API consumer expects at most top_k results total, the final list should be truncated after sorting.
References
- Ensure search results respect the requested limit (top_k) across all aggregated domains.
|
Follow-up pushed in e555bea:
Validation:
GitHub currently reports 1 check on the new head commit, with 0 failed and 0 pending. |
|
Followed up on the remaining review note about the memory route calling the protected code retrieval dispatcher directly. Changes in
Validation run locally:
|
|
| Filename | Overview |
|---|---|
| src/pipelines/retrieval.py | Adds raw_search, synthesize_answer, record_latency, profile catalog cache (LRU with 60s TTL), and retrieval plan cache (LRU with version-based invalidation). Core logic is sound; _user_memory_versions dict grows without eviction bound. |
| src/api/routes/memory.py | Rewrites search_memory onto the low-latency raw path with graceful degradation for code and answer failures, adds v2 async ingest/batch-ingest/scrape endpoints backed by the durable job store, and wires cache invalidation into the ingest path. |
| src/api/schemas.py | Extends SearchRequest with answer, org_id, repo fields and adds snippet/code to allowed domains; snippet silently added to default domain list changes behavior for callers who omit the domains field. |
| src/pipelines/code_retrieval.py | Adds raw_search fast path that runs search_symbols and search_files in parallel, plus _fuse_source_records (RRF fusion with dedup); content-string fallback in _source_record_key can merge distinct records with identical content. |
Sequence Diagram
sequenceDiagram
participant Client
participant SearchRoute as POST /v1/memory/search
participant RetrievalPipeline
participant CodePipeline as CodeRetrievalPipeline
participant VectorStore
participant Neo4j
Client->>SearchRoute: "{query, domains, top_k, answer?, org_id?}"
SearchRoute->>SearchRoute: validate org_id if code in domains
alt memory domains requested
SearchRoute->>RetrievalPipeline: raw_search(query, user_id, domains, top_k)
par profile
RetrievalPipeline->>VectorStore: search_by_metadata (cache-backed)
and temporal
RetrievalPipeline->>Neo4j: search_events_by_embedding
and summary
RetrievalPipeline->>VectorStore: search_by_text
and snippet
RetrievalPipeline->>VectorStore: search_by_text (sandboxed)
end
RetrievalPipeline-->>SearchRoute: (results, latency)
end
alt code in domains
SearchRoute->>CodePipeline: raw_search(query, repo, top_k)
par search_symbols
CodePipeline->>VectorStore: symbol search
and search_files
CodePipeline->>VectorStore: file search
end
CodePipeline-->>SearchRoute: fused SourceRecords (RRF)
end
SearchRoute->>SearchRoute: sort + slice to top_k
alt "answer=true"
SearchRoute->>RetrievalPipeline: synthesize_answer(query, all_results)
RetrievalPipeline-->>SearchRoute: answer text
end
SearchRoute-->>Client: "SearchResponse {results, answer, confidence, mode, latency}"
Reviews (8): Last reviewed commit: "Remove test file changes from raw search..." | Re-trigger Greptile
8f0591e to
c424928
Compare
|
Rebased this branch onto current Local focused validation on
The GitHub label check is green; Greptile review is still running on the new head. |
|
Follow-up pushed in
Validation:
|
|
Follow-up pushed in
Validation:
|
|
Follow-up pushed in
Validation:
|
|
Follow-up pushed in
Validation:
|
|
Quick note on the Greptile P1 comments: the current head addresses both graceful-degradation paths.
I re-ran the targeted coverage locally:
|
|
Addressed the two remaining raw-search review points in
Validation run locally from the repo venv: |
|
@ProtonsAndElectrons please resolve the conflicts and remove the changes in test files |
7f62cc0 to
17d0833
Compare
|
Done. I rebased the branch onto current Validation from the repo venv: |
|
@ved015 Thanks for the review. I removed the test file changes in 17d0833; the PR diff now only touches the four source files:
I also rechecked the branch against current |
Codex suggestion — P1: Raw search can drop relevant non-profile domains after global score sortingIn Profile hits come from metadata-only lookup, where the vector-store score is not semantic relevance and is effectively Suggestion: avoid comparing metadata-only profile scores against semantic scores directly. Either allocate per-domain slots before merging, down-rank/profile-score-normalize metadata hits, or include profile hits separately while preserving top semantic hits from temporal/summary/snippet. |
Codex suggestion — P1: Cache invalidation is incomplete for the active async v2 ingest flowThe new cache invalidation in The current v2 workflow's normal low-effort path writes via per-domain activities ( Suggestion: move invalidation to a place that runs after every successful memory write path, including v2 per-domain activities, and make the API retrieval cache observe a shared/versioned invalidation signal rather than only a process-local method call. |
|
Thanks a lot for the work here and for following up on the earlier review comments. After reviewing the current product/API shape, I do not think we need this PR as-is right now. We already have separate memory flows for raw search and LLM-backed retrieval, and this PR broadens I also still see a couple of correctness risks around result ranking across domains and cache invalidation for async ingest paths, so I would rather not merge this into Going to close this PR for now. If we revisit this work, the safest path would be a much smaller scoped PR, for example only adding a missing raw-search domain or only improving retrieval latency with benchmarks and focused tests. |
Closes #163.
Summary
/v1/memory/searchonto a low-latency raw retrieval path that skips LLM tool selection and synthesis by defaultanswer=trueto synthesize from already-retrieved raw hits when callers need the slower answer path/retrievepathValidation
\.venv\Scripts\python.exe -m pytest tests/integration/test_retrieval_pipeline.py tests/api/test_dependencies_and_routes.py tests/unit/test_schemas.py\.venv\Scripts\python.exe -m ruff check src/pipelines/retrieval.py src/api/schemas.py src/api/routes/memory.py tests/integration/test_retrieval_pipeline.py\.venv\Scripts\python.exe -m black --target-version py311 --check src/pipelines/retrieval.py src/api/schemas.py src/api/routes/memory.py tests/integration/test_retrieval_pipeline.pygit diff --check