Conversation
* Feat: MEM-59 — extract.v5 granularity-aware dedup (recover LME single_session_assistant) Paired follow-up to MEM-57. MEM-57's pre-extraction dedup context won LOCOMO big (+10.3) but introduced a granularity-blindness regression on LME single_session_assistant (74.2 peak → 57.6): when <related_memories> held a SUMMARY of a list and the input held the atomic items, the extractor dropped the items as paraphrases of the summary. v5 adds a granularity carve-out to the <related_memories> dedup rules: specific atomic facts (names, numbers, list items, quotes, dates, titles) are extracted even when the context holds only a summary/generalisation of the same topic — plus a worked summary-vs-atomic example. MEM-57's exact-paraphrase dedup (the mechanism behind the LOCOMO win) is preserved explicitly. Pure prompt change. No infra, no latency, parser unchanged. Bumps FACT_EXTRACTION_PROMPT_VERSION extract.v4 → extract.v5. ## Benchmark validation (baseline preset, vs v4 / MEM-57) LongMemEval — the recovery target: - single_session_assistant: 57.6 → 80.9 (+23.3) — above MEM-55 v2's 74.2 peak - multi_session: 82.5 → 80.8 (−1.7) - preference: 80.2 → 80.8 (+0.6) - knowledge_update: 86.5 → 84.3 (−2.2) - single_session_user: 96.1 → 95.5 (−0.6) - temporal: 59.5 → 60.1 (+0.6) - Overall: 76.0 → 77.9 (+1.9) LOCOMO — no-regression check (held flat, all within ±2-3 J noise): - single_hop: 67.3 → 64.8 (−2.5) - multi_hop: 56.7 → 57.9 (+1.2) - open_domain: 71.5 → 70.2 (−1.3) - adversarial: 82.4 → 81.9 (−0.5) - temporal: 45.8 → 45.2 (−0.6) - Overall: 68.5 → 67.4 (−1.1) Closes the cycle-13 RAG work: MEM-57 + MEM-59 deliver both wins as a pair — LOCOMO +10.3 (MEM-57) and LME single_session_assistant fully recovered + overall up (MEM-59). ## Tests 227/227 pass. New test parse_extracted_facts_handles_v5_granularity_extraction pins the multi-atomic-item output round-trips through the parser. Closes MEM-59. * test(extractor): pin extract.v5 granularity carve-out in the prompt asset Deep-review follow-up. The existing parse_extracted_facts_handles_v5_ granularity_extraction test only exercises the (unchanged) parser, so it would still pass if a future edit silently deleted the granularity rule or worked example from prompts/extract.txt — re-introducing the LME single_session_assistant regression with no test signal. Add extract_prompt_asset_contains_v5_granularity_carveout, which asserts the embedded prompt asset still contains: the granularity rule, the worked summary-vs-atomic example (incl. its TAB-separated output line, doubling as a tab-integrity guard), v4's preserved exact-paraphrase dedup rule, and that the version const tracks at extract.v5. Pure test addition — no behavior change, prompt unchanged (still the exact text that produced the validated MEM-59 benchmark numbers). 228/228 pass.
…patibility MEM-60 define relayer compatibility policy
…aults Keep testnet Seal defaults on legacy key servers
…non-manual) (#185) `/api/recall/manual` returned raw pgvector cosine order while `/api/recall` and `/api/ask` applied the CompositeRanker (recency + importance, opt-in via scoring_weights), so the same query + weights gave different orderings across endpoints. Manual recall also validated scoring_weights and then ignored them. Manual recall now applies the same ranker, keeping its lightweight contract: it ranks on the SearchHit fields directly (distance / created_at / importance, all present pre-decrypt) and still returns blob ids + distances WITHOUT a Walrus fetch or SEAL decrypt. All three recall paths now share one ordering logic and agree for the same query + weights. - New `rank_search_hits` reuses the exact `Ranker::rank` the hydrating paths use (no re-implementation of scoring on SearchHit — that would risk drift). - Reorder is index-based, not blob_id-keyed: blob_id is not unique (search_similar has no DISTINCT; restore can produce duplicate-blob_id rows), so a blob_id-keyed round-trip would collapse duplicates and drop hits. - recall_manual validates scoring_weights up front (400 on malformed) like recall. - Default weights short-circuit → cosine order unchanged → existing callers unaffected. Wire shape unchanged (Vec<SearchHit>); only order changes. Tests: 236/236. New recall tests cover manual≡non-manual parity (importance / recency / combined weights), default no-op, duplicate-blob_id no-drop, an 8-item permutation round-trip, and empty/single/field-preservation cases. Closes ENG-1785.
…-upstream-memwal-friction-fixes-before-monday MEM-62: workshop MemWal friction fixes
…-upstream-memwal-friction-fixes-before-monday MEM-62 remove MEMWAL_KEY fallback
…-upstream-memwal-friction-fixes-before-monday MEM-62 update SDK changelogs
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.