staging <- dev by ducnmm · Pull Request #190 · MystenLabs/MemWal

ducnmm · 2026-05-24T01:33:37Z

No description provided.

* Feat: MEM-59 — extract.v5 granularity-aware dedup (recover LME single_session_assistant) Paired follow-up to MEM-57. MEM-57's pre-extraction dedup context won LOCOMO big (+10.3) but introduced a granularity-blindness regression on LME single_session_assistant (74.2 peak → 57.6): when <related_memories> held a SUMMARY of a list and the input held the atomic items, the extractor dropped the items as paraphrases of the summary. v5 adds a granularity carve-out to the <related_memories> dedup rules: specific atomic facts (names, numbers, list items, quotes, dates, titles) are extracted even when the context holds only a summary/generalisation of the same topic — plus a worked summary-vs-atomic example. MEM-57's exact-paraphrase dedup (the mechanism behind the LOCOMO win) is preserved explicitly. Pure prompt change. No infra, no latency, parser unchanged. Bumps FACT_EXTRACTION_PROMPT_VERSION extract.v4 → extract.v5. ## Benchmark validation (baseline preset, vs v4 / MEM-57) LongMemEval — the recovery target: - single_session_assistant: 57.6 → 80.9 (+23.3) — above MEM-55 v2's 74.2 peak - multi_session: 82.5 → 80.8 (−1.7) - preference: 80.2 → 80.8 (+0.6) - knowledge_update: 86.5 → 84.3 (−2.2) - single_session_user: 96.1 → 95.5 (−0.6) - temporal: 59.5 → 60.1 (+0.6) - Overall: 76.0 → 77.9 (+1.9) LOCOMO — no-regression check (held flat, all within ±2-3 J noise): - single_hop: 67.3 → 64.8 (−2.5) - multi_hop: 56.7 → 57.9 (+1.2) - open_domain: 71.5 → 70.2 (−1.3) - adversarial: 82.4 → 81.9 (−0.5) - temporal: 45.8 → 45.2 (−0.6) - Overall: 68.5 → 67.4 (−1.1) Closes the cycle-13 RAG work: MEM-57 + MEM-59 deliver both wins as a pair — LOCOMO +10.3 (MEM-57) and LME single_session_assistant fully recovered + overall up (MEM-59). ## Tests 227/227 pass. New test parse_extracted_facts_handles_v5_granularity_extraction pins the multi-atomic-item output round-trips through the parser. Closes MEM-59. * test(extractor): pin extract.v5 granularity carve-out in the prompt asset Deep-review follow-up. The existing parse_extracted_facts_handles_v5_ granularity_extraction test only exercises the (unchanged) parser, so it would still pass if a future edit silently deleted the granularity rule or worked example from prompts/extract.txt — re-introducing the LME single_session_assistant regression with no test signal. Add extract_prompt_asset_contains_v5_granularity_carveout, which asserts the embedded prompt asset still contains: the granularity rule, the worked summary-vs-atomic example (incl. its TAB-separated output line, doubling as a tab-integrity guard), v4's preserved exact-paraphrase dedup rule, and that the version const tracks at extract.v5. Pure test addition — no behavior change, prompt unchanged (still the exact text that produced the validated MEM-59 benchmark numbers). 228/228 pass.

…patibility MEM-60 define relayer compatibility policy

…aults Keep testnet Seal defaults on legacy key servers

…non-manual) (#185) `/api/recall/manual` returned raw pgvector cosine order while `/api/recall` and `/api/ask` applied the CompositeRanker (recency + importance, opt-in via scoring_weights), so the same query + weights gave different orderings across endpoints. Manual recall also validated scoring_weights and then ignored them. Manual recall now applies the same ranker, keeping its lightweight contract: it ranks on the SearchHit fields directly (distance / created_at / importance, all present pre-decrypt) and still returns blob ids + distances WITHOUT a Walrus fetch or SEAL decrypt. All three recall paths now share one ordering logic and agree for the same query + weights. - New `rank_search_hits` reuses the exact `Ranker::rank` the hydrating paths use (no re-implementation of scoring on SearchHit — that would risk drift). - Reorder is index-based, not blob_id-keyed: blob_id is not unique (search_similar has no DISTINCT; restore can produce duplicate-blob_id rows), so a blob_id-keyed round-trip would collapse duplicates and drop hits. - recall_manual validates scoring_weights up front (400 on malformed) like recall. - Default weights short-circuit → cosine order unchanged → existing callers unaffected. Wire shape unchanged (Vec<SearchHit>); only order changes. Tests: 236/236. New recall tests cover manual≡non-manual parity (importance / recency / combined weights), default no-op, duplicate-blob_id no-drop, an 8-item permutation round-trip, and empty/single/field-preservation cases. Closes ENG-1785.

…-upstream-memwal-friction-fixes-before-monday MEM-62: workshop MemWal friction fixes

…-upstream-memwal-friction-fixes-before-monday MEM-62 remove MEMWAL_KEY fallback

…-upstream-memwal-friction-fixes-before-monday MEM-62 update SDK changelogs

Nguyen Mau Minh Duc and others added 16 commits May 21, 2026 17:29

MEM-60 define relayer compatibility policy

7910934

docs: align Python SDK credential headers

a84bba9

chore: bump SDK release versions

630ba20

Merge pull request #184 from MystenLabs/mem-60-relayer-versioning-com…

740e541

…patibility MEM-60 define relayer compatibility policy

Keep testnet Seal defaults on legacy key servers

e7db144

Merge dev into Seal legacy defaults

412144f

Merge pull request #186 from MystenLabs/codex/seal-legacy-testnet-def…

e267d12

…aults Keep testnet Seal defaults on legacy key servers

MEM-62 workshop MemWal friction fixes

e298353

Merge pull request #187 from MystenLabs/feature/mem-62-workshop-track…

5201ff7

…-upstream-memwal-friction-fixes-before-monday MEM-62: workshop MemWal friction fixes

MEM-62 remove MEMWAL_KEY fallback

9758fc0

Merge pull request #188 from MystenLabs/feature/mem-62-workshop-track…

2b5ecda

…-upstream-memwal-friction-fixes-before-monday MEM-62 remove MEMWAL_KEY fallback

MEM-62 update SDK changelogs

b96389c

chore: rebrand harmless text to Walrus Memory

782e99d

Merge pull request #189 from MystenLabs/feature/mem-62-workshop-track…

67fe70b

…-upstream-memwal-friction-fixes-before-monday MEM-62 update SDK changelogs

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

staging <- dev#190

staging <- dev#190
ducnmm wants to merge 16 commits into
stagingfrom
dev

ducnmm commented May 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

ducnmm commented May 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants