Clear removed marker when branch selection supersedes the local member's removal#724
Conversation
…r's removal A commit that removes the local member's own leaf realizes removal (Group.removed set, send gate closed, group presented as removed). When that commit later loses branch selection, the withdrawal machinery from #702 already retracts the self-removed notification, but the marker semantics were unspecified and untested: supersession tests used rename/retention commits only, and eviction tests never superseded a removal. Spec reading (marmot-protocol/marmot): member-departure.md defines removal evidence as "an accepted commit on the selected canonical branch ... recorded in retained canonical state" and makes realization state-derived; convergence.md "Applying the selected branch" requires that a change that lost branch selection "MUST NOT remain visible to the application as a completed change". A superseded removal was never canonically applied, so the copy must stop self-quarantining — the "terminal for that group copy" rule presumes the removal stays canonical (clarification filed as marmot-protocol/marmot#220). Implementation-wise the reorg apply already restored the pre-removal group record from the retained anchor (snapshots capture the whole record), so the marker could not actually survive a convergence reorg — but that was incidental to snapshot mechanics, undocumented, and untested. This change makes the semantics explicit: - distributed_convergence: when the selected canonical branch records the local member's membership, clear a surviving removed marker as the state-derived inverse of realize_self_eviction (reconciles a pathological marker no anchor restore can see; keeps the send-gate posture derivable from canonical state alone). - engine/traits docs: the marker clears via authenticated re-join OR supersession of the removal that set it; it is terminal only while the removal remains canonical. - tests: superseded_self_removal_clears_removed_marker_and_restores_send pins the full resulting view end-to-end (a client whose own removal loses branch selection: withdrawal names the superseded removal in the stamped id space, marker cleared, membership and the winner's rename presented, roster-correction row emitted, send gate reopens); convergence_apply_clears_removed_marker_without_canonical_evidence pins the explicit reconciliation guard. - cli CHANGELOG: sends work again on a group whose removal was superseded, instead of failing invalid_transition forever. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
|
Bugbot is not enabled for your account, so this pull request was not reviewed. Enable Bugbot in the Cursor dashboard to get automatic reviews on future PRs. |
|
Ready to review this PR? Stage has broken it down into 3 individual chapters for you:
Chapters generated by Stage for commit 101873a on Jul 4, 2026 9:44am UTC. |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (1)
WalkthroughThis PR clears a stale ChangesRemoved marker reconciliation
Estimated code review effort: 3 (Moderate) | ~25 minutes Possibly related issues
Possibly related PRs
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
crates/cgka-engine/src/distributed_convergence.rs (1)
634-668: 🚀 Performance & Scalability | 🔵 Trivial | ⚡ Quick winAvoid unconditional
get_groupfetch on the common (already-not-removed) path.This block runs on every settled convergence pass where self remains a member, and unconditionally calls
storage.get_groupto checkgroup.removed, even thoughcurrent_group.removedwas already loaded earlier in this same function (beforecurrent_group.memberswas consumed at thefor member in current_group.membersloop). Capturing that bool before the move and gating the extra fetch/write on it would save a redundant storage read on the hot common case (marker not set), matching the perf-consciousness already applied elsewhere in this file (e.g., theseen_message_ids_hexcache noted for the "up to 16" pass drain).♻️ Suggested refactor
+ let self_was_removed = current_group.removed; let previous_ids: HashSet<MemberId> = previous_members @@ if current_ids.contains(self.identity.self_id()) { - let mut group = self - .storage - .get_group(group_id) - .map_err(|e| OpenMlsProjectionError::Storage(format!("{e:?}")))?; - if group.removed { - group.removed = false; - self.storage - .put_group(&group) - .map_err(|e| OpenMlsProjectionError::Storage(format!("{e:?}")))?; - tracing::info!( - target: "cgka_engine::distributed_convergence", - method = "emit_convergence_events", - "cleared removed marker: selected canonical branch records local membership" - ); + if self_was_removed { + let mut group = self + .storage + .get_group(group_id) + .map_err(|e| OpenMlsProjectionError::Storage(format!("{e:?}")))?; + if group.removed { + group.removed = false; + self.storage + .put_group(&group) + .map_err(|e| OpenMlsProjectionError::Storage(format!("{e:?}")))?; + tracing::info!( + target: "cgka_engine::distributed_convergence", + method = "emit_convergence_events", + "cleared removed marker: selected canonical branch records local membership" + ); + } }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/cgka-engine/src/distributed_convergence.rs` around lines 634 - 668, The `emit_convergence_events` path is doing an unnecessary `storage.get_group` on every settled pass even when the group is already not removed. Capture the earlier `current_group.removed` state before `current_group.members` is consumed, then use that bool to skip the extra read/write unless the marker was actually set. Keep the existing clear-and-log behavior only for the removed case, and update the later `get_group`/`put_group` logic accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@crates/cgka-engine/src/distributed_convergence.rs`:
- Around line 634-668: The `emit_convergence_events` path is doing an
unnecessary `storage.get_group` on every settled pass even when the group is
already not removed. Capture the earlier `current_group.removed` state before
`current_group.members` is consumed, then use that bool to skip the extra
read/write unless the marker was actually set. Keep the existing clear-and-log
behavior only for the removed case, and update the later `get_group`/`put_group`
logic accordingly.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 31a018d5-10db-491a-8a39-630198feaac4
📒 Files selected for processing (5)
crates/cgka-engine/src/distributed_convergence.rscrates/cgka-engine/src/engine.rscrates/cgka-engine/tests/distributed_convergence.rscrates/cli/CHANGELOG.mdcrates/traits/src/group.rs
Review follow-up (CodeRabbit): emit_convergence_events already holds the post-apply group record, so capture its removed flag before the members vec is consumed and skip the extra get_group read on the common (not-removed) path. The rare marked path still re-checks under a fresh record before writing. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
|
Bugbot is not enabled for your account, so this pull request was not reviewed. Enable Bugbot in the Cursor dashboard to get automatic reviews on future PRs. |
erskingardner
left a comment
There was a problem hiding this comment.
Adversarial review summary
Solid fix for a real spec gap: making the removed-marker clear explicit rather than relying on anchor-restore side effects is the right defensive move, and convergence_apply_clears_removed_marker_without_canonical_evidence is the test that actually gates the new code.
I don't see a correctness bug in the convergence guard itself — the mutual exclusion with the removal loop in the same pass looks sound, and the group_record_was_removed snapshot + re-fetch pattern is a reasonable optimization.
Main gaps are parity (direct seam still snapshot-only), coverage (the big E2E test passes without this diff), and restart (hydration never reconciles a stale marker). Inline notes below.
Overall: approve direction, with follow-ups worth tracking before/soon after merge.
|
Two additional adversarial notes that don't map to changed lines in this diff: 1. Direct-seam parity gap (
|
…ped record Review follow-ups (PR #724): - The reconciliation guard no longer trusts the Marmot record roster alone: self_leaf_is_active_in_mls requires the live OpenMLS group to load, be active, and carry our identity on its own leaf (fail-closed) before the marker clears, so the heal stays derivable from canonical MLS state even if the record roster were stale. - Trimmed the inline spec rationale to a pointer; the full removed-marker lifecycle now lives in module-level rustdoc on distributed_convergence. - New test convergence_apply_heals_fully_evicted_shaped_record: the full post-realize_self_eviction record shape (removed = true, self stripped from members) heals on a forward convergence apply — roster rebuilt from replayed MLS state, marker cleared, send gate reopened, roster- correction MemberAdded emitted. Fails without the reconciliation guard (verified against the pre-PR source). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
|
Bugbot is not enabled for your account, so this pull request was not reviewed. Enable Bugbot in the Cursor dashboard to get automatic reviews on future PRs. |
What
Handles the #702 × #703 interaction left out of scope during the #702 merge: a commit that removes the local member's own leaf is applied (realizing removal —
Group.removedset, send gate closed, group presented as removed), and that commit later loses branch selection. The withdrawal machinery from #702 already retracts the self-removed notification; this PR pins the marker semantics so the device does not stay self-quarantined for a group it canonically still belongs to.distributed_convergence::emit_convergence_events: when the selected canonical branch records the local member's membership, a survivingremovedmarker is cleared — the explicit, state-derived inverse ofrealize_self_eviction. Outbound intents purged at realization stay purged; the app re-issues sends against the reopened gate.engine.rs/traits::Group: the marker is terminal only while the removal remains canonical; it clears via authenticated re-join or supersession of the removal that set it.invalid_transitionforever.Spec basis
member-departure.md defines removal evidence as "an accepted commit on the selected canonical branch … recorded in retained canonical state" and makes realization state-derived; convergence.md ("Applying the selected branch") requires that a change that lost branch selection "MUST NOT remain visible to the application as a completed change". A superseded removal was never canonically applied, so clearing is spec-required, not a violation of the "terminal for that group copy" rule — that rule presumes the removal stays canonical. Since the terminal sentence can be misread as unconditional, a spec clarification is filed as marmot-protocol/marmot#220.
What a reviewer should know
The marker could not actually survive a convergence reorg on master. The reorg apply rolls back to the retained anchor before replaying the winner, and SQLite group snapshots capture the whole group record — so the anchor restore already reset
removed: falseincidentally (seesnapshot_rollback_restores_group_messages_queue_caps_and_openmls_group_state). The resulting view was spec-correct, but only as an undocumented side effect of snapshot mechanics; a future change narrowing snapshot scope (plausible, if someone deliberately preserves the "terminal" marker across rollbacks) would silently reintroduce the quarantine bug. This PR makes the invariant explicit in code, docs, and tests.Tests
superseded_self_removal_clears_removed_marker_and_restores_send— the previously untested scenario end-to-end: Carol applies a peer commit removing her own leaf (marker set, send rejected with theRemovedinvalid-transition, self-removed notification emitted), then the removal loses a same-epoch fork to a rename through stored convergence. Asserts the full resulting view: withdrawal names the superseded removal in the stamped id space,CommitRolledBackfires, noForkRecovered, marker cleared, membership + the winner's rename presented, roster-correctionMemberAddedemitted, and the previously rejected send succeeds. Passes with or without the code change — it pins the behavior against regressions in either mechanism (anchor restore or explicit clear).convergence_apply_clears_removed_marker_without_canonical_evidence— covers the explicit reconciliation branch directly (marker forced on the record, mirroring the existing pathological-copy tests ininvite_leave.rs); fails without the fix.Verified: both new tests, full
cargo test -p cgka-engine,cargo test -p cgka-traits,just fast-ci,cargo fmt --check.Related: the open post-restart own-commit branch-selection investigation is how this scenario becomes reachable in practice.
🤖 Generated with Claude Code
Summary by CodeRabbit