Materialize own confirmed commits as pre-validated convergence candidates (#706)#723
Conversation
…ates (#706) A device's own published-and-confirmed commit cannot be replayed through OpenMLS process_message, so after an engine restart cleared the in-memory committed_from guard, its branch silently dropped out of stored-convergence branch selection: a losing same-epoch sibling was selected even when the own commit's authenticated ordering key should win, and when both concurrent committers restarted each deterministically reorged onto the other's branch, forking the group permanently. Fix, using durable state only: - confirm_published captures an OwnCommitConvergenceStamp (committer, priority, consumed proposal refs) while the staged commit is attached and enriches the stored wire record to the new StoredMessagePayload OwnCommitWire variant in the same write that marks it Processed. - Candidate materialization realizes a stamped own commit on a canonical path prefix by rolling forward to the retained anchor snapshot at its resulting epoch (captured by the confirm path) and synthesizing its CommitStaged observation from the stamp; any failure prunes the branch exactly as before. - The apply path skips the selected branch's already-applied Processed prefix (the live canonical chain), so an own-branch win applies without re-processing own commits and an own-branch loss keeps the existing rollback-and-replay behavior, composing with the #363/#702 withdrawals. Tests: restart-winner and mutual-restart variants in update_group_data.rs (both reproduce the wrong-branch selection / permanent fork without the fix), plus a JSON shape snapshot for the new payload variant. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
|
Bugbot is not enabled for your account, so this pull request was not reviewed. Enable Bugbot in the Cursor dashboard to get automatic reviews on future PRs. |
|
Ready to review this PR? Stage has broken it down into 6 individual chapters for you: Chapters generated by Stage for commit 795dace on Jul 4, 2026 9:55am UTC. |
|
Warning Review limit reachedYou’ve reached a temporary PR review limit under our Fair Usage Limits Policy. Next review available in: 17 minutes Enable usage-based reviews in Billing to review now. Otherwise, wait until the next included review is available. How can I continue?After more reviews become available, a review can be triggered using the To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based reviews. How do review limits work?CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan review availability. For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, additional reviews become available more gradually as earlier reviews age out of the rolling window. Please refer docs for additional details. Review details⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
WalkthroughThis PR adds a durable ChangesOwn commit convergence stamping and replay
Estimated code review effort: 5 (Critical) | ~110 minutes Sequence Diagram(s)sequenceDiagram
participant Confirm as do_confirm_published
participant Storage as StorageProvider
participant Replay as process_openmls_messages_inner
participant Anchor as RetainedAnchorSnapshot
participant Apply as apply_openmls_canonicalization_result
Confirm->>Confirm: own_commit_stamp(staged_commit, committer)
Confirm->>Storage: stamp_processed_own_commit_record(message_id, stamp)
Storage-->>Storage: persist OwnCommitWire{message, stamp}
Replay->>Storage: load stored payload, extract own_commit_stamp
Replay->>Replay: check prefix_canonical for commit digest
alt stamp present and prefix canonical
Replay->>Anchor: snapshot at commit's resulting epoch
Anchor-->>Replay: post-merge group state
Replay->>Replay: synthesize CommitStaged from stamp
else no stamp or non-canonical
Replay->>Replay: normal OpenMLS commit processing
end
Apply->>Apply: already_applied_commit_prefix(accepted_commits)
Apply->>Replay: replay_messages_for_canonicalization_result(applied_prefix)
Apply->>Apply: apply_start_epoch_for_canonicalization_result(applied_prefix)
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
| let mut replay_messages = Vec::new(); | ||
| let mut seen = BTreeSet::new(); | ||
| for hex_message_id in result | ||
| .accepted_proposals |
There was a problem hiding this comment.
This still replays proposals that were consumed by commits in the skipped applied_prefix. In the branch this PR is fixing, if the already-applied own commit consumed a fork-epoch proposal, applied_prefix filters the commit out and apply_start_epoch stays at the live tip, but this loop still puts that old proposal into replay_messages before the unskipped suffix. OpenMLS will then process a proposal whose source epoch is below the live group epoch and reject the apply, rolling back the convergence pass.
I think this needs to filter accepted_proposals to only proposals consumed by the unskipped commit suffix, or else start replay from the earliest accepted proposal/commit source epoch instead of the first unskipped commit. A regression test where the winning restarted own commit consumes a pending proposal would catch it.
There was a problem hiding this comment.
Good catch — confirmed, and it was worse than a one-off rejected pass: the proposal stayed a pending input, so every retry re-selected the own branch and hit the same WrongEpoch apply failure — a permanent livelock for the group.
Fixed in 795dace by filtering the replay set: an accepted proposal whose record epoch sits below the apply-start epoch can only have been consumed by a commit in the skipped prefix (MLS proposals are epoch-scoped, so no unskipped suffix commit at a later epoch could consume it). Its effect is already inside the live state, and its Processed disposition still persists — which is what removes it from the convergence input set and closes the retry loop. When the prefix is empty (normal rollback path) the apply-start epoch is the fork epoch and no accepted proposal sits below it, so that path is unchanged.
Regression test as suggested: rebuilt_winner_applies_own_selfremove_commit_without_replaying_consumed_proposal — both remaining members race the SelfRemove auto-commit (each consuming the leaver's proposal by reference), the ordering-key winner restarts with the consumed proposal re-opened as a pending input, and converges over the losing sibling. Without the filter it fails with exactly the apply failure you described; with it, the pass settles, the own branch survives, and the proposal record returns to Processed.
One note from building the test: a welcome-joined member whose only own commit is the auto-commit holds no retained anchor at the fork epoch, because the auto-commit send path projects the group record to the post-merge epoch before confirm's pre-commit retention reads it — so historical replay reports MissingRetainedAnchor and the group goes Unrecoverable. That's a pre-existing retention quirk orthogonal to this PR (the test sidesteps it with a settled rename round first); happy to file it separately.
Review follow-up on #723: the apply path skipped the already-applied own commit but still replayed the fork-epoch proposal that commit consumed. MLS proposals are epoch-scoped, so OpenMLS rejected the replay against the post-prefix live state (WrongEpoch), failing the whole apply and rolling back the convergence pass on every retry — a permanent livelock whenever the winning restarted own commit had consumed a pending proposal. Accepted proposals whose record epoch sits below the apply-start epoch were consumed by a skipped prefix commit: their effect is already inside the live state, so they are excluded from replay. Their Processed disposition still persists, which removes them from the convergence input set and closes the retry loop. Regression test: SelfRemove auto-commit race (both remaining members auto-commit the leaver's proposal at the same epoch), winner restarts with the consumed proposal re-opened as a pending input, converges over the losing sibling. Fails without the filter with exactly the apply failure described in review. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
|
Bugbot is not enabled for your account, so this pull request was not reviewed. Enable Bugbot in the Cursor dashboard to get automatic reviews on future PRs. |
Fixes #706.
A device's own published-and-confirmed commit cannot be replayed through OpenMLS
process_message, so after an engine restart cleared the in-memorycommitted_fromguard, its branch silently dropped out of stored-convergence branch selection: a losing same-epoch sibling was selected even when the own commit's authenticated ordering key should win, and when both concurrent committers restarted, each deterministically reorged onto the other's branch — a permanent group fork.What this does
crates/traits/src/message.rs,crates/cgka-engine/src/publish.rs): newStoredMessagePayload::OwnCommitWire { message, stamp }variant carryingOwnCommitConvergenceStamp { committer, priority, consumed_proposal_refs }.do_confirm_publishedcaptures it while the staged commit is still attached and enriches the stored wire record in the same durable write that marks itProcessed. Convergence replay still reads the record as a plain wire row; the stamp rides alongside.crates/cgka-engine/src/openmls_projection.rs): when a replayed candidate path reaches a stamped own commit and every commit replayed before it was canonical (Processed), the group is rolled forward to the retained anchor snapshot at the commit's resulting epoch — the exact post-merge state the confirm path retains — and itsCommitStagedobservation is synthesized from the stamp. Any failure (missing anchor, epoch mismatch, non-canonical prefix) prunes the branch exactly as before, so the change is strictly additive.Processedcommits below the live tip is the live canonical chain; it is excluded from the apply replay and the apply-start epoch moves past it. An own-branch win applies without re-processing own commits (no-op on group state); an own-branch loss keeps the existing rollback-to-anchor + replay path.With both branches materialized, selection and withdrawal compose with #702: a losing own branch is dropped
InvalidAgainstCandidateStateand withdrawn viaemit_rolled_back_commitsunder the same confirm-time stamped origin id;emit_superseded_processed_commitsremains the backstop for own commits that never materialized (pre-stamp records).Tests
rebuilt_engine_convergence_keeps_own_confirmed_rename_when_it_wins_selection— restart-WINNER variant: without the fix the losing sibling is wrongly selected and the device reorgs off its own canonical branch.mutually_rebuilt_engines_converge_on_same_branch_after_concurrent_renames— mutual-restart variant: without the fix both devices swap onto each other's branch (verified: disabling the stamp collection reproduces the fork with exactly that failure).publish_lifecyclefault-injection extended to cover the enrichedProcessedwrite.just fast-cigreen;cgka-engine,cgka-traits,storage-sqlite, andcgka-conformance-simulatorsuites pass on this branch rebased onto current master (includes #701/#702/#703/#704).Residual edges (documented in #706)
Sent-but-never-confirmed own commits still prune (no stamp, no post-merge anchor); pre-fix records keep the old behavior with #702's backstop; an accepted fork-epoch proposal below a skipped prefix can fail apply-replay cleanly (snapshot rolls back).
🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Bug Fixes