Skip to content

Materialize own confirmed commits as pre-validated convergence candidates (#706)#723

Merged
erskingardner merged 2 commits into
masterfrom
fix/706-own-commit-convergence
Jul 4, 2026
Merged

Materialize own confirmed commits as pre-validated convergence candidates (#706)#723
erskingardner merged 2 commits into
masterfrom
fix/706-own-commit-convergence

Conversation

@erskingardner

@erskingardner erskingardner commented Jul 4, 2026

Copy link
Copy Markdown
Member

Fixes #706.

A device's own published-and-confirmed commit cannot be replayed through OpenMLS process_message, so after an engine restart cleared the in-memory committed_from guard, its branch silently dropped out of stored-convergence branch selection: a losing same-epoch sibling was selected even when the own commit's authenticated ordering key should win, and when both concurrent committers restarted, each deterministically reorged onto the other's branch — a permanent group fork.

What this does

  1. Confirm-time stamp (crates/traits/src/message.rs, crates/cgka-engine/src/publish.rs): new StoredMessagePayload::OwnCommitWire { message, stamp } variant carrying OwnCommitConvergenceStamp { committer, priority, consumed_proposal_refs }. do_confirm_published captures it while the staged commit is still attached and enriches the stored wire record in the same durable write that marks it Processed. Convergence replay still reads the record as a plain wire row; the stamp rides alongside.
  2. Anchor rollforward during candidate materialization (crates/cgka-engine/src/openmls_projection.rs): when a replayed candidate path reaches a stamped own commit and every commit replayed before it was canonical (Processed), the group is rolled forward to the retained anchor snapshot at the commit's resulting epoch — the exact post-merge state the confirm path retains — and its CommitStaged observation is synthesized from the stamp. Any failure (missing anchor, epoch mismatch, non-canonical prefix) prunes the branch exactly as before, so the change is strictly additive.
  3. Already-applied prefix skip at apply: the selected branch's leading run of Processed commits below the live tip is the live canonical chain; it is excluded from the apply replay and the apply-start epoch moves past it. An own-branch win applies without re-processing own commits (no-op on group state); an own-branch loss keeps the existing rollback-to-anchor + replay path.

With both branches materialized, selection and withdrawal compose with #702: a losing own branch is dropped InvalidAgainstCandidateState and withdrawn via emit_rolled_back_commits under the same confirm-time stamped origin id; emit_superseded_processed_commits remains the backstop for own commits that never materialized (pre-stamp records).

Tests

  • rebuilt_engine_convergence_keeps_own_confirmed_rename_when_it_wins_selection — restart-WINNER variant: without the fix the losing sibling is wrongly selected and the device reorgs off its own canonical branch.
  • mutually_rebuilt_engines_converge_on_same_branch_after_concurrent_renames — mutual-restart variant: without the fix both devices swap onto each other's branch (verified: disabling the stamp collection reproduces the fork with exactly that failure).
  • JSON shape snapshot for the new payload variant; publish_lifecycle fault-injection extended to cover the enriched Processed write.

just fast-ci green; cgka-engine, cgka-traits, storage-sqlite, and cgka-conformance-simulator suites pass on this branch rebased onto current master (includes #701/#702/#703/#704).

Residual edges (documented in #706)

Sent-but-never-confirmed own commits still prune (no stamp, no post-merge anchor); pre-fix records keep the old behavior with #702's backstop; an accepted fork-epoch proposal below a skipped prefix can fail apply-replay cleanly (snapshot rolls back).

🤖 Generated with Claude Code


Open in Stage

Summary by CodeRabbit

  • New Features

    • Stored convergence now preserves metadata for confirmed own commits, improving how restarted devices reconcile their latest changes.
    • Confirmed commits can be replayed more reliably from local storage without being reprocessed incorrectly.
  • Bug Fixes

    • Fixed branch selection issues after restart so a device is less likely to switch away from its own confirmed commit.
    • Improved handling of persisted message processing to avoid duplicate replay of already-applied commits.
    • Strengthened retry behavior for transient write failures during publish confirmation.

…ates (#706)

A device's own published-and-confirmed commit cannot be replayed through
OpenMLS process_message, so after an engine restart cleared the in-memory
committed_from guard, its branch silently dropped out of stored-convergence
branch selection: a losing same-epoch sibling was selected even when the own
commit's authenticated ordering key should win, and when both concurrent
committers restarted each deterministically reorged onto the other's branch,
forking the group permanently.

Fix, using durable state only:
- confirm_published captures an OwnCommitConvergenceStamp (committer,
  priority, consumed proposal refs) while the staged commit is attached and
  enriches the stored wire record to the new StoredMessagePayload
  OwnCommitWire variant in the same write that marks it Processed.
- Candidate materialization realizes a stamped own commit on a canonical
  path prefix by rolling forward to the retained anchor snapshot at its
  resulting epoch (captured by the confirm path) and synthesizing its
  CommitStaged observation from the stamp; any failure prunes the branch
  exactly as before.
- The apply path skips the selected branch's already-applied Processed
  prefix (the live canonical chain), so an own-branch win applies without
  re-processing own commits and an own-branch loss keeps the existing
  rollback-and-replay behavior, composing with the #363/#702 withdrawals.

Tests: restart-winner and mutual-restart variants in update_group_data.rs
(both reproduce the wrong-branch selection / permanent fork without the
fix), plus a JSON shape snapshot for the new payload variant.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@cursor

cursor Bot commented Jul 4, 2026

Copy link
Copy Markdown

Bugbot is not enabled for your account, so this pull request was not reviewed.

Enable Bugbot in the Cursor dashboard to get automatic reviews on future PRs.

@stage-review

stage-review Bot commented Jul 4, 2026

Copy link
Copy Markdown

Ready to review this PR? Stage has broken it down into 6 individual chapters for you:

Title
1 Define OwnCommitConvergenceStamp and enrich StoredMessagePayload
2 Capture and persist convergence stamps during publish
3 Materialize own commits via anchor rollforward
4 Skip already-applied prefixes during apply
5 Verify convergence with integration tests
6 Other changes
Open in Stage

Chapters generated by Stage for commit 795dace on Jul 4, 2026 9:55am UTC.

@coderabbitai

coderabbitai Bot commented Jul 4, 2026

Copy link
Copy Markdown

Review Change Stack

Warning

Review limit reached

You’ve reached a temporary PR review limit under our Fair Usage Limits Policy.

Your recent review volume is higher than typical usage, so adaptive limits are currently applied.

Next review available in: 17 minutes

Enable usage-based reviews in Billing to review now. Otherwise, wait until the next included review is available.
You're only billed for reviews past your plan's rate limits ($0.25/file).

How can I continue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based reviews.

How do review limits work?

CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan review availability.

For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, additional reviews become available more gradually as earlier reviews age out of the rolling window.

Please refer docs for additional details.

Review details
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: de09286b-e8c4-4b87-9be2-8199b0c5d99b

📥 Commits

Reviewing files that changed from the base of the PR and between 1382720 and 795dace.

📒 Files selected for processing (2)
  • crates/cgka-engine/src/openmls_projection.rs
  • crates/cgka-engine/tests/update_group_data.rs

Walkthrough

This PR adds a durable OwnCommitConvergenceStamp recorded on own commits at confirm time, stored via a new OwnCommitWire payload variant. Stored-convergence replay and materialization now use a PrevalidatedOwnCommits tracker to synthesize own-commit observations without re-processing them through OpenMLS, and the apply path skips already-applied own commits via a computed applied-prefix. Tests cover restart/branch-selection convergence scenarios.

Changes

Own commit convergence stamping and replay

Layer / File(s) Summary
OwnCommitWire payload and convergence stamp contract
crates/traits/src/message.rs, crates/traits/src/lib.rs, crates/traits/tests/snapshots.rs
Adds OwnCommitConvergenceStamp struct and StoredMessagePayload::OwnCommitWire variant with constructor/accessors, re-exports the stamp type, and adds a snapshot round-trip test.
PrevalidatedOwnCommits tracking and stamp derivation
crates/cgka-engine/src/openmls_projection.rs
Introduces PrevalidatedOwnCommits, own_commit_stamp(), and stamp_processed_own_commit_record(), and adds the tracker field to StoredOpenMlsCanonicalizationWork.
Capturing and persisting stamp at confirm time
crates/cgka-engine/src/publish.rs
do_confirm_published captures the own-commit stamp from the staged commit and persists it alongside marking the message Processed.
Threading own_commits through candidate materialization and probing
crates/cgka-engine/src/openmls_projection.rs
Propagates the prevalidated-own-commits context through replay, materialization, canonicalization batch, and BFS probing functions, and records canonical/stamped digests from stored commit records.
Prevalidated own-commit replay synthesis
crates/cgka-engine/src/openmls_projection.rs
Rolls the group forward to the retained-anchor snapshot and synthesizes a CommitStaged observation from the stamp when replaying a canonical-prefix own commit, tracking canonicality via prefix_canonical.
Applied-prefix skip during canonicalization apply
crates/cgka-engine/src/openmls_projection.rs
Computes an already-applied commit prefix and uses it to avoid redundant replay of own commits already reflected in live state.
Fault injection and restart-convergence tests
crates/cgka-engine/tests/publish_lifecycle.rs, crates/cgka-engine/tests/update_group_data.rs
Extends fault injection to put_message writes and adds tests verifying correct branch selection and mutual-restart convergence with own confirmed renames.

Estimated code review effort: 5 (Critical) | ~110 minutes

Sequence Diagram(s)

sequenceDiagram
  participant Confirm as do_confirm_published
  participant Storage as StorageProvider
  participant Replay as process_openmls_messages_inner
  participant Anchor as RetainedAnchorSnapshot
  participant Apply as apply_openmls_canonicalization_result

  Confirm->>Confirm: own_commit_stamp(staged_commit, committer)
  Confirm->>Storage: stamp_processed_own_commit_record(message_id, stamp)
  Storage-->>Storage: persist OwnCommitWire{message, stamp}

  Replay->>Storage: load stored payload, extract own_commit_stamp
  Replay->>Replay: check prefix_canonical for commit digest
  alt stamp present and prefix canonical
    Replay->>Anchor: snapshot at commit's resulting epoch
    Anchor-->>Replay: post-merge group state
    Replay->>Replay: synthesize CommitStaged from stamp
  else no stamp or non-canonical
    Replay->>Replay: normal OpenMLS commit processing
  end

  Apply->>Apply: already_applied_commit_prefix(accepted_commits)
  Apply->>Replay: replay_messages_for_canonicalization_result(applied_prefix)
  Apply->>Apply: apply_start_epoch_for_canonicalization_result(applied_prefix)
Loading
🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly summarizes the main change: making own confirmed commits materialize as pre-validated convergence candidates.
Linked Issues check ✅ Passed The changes implement the durable own-commit stamp, candidate materialization, and apply-prefix skipping required by #706, with tests covering the restart cases.
Out of Scope Changes check ✅ Passed The modified files are directly tied to #706, and the added tests and API updates support the same convergence fix rather than unrelated functionality.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/706-own-commit-convergence

Comment @coderabbitai help to get the list of available commands.

let mut replay_messages = Vec::new();
let mut seen = BTreeSet::new();
for hex_message_id in result
.accepted_proposals

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still replays proposals that were consumed by commits in the skipped applied_prefix. In the branch this PR is fixing, if the already-applied own commit consumed a fork-epoch proposal, applied_prefix filters the commit out and apply_start_epoch stays at the live tip, but this loop still puts that old proposal into replay_messages before the unskipped suffix. OpenMLS will then process a proposal whose source epoch is below the live group epoch and reject the apply, rolling back the convergence pass.

I think this needs to filter accepted_proposals to only proposals consumed by the unskipped commit suffix, or else start replay from the earliest accepted proposal/commit source epoch instead of the first unskipped commit. A regression test where the winning restarted own commit consumes a pending proposal would catch it.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — confirmed, and it was worse than a one-off rejected pass: the proposal stayed a pending input, so every retry re-selected the own branch and hit the same WrongEpoch apply failure — a permanent livelock for the group.

Fixed in 795dace by filtering the replay set: an accepted proposal whose record epoch sits below the apply-start epoch can only have been consumed by a commit in the skipped prefix (MLS proposals are epoch-scoped, so no unskipped suffix commit at a later epoch could consume it). Its effect is already inside the live state, and its Processed disposition still persists — which is what removes it from the convergence input set and closes the retry loop. When the prefix is empty (normal rollback path) the apply-start epoch is the fork epoch and no accepted proposal sits below it, so that path is unchanged.

Regression test as suggested: rebuilt_winner_applies_own_selfremove_commit_without_replaying_consumed_proposal — both remaining members race the SelfRemove auto-commit (each consuming the leaver's proposal by reference), the ordering-key winner restarts with the consumed proposal re-opened as a pending input, and converges over the losing sibling. Without the filter it fails with exactly the apply failure you described; with it, the pass settles, the own branch survives, and the proposal record returns to Processed.

One note from building the test: a welcome-joined member whose only own commit is the auto-commit holds no retained anchor at the fork epoch, because the auto-commit send path projects the group record to the post-merge epoch before confirm's pre-commit retention reads it — so historical replay reports MissingRetainedAnchor and the group goes Unrecoverable. That's a pre-existing retention quirk orthogonal to this PR (the test sidesteps it with a settled rename round first); happy to file it separately.

Review follow-up on #723: the apply path skipped the already-applied own
commit but still replayed the fork-epoch proposal that commit consumed.
MLS proposals are epoch-scoped, so OpenMLS rejected the replay against
the post-prefix live state (WrongEpoch), failing the whole apply and
rolling back the convergence pass on every retry — a permanent livelock
whenever the winning restarted own commit had consumed a pending
proposal.

Accepted proposals whose record epoch sits below the apply-start epoch
were consumed by a skipped prefix commit: their effect is already inside
the live state, so they are excluded from replay. Their Processed
disposition still persists, which removes them from the convergence
input set and closes the retry loop.

Regression test: SelfRemove auto-commit race (both remaining members
auto-commit the leaver's proposal at the same epoch), winner restarts
with the consumed proposal re-opened as a pending input, converges over
the losing sibling. Fails without the filter with exactly the apply
failure described in review.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@cursor

cursor Bot commented Jul 4, 2026

Copy link
Copy Markdown

Bugbot is not enabled for your account, so this pull request was not reviewed.

Enable Bugbot in the Cursor dashboard to get automatic reviews on future PRs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

1 participant