Skip to content

xmtp_mls: route Sender::NewMemberCommit through from_external_commit + envelope-timestamp plumbing#3668

Open
tylerhawkes wants to merge 1 commit into
tyler/qr-invite-l7-validatorfrom
tyler/qr-invite-l8-ingestion
Open

xmtp_mls: route Sender::NewMemberCommit through from_external_commit + envelope-timestamp plumbing#3668
tylerhawkes wants to merge 1 commit into
tyler/qr-invite-l7-validatorfrom
tyler/qr-invite-l8-ingestion

Conversation

@tylerhawkes

@tylerhawkes tylerhawkes commented May 21, 2026

Copy link
Copy Markdown
Contributor

Summary

The ingestion layer for XIP-82 external commits — routing plus the envelope-timestamp plumbing the validator's time bounds require:

  • Sender dispatch in validate_and_process_external_message: Sender::Member continues through from_staged_commit; Sender::NewMemberCommit (the atomic external-commit join) routes to ValidatedCommit::from_external_commit (xmtp_mls: ValidatedCommit::from_external_commit — XIP-82 receive-side check set #3667), sourcing metadata capability-aware (migrated groups read from the AppData dict, unmigrated from legacy extensions). Both arms converge at merge_staged_commit_logged. Any other sender is rejected.
  • PublicMessage framing accepted at the protocol-message gate (external commits are public messages per RFC 9420 §12.4.3.2); the per-sender policy is enforced by the dispatch above.
  • Joiner identity from the update-path leaf in extract_message_sender: the joiner isn't in the pre-commit tree; OpenMLS has already verified the framing signature against the path leaf, so it is the source of truth for inbox id + installation id.
  • epoch_entered_at_ns column on groups (nullable BIGINT; migration included): the delivery-service envelope timestamp of the message by which this client entered the current epoch. Written on every successful commit merge — own commits echoed back (process_own_message) and others' commits (process_external_message) — and at welcome-join (covering re-welcomes too, which insert_or_replace_group would otherwise skip). NULL falls back to created_at_ns, the initial epoch.
  • Hardening from review: the column setter is monotonic (envelope time only moves forward over a group's life — a backdated write from pathological reordering or corruption would widen the staleness window), and both i64→u64 casts clamp at zero (a negative epoch-start would otherwise wrap and saturate the staleness age to zero, silently disabling check 9).
  • ExternalCommitTimestamps sourcing at the dispatch: commit_envelope_ns from the commit's own envelope, epoch_started_at_ns from the column. Checks 8 + 9 thus evaluate against envelope timestamps only — a member that syncs the commit a week later reaches the same verdict as one that processed it immediately. A missing group row degrades toward rejection (fail-closed), never toward acceptance.

Stack

Base: tyler/qr-invite-l7-validator (#3667) ← #3666#3671#3751 ← main.

Test plan

  • New xmtp_db round-trip test for the column (NULL fresh → set → overwrite); full xmtp_db suite (209) passes the migration.
  • Full xmtp_mls suite: 688 passed (welcome-path mocks extended for the new write).
  • cargo check --locked, just lint-rust clean.
  • End-to-end (a real QR-invite join driven through ingestion) rides with the integration-test PR (xmtp_mls: integration test for QR-invite end-to-end flow #3677).

🤖 Generated with Claude Code

Note

Route Sender::NewMemberCommit through from_external_commit and persist epoch timestamps

  • Adds epoch_entered_at_ns (nullable BIGINT) to the groups table via a new migration, with monotonic-write semantics: updates are only persisted if the new value is greater than the stored value.
  • Extends the commit message handler in mls_sync.rs to accept Sender::NewMemberCommit (external commits), routing them through ValidatedCommit::from_external_commit with delivery-service envelope timestamps; unsupported sender types are rejected.
  • Records epoch_entered_at_ns after merging any commit (own, external, or welcome), using the envelope timestamp as the value.
  • Extends the message dispatcher to accept ProtocolMessage::PublicMessage in addition to PrivateMessage, enabling external commit processing.
  • Updates extract_message_sender to extract identity and key from the update-path leaf node for external commits.
  • Behavioral Change: the dispatcher now processes PublicMessages; previously these were rejected as unsupported.

Macroscope summarized 04fbcca.

@tylerhawkes tylerhawkes requested a review from a team as a code owner May 21, 2026 02:17
@claude

claude Bot commented May 21, 2026

Copy link
Copy Markdown

Claude finished @tylerhawkes's task —— View job


Code Review — June 10, 2026

Macroscope has already identified 4 important issues that need addressing. Beyond those, here are additional findings:

🔴 Critical: Unsupported sender rejection doesn't advance cursor

Location: crates/xmtp_mls/src/groups/mls_sync.rs:1285-1295

The other arm in the sender-dispatch match rejects unsupported sender types with ActorNotMember, but does not call maybe_update_cursor before returning the error. This means:

  1. The rejection is correct (good)
  2. The cursor is not advanced (bad)
  3. The message will be retried indefinitely on subsequent syncs (very bad)

Every other early-return error path in validate_and_process_external_message (lines 1203, 1209, 1220, 1230, 1303, 1329, 1352, 1360, 1381, 1398, 1435) calls maybe_update_cursor first. This path should follow the same pattern.

Impact: A malicious or misconfigured node sending commits with unexpected sender types will cause the group to become stuck, unable to sync past that message.

Suggested fix:

other => {
    tracing::warn!(
        inbox_id = self.context.inbox_id(),
        group_id = %self.group_id,
        ?other,
        "rejecting commit from unsupported sender type"
    );
    self.maybe_update_cursor(&self.context.db(), envelope)?;
    return Err(GroupMessageProcessingError::CommitValidation(
        CommitValidationError::ActorNotMember,
    ));
}

🟡 Code Quality: Database write uses monotonic guard but comment suggests unconditional behavior

Location: crates/xmtp_db/src/encrypted_store/group.rs:1231-1251

The implementation of set_group_epoch_entered_at_ns includes a monotonic guard (lines 1245-1248):

.filter(
    dsl::epoch_entered_at_ns
        .is_null()
        .or(dsl::epoch_entered_at_ns.lt(epoch_entered_at_ns)),
)

This is excellent defensive design! However, the call sites' comments don't reflect this:

  • mls_sync.rs:979-980: "record when (by envelope time) we entered the new epoch"
  • mls_sync.rs:1674-1676: "record when (by envelope time) we entered the new epoch"

These comments suggest unconditional writes, which could mislead future maintainers into expecting different behavior. The Macroscope medium-severity finding about line 1672 is actually already mitigated by the monotonic guard in the database layer, but the comment doesn't make that clear.

Suggested improvement: Update call-site comments to mention the monotonic property:

// XIP-82: record when (by envelope time) we entered the new
// epoch (monotonic: never moves backward). The external-commit
// validator measures the `expire_in_ns` staleness bound from here.
storage
    .db()
    .set_group_epoch_entered_at_ns(&self.group_id, envelope_timestamp_ns)?;

✅ Good: The actual timestamp handling is correct

Despite Macroscope's medium-severity concern about line 1672, the implementation correctly handles the welcome-vs-commit timestamp ordering issue:

  1. Database layer enforces monotonicity (group.rs:1245-1248)
  2. Welcome sets initial value (xmtp_welcome.rs:522)
  3. Commits only advance if later (enforced by the filter)

The max(welcome_time, commit_time) behavior Macroscope suggested is effectively what happens, just implemented at the database layer rather than the call site. This is good design.


📊 Test Coverage Observation

The 82.35% patch coverage with 12 missing lines is concerning for security-critical path changes. The uncovered lines appear to be in error-handling branches (based on Codecov's report focusing on mls_sync.rs). Consider:

  1. Adding negative test cases for the new sender-type dispatch paths
  2. Testing the cursor-advancement behavior for rejected external commits
  3. Testing timestamp monotonicity edge cases (out-of-order delivery)

Summary

Action Items:

  1. 🔴 Must fix: Add cursor advancement before rejecting unsupported sender types (mls_sync.rs:1285)
  2. 🟠 Address Macroscope findings: 3 high-severity issues require human review (Delete mutation, DM check, cursor handling)
  3. 🟡 Nice to have: Update comments to clarify monotonic epoch-time behavior
  4. 📝 Consider: Expand test coverage for error paths

The core timestamp-tracking design is sound, but the cursor-handling gap could cause operational issues in production.


return Err(CommitValidationError::ExternalCommitAppDataUpdateOutOfScope);
}

for mutation in &delta.mutations {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟠 High groups/validated_commit.rs:2692

enforce_app_data_update_scope permits TlsMapMutation::Delete mutations when they target the joiner's own inbox id, but a delete-only delta leaves the joiner in the MLS group with no GROUP_MEMBERSHIP entry — violating the invariant that external joins must register the joiner. Since resync external commits are unsupported (ResyncExternalCommitNotSupported), the joiner has no pre-existing entry to delete or update; only Insert should be allowed.

🚀 Reply "fix it for me" or copy this AI Prompt for your agent:
In file crates/xmtp_mls/src/groups/validated_commit.rs around line 2692:

`enforce_app_data_update_scope` permits `TlsMapMutation::Delete` mutations when they target the joiner's own inbox id, but a delete-only delta leaves the joiner in the MLS group with no `GROUP_MEMBERSHIP` entry — violating the invariant that external joins must register the joiner. Since resync external commits are unsupported (`ResyncExternalCommitNotSupported`), the joiner has no pre-existing entry to delete or update; only `Insert` should be allowed.

Evidence trail:
crates/xmtp_mls/src/groups/validated_commit.rs lines 2692-2701 (REVIEWED_COMMIT): Loop permits Delete/Update mutations as long as key matches joiner_id.
crates/xmtp_mls/src/groups/validated_commit.rs lines 948-1032 (REVIEWED_COMMIT): `from_external_commit` has no downstream check that delta contains an Insert.
crates/xmtp_mls/src/groups/validated_commit.rs line 2582 (REVIEWED_COMMIT): ResyncExternalCommitNotSupported confirms joiner has no pre-existing entry.
crates/xmtp_mls/src/groups/validated_commit.rs lines 2641-2642 (REVIEWED_COMMIT): Docstring states intent is only Insert.
crates/xmtp_mls_common/src/tls_map.rs line 386 (REVIEWED_COMMIT): TlsMapMutation enum with Insert/Update/Delete variants.

/// group to be migrated to AppData. The component's
/// `permissions.update_policy` (super-admin-only by default) gates
/// who can flip the bits.
pub async fn set_external_commit_policy(

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟠 High groups/mod.rs:2015

set_external_commit_policy allows external commit policies to be set on DM conversations, bypassing the restriction that other metadata update methods enforce. This could enable third parties to join a two-person DM via external commits. Consider adding the DM check that other methods use: if self.metadata().await?.conversation_type == ConversationType::Dm { return Err(MetadataPermissionsError::DmGroupMetadataForbidden.into()); }

🚀 Reply "fix it for me" or copy this AI Prompt for your agent:
In file crates/xmtp_mls/src/groups/mod.rs around line 2015:

`set_external_commit_policy` allows external commit policies to be set on DM conversations, bypassing the restriction that other metadata update methods enforce. This could enable third parties to join a two-person DM via external commits. Consider adding the DM check that other methods use: `if self.metadata().await?.conversation_type == ConversationType::Dm { return Err(MetadataPermissionsError::DmGroupMetadataForbidden.into()); }`

Evidence trail:
crates/xmtp_mls/src/groups/mod.rs lines 2015-2046 (set_external_commit_policy missing DM check), lines 1987-2004 (update_commit_log_signer with DM check at 1993-1995), lines 2051-2060 (set_allow_external_commit wrapper). git_grep for DmGroupMetadataForbidden shows the check at lines 1901, 1927, 1994, 2086, 2144, 2184, 2482 — all other metadata update paths.

@macroscopeapp

macroscopeapp Bot commented May 21, 2026

Copy link
Copy Markdown
Contributor

Approvability

Verdict: Needs human review

5 blocking correctness issues found. This PR introduces significant new MLS external commit validation logic with security implications for group membership. Multiple high-severity review comments remain unresolved, including concerns about DM policy bypass and cursor advancement on error paths.

You can customize Macroscope's approvability policy. Learn more.

@codecov

codecov Bot commented May 21, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 84.81013% with 12 lines in your changes missing coverage. Please review.
✅ Project coverage is 84.22%. Comparing base (9bfd0b7) to head (04fbcca).

Files with missing lines Patch % Lines
crates/xmtp_mls/src/groups/mls_sync.rs 40.00% 12 Missing ⚠️
Additional details and impacted files
@@                       Coverage Diff                        @@
##           tyler/qr-invite-l7-validator    #3668      +/-   ##
================================================================
- Coverage                         84.29%   84.22%   -0.07%     
================================================================
  Files                               410      410              
  Lines                             61577    61655      +78     
================================================================
+ Hits                              51906    51930      +24     
- Misses                             9671     9725      +54     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@tylerhawkes tylerhawkes force-pushed the tyler/qr-invite-l8-ingestion branch 3 times, most recently from e6b79bc to 684fa1b Compare May 27, 2026 18:17
@tylerhawkes tylerhawkes changed the base branch from main to tyler/qr-invite-l7-validator May 27, 2026 18:18
@tylerhawkes tylerhawkes force-pushed the tyler/qr-invite-l8-ingestion branch from 684fa1b to be09e92 Compare May 27, 2026 18:33
@tylerhawkes tylerhawkes force-pushed the tyler/qr-invite-l7-validator branch from f29a7c1 to 6d0ec42 Compare May 27, 2026 18:33
@tylerhawkes tylerhawkes force-pushed the tyler/qr-invite-l7-validator branch from 6d0ec42 to 9bfd0b7 Compare June 10, 2026 05:43
@tylerhawkes tylerhawkes force-pushed the tyler/qr-invite-l8-ingestion branch from be09e92 to 65448fc Compare June 10, 2026 06:17
@tylerhawkes tylerhawkes changed the title xmtp_mls: route Sender::NewMemberCommit through from_external_commit validator xmtp_mls: route Sender::NewMemberCommit through from_external_commit + envelope-timestamp plumbing Jun 10, 2026
) {
Ok(m) => m,
Err(e) => {
self.maybe_update_cursor(&self.context.db(), envelope)?;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟠 High groups/mls_sync.rs:1203

In the Sender::NewMemberCommit validation arm, maybe_update_cursor is called to advance the stored cursor before the later allow_cursor_increment gate is checked. When process_message is called with allow_cursor_increment=false (streaming path), callers rely on the cursor not moving so the message can be retried during the next sync. Any external commit that hits metadata/permission decode failures (lines 1203, 1209, 1230) will have its cursor advanced anyway, causing the subsequent sync to permanently skip that commit and potentially leave the group stuck behind.

🚀 Reply "fix it for me" or copy this AI Prompt for your agent:
In file @crates/xmtp_mls/src/groups/mls_sync.rs around line 1203:

In the `Sender::NewMemberCommit` validation arm, `maybe_update_cursor` is called to advance the stored cursor before the later `allow_cursor_increment` gate is checked. When `process_message` is called with `allow_cursor_increment=false` (streaming path), callers rely on the cursor not moving so the message can be retried during the next sync. Any external commit that hits metadata/permission decode failures (lines 1203, 1209, 1230) will have its cursor advanced anyway, causing the subsequent sync to permanently skip that commit and potentially leave the group stuck behind.

Comment on lines +1672 to +1674
storage
.db()
.set_group_epoch_entered_at_ns(&self.group_id, envelope_timestamp_ns)?;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Medium groups/mls_sync.rs:1672

set_group_epoch_entered_at_ns at line 1674 unconditionally overwrites the welcome-based epoch start with the commit's envelope timestamp. For newly added members, the welcome timestamp is the true epoch start, but backfilled commits have earlier timestamps. When external-commit validation later computes age_ns = commit_envelope_ns - epoch_started_at_ns, this overwritten value makes valid commits appear older than they are, potentially rejecting them as stale when expire_in_ns is tight. Consider only setting the epoch entered time when the field is unset, or using max(welcome_time, commit_time) to preserve the earliest timestamp.

-                // XIP-82: record when (by envelope time) we entered the
-                // new epoch — the external-commit validator measures the
-                // `expire_in_ns` staleness bound from here.
-                storage
-                    .db()
-                    .set_group_epoch_entered_at_ns(&self.group_id, envelope_timestamp_ns)?;
🚀 Reply "fix it for me" or copy this AI Prompt for your agent:
In file @crates/xmtp_mls/src/groups/mls_sync.rs around lines 1672-1674:

`set_group_epoch_entered_at_ns` at line 1674 unconditionally overwrites the welcome-based epoch start with the commit's envelope timestamp. For newly added members, the welcome timestamp is the true epoch start, but backfilled commits have earlier timestamps. When external-commit validation later computes `age_ns = commit_envelope_ns - epoch_started_at_ns`, this overwritten value makes valid commits appear older than they are, potentially rejecting them as stale when `expire_in_ns` is tight. Consider only setting the epoch entered time when the field is unset, or using `max(welcome_time, commit_time)` to preserve the earliest timestamp.

…+ envelope-timestamp plumbing

Dispatch in validate_and_process_external_message: Sender::Member goes
through from_staged_commit as before; Sender::NewMemberCommit (atomic
external-commit join) routes to the XIP-82 validator, with PublicMessage
framing accepted at ingestion and the joiner identified from the staged
commit's update-path leaf.

Adds the epoch_entered_at_ns column on groups (nullable; falls back to
created_at_ns at the initial epoch) recording the delivery-service
envelope timestamp of the message by which this client entered the
current epoch — written on every successful commit merge (own + external
paths) and at welcome-join. The validator's expire_in_ns staleness bound
(check 9) measures from it, and expires_at_ns (check 8) uses the commit
envelope timestamp, so the accept/reject verdict never depends on a
validator's wall clock.
/// NULL if the pending-remove didn't receive an update yet
#[builder(default = None)]
pub has_pending_leave_request: Option<bool>,
/// XIP-82: envelope timestamp (ns) of the message by which this

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Medium encrypted_store/group.rs:109

StoredGroup::epoch_entered_at_ns defaults to NULL for existing rows, but xmtp_welcome.rs creates groups with created_at_ns set to local now_ns() rather than the welcome message's envelope timestamp. This causes pre-migration groups to measure XIP-82 staleness from local insertion time instead of the actual epoch-entry message time, so external commits may be incorrectly accepted or rejected until the group advances epoch.

🚀 Reply "fix it for me" or copy this AI Prompt for your agent:
In file @crates/xmtp_db/src/encrypted_store/group.rs around line 109:

`StoredGroup::epoch_entered_at_ns` defaults to `NULL` for existing rows, but `xmtp_welcome.rs` creates groups with `created_at_ns` set to local `now_ns()` rather than the welcome message's envelope timestamp. This causes pre-migration groups to measure XIP-82 staleness from local insertion time instead of the actual epoch-entry message time, so external commits may be incorrectly accepted or rejected until the group advances epoch.

tylerhawkes added a commit that referenced this pull request Jun 11, 2026
…ves) + adapt invite::payload

Regenerates xmtp_proto from xmtp/proto#336 (typed SymmetricKey /
GroupStateHash / ServicePointer newtypes, ExternalCommitPolicyV1
max_uses + refresh_pointers, EncryptedGroupInfoBlobV1 AAD +
effective-expiry semantics, GroupMembershipEntry.V1
admitted_via_external_group_id).

Consuming changes kept to what main already contains:

- invite::payload rewritten for the typed shape: symmetric_key is a
  SymmetricKey submessage (MissingSymmetricKey + material-length
  checks), service_pointer is an optional ServicePointer (absent =
  application-resolved; present-but-empty oneof fails closed;
  https_url parsed and scheme-checked via url). New
  https_service_pointer / opaque_service_pointer /
  validate_service_pointer helpers.

- GroupMembershipEntry.V1 construction sites gain the new
  admitted_via_external_group_id field, empty everywhere: the
  migrator's synthesized entries are all Welcome/legacy members
  (absent is permanently correct), and the membership-update rewrite
  path documents that tag preservation MUST land together with the
  validator's write-once enforcement in the external-commit stack —
  nothing on main can set the tag yet, so empty is correct today.

The rest of the QR-invite stack (#3671 -> #3666 -> #3667 -> #3668 ->
#3673 -> #3674) rebases onto this and picks up the new fields layer
by layer.
tylerhawkes added a commit that referenced this pull request Jun 11, 2026
…ves) + adapt invite::payload

Regenerates xmtp_proto from xmtp/proto#336 (typed SymmetricKey /
GroupStateHash / ServicePointer newtypes, ExternalCommitPolicyV1
max_uses + refresh_pointers, EncryptedGroupInfoBlobV1 AAD +
effective-expiry semantics, GroupMembershipEntry.V1
admitted_via_external_group_id).

Consuming changes kept to what main already contains:

- invite::payload rewritten for the typed shape: symmetric_key is a
  SymmetricKey submessage (MissingSymmetricKey + material-length
  checks), service_pointer is an optional ServicePointer (absent =
  application-resolved; present-but-empty oneof fails closed;
  https_url parsed and scheme-checked via url). New
  https_service_pointer / opaque_service_pointer /
  validate_service_pointer helpers.

- GroupMembershipEntry.V1 construction sites gain the new
  admitted_via_external_group_id field, empty everywhere: the
  migrator's synthesized entries are all Welcome/legacy members
  (absent is permanently correct), and the membership-update rewrite
  path documents that tag preservation MUST land together with the
  validator's write-once enforcement in the external-commit stack —
  nothing on main can set the tag yet, so empty is correct today.

The rest of the QR-invite stack (#3671 -> #3666 -> #3667 -> #3668 ->
#3673 -> #3674) rebases onto this and picks up the new fields layer
by layer.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant