Follow-up to PR #923 (SQS standard SendMessageBatch double-send fix). #923 eliminated the leader-churn double-send by reusing a stable per-entry identity across the in-process retry, and is claude-approved. Two P2 edge cases raised by codex on #923 are intentionally deferred here because closing them properly requires the full option-2 dedup machinery (the same shape used for DynamoDB in #920), which is a substantially larger, gated change than the double-send fix.
P2-1 — retry can clobber a concurrent consumer mutation
When attempt 1 commits but returns a retryable conflict (leader churn), the retry re-PUTs the same data/vis/by-age keys while the OCC transaction only reads [meta, gen]. If a consumer receives or deletes the message during the retry backoff (their own OCC txn mutating those message keys), the retry's second PUT can overwrite the rotated CurrentReceiptToken/VisibleAtMillis or recreate a deleted record. This is a narrower window than the original double-send (needs a consumer to race the retry on a now-visible message) and affects one message rather than every batch entry, but it is a real interleaving the simple stable-key fix does not fence.
P2-2 — committed entry reported Failed[] after retry-time revalidation
If attempt 1 commits and a concurrent SetQueueAttributes tightens a limit (e.g. lowers MaximumMessageSize) before the retry, validateBatchEntry (which now correctly runs every attempt) rejects the entry into Failed[] even though it is already in the queue — an inconsistent client view (stored, but reported failed). Within the standard-queue at-least-once contract; documented in docs/design/2026_06_03_partial_dynamodb_onephase_dedup.md (S3/SQS section, "Residual edge").
Proper fix (option-2 dedup for SQS batch)
Mirror the DynamoDB approach (#920):
- adapter allocates
commitTS locally (Clock().NextFenced()), gated on coordinator.IsLeader() (leader-issued ts);
- on retry, reuse the write set carrying
PrevCommitTS + fence the message data keys in ReadKeys (stable StartTS);
- FSM exact-ts probe (
dedupProbeOnePhase) no-ops the apply if attempt 1's primary key landed at PrevCommitTS → the retry does NOT re-write, so a consumer mutation is preserved and a committed entry is reported success (cached results), not failed;
- behind a default-off gate (R5 ship-reader-before-writer), like the Redis/DynamoDB dedup.
This is a gated feature, not a hotfix, so it is tracked separately from the double-send correctness fix.
Follow-up to PR #923 (SQS standard
SendMessageBatchdouble-send fix). #923 eliminated the leader-churn double-send by reusing a stable per-entry identity across the in-process retry, and is claude-approved. Two P2 edge cases raised by codex on #923 are intentionally deferred here because closing them properly requires the full option-2 dedup machinery (the same shape used for DynamoDB in #920), which is a substantially larger, gated change than the double-send fix.P2-1 — retry can clobber a concurrent consumer mutation
When attempt 1 commits but returns a retryable conflict (leader churn), the retry re-PUTs the same data/vis/by-age keys while the OCC transaction only reads
[meta, gen]. If a consumer receives or deletes the message during the retry backoff (their own OCC txn mutating those message keys), the retry's second PUT can overwrite the rotatedCurrentReceiptToken/VisibleAtMillisor recreate a deleted record. This is a narrower window than the original double-send (needs a consumer to race the retry on a now-visible message) and affects one message rather than every batch entry, but it is a real interleaving the simple stable-key fix does not fence.P2-2 — committed entry reported
Failed[]after retry-time revalidationIf attempt 1 commits and a concurrent
SetQueueAttributestightens a limit (e.g. lowersMaximumMessageSize) before the retry,validateBatchEntry(which now correctly runs every attempt) rejects the entry intoFailed[]even though it is already in the queue — an inconsistent client view (stored, but reported failed). Within the standard-queue at-least-once contract; documented indocs/design/2026_06_03_partial_dynamodb_onephase_dedup.md(S3/SQS section, "Residual edge").Proper fix (option-2 dedup for SQS batch)
Mirror the DynamoDB approach (#920):
commitTSlocally (Clock().NextFenced()), gated oncoordinator.IsLeader()(leader-issued ts);PrevCommitTS+ fence the message data keys inReadKeys(stableStartTS);dedupProbeOnePhase) no-ops the apply if attempt 1's primary key landed atPrevCommitTS→ the retry does NOT re-write, so a consumer mutation is preserved and a committed entry is reported success (cached results), not failed;This is a gated feature, not a hotfix, so it is tracked separately from the double-send correctness fix.