backup: Phase 0b M6 implementation - cmd/elastickv-snapshot-encode + library by bootjp · Pull Request #904 · bootjp/elastickv

bootjp · 2026-06-01T15:56:53Z

Summary

Phase 0b M6 implementation per the merged design doc docs/design/2026_06_01_proposed_snapshot_encode_cli.md (#896, merged at fe9e941).

Wires the merged M1–M5 encoder slices into a user-facing CLI plus a library entrypoint mirroring the decoder's DecodeSnapshot. Implements the round-trip self-test the parent doc mandates, with write-then-rename atomic publish so a self-test failure never reaches the restore path.

What lands

Library (internal/backup/):

encode_snapshot.go: EncodeSnapshot(opts, out io.Writer) (EncodeResult, error) — high-level wrapper that dispatches per-adapter encoders in canonical fan-out order (redis → dynamodb → s3 → sqs), implements two-mode buffering (stream when SelfTest=false, buffer when SelfTest=true), runs the structural self-test against the in-memory buffer, and copies to out only on match. Unexported corruptBufferForTest hook lets same-package tests inject buffer corruption that reaches the self-test decode but never out (codex P2 v6 docs(backup): propose Phase 0b M6 - cmd/elastickv-snapshot-encode CLI #896 — write-then-rename atomicity).
encode_info.go: EncodeInfo schema + WriteEncodeInfo / ReadEncodeInfo helpers + EncodeInfoSidecarPath (path-derived <output>.encode_info.json, no static-name collisions per gemini medium v2 docs(backup): propose Phase 0b M6 - cmd/elastickv-snapshot-encode CLI #896). Format-version gate so a future bump surfaces as ErrUnsupportedEncodeInfoFormatVersion.
manifest.go: Exclusions.RenameS3Collisions bool added with JSON tag rename_s3_collisions. Intentionally NOT in exclusionsRequiredFields so legacy manifests decode safely with the zero value false (claude v5 medium docs(backup): propose Phase 0b M6 - cmd/elastickv-snapshot-encode CLI #896).

Decoder CLI update (cmd/elastickv-snapshot-decode/main.go):

emitManifest now populates the new RenameS3Collisions field from cfg.renameCollisions, completing the decoder→encoder round-trip for --rename-collisions dumps.

Encoder CLI (cmd/elastickv-snapshot-encode/main.go):

Flags: --input, --output, --adapter (decoder-parity CSV parser), --last-commit-ts, --self-test, --scratch-root.
Atomic publish: write to <output>.tmp-<random>, fsync+close, then rename. Self-test failure → mismatch.txt next to where .fsm would have been, temp file removed, exit 2.
Fail-closed HLC ceiling: --last-commit-ts T < manifest → exit 2 with typed ErrSelfTestLowerLastCommitTS.
Exit codes: 0 success / 1 user-input error / 2 data-correctness failure (decoder-parity).
encoder_version stamped at build time via -ldflags "-X main.version=..." (mirrors decoder pattern at main.go:45).

Test plan

go test -race -count=1 — all green:

internal/backup/: 3 new tests (encode_info schema, format gate, legacy manifest forward-compat)
internal/backup/encode_snapshot_test.go: 5 tests (library round-trip, self-test match against canonicalized input, corruption never reaches out, missing-input guard, sidecar path derivation)
cmd/elastickv-snapshot-encode/main_test.go: 9 tests (missing manifest, unknown adapter, lower-TS fail-closed, equal+higher TS accept, path-derived sidecar, two-files-no-collision, full round-trip with --self-test, atomic-publish never leaves bad .fsm, --last-commit-ts parser)

make lint: clean.

Caller audit per CLAUDE.md semantic-change rule

Exclusions struct gained a field. Existing callers either build via field-tagged literals (decoder CLI's emitManifest — updated to populate the new field) or read it (encoder's buildSelfTestDecodeOptions — new code). No silent semantic change.
DecodeOptions.RenameS3Collisions was already a public field used by the decoder; the encoder now also reads it via the manifest. No caller-side change needed.

Self-review (5 passes)

Data loss: write-then-rename atomic publish — self-test failure never publishes the .fsm. Corrupted buffer never reaches the io.Writer (pinned by TestEncodeSnapshotSelfTestDetectsCorruption asserting out.Len()==0). All b.Add errors propagate through EncodeSnapshot.
Concurrency: pure offline. CLI is single-shot; library takes a caller-owned io.Writer. Temp-file suffix uses crypto/rand so concurrent encodes against the same --output cannot collide.
Performance: SelfTest=false streams with one sha256.Writer tee, no extra allocations. SelfTest=true allocates one FSM-sized *bytes.Buffer plus the scratch decode tree (documented memory cost).
Data consistency: --last-commit-ts T < manifest fail-closed with typed error; self-test threads MANIFEST DecodeOptions (Exclusions.* + DynamoDBLayout → DynamoDBBundleJSONL) so trees produced with non-default decoder flags round-trip cleanly.
Test coverage: 17 new tests cover library entrypoint, CLI flag parsing, atomic publish discipline, sidecar path-derivation, corruption detection, forward-compat for legacy manifests, and the four user-visible behaviors (round-trip / override / scope / missing-manifest).

Risk

Low. The encoder is offline; restoration is non-destructive (new keyspace on a fresh cluster via stop-replace-restart). The only public-API change is the Exclusions.RenameS3Collisions field, which is forward-compat for older manifests. All existing M1–M5 tests continue to pass.

Summary by CodeRabbit

New Features
- New snapshot encode CLI with atomic publish, multi-adapter encoding, optional self-test verification, and per-output encode sidecar (.encode_info.json).
- Sidecar safe-open exported and sidecar permission tightening to owner-only mode.
Bug Fixes
- Manifests now include/preserve an S3 rename-collision flag while remaining compatible with older manifests.
- Self-test failures avoid publishing bad outputs, preserve prior artifacts appropriately, and reject unexpected non-directory S3 entries.
Tests
- Extensive unit and end-to-end tests covering CLI, library, self-test, manifest compatibility, sidecar handling, and filesystem safety.

Implementation of the M6 design (this PR's earlier doc commits 31c0a0f through 1736435). Wires the merged M1-M5 encoder slices into a user-facing CLI plus a library entrypoint, with the round-trip self-test the parent doc mandates. ## What ships - internal/backup/manifest.go: Exclusions gains RenameS3Collisions bool with json tag rename_s3_collisions. Intentionally NOT added to exclusionsRequiredFields so legacy manifests decode safely with the zero value false. Pinned by TestExclusionsLegacyManifestOmitsRenameS3Collisions. - cmd/elastickv-snapshot-decode/main.go: emitManifest now populates the new field from cfg.renameCollisions, so M6's self-test can thread the same option back through DecodeSnapshot. - internal/backup/encode_info.go: ENCODE_INFO.json schema with format_version gate, NewEncodeInfo constructor, WriteEncodeInfo / ReadEncodeInfo helpers, and EncodeInfoSidecarPath that derives the sidecar from the .fsm path (<output>.encode_info.json — no static name collisions). - internal/backup/encode_snapshot.go: EncodeSnapshot(opts, out) library entrypoint mirroring DecodeSnapshot. Dispatches per-adapter encoders in canonical fan-out order (redis -> dynamodb -> s3 -> sqs), and implements the two-mode buffering model: - SelfTest=false: stream FSM straight to out with a sha256.Writer tee. - SelfTest=true: buffer FSM in *bytes.Buffer, fire the unexported corruptBufferForTest hook (if set), self-test against the buffer, copy to out ONLY on match. Corruption hooks reach the self-test decode but never reach out (write-then-rename atomicity). - cmd/elastickv-snapshot-encode/main.go: CLI with --input / --output / --adapter / --last-commit-ts / --self-test / --scratch-root flags; decoder-parity adapter CSV parser; fail-closed T >= manifest.last_commit_ts validation; atomic publish via <output>.tmp-<random> with deferred cleanup; sidecar emission with fsync+close discipline. ## Tests (12 new) internal/backup/encode_info_test.go: - TestEncodeInfoRoundTrip - TestEncodeInfoRejectsUnknownFormatVersion - TestExclusionsLegacyManifestOmitsRenameS3Collisions internal/backup/encode_snapshot_test.go: - TestEncodeSnapshotLibraryRoundTrip - TestEncodeSnapshotSelfTestMatchesInput (uses canonicalize-once pattern: encode -> decode -> re-encode self-tests cleanly) - TestEncodeSnapshotSelfTestDetectsCorruption (corruption reaches self-test decode, NEVER reaches the io.Writer) - TestEncodeSnapshotRequiresInputRoot - TestEncodeInfoSidecarPath cmd/elastickv-snapshot-encode/main_test.go: - TestCLIRejectsMissingManifest - TestCLIRejectsUnknownAdapter - TestCLIRejectsLowerLastCommitTSOverride - TestCLIAcceptsEqualAndHigherLastCommitTSOverride (sub-tests for equal + higher) - TestCLIEncodeInfoPathDerivedFromOutput - TestCLIEncodeInfoTwoFilesNoCollision - TestCLIRoundTripSelfTestAllAdapters - TestCLISelfTestFailureLeavesNoFsmAtOutputPath - TestParseLastCommitTS All green (race + no cache). golangci-lint clean. ## Caller audit per CLAUDE.md semantic-change rule - Exclusions struct gained a field. Existing callers either: (a) build the struct via field-tagged literals (decoder CLI's emitManifest — updated to populate the new field), or (b) read it (encoder's buildSelfTestDecodeOptions — new code). No silent semantic change for any pre-existing caller. - DecodeOptions.RenameS3Collisions was already a public field used by the decoder; the encoder now also reads it via the manifest. No caller-side change needed. ## Self-review 1. Data loss: write-then-rename atomic publish — self-test failure never publishes the .fsm. Mismatch txt + corrupt buffer never reach the io.Writer (pinned by TestEncodeSnapshotSelfTestDetectsCorruption asserting out.Len()==0). 2. Concurrency: pure offline. The CLI is a single-shot binary; library entrypoint takes an io.Writer the caller owns. 3. Performance: SelfTest=false streams with one sha256 tee, no extra allocations. SelfTest=true allocates one FSM-sized buffer plus the scratch decode tree (documented memory cost). 4. Data consistency: --last-commit-ts T < manifest is fail-closed with typed error; self-test threads MANIFEST DecodeOptions (Exclusions.* + DynamoDBLayout -> DynamoDBBundleJSONL) so trees produced with non-default decoder flags round-trip cleanly. 5. Test coverage: 12 new tests cover library entrypoint, CLI flag parsing, atomic publish discipline, sidecar path-derivation, corruption detection, forward-compat for legacy manifests.

coderabbitai · 2026-06-01T15:57:01Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Adds a snapshot-encode CLI and EncodeSnapshot library with optional on-disk self-test and deterministic structural diffs, ENCODE_INFO.json sidecar schema and helpers, atomic temp→publish flow with randomized temp names, and a manifest Exclusions flag rename_s3_collisions that decode emits.

Changes

Snapshot Encoding Pipeline

Layer / File(s)	Summary
Manifest exclusions and decode integration `internal/backup/manifest.go`, `cmd/elastickv-snapshot-decode/main.go`	Manifest Exclusions adds optional `rename_s3_collisions` (`RenameS3Collisions`) with `omitempty` and backward-compatible zero-value; decode CLI includes this flag when emitting MANIFEST.json.
ENCODE_INFO.json schema and tests `internal/backup/encode_info.go`, `internal/backup/encode_info_test.go`	Defines ENCODE_INFO.json format version constant, typed unsupported-version error, EncodeInfo and EncodeInfoSelfTest structs, New/Read/Write helpers, sidecar path derivation, and tests for round-trip and unknown-format rejection.
Open sidecar export and Unix permission enforcement `internal/backup/open_sidecar_export.go`, `internal/backup/open_nofollow_unix.go`, `internal/backup/open_nofollow_unix_test.go`	Exports `OpenSidecarFile`, enforces owner-only `0o600` permissions after truncation, and adds a Unix-only test verifying tightened permissions.
S3 encoder strictness and tests `internal/backup/encode_s3.go`, `internal/backup/encode_s3_test.go`	S3 encoder now fails closed on unexpected non-directory top-level `s3/` entries while skipping reserved `_`-prefixed entries; tests added for both behaviors.
EncodeSnapshot library with optional self-test `internal/backup/encode_snapshot.go`, `internal/backup/encode_snapshot_test.go`	Adds EncodeSnapshot API, EncodeOptions/EncodeResult types, sentinel errors, SHA-256 tee, adapter-runner wiring (redis→dynamodb→s3→sqs), redis multi-DB guards, buffered self-test path that decodes and performs bounded deterministic streamed diffs, and extensive library tests validating round-trip, self-test idempotency/corruption detection, input/manifest guards, and adapter-data error mapping.
elastickv-snapshot-encode CLI with atomic publishing `cmd/elastickv-snapshot-encode/main.go`, `cmd/elastickv-snapshot-encode/main_test.go`	CLI parses required flags, reads/validates MANIFEST.json, enforces ceiling-only last-commit override, builds EncodeOptions, writes restrictive-permissions temp `.fsm`, optionally runs self-test (writes `<output>.mismatch.txt` on mismatch), atomically renames temp→final, fsyncs parent dir, writes `<output>.encode_info.json` sidecar with safe open/write/fsync semantics, and maps data-correctness sentinel errors to a distinct exit code. Tests cover error mappings, sidecar contracts, self-test flows, and permission/security checks.

Sequence Diagram(s)

sequenceDiagram
  participant User
  participant CLI as elastickv-snapshot-encode
  participant Lib as EncodeSnapshot
  participant FS as Filesystem
  participant Decode as DecodeSnapshot

  User->>CLI: run encode (--input, --output, --adapter, --last-commit-ts, --self-test)
  CLI->>FS: read <input>/MANIFEST.json
  CLI->>Lib: build EncodeOptions + run EncodeSnapshot
  Lib->>FS: read adapter input trees (redis/dynamodb/s3/sqs)
  Lib->>Decode: when SelfTest=true, write temp .fsm -> DecodeSnapshot (self-test)
  Decode->>Lib: return match/mismatch + decoded tree
  Lib->>FS: on success rename temp .fsm -> <output> and fsync parent dir
  CLI->>FS: write <output>.encode_info.json sidecar (safe open/write/fsync)
  CLI->>User: exit code (0 or classified non-zero)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

bootjp/elastickv#713: Related to sidecar-safe open implementations and Unix permission handling used/exposed here.
bootjp/elastickv#828: Related Redis encoder changes that integrate with EncodeSnapshot adapter wiring and Redis handling.
bootjp/elastickv#810: Related to decode manifest emission and the rename-collisions flag that this PR emits.

Poem

🐰 I hopped through manifests and stitched a sidecar seam,

I wrote temp .fsm files safe and kept the publish clean,
I ran a tiny self-test to sniff a sneaky glitch,
Left a JSON note with hashes—tidy, sealed, and stitched,
Then hopped away content into the backup stream.

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and specifically identifies the main change: implementation of Phase 0b M6 encoder (cmd/elastickv-snapshot-encode) plus supporting library components.
Docstring Coverage	✅ Passed	Docstring coverage is 89.74% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch backup/m6-cli-design

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

bootjp · 2026-06-01T15:57:02Z

@claude review

claude · 2026-06-01T15:57:40Z

Claude finished @bootjp's task in 2m 59s —— View job

Code Review - PR #904

Tasks

Gather context and understand the request
Read changed files
Review library code (internal/backup/)
Review CLI code (cmd/elastickv-snapshot-encode/)
Review tests
Post final review

Summary

This is a solid, well-structured implementation. The atomic publish discipline, self-test detection, and HLC ceiling fail-closed logic are all correctly implemented. A few issues worth addressing before merge:

Issues

1. fmt.Sscanf for hex parsing doesn't consume the entire input — partial parse silently succeeds

cmd/elastickv-snapshot-encode/main.go:156

if _, err := fmt.Sscanf(s[2:], "%x", &ts); err != nil {

fmt.Sscanf stops at the first non-hex character and returns n=1, err=nil — so "0xffZZ" parses as 0xff with no error. This means a user typo like --last-commit-ts 0x1234abc_garbage would silently accept a wrong value (and pass the >= manifest check if it happens to be large enough).

Replace with strconv.ParseUint(s[2:], 16, 64) which rejects partial parses. The same fix applies to the decimal branch: fmt.Sscanf(s, "%d", &ts) with a positive decimal will stop before a sign or non-digit; strconv.ParseUint(s, 10, 64) is stricter.

Fix this →

2. Stale comment: EncodeResult.SHA256 says "via SHA256Hex" but no such function exists

internal/backup/encode_snapshot.go:85

// SHA256 of the produced .fsm (lowercase hex via SHA256Hex).
SHA256 [32]byte

The field is [32]byte, not a string. The hex encoding happens later at the CLI level via hex.EncodeToString(result.SHA256[:]) in writeSidecar. The reference to SHA256Hex is a phantom; there is no such function in the package. The comment should read "raw SHA-256 digest; hex-encode with hex.EncodeToString(r.SHA256[:]) for display."

Fix this →

3. uitoa / itoa reinvent strconv — misleading for future readers

internal/backup/encode_snapshot.go:402-422

The hand-rolled decimal converters are correct but strconv.FormatUint and strconv.Itoa do exactly the same thing and are already linked into every Go binary. The only benefit here is avoiding an import, but strconv is already transitively present. Keeping custom implementations means a future reader may not recognise them as stdlib equivalents and will spend time auditing them.

Minor, but worth dropping.

4. diffOneSubdir mismatch output is non-deterministic for "bytes differ" entries

internal/backup/encode_snapshot.go:328-350

The loop over bFiles (a map) produces "bytes differ" entries in map-iteration order, which is randomised per-run. The "missing in input" entries also come from map iteration. Only "missing in scratch" entries (the remaining slice) are sorted.

Consequence: mismatch.txt can look different across two runs with the same inputs, making it harder to compare mismatch reports or write tests that check the txt content. Sorting diffs before returning (or collecting into a slice and sorting) would make this deterministic.

Fix this →

5. EncodeSnapshot returns (result, nil) on self-test mismatch — callers must check two things

internal/backup/encode_snapshot.go:181-183

if !matched {
    return result, nil  // no error, but SelfTestMatched = false
}

When SelfTest=true and the self-test fails, EncodeSnapshot returns (result, nil) — a nil error. Callers who only check err != nil will silently proceed as if the encode succeeded. The CLI correctly checks result.SelfTestMatched, but any future library caller who patterns-matches just on the error will miss the failure.

Returning a typed error (e.g., errSelfTestMismatch promoted from the CLI into the library) on mismatch, or at minimum documenting the contract prominently on EncodeSnapshot's godoc, would make this safer. The current behaviour is already documented in EncodeResult.SelfTestMatched, but the "no error on mismatch" choice is surprising enough to warrant a sentence in the EncodeSnapshot signature comment.

Observations (no fix required, but worth knowing)

6. writeSidecar is not atomic with respect to the .fsm rename

If the process is killed after os.Rename(tempPath, cfg.outputPath) but before writeSidecar completes, the published .fsm will exist without a sidecar. This is probably fine — the .fsm itself is the critical restore artifact and the sidecar is operator-metadata — but it means restore tooling should tolerate a missing sidecar gracefully. Worth a note in the design doc's "failure modes" section if it isn't already there.

7. walkRegularFiles slurps entire file contents into memory

internal/backup/encode_snapshot.go:359-392

For the self-test diff, every regular file in every enabled adapter subtree is read fully into a map. The comment already acknowledges this is bounded by test fixtures and defers streaming compare. Fine for now, but worth flagging if the first production-scale encode triggers an OOM.

8. var _ = strconv.FormatUint in main_test.go

cmd/elastickv-snapshot-encode/main_test.go:362

This is an unusual pattern to keep an import alive. If strconv has no active use in the test file, the import and this line can be removed. If it's an intentional placeholder for future tests, a short comment explaining why would help.

9. sha256w = hashSHA256 type alias chain

internal/backup/encode_snapshot.go:432-438

The alias sha256w = hashSHA256 (where hashSHA256 is an interface) is confusing — two names for the same type, neither of which is hash.Hash. The comment says it avoids importing hash, but crypto/sha256 already transitively imports hash, so the import would add no size cost. Using hash.Hash directly would be clearer.

Strengths

Write-then-rename atomic publish is correctly implemented and well-tested (TestCLISelfTestFailureLeavesNoFsmAtOutputPath, TestEncodeSnapshotSelfTestDetectsCorruption).
corruptBufferForTest hook cleanly separates the test injection path from production code without build tags.
exclusionsRequiredFields intentionally excludes rename_s3_collisions, with a clear comment explaining why — good forward-compat discipline.
EncodeInfoSidecarPath path-derivation (not a static name) correctly avoids sibling collisions; both the library and CLI tests pin this.
HLC ceiling fail-closed (ErrSelfTestLowerLastCommitTS) is a typed error that maps cleanly to exit code 2 — matches the decoder's exit-code contract.
buildSelfTestDecodeOptions threads Exclusions.* and DynamoDBBundleJSONL through from the manifest — ensures the self-test's scratch tree uses the same decode flags as the original run.
| Branch

gemini-code-assist

Code Review

This pull request implements the elastickv-snapshot-encode command-line tool and library, which encodes vendor-independent directory trees into native .fsm files, generates an encode_info.json sidecar, and supports an optional round-trip self-test. The review feedback highlights several critical memory-efficiency improvements to prevent Out Of Memory (OOM) crashes on large production snapshots. Key recommendations include writing the .fsm to a temporary file on disk instead of buffering it in memory during the self-test, streaming the file comparison chunk-by-chunk rather than reading all file bytes into memory, and increasing the entropy of the temporary file suffix to avoid collisions.

gemini-code-assist · 2026-06-01T16:00:30Z

+func encodeBuffered(b *snapshotBuilder, opts EncodeOptions, enabled []string, out io.Writer) (EncodeResult, error) {
+	var buf bytes.Buffer
+	bytesWritten, err := b.WriteTo(&buf)
+	if err != nil {
+		return EncodeResult{}, errors.WithStack(err)
+	}
+	bufBytes := buf.Bytes()
+	if opts.corruptBufferForTest != nil {
+		opts.corruptBufferForTest(bufBytes)
+	}
+
+	header, mismatchTxt, matched, stErr := runSelfTest(bufBytes, opts)
+	result := EncodeResult{
+		Header:              header,
+		BytesWritten:        bytesWritten,
+		SHA256:              sha256.Sum256(bufBytes),
+		SelfTestRan:         true,
+		SelfTestMatched:     matched,
+		SelfTestMismatchTxt: mismatchTxt,
+		AdaptersEnabled:     enabled,
+	}
+	if stErr != nil {
+		return result, stErr
+	}
+	if !matched {
+		return result, nil
+	}
+	if _, err := io.Copy(out, bytes.NewReader(bufBytes)); err != nil {
+		return result, errors.Wrap(err, "copy buffered fsm to out")
+	}
+	return result, nil
+}


Buffering the entire .fsm file in memory (bytes.Buffer) during the self-test can lead to Out Of Memory (OOM) crashes on large production snapshots. Instead, write the .fsm directly to a temporary file on disk, compute the SHA256 hash in a single pass using the existing sha256Writer, and stream-read from the temporary file for the self-test and copying.

func encodeBuffered(b *snapshotBuilder, opts EncodeOptions, enabled []string, out io.Writer) (EncodeResult, error) { tempFile, err := os.CreateTemp(opts.SelfTestDecodeOptions.OutRoot, "encode-buffer-") if err != nil { return EncodeResult{}, errors.WithStack(err) } defer func() { _ = tempFile.Close() _ = os.Remove(tempFile.Name()) }() hashWriter := newSHA256Writer(tempFile) bytesWritten, err := b.WriteTo(hashWriter) if err != nil { return EncodeResult{}, errors.WithStack(err) } if opts.corruptBufferForTest != nil { if _, err := tempFile.Seek(0, io.SeekStart); err != nil { return EncodeResult{}, errors.WithStack(err) } bufBytes, err := io.ReadAll(tempFile) if err != nil { return EncodeResult{}, errors.WithStack(err) } opts.corruptBufferForTest(bufBytes) if _, err := tempFile.WriteAt(bufBytes, 0); err != nil { return EncodeResult{}, errors.WithStack(err) } } header, mismatchTxt, matched, stErr := runSelfTest(tempFile, opts) result := EncodeResult{ Header: header, BytesWritten: bytesWritten, SHA256: hashWriter.Sum(), SelfTestRan: true, SelfTestMatched: matched, SelfTestMismatchTxt: mismatchTxt, AdaptersEnabled: enabled, } if stErr != nil { return result, stErr } if !matched { return result, nil } if _, err := tempFile.Seek(0, io.SeekStart); err != nil { return result, errors.Wrap(err, "seek temp file for copy") } if _, err := io.Copy(out, tempFile); err != nil { return result, errors.Wrap(err, "copy buffered fsm to out") } return result, nil }

gemini-code-assist · 2026-06-01T16:00:30Z

+func runSelfTest(fsmBytes []byte, opts EncodeOptions) (SnapshotHeader, []byte, bool, error) {
+	scratchBase := opts.SelfTestDecodeOptions.OutRoot
+	scratchDir, err := os.MkdirTemp(scratchBase, "encode-self-test-")
+	if err != nil {
+		return SnapshotHeader{}, nil, false, errors.Wrap(err, "mkdir scratch")
+	}
+	defer func() {
+		_ = os.RemoveAll(scratchDir)
+	}()
+
+	decOpts := opts.SelfTestDecodeOptions
+	decOpts.OutRoot = scratchDir
+
+	result, derr := DecodeSnapshot(bytes.NewReader(fsmBytes), decOpts)
+	if derr != nil {
+		// Decoder errored on our own output — that IS a self-test
+		// failure (the .fsm we produced isn't loadable). Surface as
+		// a mismatch with the decoder error embedded in the txt.
+		mismatchTxt := []byte("self-test failed: DecodeSnapshot rejected the produced .fsm: " + derr.Error())
+		return SnapshotHeader{}, mismatchTxt, false, nil
+	}
+
+	if result.Header.LastCommitTS != opts.LastCommitTS {
+		mismatchTxt := []byte(formatHeaderMismatch(opts.LastCommitTS, result.Header.LastCommitTS))
+		return result.Header, mismatchTxt, false, nil
+	}
+
+	diff, derr := diffAdapterTrees(opts.InputRoot, scratchDir, opts.Adapters)
+	if derr != nil {
+		return result.Header, nil, false, errors.Wrap(derr, "diff scratch tree")
+	}
+	if len(diff) > 0 {
+		return result.Header, []byte(strings.Join(diff, "\n") + "\n"), false, nil
+	}
+	return result.Header, nil, true, nil
+}


Update runSelfTest to accept an *os.File instead of a byte slice to support streaming the .fsm from disk during the self-test, avoiding loading the entire snapshot into memory.

func runSelfTest(fsmFile *os.File, opts EncodeOptions) (SnapshotHeader, []byte, bool, error) { scratchBase := opts.SelfTestDecodeOptions.OutRoot scratchDir, err := os.MkdirTemp(scratchBase, "encode-self-test-") if err != nil { return SnapshotHeader{}, nil, false, errors.Wrap(err, "mkdir scratch") } defer func() { _ = os.RemoveAll(scratchDir) }() decOpts := opts.SelfTestDecodeOptions decOpts.OutRoot = scratchDir if _, err := fsmFile.Seek(0, io.SeekStart); err != nil { return SnapshotHeader{}, nil, false, errors.Wrap(err, "seek temp file") } result, derr := DecodeSnapshot(fsmFile, decOpts) if derr != nil { // Decoder errored on our own output — that IS a self-test // failure (the .fsm we produced isn't loadable). Surface as // a mismatch with the decoder error embedded in the txt. mismatchTxt := []byte("self-test failed: DecodeSnapshot rejected the produced .fsm: " + derr.Error()) return SnapshotHeader{}, mismatchTxt, false, nil } if result.Header.LastCommitTS != opts.LastCommitTS { mismatchTxt := []byte(formatHeaderMismatch(opts.LastCommitTS, result.Header.LastCommitTS)) return result.Header, mismatchTxt, false, nil } diff, derr := diffAdapterTrees(opts.InputRoot, scratchDir, opts.Adapters) if derr != nil { return result.Header, nil, false, errors.Wrap(derr, "diff scratch tree") } if len(diff) > 0 { return result.Header, []byte(strings.Join(diff, "\n") + "\n"), false, nil } return result.Header, nil, true, nil }

gemini-code-assist · 2026-06-01T16:00:30Z

+func diffOneSubdir(aDir, bDir, relPrefix string) ([]string, error) {
+	aFiles, aErr := walkRegularFiles(aDir)
+	if aErr != nil && !errors.Is(aErr, os.ErrNotExist) {
+		return nil, errors.Wrapf(aErr, "walk input %s", aDir)
+	}
+	bFiles, bErr := walkRegularFiles(bDir)
+	if bErr != nil && !errors.Is(bErr, os.ErrNotExist) {
+		return nil, errors.Wrapf(bErr, "walk scratch %s", bDir)
+	}
+
+	var diffs []string
+	aMap := map[string][]byte{}
+	for path, body := range aFiles {
+		aMap[path] = body
+	}
+	for path, bBody := range bFiles {
+		aBody, ok := aMap[path]
+		if !ok {
+			diffs = append(diffs, relPrefix+"/"+path+" (missing in input)")
+			continue
+		}
+		if !bytes.Equal(aBody, bBody) {
+			diffs = append(diffs, relPrefix+"/"+path+" (bytes differ)")
+		}
+		delete(aMap, path)
+	}
+	// Anything remaining in aMap is present in input but not in scratch.
+	remaining := make([]string, 0, len(aMap))
+	for path := range aMap {
+		remaining = append(remaining, relPrefix+"/"+path+" (missing in scratch)")
+	}
+	sort.Strings(remaining)
+	diffs = append(diffs, remaining...)
+	return diffs, nil
+}
+
+// walkRegularFiles returns a map of relative path -> file bytes for every
+// regular file under root. Missing root is the empty map + os.ErrNotExist.
+// Bounded by the per-adapter test fixtures the encoder runs against;
+// production-scale dumps may want streaming compare, deferred until a
+// real bottleneck shows up.
+func walkRegularFiles(root string) (map[string][]byte, error) {
+	out := map[string][]byte{}
+	rootInfo, err := os.Stat(root)
+	if err != nil {
+		return nil, errors.WithStack(err)
+	}
+	if !rootInfo.IsDir() {
+		return nil, errors.Errorf("not a directory: %s", root)
+	}
+	if err := filepath.WalkDir(root, func(path string, d os.DirEntry, err error) error {
+		if err != nil {
+			return err
+		}
+		if d.IsDir() {
+			return nil
+		}
+		if !d.Type().IsRegular() {
+			return nil
+		}
+		body, rerr := os.ReadFile(path) //nolint:gosec // walking a caller-provided root, regular files only
+		if rerr != nil {
+			return errors.Wrap(rerr, path)
+		}
+		rel, rerr := filepath.Rel(root, path)
+		if rerr != nil {
+			return errors.WithStack(rerr)
+		}
+		out[filepath.ToSlash(rel)] = body
+		return nil
+	}); err != nil {
+		return nil, errors.WithStack(err)
+	}
+	return out, nil
+}


Loading the entire contents of all files in both the input and scratch directories into memory (walkRegularFiles reading all file bytes via os.ReadFile) will cause OOM crashes on large backups (e.g., with large S3 blobs). Instead, walk the directories to collect file sizes, and perform a streaming chunk-by-chunk comparison for files present in both directories.

func diffOneSubdir(aDir, bDir, relPrefix string) ([]string, error) { aFiles, aErr := walkRegularFiles(aDir) if aErr != nil && !errors.Is(aErr, os.ErrNotExist) { return nil, errors.Wrapf(aErr, "walk input %s", aDir) } bFiles, bErr := walkRegularFiles(bDir) if bErr != nil && !errors.Is(bErr, os.ErrNotExist) { return nil, errors.Wrapf(bErr, "walk scratch %s", bDir) } var diffs []string for path, aSize := range aFiles { bSize, ok := bFiles[path] if !ok { diffs = append(diffs, relPrefix+"/"+path+" (missing in scratch)") continue } if aSize != bSize { diffs = append(diffs, relPrefix+"/"+path+" (bytes differ)") continue } equal, err := compareFiles(filepath.Join(aDir, path), filepath.Join(bDir, path)) if err != nil { return nil, err } if !equal { diffs = append(diffs, relPrefix+"/"+path+" (bytes differ)") } } for path := range bFiles { if _, ok := aFiles[path]; !ok { diffs = append(diffs, relPrefix+"/"+path+" (missing in input)") } } sort.Strings(diffs) return diffs, nil } func walkRegularFiles(root string) (map[string]int64, error) { out := map[string]int64{} rootInfo, err := os.Stat(root) if err != nil { return nil, errors.WithStack(err) } if !rootInfo.IsDir() { return nil, errors.Errorf("not a directory: %s", root) } if err := filepath.WalkDir(root, func(path string, d os.DirEntry, err error) error { if err != nil { return err } if d.IsDir() { return nil } if !d.Type().IsRegular() { return nil } info, rerr := d.Info() if rerr != nil { return errors.Wrap(rerr, path) } rel, rerr := filepath.Rel(root, path) if rerr != nil { return errors.WithStack(rerr) } out[filepath.ToSlash(rel)] = info.Size() return nil }); err != nil { return nil, errors.WithStack(err) } return out, nil } func compareFiles(pathA, pathB string) (bool, error) { fA, err := os.Open(pathA) if err != nil { return false, errors.WithStack(err) } defer fA.Close() fB, err := os.Open(pathB) if err != nil { return false, errors.WithStack(err) } defer fB.Close() bufA := make([]byte, 64*1024) bufB := make([]byte, 64*1024) for { nA, errA := io.ReadFull(fA, bufA) nB, errB := io.ReadFull(fB, bufB) if errA == io.EOF || errA == io.ErrUnexpectedEOF { errA = nil } if errB == io.EOF || errB == io.ErrUnexpectedEOF { errB = nil } if errA != nil { return false, errors.WithStack(errA) } if errB != nil { return false, errors.WithStack(errB) } if nA != nB { return false, nil } if nA == 0 { break } if !bytes.Equal(bufA[:nA], bufB[:nB]) { return false, nil } } return true, nil }

gemini-code-assist · 2026-06-01T16:00:30Z

+	// tempSuffixHexLen is the hex-character length of the random
+	// suffix appended to <output>.tmp-<hex>; 8 hex chars = 4 bytes of
+	// entropy, which gives ~4×10^9 collision space per --output path
+	// (more than enough for concurrent encodes against the same path).
+	tempSuffixHexLen   = 8
+	tempSuffixByteLen  = tempSuffixHexLen / 2


Using only 4 bytes of entropy (8 hex characters) for the temporary file suffix has a relatively high probability of collision (due to the Birthday Paradox) in highly concurrent environments or automated test pipelines. Increasing the suffix length to 16 hex characters (8 bytes of entropy) provides a much safer collision-free space.

Suggested change

// tempSuffixHexLen is the hex-character length of the random

// suffix appended to <output>.tmp-<hex>; 8 hex chars = 4 bytes of

// entropy, which gives ~4×10^9 collision space per --output path

// (more than enough for concurrent encodes against the same path).

tempSuffixHexLen = 8

tempSuffixByteLen = tempSuffixHexLen / 2

// tempSuffixHexLen is the hex-character length of the random

// suffix appended to <output>.tmp-<hex>; 16 hex chars = 8 bytes of

// entropy, which gives a collision-free space per --output path.

tempSuffixHexLen = 16

tempSuffixByteLen = tempSuffixHexLen / 2

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1fa691fb74

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-01T16:01:48Z

+			return backup.AdapterSet{}, errors.Errorf("unknown adapter %q", name)
+		}
+	}
+	return set, nil


Reject empty adapter selections

When --adapter contains only separators or whitespace (for example --adapter ' ,', or a templated value that expands to spaces), every token hits the empty case and this returns a zero AdapterSet with no error. The rest of the encoder then invokes no adapter encoders, publishes a valid header-only .fsm, and exits successfully, which can turn a bad CLI argument into an apparently successful but empty restore artifact; the decoder-side parser already rejects this zero-adapter case.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-01T16:01:48Z

+	if _, err := fmt.Sscanf(s, "%d", &ts); err != nil {
+		return 0, errors.Wrap(err, "--last-commit-ts decimal parse")


Reject trailing junk in timestamps

For decimal --last-commit-ts values, fmt.Sscanf with %d accepts a numeric prefix and ignores trailing bytes, so an input like --last-commit-ts 100oops is silently treated as 100. Because this flag controls the restored HLC ceiling and sidecar metadata, malformed operator input should fail instead of publishing a snapshot with a different timestamp than the argument appears to specify.

Useful? React with 👍 / 👎.

Six substantive review findings on the M6 implementation PR. Headline fix: switched the self-test buffer from in-memory bytes.Buffer to a disk-backed temp file so large snapshots cannot OOM the encoder. ## Gemini high #1: SelfTest=true was buffering FSM in memory Out Of Memory risk on production-scale snapshots (gemini high #904). Fixed by writing the FSM through a tee (sha256 + os.File.WriteAt) into an os.CreateTemp file under opts.SelfTestDecodeOptions.OutRoot, then Seek(0) and stream through DecodeSnapshot for the self-test. The temp file is os.Remove'd via defer on every exit path. The contract is unchanged: corruption reaches the self-test decode but never reaches the io.Writer (TestEncodeSnapshotSelfTestDetectsCorruption still asserts out.Len()==0). Memory cost: O(1) for the SelfTest=true path now (previously was an FSM-sized bytes.Buffer on top of the sort working set). ## Gemini high #2: runSelfTest signature Changed from runSelfTest(fsmBytes []byte, ...) to runSelfTest(fsmFile io.Reader, ...) so the .fsm streams through DecodeSnapshot instead of being held as []byte. Same OOM fix at the runSelfTest layer. ## Gemini high #3: walkRegularFiles loaded every file's bytes into memory For large S3 blobs this OOMs the encoder. Replaced walkRegularFiles with walkRegularFilePaths (paths only) and streamFilesEqual (chunk compare at streamCmpBufSize = 64 KiB). Per-file compare cost is now O(streamCmpBufSize), not O(file size). ## Gemini medium: tempSuffixHexLen entropy was 4 bytes Birthday-paradox collision risk in highly concurrent CI environments. Bumped from 8 hex / 4 bytes (2^32 space) to 16 hex / 8 bytes (2^64 space). Same width crypto/rand uses for cryptographic nonces. ## Claude high + codex P2: fmt.Sscanf silently accepts partial parses --last-commit-ts 0xffZZ parsed as 0xff (fmt.Sscanf stops at first non-hex). --last-commit-ts 100oops parsed as 100. This means a typo could silently set the snapshot HLC ceiling to something the operator did not type. Replaced fmt.Sscanf with strconv.ParseUint (base 10 / 16) which rejects trailing garbage as a parse error. New TestParseLastCommitTS cases cover: 0xffZZ, 100oops, " 100 ext", "-1". ## Codex P2: empty adapter selection produced an empty AdapterSet --adapter ' ,' (e.g. a templated argument that expanded to spaces) would yield a zero AdapterSet and the encoder would publish a valid header-only .fsm with no adapter records — a silently-empty restore artifact. Now rejected with a clear error. Pinned by TestParseAdapterSetRejectsEmptySelection covering " ,", ",,,", " ", and "," — plus a positive case (single "s3") asserting the guard does not break legitimate single-adapter selections. ## Caller audit per CLAUDE.md semantic-change rule - runSelfTest signature changed from ([]byte, opts) to (io.Reader, opts). Sole caller is encodeBuffered (same package, just updated). No external callers since the function is unexported. - corruptBufferForTest hook type changed from func([]byte) to func(*os.File). Sole caller is the same-package test TestEncodeSnapshotSelfTestDetectsCorruption (updated to ReadAt/ WriteAt the temp file). External callers cannot set this field (lowercase). - streamFilesEqual replaces the bytes.Equal compare in diffOneSubdir. Same return contract (bool, error) so the caller (diffOneSubdir) is unchanged in behavior. - parseAdapterSet now returns an error for empty-selection inputs. Caller is parseFlags which already propagates the error to exit-1. Tests + lint green.

bootjp · 2026-06-01T16:25:02Z

v2 (b131186): addresses all 6 review findings on the implementation PR.

Gemini high #1 (OOM on large snapshots): Self-test buffer is now an on-disk os.CreateTemp file (NOT *bytes.Buffer). FSM streams through tee → sha256 + temp file; self-test decodes from the temp file directly. Memory cost dropped from O(FSM size) to O(1). Pinned by existing TestEncodeSnapshotSelfTestDetectsCorruption (corrupt-via-WriteAt, never reaches out).

Gemini high #2 (runSelfTest signature): Changed from fsmBytes []byte to fsmFile io.Reader, completing the streaming chain.

Gemini high #3 (walkRegularFiles OOM): Replaced bytes-into-memory with walkRegularFilePaths + streamFilesEqual (64 KiB chunk compare). Per-file cost is now O(chunk size), not O(file size).

Gemini medium (tempSuffixHexLen entropy): Bumped from 8 hex / 4 bytes (2^32) to 16 hex / 8 bytes (2^64), same width crypto/rand uses for cryptographic nonces.

Claude high + codex P2 (fmt.Sscanf partial parses): Replaced with strconv.ParseUint strict parsing. Trailing garbage like 0xffZZ or 100oops now surfaces as a flag-parse error. TestParseLastCommitTS extended to cover both.

Codex P2 (empty adapter selection): --adapter ' ,' etc. now rejected. Pinned by TestParseAdapterSetRejectsEmptySelection. Without the guard a templated arg expanding to spaces would silently publish a header-only .fsm.

@claude review

claude · 2026-06-01T16:25:19Z

Claude finished @bootjp's task in 3m 41s —— View job

Code Review - PR #904 (v2)

Tasks

Gather context and understand the request
Read changed files (v2 commit b131186)
Review library code (internal/backup/)
Review CLI code (cmd/elastickv-snapshot-encode/)
Review tests
Post final review

v2 Verification — All 6 Required Fixes Confirmed

Finding	Status
Gemini high #1: OOM — self-test buffer in `*bytes.Buffer`	✅ `encodeBuffered` now uses `os.CreateTemp`, tees through `teeWriter` to hash+file, self-test streams from `*os.File`
Gemini high #2: `runSelfTest(fsmBytes []byte)` → streaming	✅ Signature is now `runSelfTest(fsmFile io.Reader, opts EncodeOptions)`
Gemini high #3: `walkRegularFiles` loading all bytes into memory	✅ `walkRegularFilePaths` returns path→abspath map; `streamFilesEqual` does 64 KiB chunk compare via `streamReadersEqual`
Gemini medium: `tempSuffixHexLen` low entropy (4 bytes)	✅ Bumped to 16 hex chars / 8 bytes (`2^64` space)
Claude high + codex P2: `fmt.Sscanf` partial parses	✅ `parseLastCommitTS` uses `strconv.ParseUint` with strict rejection; `TestParseLastCommitTS` covers `"0xffZZ"` and `"100oops"`
Codex P2: empty adapter selection silently publishes header-only `.fsm`	✅ `parseAdapterSet` rejects zero-adapter result; `TestParseAdapterSetRejectsEmptySelection` covers `" ,"`, `",,,"`

One New Issue Introduced by v2

Stale doc comments still say *bytes.Buffer after the disk-backed rewrite

internal/backup/encode_snapshot.go:52-57 (the SelfTest field comment):

// SelfTest enables the round-trip self-test. EncodeSnapshot buffers
// the FSM in *bytes.Buffer, decodes from the buffer, and copies to
// the caller's io.Writer only if the buffer survives DecodeSnapshot

internal/backup/encode_snapshot.go:113-116 (the EncodeSnapshot function doc):

// When opts.SelfTest=true an internal *bytes.Buffer holds the
// FSM during the self-test; bytes are copied to out only after the
// self-test matches. Memory cost in self-test mode is one FSM-sized
// allocation on top of the existing sort working set.

Both still describe the v1 in-memory approach. The actual v2 implementation writes to an os.CreateTemp file (encode-self-test-fsm-*) and the memory cost is O(1) (not O(FSM size)). A future reader looking at these comments to understand the memory model will get the wrong answer.

Fix this →

Carry-over Observations (non-blocking, noted in v1)

These were marked "no fix required" in the v1 review and remain unchanged:

EncodeResult.SHA256 comment (encode_snapshot.go:91) still says "lowercase hex via SHA256Hex" — no such function exists; the hex encoding happens in the CLI's writeSidecar.
diffOneSubdir non-determinism — the diffs slice for "bytes differ" and "missing in input" entries is built in map-iteration order; only remaining is sorted. mismatch.txt output can vary across runs with identical inputs.
EncodeSnapshot returns (result, nil) on self-test mismatch — callers must check result.SelfTestMatched AND err != nil; callers who only check the error will silently pass through a mismatch. The godoc for EncodeSnapshot doesn't call out this dual-check contract.
sha256w = hashSHA256 alias chain — two interface names for the same narrow surface; hash.Hash (already transitively imported) would be clearer in sha256Writer.h.
uitoa/itoa hand-rolled — strconv.FormatUint/strconv.Itoa are exact equivalents.

Strengths in v2

The disk-backed self-test path is cleanly implemented:

teeWriter correctly hashes bytes AS they stream to the temp file — EncodeResult.SHA256 reflects the encoder's actual output, not any test corruption.
The corruption hook fires AFTER Sync() but BEFORE the seek-to-0, so the self-test decoder always reads the (possibly corrupted) on-disk bytes — the write-then-rename invariant is correctly preserved.
streamReadersEqual correctly short-circuits on aSize != bSize before opening files, and the io.ReadFull loop handles the last-partial-chunk case (io.ErrUnexpectedEOF) before checking aErr != nil.
TestEncodeSnapshotSelfTestDetectsCorruption correctly asserts out.Len()==0 on mismatch — the critical publish-nothing invariant.
TestParseAdapterSetRejectsEmptySelection covers the " ,", ",", ",,," cases that would otherwise produce a zero AdapterSet.

Verdict: Ready to merge once the two stale *bytes.Buffer comments are updated. All 6 previously required fixes are correctly implemented and tested.
| Branch

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b13118638f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-01T16:30:30Z

+	b := newSnapshotBuilder(opts.LastCommitTS)
+	enabled, err := runAdapterEncoders(b, opts)


Enforce the manifest timestamp floor in EncodeSnapshot

When EncodeSnapshot is called directly (the new library entrypoint the design exposes for future callers), a caller can pass LastCommitTS lower than InputRoot/MANIFEST.json:last_commit_ts; this path immediately builds with that value and never reads the manifest or returns the advertised ErrSelfTestLowerLastCommitTS. The CLI pre-check catches its own invocations, but in-process callers can still emit an .fsm whose header and MVCC key timestamps lower the restored HLC ceiling, violating the fail-closed invariant this wrapper is meant to enforce.

Useful? React with 👍 / 👎.

…over cleanups) Mandatory (claude v2): - Stale *bytes.Buffer comments in SelfTest field doc and EncodeSnapshot godoc updated to describe the v2 on-disk temp file model with O(1) memory cost (encode-self-test-fsm-* under SelfTestDecodeOptions.OutRoot). Non-blocking carry-over items from v1 + v2 (folded in since trivial): - EncodeResult.SHA256 comment no longer references a non-existent SHA256Hex function; says "raw 32-byte digest; CLI hex-encodes via encoding/hex in writeSidecar". - EncodeSnapshot godoc now spells out the dual-check contract ("Callers MUST check result.SelfTestMatched before treating a nil error as success") since self-test mismatch returns (result, nil). - diffOneSubdir output is now fully sorted (was: only the "remaining" segment sorted; map-iteration order leaked into the rest). mismatch.txt is deterministic across runs with identical inputs. - Replaced hand-rolled uitoa/itoa with strconv.FormatUint and strconv.Itoa. Dropped the sha256w / hashSHA256 alias chain; the sha256Writer.h field now uses hash.Hash directly (already imported).

bootjp · 2026-06-01T16:43:22Z

v3 (b8da884): addresses claude v2 review.

Mandatory (the v2 blocker): Two stale *bytes.Buffer doc comments updated to describe the on-disk temp file model (encode-self-test-fsm-*) with O(1) memory cost.

Non-blocking carry-overs folded in since trivial:

EncodeResult.SHA256 comment no longer references non-existent SHA256Hex (now: "raw 32-byte digest; CLI hex-encodes in writeSidecar")
EncodeSnapshot godoc spells out the dual-check contract for self-test mismatch (returns (result, nil) — callers MUST check SelfTestMatched)
diffOneSubdir fully sorts the output (was: only "remaining" segment sorted); mismatch.txt is now deterministic
Hand-rolled uitoa/itoa replaced with strconv.FormatUint / strconv.Itoa
Dropped the sha256w / hashSHA256 alias chain; sha256Writer.h now uses hash.Hash directly

@claude review

claude · 2026-06-01T16:43:40Z

Claude finished @bootjp's task in 3m 33s —— View job

Code Review - PR #904 (v3)

Tasks

Gather context and understand the request
Read changed files (v3 commit b8da884)
Verify all v2 carry-over fixes are implemented correctly
Check for any new issues introduced in v3
Post final review

v3 Verification — All v2 Carry-overs Confirmed

Finding	Status
Mandatory v2 fix: stale `*bytes.Buffer` doc in `SelfTest` field comment	✅ Now says "on-disk temp file under SelfTestDecodeOptions.OutRoot (encode-self-test-fsm-*)" and "O(1) memory cost"
Mandatory v2 fix: stale `*bytes.Buffer` doc in `EncodeSnapshot` func comment	✅ Now describes disk-based approach; explicitly credits gemini high #904 OOM fix
`EncodeResult.SHA256` phantom `SHA256Hex` reference	✅ Now: "raw 32-byte digest; the CLI hex-encodes it via encoding/hex when writing ENCODE_INFO.json"
`EncodeSnapshot` dual-check contract undocumented	✅ Lines 127-131 now explicitly document: "Self-test failure returns (result, nil)… Callers MUST check result.SelfTestMatched before treating a nil error as success"
`diffOneSubdir` non-deterministic output	✅ `sort.Strings(diffs)` on line 423 now applies to ALL diff entries (missing-in-input + bytes-differ + missing-in-scratch)
Hand-rolled `uitoa`/`itoa`	✅ `formatHeaderMismatch` uses `strconv.FormatUint`; truncation message uses `strconv.Itoa`
`sha256w`/`hashSHA256` alias chain	✅ Both `sha256Writer.h` (line 538) and `teeWriter.h` (line 244) now typed as `hash.Hash` directly; `"hash"` imported explicitly

One New Doc Bug in v3

ErrSelfTestLowerLastCommitTS and EncodeSnapshot both claim the HLC floor check lives in the library — it doesn't

internal/backup/encode_snapshot.go:17-25 (the var comment):

// ErrSelfTestLowerLastCommitTS is returned by EncodeSnapshot when the
// effective LastCommitTS in EncodeOptions is below the manifest's value.
// ...
// Surfaced at the EncodeSnapshot layer so the CLI's main exits with code 2

internal/backup/encode_snapshot.go:133-136 (the func comment):

// Returns ErrSelfTestLowerLastCommitTS when opts.LastCommitTS is below
// the manifest's value — caller is responsible for reading the manifest
// and computing the effective T (this layer just validates the floor).

Both are inaccurate. EncodeSnapshot never reads MANIFEST.json, never compares opts.LastCommitTS to the manifest value, and never returns ErrSelfTestLowerLastCommitTS. That check is performed exclusively by the CLI's resolveLastCommitTS before EncodeSnapshot is ever called:

// cmd/elastickv-snapshot-encode/main.go:349-354
func resolveLastCommitTS(cfg *config, manifestTS uint64) (uint64, bool, error) {
    if cfg.lastCommitTS < manifestTS {
        return 0, false, errors.Wrapf(backup.ErrSelfTestLowerLastCommitTS, ...)
    }
    ...
}

The functional behaviour is correct — the HLC floor IS enforced before any bytes are written. But a future library caller reading the EncodeSnapshot godoc would trust that calling EncodeSnapshot with a low LastCommitTS returns ErrSelfTestLowerLastCommitTS, and be surprised when it doesn't (the call silently succeeds and stamps the low timestamp into the .fsm).

The fix is straightforward: correct both comments to say the error is a sentinel for the caller to return after its own manifest comparison, not something EncodeSnapshot itself enforces.

Fix this →

Observations (non-blocking)

sha256Writer and teeWriter are structurally identical

Both are {w io.Writer; h hash.Hash} with identical Write bodies. teeWriter is used in encodeBuffered, sha256Writer in encodeStream. They could be unified into one type — sha256Writer already has the Sum() [32]byte convenience method; adding it would cover both paths. Not a blocker, but the duplication will grow if a third path is added.

Strengths in v3

diffOneSubdir's sort.Strings(diffs) at the end is now truly a full-fence sort: the loop first drains bPaths (collecting missing-in-input and bytes-differ entries), then drains remaining aPaths (collecting missing-in-scratch), then sorts the combined slice. Determinism is correct regardless of map-iteration order.
teeWriter.h and sha256Writer.h both correctly typed as hash.Hash, which removes the confusing sha256w/hashSHA256 indirection without adding any import cost ("hash" is already in-tree via crypto/sha256).
strconv.FormatUint / strconv.Itoa are drop-in replacements for the hand-rolled converters with no semantic change — the behaviour is identical and a reader no longer has to audit them.
The EncodeSnapshot dual-check warning ("library callers should follow the same pattern") is now in the right place: immediately after documenting the (result, nil) return on mismatch, not buried in a separate godoc block.

Verdict: One doc fix needed before merge. All functional behaviour is correct; all v2 findings are addressed. The doc bug (EncodeSnapshot + ErrSelfTestLowerLastCommitTS incorrectly attributed to the library layer) is the only remaining item.
| Branch

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b8da884f9d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-01T16:47:41Z

+// encodeToTempFile creates tempPath, runs EncodeSnapshot into it,
+// fsync+close. Caller is responsible for the os.Remove cleanup on error.
+func encodeToTempFile(tempPath string, encodeOpts backup.EncodeOptions) (backup.EncodeResult, error) {
+	tempFile, err := os.Create(tempPath) //nolint:gosec // operator-supplied path


Create snapshot artifacts owner-only

When the encoder runs with a typical umask like 022, os.Create creates the temporary FSM as mode 0666 masked to 0644, and that same file becomes the published <output> after the rename. Because the .fsm contains the full KV dataset, this exposes backup contents to other local users on multi-user backup hosts while the encode is running and after publish; the decoded adapter writers in internal/backup/open_nofollow_* use 0600 for this reason. Please create the temp output with an explicit owner-only mode before renaming it.

Useful? React with 👍 / 👎.

… unify writer types Mandatory (claude v3): ErrSelfTestLowerLastCommitTS var comment and EncodeSnapshot godoc both claimed the HLC ceiling check is enforced in the library. It is NOT. The check lives in the CLI's resolveLastCommitTS BEFORE EncodeSnapshot is called. A future library caller (Phase 1 live extractor, integration test) reading the godoc would assume EncodeSnapshot guards against a low timestamp and be surprised when it silently stamps the low value into the .fsm. Both comments updated to say: - The error is the sentinel callers should return after their OWN manifest comparison. - EncodeSnapshot does NOT read MANIFEST.json or validate opts.LastCommitTS against any external floor. Non-blocking observation (claude v3): sha256Writer and teeWriter were structurally identical (both {w io.Writer; h hash.Hash} with the same Write body). teeWriter removed; encodeBuffered now uses newSHA256Writer for the temp-file tee. One less type to audit.

bootjp · 2026-06-01T17:01:51Z

v4 (a357279): addresses claude v3.

Mandatory: ErrSelfTestLowerLastCommitTS var comment + EncodeSnapshot godoc claimed the HLC ceiling check is enforced in the library — it isn't (the check lives in the CLI's resolveLastCommitTS BEFORE EncodeSnapshot is called). Both docs corrected to say the error is the sentinel callers should return after their OWN manifest comparison, and EncodeSnapshot does NOT read MANIFEST.json or validate against any external floor.

Non-blocking observation: Unified the duplicated sha256Writer + teeWriter types. Both had identical {w io.Writer; h hash.Hash} structure and Write bodies. teeWriter removed; encodeBuffered now uses newSHA256Writer for the temp-file tee. One less type to audit.

@claude review

claude · 2026-06-01T17:02:05Z

Claude finished @bootjp's task in 2m 35s —— View job

Code Review - PR #904 (v4)

Tasks

Gather context and understand the request
Read changed files (v4 commit a357279)
Verify mandatory v3 fix: ErrSelfTestLowerLastCommitTS + EncodeSnapshot doc corrections
Verify non-blocking fix: teeWriter removed, encodeBuffered uses newSHA256Writer
Check for new issues introduced in v4
Post final review

v4 Verification — All v3 Items Confirmed

Finding	Status
Mandatory v3 fix: `ErrSelfTestLowerLastCommitTS` var comment	✅ Changed from "is returned by EncodeSnapshot when the effective LastCommitTS is below the manifest's value" to "is the sentinel callers should return when…"; added explicit statement that `EncodeSnapshot` does NOT read `MANIFEST.json` or enforce this floor
Mandatory v3 fix: `EncodeSnapshot` func comment	✅ Changed from "Returns ErrSelfTestLowerLastCommitTS when opts.LastCommitTS is below the manifest's value" to "EncodeSnapshot does NOT read MANIFEST.json and does NOT validate opts.LastCommitTS against any external floor"; correctly identifies `resolveLastCommitTS` (CLI only) as the check site
Non-blocking: `teeWriter` removed	✅ Struct + methods fully deleted; `encodeBuffered` now calls `newSHA256Writer(tempFile)` and `hashTee.Sum()` — semantically identical to the removed `var sha [32]byte; copy(sha[:], hasher.Sum(nil))` pattern, one less type to audit

One Remaining Issue — `encodeInfoFilePerm` is dead code; sidecar created `0644`

cmd/elastickv-snapshot-encode/main.go:56:

encodeInfoFilePerm = 0o600

This constant is defined but never used. Both file-creation sites still call os.Create, which ignores the permission argument:

// encodeToTempFile (line 323) — temp FSM, renamed to <output>
tempFile, err := os.Create(tempPath)

// writeSidecar (line 407) — ENCODE_INFO.json
f, err := os.Create(sidecarPath)

os.Create opens with mode 0666 and relies on the process umask. With a typical umask of 022 this gives 0644 — the full KV dataset is world-readable on a multi-user backup host while encoding and after publication. mismatchTxtPerm = 0o600 IS wired to os.WriteFile for mismatch.txt, but encodeInfoFilePerm is silently a no-op.

The Codex v3 review flagged the temp FSM side of this as P2. The dead constant shows the sidecar side was intended to be fixed too but wasn't applied.

Fix both call sites:

// encodeToTempFile
tempFile, err := os.OpenFile(tempPath, os.O_CREATE|os.O_WRONLY|os.O_TRUNC, 0o600)

// writeSidecar — now encodeInfoFilePerm is no longer dead
f, err := os.OpenFile(sidecarPath, os.O_CREATE|os.O_WRONLY|os.O_TRUNC, encodeInfoFilePerm)

Fix this →

Strengths in v4

The ErrSelfTestLowerLastCommitTS doc correction is precise: the rewrite distinguishes between "what this sentinel means" (caller-defined HLC regression) and "who enforces it" (the CLI's resolveLastCommitTS, not EncodeSnapshot). Future library callers reading the godoc now get an accurate contract.
Unifying teeWriter into sha256Writer eliminates the only meaningful duplication left in the encoder. The hashTee.Sum() call is cleaner than the previous var sha [32]byte; copy(sha[:], hasher.Sum(nil)) pattern and leaves a single canonical tee-with-hash type in the package.
encodeBuffered is now visibly simpler: three lines (newSHA256Writer, WriteTo, Sum) versus five with the old hasher + teeWriter split.

Verdict: One fix before merge: os.Create → os.OpenFile at both file-creation sites so the FSM and sidecar are 0600 rather than 0644, and encodeInfoFilePerm becomes live code. All v3 findings are correctly addressed and the codebase is otherwise clean.
| Branch

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a3572798b7

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-01T17:07:27Z

+	}
+
+	b := newSnapshotBuilder(opts.LastCommitTS)
+	enabled, err := runAdapterEncoders(b, opts)


Reject zero-adapter library encodes

When an in-process caller passes the zero AdapterSet (for example, a future Phase 1 caller forgets to thread adapter selection), runAdapterEncoders returns an empty list with no error and the subsequent write emits a valid header-only .fsm, silently discarding any adapter subtrees under InputRoot. The CLI parser now rejects empty selections, but this public library entrypoint still has the same fail-open behavior for direct callers, so it should fail before writing when no adapters are enabled.

Useful? React with 👍 / 👎.

Claude v4 mandatory finding: encodeInfoFilePerm = 0o600 was declared but dead code. Both file-creation sites (encodeToTempFile and writeSidecar) used os.Create, which honors umask and typically gives 0644. On a multi-user backup host, the full KV dataset (.fsm) and the sidecar (which carries source path, cluster_id, and SHA-256) would have been world-readable. Replaced both os.Create calls with os.OpenFile + explicit encodeInfoFilePerm: - encodeToTempFile: temp .fsm is now 0o600 from creation, so the on-disk dataset is never world-readable during the encode-then- rename window. - writeSidecar: ENCODE_INFO.json gets the same 0o600. New TestCLIPublishesFsmAndSidecarMode0600 asserts both files have no group/other access bits (`perm & 0o077 == 0`) after a successful run. Skips on Windows where Unix-style mode bits aren't meaningful. Codex v3 had also flagged the temp FSM side at P2 in passing; that's now addressed alongside the sidecar. Caller audit: both call sites updated in this same file; no other os.Create calls remain in the encoder CLI. The `0o600` constant is now live code.

bootjp · 2026-06-01T17:20:25Z

v5 (0c223d2): addresses claude v4 mandatory.

Mandatory (claude v4): encodeInfoFilePerm = 0o600 was dead code; both os.Create call sites (encodeToTempFile + writeSidecar) relied on umask and gave 0644 in practice. On a multi-user backup host the .fsm + sidecar would have been world-readable.

Both replaced with os.OpenFile + encodeInfoFilePerm. The constant is now live code. New TestCLIPublishesFsmAndSidecarMode0600 asserts perm & 0o077 == 0 on both files. Skips on Windows.

Caller audit: both sites updated; no other os.Create calls remain in the encoder CLI.

@claude review

claude · 2026-06-01T17:20:44Z

Claude finished @bootjp's task in 2m 29s —— View job

Code Review - PR #904 (v5)

Tasks

Gather context and understand the request
Read changed files (v5 commit 0c223d2)
Verify mandatory v4 fix: os.Create → os.OpenFile at both sites with 0o600
Verify new test TestCLIPublishesFsmAndSidecarMode0600
Check for any new issues introduced in v5
Post final review

v5 Verification — Mandatory v4 Fix Confirmed

Finding	Status
Mandatory v4 fix: `encodeToTempFile` — temp FSM world-readable	✅ `os.OpenFile(tempPath, os.O_CREATE\|os.O_WRONLY\|os.O_TRUNC, encodeInfoFilePerm)` at `main.go:325`
Mandatory v4 fix: `writeSidecar` — sidecar world-readable	✅ `os.OpenFile(sidecarPath, os.O_CREATE\|os.O_WRONLY\|os.O_TRUNC, encodeInfoFilePerm)` at `main.go:412`
`encodeInfoFilePerm = 0o600` is now live code	✅ Used at both `os.OpenFile` call sites; no longer dead
New test `TestCLIPublishesFsmAndSidecarMode0600`	✅ Skips on Windows; asserts `perm & 0o077 == 0` for both `out` (the `.fsm`) and `out + ".encode_info.json"`

Permission chain is correct end-to-end: the temp file is created 0o600 → os.Rename preserves permissions on POSIX → the published .fsm arrives at cfg.outputPath with 0o600. The test correctly checks the final path, not the transient temp path.

The library-level self-test buffer (encode_snapshot.go:194) uses os.CreateTemp, which creates files mode 0600 by default on Unix — no issue there.

No remaining os.Create calls in either encoder file that would allow umask to widen permissions.

No New Issues Introduced in v5

The change is minimal and scoped to exactly what v4 flagged: two os.Create call sites replaced with os.OpenFile and one new test.

Carry-over Observation (pre-existing, non-blocking)

EncodeSnapshot does not guard against zero AdapterSet at the library level

internal/backup/encode_snapshot.go:144 — when called directly with opts.Adapters == AdapterSet{}, runAdapterEncoders returns an empty enabled slice with no error, producing a valid header-only .fsm. The CLI parser (parseAdapterSet) guards against this, but a future in-process caller who forgets to thread adapter selection would get a silent empty-restore artifact. The codex v5 review also flagged this (P2). Not introduced by v5; not a blocker for merge, but worth tracking.

Strengths in v5

encodeInfoFilePerm is now the single source of truth for file permissions across both the FSM and sidecar — if the permission is ever revisited, one constant update covers both sites.
The Windows skip in TestCLIPublishesFsmAndSidecarMode0600 uses runtime.GOOS == "windows" (evaluated once at package init via var isWindows), which is cleaner than an in-test runtime.GOOS call inside the table.
The test assertion perm&0o077 != 0 is deliberately loose: it accepts 0o400 or 0o600 (e.g., a filesystem that strips the owner-write bit) while still rejecting anything with group or other bits.

Verdict: Ready to merge. The mandatory v4 fix is correctly implemented and pinned by the new test. No new issues in v5.
| Branch

… v5 + codex v5) Carry-over flagged in both v5 reviews: EncodeSnapshot at the library layer accepted opts.Adapters == AdapterSet{} and produced a valid header-only .fsm with no adapter records. The CLI's parseAdapterSet already rejects this for flag-driven entry, but a future in-process caller (Phase 1 live extractor, integration test) that forgets to thread the adapter set would get a silently empty restore artifact. Fail-closed guard added at the top of EncodeSnapshot alongside the existing InputRoot / out-nil checks. Pinned by TestEncodeSnapshotRejectsZeroAdapterSet which also asserts no bytes are written to out on rejection. Claude v5 explicitly said the existing fix was "Ready to merge" and this was non-blocking; folding in for completeness since the guard is a 4-line addition and the test is independent. Caller audit per CLAUDE.md semantic-change rule: EncodeSnapshot's new pre-condition narrows the accepted input domain. All existing callers either (a) pass through the CLI's parseAdapterSet (which already rejects empty selections) or (b) are same-package tests that pass an explicit AdapterSet{...} literal. No legitimate caller is impacted.

bootjp · 2026-06-01T17:39:08Z

v6 (b56c0e6): folded in the carry-over non-blocking guard.

Claude v5 already declared "Ready to merge" but flagged the library-level zero-AdapterSet gap (also raised by codex v5). Added the guard at the top of EncodeSnapshot alongside the existing InputRoot / out-nil checks. Pinned by TestEncodeSnapshotRejectsZeroAdapterSet (asserts error AND no bytes written to out).

Caller audit: the new pre-condition narrows accepted input. All existing callers pass through the CLI's parseAdapterSet (already rejects empty) or are same-package tests with explicit literals. No legitimate caller impacted.

@claude review

claude · 2026-06-01T17:39:21Z

Claude finished @bootjp's task in 1m 30s —— View job

Code Review - PR #904 (v6)

Tasks

Gather context and understand the request
Read v6 changes (internal/backup/encode_snapshot.go, encode_snapshot_test.go)
Verify zero-AdapterSet guard implementation
Verify TestEncodeSnapshotRejectsZeroAdapterSet correctness
Check for any new issues introduced in v6
Post final review

v6 Verification — Zero-AdapterSet Guard Confirmed

The change is exactly scoped to what was described: 10 lines to encode_snapshot.go (the guard) and 22 lines to encode_snapshot_test.go (the test). No other files touched.

Item	Status
Guard placed before `newSnapshotBuilder` (no bytes written before rejection)	✅ Lines 151-160, after `out == nil` check
Condition covers all four current adapter fields	✅ `!DynamoDB && !S3 && !Redis && !SQS`
Returns `EncodeResult{}` + non-nil error (zero bytes)	✅
Error message matches package prefix convention	✅ `"backup: EncodeOptions.Adapters has no enabled adapter"`
Comment explains WHY and references existing CLI guard	✅ Credits `parseAdapterSet` + `codex v5 + claude v5 #904`
`TestEncodeSnapshotRejectsZeroAdapterSet` uses explicit `AdapterSet{}`	✅
Test asserts `err != nil` (fail-closed)	✅ `t.Fatalf` on success
Test asserts `buf.Len() == 0` (no bytes on rejection)	✅ The second invariant, matches prior `TestEncodeSnapshotSelfTestDetectsCorruption` pattern

No New Issues Introduced in v6

The guard correctly fires before any snapshotBuilder state is created or any bytes flow to out. All five v5-verified fixes remain intact (the diff is purely additive: the guard block + its test).

One observation carries forward as genuinely non-blocking: if AdapterSet ever gains a fifth adapter field, this guard condition would need updating alongside adapterRunners() and enabledAdapterSubdirs(). Since all three sites are co-located in the same file and the compiler won't help, a future author adding a new adapter should search for these. Not an issue for this PR.

Verdict: Ready to merge. v6 closes the only remaining carry-over from v5 cleanly, with no regressions and no new issues.

| Branch |

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b56c0e65ae

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-01T17:43:23Z

+	result, err := writeAndPublish(cfg, encodeOpts, mismatchPath, logger)
+	if err != nil {
+		return err


Write encode info before returning self-test mismatches

When --self-test produces a real mismatch, writeAndPublish returns errSelfTestMismatch here; this early return skips the writeSidecar call below, so <output>.encode_info.json is never written with self_test.matched=false. The design says mismatch exits should leave both mismatch.txt and provenance, and without the sidecar operators lose the generated SHA/T/adapters needed to diagnose a failed self-test.

Useful? React with 👍 / 👎.

User flagged that I had missed codex reviews on v2 and v6. Re-fetched all bot reviews; two P2 items remained unaddressed. ## Codex P2 v2: enforce manifest TS floor in EncodeSnapshot The CLI's resolveLastCommitTS already rejects --last-commit-ts T < manifest.last_commit_ts, but the library-level EncodeSnapshot accepted ANY opts.LastCommitTS, including one below the manifest's floor. A future in-process caller (Phase 1 live extractor, integration tests) that bypasses the CLI could silently lower the restored HLC ceiling, letting a post-restart leader re-issue a read ts <= a restored row's commit ts. Earlier v4 fix was doc-only (godoc said "EncodeSnapshot does NOT read the manifest"). That's accurate but pushes the responsibility to callers. Codex's suggestion to enforce in the library is the fail-closed answer. Fix: EncodeOptions gains ManifestLastCommitTS uint64. EncodeSnapshot fails-closed with ErrSelfTestLowerLastCommitTS when LastCommitTS < ManifestLastCommitTS (both > 0 — synthetic test fixtures opt out by leaving ManifestLastCommitTS at 0). CLI's buildEncodeOptions now threads manifest.LastCommitTS into the field. Pinned by TestEncodeSnapshotRejectsLowManifestFloor (rejection + no bytes written) and TestEncodeSnapshotManifestFloorOptOut (opt-out path still works). ## Codex P2 v6: write encode_info.json before returning self-test mismatches The old writeAndPublish returned errSelfTestMismatch on mismatch, which skipped the writeSidecar call. The design says mismatch should leave BOTH <output>.mismatch.txt AND <output>.encode_info.json (with self_test.matched=false) for diagnostics. Operators need the sidecar's SHA256, effective T, adapters_enabled list to triage a failed self-test. Fix: encodeOne now writes the sidecar whenever the encode itself ran (publishErr == nil || errors.Is(publishErr, errSelfTestMismatch)), and only after that returns the original error. Stale sidecar cleanup at the start of every run (was: only mismatch.txt was cleaned). ## Caller audit per CLAUDE.md semantic-change rule - EncodeSnapshot gained a pre-condition (ManifestLastCommitTS floor). All in-tree callers either thread manifest.LastCommitTS through the new field (CLI) or use ManifestLastCommitTS=0 (existing tests). No legitimate caller is impacted. - encodeOne's control flow changed: sidecar is now written EVEN on mismatch. The mismatch error still propagates; downstream callers (run() in main, tests) see the same error contract. Sole consumer is run() which maps errSelfTestMismatch to exitDataErr. Tests + lint green.

bootjp · 2026-06-02T04:20:55Z

v7 (9f69295): addresses two missed codex P2 reviews.

ユーザー指摘の通り、v2 と v6 の codex review を見落としていました。

Codex P2 v2 (manifest TS floor in EncodeSnapshot): Library-level enforcement was missing. v4 only fixed the godoc to say "EncodeSnapshot does NOT enforce" — accurate but pushed responsibility to callers. Now EncodeOptions.ManifestLastCommitTS (opt-in via non-zero) makes EncodeSnapshot fail-closed when LastCommitTS < ManifestLastCommitTS. CLI threads manifest.LastCommitTS automatically. Pinned by TestEncodeSnapshotRejectsLowManifestFloor + TestEncodeSnapshotManifestFloorOptOut.

Codex P2 v6 (sidecar on self-test mismatch): writeAndPublish returned errSelfTestMismatch which skipped writeSidecar. Design says mismatch should leave BOTH mismatch.txt AND encode_info.json (with matched:false) for diagnostics. encodeOne now writes the sidecar whenever the encode actually ran (publishErr == nil || errors.Is(publishErr, errSelfTestMismatch)), then propagates the original error.

Caller audit: EncodeSnapshot gained a pre-condition (zero-AdapterSet was added earlier, manifest floor is the new one). All in-tree callers either thread the manifest value (CLI) or use opt-out (ManifestLastCommitTS=0 for synthetic test fixtures). No legitimate caller impacted.

@claude review

claude · 2026-06-02T04:21:10Z

Claude finished @bootjp's task in 3m 59s —— View job

Code Review - PR #904 (v7)

Tasks

Gather context and understand the request
Read changed files (v7 commit 9f69295)
Verify Fix 1: Codex P2 v2 — library-level manifest TS floor (ManifestLastCommitTS)
Verify Fix 2: Codex P2 v6 — sidecar written on self-test mismatch
Check for any new issues introduced in v7
Post final review

v7 Verification — Both Codex P2 Fixes Confirmed

Finding	Status
Codex P2 v2: library-level manifest TS floor missing	✅ `validateEncodeOptions` enforces `LastCommitTS >= ManifestLastCommitTS` when `ManifestLastCommitTS > 0`; CLI threads `manifest.LastCommitTS` via `buildEncodeOptions`
Codex P2 v6: sidecar skipped on `errSelfTestMismatch`	✅ `encodeOne` now calls `writeSidecar` when `publishErr == nil \|\| errors.Is(publishErr, errSelfTestMismatch)`; sidecar written with `matched: false` on mismatch
Stale sidecar cleanup on re-run	✅ `_ = os.Remove(backup.EncodeInfoSidecarPath(cfg.outputPath))` before encode ensures stale prior sidecar is removed
New tests for P2 v2	✅ `TestEncodeSnapshotRejectsLowManifestFloor` + `TestEncodeSnapshotManifestFloorOptOut`
`validateEncodeOptions` extracted as helper	✅ Keeps `EncodeSnapshot` under cyclop threshold; logical grouping with comment

The dual-enforcement analysis: resolveLastCommitTS in the CLI pre-screens (returns error before buildEncodeOptions is called), so the effectiveTS passed to EncodeSnapshot satisfies effectiveTS >= manifest.LastCommitTS. With ManifestLastCommitTS = manifest.LastCommitTS, validateEncodeOptions's < check is satisfied. No double-fire at CLI entry; the library check is exclusively defense-in-depth for direct callers.

One New Issue in v7

ErrSelfTestLowerLastCommitTS var comment contradicts the implementation it lives above

internal/backup/encode_snapshot.go:23-28:

// EncodeSnapshot itself does NOT read MANIFEST.json or enforce this
// floor — the comparison happens in the CLI's resolveLastCommitTS
// BEFORE EncodeSnapshot is called, and a future library caller (Phase 1
// live extractor, integration tests) must perform its own comparison
// and return this sentinel on regression so callers can errors.Is on
// it to map to the right exit code (claude v3 doc bug #904).

This was accurate after v4's doc fix, but v7 changed the contract: validateEncodeOptions (called from EncodeSnapshot) now DOES enforce the floor when ManifestLastCommitTS > 0, and EncodeSnapshot itself now RETURNS this error. Three things in the same file now contradict this var-level comment:

The EncodeSnapshot function doc at line 148-156 correctly says "it WILL enforce a floor on opts.LastCommitTS when the caller threads the manifest value through opts.ManifestLastCommitTS"
The ManifestLastCommitTS field doc at line 58 correctly says "EncodeSnapshot fails-closed with ErrSelfTestLowerLastCommitTS"
validateEncodeOptions:181 actually wraps and returns ErrSelfTestLowerLastCommitTS

The fix is to update lines 23-28 to say the error is now returned by both the CLI (resolveLastCommitTS) and the library (validateEncodeOptions when ManifestLastCommitTS > 0):

// EncodeSnapshot itself does NOT read MANIFEST.json, but when the caller
// sets opts.ManifestLastCommitTS > 0, validateEncodeOptions enforces this
// floor and returns ErrSelfTestLowerLastCommitTS before any bytes are
// written. The CLI's resolveLastCommitTS provides the same check for
// flag-driven entry; library callers SHOULD set ManifestLastCommitTS to
// the manifest value for defense-in-depth (codex P2 v2 + claude v3 #904).

Fix this →

Observation (non-blocking)

TestCLISelfTestMismatchWritesSidecarWithMatchedFalse covers stale-sidecar cleanup, not the actual P2 v6 fix

The test exercises the manifest-floor rejection path (resolveLastCommitTS failing before writeAndPublish is called), so the sidecar is never written — the test correctly asserts os.IsNotExist. This confirms the stale-sidecar cleanup works, but does not verify that writeSidecar is called with matched: false when writeAndPublish actually returns errSelfTestMismatch.

The codex P2 v6 fix (the new if publishErr == nil || errors.Is(publishErr, errSelfTestMismatch) block at line 254) is correct in its logic but untested at the CLI integration level. The comment acknowledges this and defers to TestEncodeSnapshotSelfTestDetectsCorruption at the library level. Non-blocking, but the test name (...WritesSidecarWithMatchedFalse) promises something slightly different from what it delivers — a note in the comment clarifying the sidecar IS NOT expected here (because the encode never ran) would prevent future readers from being confused.

Strengths in v7

The ManifestLastCommitTS field is correctly zero-valued to opt out — no retroactive breakage to existing tests that pass EncodeOptions{} without a manifest reference.
validateEncodeOptions as a standalone function is a good factoring choice: it groups all four pre-condition checks, keeps EncodeSnapshot readable, and provides a clear seam for future additions.
The sidecar-on-mismatch priority logic is correct: sidecar-write errors on the success path propagate as the return error; on the mismatch path they degrade to a warning so errSelfTestMismatch always wins. This matches the stated requirement that sidecar is informational-only on mismatch.
_ = os.Remove(backup.EncodeInfoSidecarPath(cfg.outputPath)) before encode ensures any stale sidecar from a prior successful run is cleared regardless of the current run's outcome — matches the mismatch.txt cleanup pattern at the same site.

Verdict: One doc fix before merge. All functional behavior is correct; both v6 Codex P2 items are correctly implemented. The ErrSelfTestLowerLastCommitTS var comment at lines 23-28 is the only item that needs updating — it still says "EncodeSnapshot itself does NOT enforce this floor," which is now false after v7.

| Branch |

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9f692950fe

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-02T04:24:39Z

+		{"dynamodb", func(s AdapterSet) bool { return s.DynamoDB }, func(b *snapshotBuilder, root string) error {
+			return errors.Wrap(NewDynamoDBEncoder(root).Encode(b), "dynamodb encoder")


Reject DynamoDB JSONL inputs before encoding

When an input manifest has dynamodb_layout: "jsonl" and --self-test is not enabled (the default), this path still runs the per-item-only DynamoDB reverse encoder. That encoder only walks items/*.json / items/*/*.json, so a decoded tree with items/data-*.jsonl is accepted but all DynamoDB items are skipped while the CLI exits successfully and publishes an .fsm containing only table metadata. Please fail closed on JSONL manifests until the reverse encoder supports that layout, or pass a JSONL-aware mode through this library path.

Useful? React with 👍 / 👎.

claude · 2026-06-02T13:48:02Z

Claude encountered an error —— View job

I'll analyze this and get back to you.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a2ed548f4b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-02T13:52:11Z

+	// attacker pre-placing a symlink at <output>.encode_info.json
+	// could redirect the truncate-and-write into a target of their
+	// choosing (codex P2 v25 #904).
+	f, err := backup.OpenSidecarFile(sidecarPath)


Preserve 0600 mode when replacing sidecars

When <output>.encode_info.json already exists with a permissive mode (for example a 0644 sidecar created by an older encoder under umask 022), this open/truncate path does not change its permissions; the 0600 mode in openSidecarFile only applies when the file is newly created. A successful re-encode therefore leaves the freshly written sidecar group/other-readable despite the owner-only contract and the fresh-file test, so this path should chmod after opening or publish via a new 0600 temp file and rename.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-02T13:52:11Z

+		{"s3", func(s AdapterSet) bool { return s.S3 }, func(b *snapshotBuilder, root string) error {
+			return errors.Wrap(NewS3RecordEncoder(root).Encode(b), "s3 encoder")
+		}},


Reject non-directory S3 bucket entries

When an enabled S3 dump is malformed or partially truncated so s3/<bucket> is a regular file or symlink, this dispatch still succeeds because S3RecordEncoder.Encode skips top-level entries where !ent.IsDir() (internal/backup/encode_s3.go:85-87). Since the manifest currently has only an empty S3 scope, the CLI cannot otherwise detect that bucket was expected, and it can publish a partial .fsm with the bucket silently omitted; the library path should fail closed on non-directory entries under selected adapter roots before encoding.

Useful? React with 👍 / 👎.

bootjp · 2026-06-02T13:55:30Z

@claude review

Re-triggering v31 — the original job at https://github.com/bootjp/elastickv/actions/runs/26824025875 errored on the bot side. v31 is doc-only (commit a2ed548); addresses the three doc gaps claude v30 flagged:

validateAdaptersAgainstManifest godoc trimmed to the nil-pointer-only check (v30 retract).
errAdapterNotInManifest var comment trimmed to the nil-scope case only.
classifyEncodeError Sources list gains the errAdapterNotInManifest bullet (was missing since v27).

No behavior change. CI all green (build/test/golangci-lint/CodeQL/CodeRabbit SUCCESS).

claude · 2026-06-02T13:55:50Z

Claude finished @bootjp's task in 2m 5s —— View job

Code Review - PR #904 (v31)

Tasks

Gather context and understand the request
Read v31 changes (cmd/elastickv-snapshot-encode/main.go)
Verify Fix 1: validateAdaptersAgainstManifest godoc trimmed to nil-pointer check only
Verify Fix 2: errAdapterNotInManifest var comment trimmed to nil-scope case only
Verify Fix 3: classifyEncodeError Sources list gains errAdapterNotInManifest bullet
Check for any new issues introduced in v31
Post final review

v31 Verification — All Three v30 Doc Fixes Confirmed

Item	Status
Fix 1: `validateAdaptersAgainstManifest` godoc — "Two failure modes" → "One failure mode"; scenario-A removed; deferred-enumeration root cause documented; pointer to `checkOneAdapterScope` for per-shape matrix	✅ Lines 139–162
Fix 2: `errAdapterNotInManifest` var comment — second clause ("manifest lists adapter whose subdir is missing") removed; "Both are silent-data-loss" → single scenario B only; historical retraction note for v27–v29 added	✅ Lines 376–389
Fix 3: `classifyEncodeError` summary clause — "adapter scope mismatch with manifest" added to exit-2 category list	✅ Lines 95–96
Fix 3: `classifyEncodeError` Sources list — 9th bullet for `errAdapterNotInManifest` added with correct source (`validateAdaptersAgainstManifest`) and codex citations	✅ Lines 117–120
Switch arms vs. Sources list parity — 10 `errors.Is` arms; 9 bullets (ErrInvalidManifest + ErrUnsupportedFormatVersion share one bullet)	✅ Confirmed correct
Only `cmd/elastickv-snapshot-encode/main.go` modified	✅ Pure doc-only change
No logic, test, or production code touched	✅ 42 additions / 31 deletions, all in comment blocks

All three doc comment blocks are now internally consistent with v30's nil-only checkOneAdapterScope semantics.

Two Codex P2 v31 Observations (pre-existing, non-blocking)

These were surfaced by Codex reviewing the v31 codebase — not introduced by v31, but newly raised:

writeSidecar failure leaves a published FSM without provenance

cmd/elastickv-snapshot-encode/main.go:429: When writeSidecar returns an error on the success path (publishErr == nil), encodeOne returns the sidecar error after writeAndPublish has already renamed the FSM into place. The CLI exits non-zero but a restore-visible .fsm exists without its sidecar. The design expects .fsm + sidecar to move together; a sidecar-write failure should either atomically publish both or roll back the FSM. Pre-existing; Codex flagged as P2.

openSidecarFile 0600 not enforced on existing sidecar re-open

cmd/elastickv-snapshot-encode/main.go:733: backup.OpenSidecarFile applies 0600 only when creating a new file. On re-encode, an existing sidecar with wider permissions (e.g. 0644 written by an older encoder) retains its permissions after truncation and rewrite. A chmod after open or a temp-file-rename pattern would enforce the mode on both new and existing sidecars. Pre-existing; Codex flagged as P2.

Verdict: Ready to merge. v31 is a clean doc-only fix that correctly aligns all three comment blocks with v30's nil-only scope check. No new issues introduced. The two Codex P2 observations above are pre-existing and do not block this PR.

| Branch | |

…x/claude P2 v31) Two non-blocking P2 observations claude flagged on its v31 review, both pre-existing but real correctness/security issues. Folding both in. ## Codex/claude P2 v31 (1/2): writeSidecar failure leaves orphan .fsm The encodeOne success path runs writeAndPublish (renames temp .fsm → output) BEFORE writeSidecar. If writeSidecar fails after the rename, the CLI returns non-zero but a restore-visible <output>.fsm exists without its provenance metadata — exactly the orphan state v17 closed for the adapter-error path, but un-closed for the sidecar-write path. Fix: new rollbackOrphanFSMAndSidecar(outputPath, logger) helper. On sidecar-write failure in the success path (publishErr == nil), the caller calls this helper before returning: - os.Remove(outputPath) — drops the just-published .fsm so the restore path sees nothing where the orphan was. - os.Remove(sidecarPath) — drops any partially-written sidecar bytes (OpenSidecarFile may have already Truncate(0)'d before WriteEncodeInfo failed). Both removes log-and-continue on non-ErrNotExist failures so the caller's primary sidecar-write error remains the dominant signal. A prior successful encode at the same output path is unrecoverable — writeAndPublish's os.Rename already overwrote it before writeSidecar ran. The rollback brings the state to "no .fsm, no sidecar," which is the same end state as "encode never ran" — the cleanest consistent outcome without filesystem transactions. Self-test-mismatch path is unchanged: the existing v15 removeStaleOutputFSM already handles that case, and writeSidecar's failure on that path already logger.Warn's-and-continues. Pinned end-to-end by extending TestCLISidecarWriteRefusesSymlinkTarget: - Existing assertions: victim file at the symlink target survives byte-equal; exit 1; symlink itself unlinked. - New v32 assertion: <output>.fsm is also removed (the rollback fired after the .fsm was published but before encodeOne returned). ## Codex/claude P2 v31 (2/2): existing sidecar perm not tightened on re-encode openSidecarFile (unix) opens with O_CREATE + 0o600. The kernel applies the mode arg ONLY when creating a new file; if the path already exists (e.g. an older encoder wrote it at 0o644), the pre-existing perm is preserved. The sidecar carries the source path, cluster_id, and SHA256 of the .fsm — leaving it world-readable on a multi-user backup host is the same leak claude v4 fixed for the create case. Fix: after the regular-file + Nlink checks + Truncate(0), call f.Chmod(sidecarFileMode) to enforce the mode on the open descriptor. New named const sidecarFileMode = 0o600 so a future widening of the flag-arg mode forces the chmod call to be touched too. Windows variant unchanged (different perm model; the openSidecarFile file is already platform-split for this reason). Pinned by TestOpenSidecarFileEnforcesOwnerOnlyMode (unix-only via build tag): writes a pre-existing 0o644 file, calls OpenSidecarFile, asserts the resulting descriptor's Stat reports perm 0o600. ## Caller audit per CLAUDE.md semantic-change rule - encodeOne success-path branch: rollback called only when publishErr == nil AND writeSidecar fails. Self-test-mismatch path unchanged. Sole caller of encodeOne is run(); error return semantics unchanged. - rollbackOrphanFSMAndSidecar: new helper, sole caller is the success-path sidecar-failure branch in encodeOne. No external callers. - openSidecarFile (unix): success path now has one extra Chmod call. All in-package callers (encode_redis.go's multiple KEYMAP / TTL sidecar writers) are unaffected — they all already expect a 0o600 file. CLI's writeSidecar / writeMismatchTxt (via the exported OpenSidecarFile wrapper) are likewise strictly-tighter-perm only. - sidecarFileMode const: new named const, only consumer is the Chmod call in openSidecarFile (unix). Tests + lint green.

bootjp · 2026-06-02T14:05:31Z

@claude review

v32 folds in both P2 observations claude flagged on its v31 review (both pre-existing but worth closing).

v31 P2 (1/2) — writeSidecar failure leaves orphan .fsm: success path runs writeAndPublish (renames temp .fsm → output) BEFORE writeSidecar. If sidecar fails, the published .fsm has no matching provenance. Fix: new rollbackOrphanFSMAndSidecar(outputPath, logger) called on the success-path sidecar-failure branch — removes both the just-published .fsm and any partial sidecar so the state is consistent absent. Self-test-mismatch path unchanged. Pinned by extending TestCLISidecarWriteRefusesSymlinkTarget (the symlink-at-sidecar triggers OpenSidecarFile ELOOP → sidecar failure → rollback fires → .fsm is gone). Victim file still preserved (rollback's os.Remove on the symlink itself, not the target).

v31 P2 (2/2) — OpenSidecarFile mode not enforced on existing files: the unix openSidecarFile opened with O_CREATE + 0o600, but the mode arg only applies on creation; an older encoder's 0o644 sidecar kept its wider perms on re-encode. The sidecar carries source path + cluster_id + SHA256 — same leak surface as the v4 fix for the create case. Fix: post-Truncate(0), call f.Chmod(sidecarFileMode) to tighten. New named const sidecarFileMode = 0o600 so a future widening forces both sites to be touched. Windows variant unchanged (different perm model; already platform-split). Pinned by new unix-build-tagged TestOpenSidecarFileEnforcesOwnerOnlyMode.

Caller audit (CLAUDE.md): encodeOne rollback is gated on success-path-with-sidecar-failure only; other paths unchanged. openSidecarFile (unix) success path adds one Chmod call; all in-package callers expected 0o600 already — strictly tighter, no behavior regression.

go test -race ./internal/backup/... ./cmd/elastickv-snapshot-encode/... + golangci-lint green.

claude · 2026-06-02T14:05:49Z

Claude encountered an error —— View job

I'll analyze this and get back to you.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6f2040858f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-02T14:10:18Z

+	if err := fsyncParentDir(cfg.outputPath); err != nil {
+		return result, errors.Wrap(err, "fsync output dir after rename")


Roll back FSM when directory fsync fails

When the rename succeeds but fsyncParentDir returns an error (for example on a filesystem that rejects directory fsync, or an I/O error), this returns before writeSidecar runs while publishedTempPath has already been cleared. The command exits failed but leaves a new restore-visible .fsm at --output without a matching .encode_info.json, so this path needs the same rollback/consistency handling as sidecar publication failures.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-02T14:10:18Z

+	if srerr := os.Remove(sidecarPath); srerr != nil && !errors.Is(srerr, os.ErrNotExist) {
+		logger.Warn("rollback partial sidecar after write failure", "err", srerr)


Avoid deleting pre-existing sidecar directories

When sidecar publication fails because <output>.encode_info.json is a pre-existing non-regular entry such as an empty directory or FIFO, OpenSidecarFile correctly refuses to write it, but this rollback then calls os.Remove on that same path even though no partial sidecar was created. That can delete operator-owned filesystem objects merely because the encode failed; only remove a sidecar known to be a regular file/symlink created or truncated by this run, or leave the refused entry in place.

Useful? React with 👍 / 👎.

bootjp · 2026-06-02T14:11:27Z

@claude review

Re-triggering v32 — original job errored. The build CI flake (Docker Hub timeout fetching buildkit) has been re-run and is in progress; all other v32 CI checks are green (test, golangci-lint, CodeQL, CodeRabbit, Analyze-actions/js/go all SUCCESS or NEUTRAL).

v32 (commit 6f20408): two pre-existing P2 fixes — rollback orphan FSM + chmod existing sidecar to 0o600. No new logic; two new tests + one extended test.

claude · 2026-06-02T14:11:49Z

Claude encountered an error —— View job

I'll analyze this and get back to you.

bootjp · 2026-06-02T14:20:25Z

@claude review

Third re-trigger for v32 (prior two attempts errored with bot-side infrastructure issues). All v32 CI checks are now SUCCESS (build re-run cleared, Analyze-go completed). Codex remains silent on v32 (likely rate-limited).

v32 (commit 6f20408) = the two non-blocking fixes you flagged on v31:

rollbackOrphanFSMAndSidecar on success-path sidecar failure (no orphan .fsm).
openSidecarFile (unix) chmod 0o600 enforcement on existing files.

No new logic beyond your own v31 guidance.

claude · 2026-06-02T14:20:50Z

Claude encountered an error —— View job

I'll analyze this and get back to you.

Codex's review window came back online after the rate-limit pause and posted three P2 findings against v30, v31, and v32 that I had missed (the inline comments were attached to PR reviews 4410223887, 4410270684, 4410442057 but the review BODIES were just the standard "Codex Review" header — the actual findings were in the per-review inline comments). ## Codex P2 v31 (L640): S3 encoder silently skips non-directory bucket entries S3RecordEncoder.Encode does `if !ent.IsDir() { continue }` for top-level entries under `s3/`. When a partially-truncated dump leaves a regular file or symlink where a bucket directory should be, the encoder publishes a partial .fsm with the affected bucket silently omitted. The deferred-enumeration empty S3 scope in the manifest cannot otherwise flag the missing data. Fix: encode_s3.go's loop now distinguishes: - Reserved-prefix entries ("_*") → continue (forward compat with `_incomplete_uploads/` and `_orphans/`, which are handled by ErrEncodeUnsupportedS3IncompleteUploads / ErrEncodeUnsupportedS3Orphans). - Any other non-directory entry → return ErrS3EncodeNotRegular with the path and mode (matches the existing _bucket.json non- regular guard). Pinned by TestS3EncodeRejectsNonDirectoryBucketEntry and TestS3EncodeIgnoresReservedPrefixEntry. ## Codex P2 v32 (L642): rollback FSM when parent-dir fsync fails v25 added fsyncParentDir after os.Rename in writeAndPublish to make the rename durable. If fsync fails (e.g., on a filesystem that rejects directory fsync, or transient I/O error), writeAndPublish returned the error but the renamed .fsm stayed at --output — the operator sees a "successful" .fsm that may not survive a crash. Fix: extract the rename + fsync + rollback block into a new publishAndFsync helper that os.Remove's the .fsm on fsync failure (same rollback semantics as the v32 sidecar-failure path). publishAndFsync also brings writeAndPublish under the cyclop bound. No new test — the failure path is hard to drive deterministically (would need fsync syscall injection). The success path is exercised by every existing CLI round-trip test; the rollback branch is inspectable by code review against the matching v32 sidecar rollback pattern. ## Codex P2 v32 (L520): rollback must not delete operator-owned sidecar entries v32's rollbackOrphanFSMAndSidecar called os.Remove on the sidecar path unconditionally. When the sidecar path was a pre-existing non-regular entry that OpenSidecarFile correctly refused to clobber (operator-placed symlink, FIFO, empty directory, etc.), the rollback would destructively unlink it merely because the encode failed. Fix: rollbackOrphanFSMAndSidecar now os.Lstats the sidecar path and only os.Removes when info.Mode().IsRegular(). Symlinks (operator- placed), FIFOs, directories, and other non-regular entries are logged-and-skipped. Pinned end-to-end by extending TestCLISidecarWriteRefusesSymlinkTarget: - Existing assertions: victim file at the symlink target survives byte-equal; .fsm at --output is removed by v32 rollback. - NEW v33 assertion: the operator-placed symlink at the sidecar path is PRESERVED (os.Lstat + ModeSymlink check). The assertions are extracted into assertSymlinkSidecarRollbackInvariants to keep the test under the cyclop bound. ## Caller audit per CLAUDE.md semantic-change rule - S3RecordEncoder.Encode: was "silent skip on non-directory top- level entries"; now "fail closed with ErrS3EncodeNotRegular unless name starts with `_`". Sole effect on legitimate dumps is no behavior change (all canonical bucket dirs are directories). Malformed dumps that previously silently half-encoded now fail closed; classifyEncodeError routes via the v9 errors.Mark + ErrEncodeAdapterData layer to exit 2. - publishAndFsync (new helper): sole caller is writeAndPublish. Behavior change ONLY on fsync failure (was: leave .fsm in place; now: roll back). Success path unchanged. - rollbackOrphanFSMAndSidecar: sidecar removal is now gated on IsRegular(). Was destructive on operator-owned non-regular entries; now leaves them alone. Sole caller is encodeOne's success-path sidecar-failure branch. - assertSymlinkSidecarRollbackInvariants (test helper): no production callers; only used by TestCLISidecarWriteRefusesSymlinkTarget. Tests + lint green (incl. gofmt on the new helper's comment block).

bootjp · 2026-06-02T14:45:08Z

@claude review

v33 closes three codex P2 findings I missed across v30/v31/v32 (codex posted them as PR-review inline comments rather than separate review-body posts, so they didn't surface in my initial scans — found them by querying /reviews/<id>/comments for each recent codex review).

codex P2 v31 L640 — S3 silently skips non-directory bucket entries: S3RecordEncoder.Encode did if !ent.IsDir() { continue }. A truncated dump with a regular file or symlink where a bucket dir should be would silently omit that bucket. Fix: fail-closed with ErrS3EncodeNotRegular; reserved-prefix _* entries (forward compat with _incomplete_uploads/, _orphans/) still skip. Pinned by two new library tests.

codex P2 v32 L642 — rollback FSM when parent-dir fsync fails: v25's fsyncParentDir runs after os.Rename. If fsync fails, the orphan .fsm survives. Fix: extracted publishAndFsync helper that rolls back the rename on fsync failure (mirrors v32's sidecar-failure rollback).

codex P2 v32 L520 — don't os.Remove operator-owned sidecar entries: v32's rollbackOrphanFSMAndSidecar unconditionally os.Removed the sidecar path. If it was a pre-existing symlink/FIFO/directory that OpenSidecarFile correctly refused to clobber, the rollback would destroy operator-owned filesystem objects. Fix: os.Lstat first; only remove if IsRegular(). Pinned by extending TestCLISidecarWriteRefusesSymlinkTarget to assert the operator-placed symlink survives.

Caller audit (CLAUDE.md): S3 encoder fail-closed flows through runAdapterEncoders' errors.Mark(ErrEncodeAdapterData) → exit-2; publishAndFsync is a pure extraction (only behavior change is rollback on fsync fail); rollback's Lstat gate is strictly safer.

go test -race ./internal/backup/... ./cmd/elastickv-snapshot-encode/... + golangci-lint green.

claude · 2026-06-02T14:45:31Z

Claude encountered an error —— View job

I'll analyze this and get back to you.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5adca7f679

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-02T14:50:32Z

+		}
+		return
+	}
+	if !info.Mode().IsRegular() {


Preserve hard-linked sidecars on rollback

When the sidecar path is a pre-existing hard link, OpenSidecarFile refuses it before truncating, but this rollback treats it as a regular file and the following os.Remove unlinks it anyway. In a shared backup directory this means a failed encode can delete an operator-owned sidecar link that the no-clobber guard intentionally left untouched; check the link count (or track whether this run actually opened/truncated the file) before removing it.

Useful? React with 👍 / 👎.

bootjp · 2026-06-02T14:50:32Z

@claude review

Re-triggering v33 — prior bot job errored. v33 (commit 5adca7f) closes three codex P2 inline findings on v30/v31/v32:

S3 encoder fail-closed on non-directory top-level entries (codex v31 L640).
publishAndFsync rollback on parent-dir fsync failure (codex v32 L642).
rollbackOrphanFSMAndSidecar Lstat-gates the sidecar removal so operator-owned non-regular entries are preserved (codex v32 L520).

CI on v33 currently green: build/test/golangci-lint/CodeQL/Analyze all SUCCESS; ubuntu-latest test + Analyze-go still in progress.

coderabbitai

🧹 Nitpick comments (2)

internal/backup/open_nofollow_unix_test.go (1)

20-24: 💤 Low value

Consider verifying the seeded file actually has broader permissions.

The test seeds a file at 0o644 but doesn't verify that permission was honored before testing the enforcement. On systems with restrictive umasks or filesystems, WriteFile might silently produce 0o600, causing the test to pass even if the Chmod logic is broken.

♻️ Proposed robustness check (pattern from encryption test)

 	if err := os.WriteFile(path, []byte("prior"), 0o644); err != nil { //nolint:gosec // test simulates legacy permissive sidecar
 		t.Fatalf("WriteFile: %v", err)
 	}
+	// Verify environment honored the broader mode before testing enforcement.
+	st, err := os.Stat(path)
+	if err != nil {
+		t.Fatalf("Stat seeded file: %v", err)
+	}
+	if st.Mode().Perm()&0o077 == 0 {
+		t.Skipf("environment refused permissive seed (mode=%o)", st.Mode().Perm())
+	}
 	f, err := OpenSidecarFile(path)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/backup/open_nofollow_unix_test.go` around lines 20 - 24, After
seeding the sidecar file with os.WriteFile(..., 0o644), immediately stat the
file with os.Stat and assert the file mode's permissions (info.Mode().Perm())
equal 0o644; if not, fail the test (t.Fatalf) so the test doesn't silently pass
on systems that applied a restrictive umask or different FS behavior. Locate the
seed call in the test (the os.WriteFile invocation) and add the os.Stat +
Mode().Perm() check right after it to ensure the precondition for the
chmod-enforcement logic is actually met.

internal/backup/open_nofollow_unix.go (1)

61-61: ⚡ Quick win

Use sidecarFileMode constant in the OpenFile call.

The comment on line 116 states "a future edit that widens one must touch both," but line 61 still uses the literal 0o600. Replacing it with sidecarFileMode would mechanically enforce synchronization rather than relying on manual coordination.

♻️ Proposed fix to use the constant consistently

-	f, err := os.OpenFile(path, flag, 0o600) //nolint:gosec,mnd // path is composed from output-root + fixed file name; 0600 is the standard owner-only mode
+	f, err := os.OpenFile(path, flag, sidecarFileMode) //nolint:gosec,mnd // path is composed from output-root + fixed file name; sidecarFileMode is the standard owner-only mode

Also applies to: 114-118

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/backup/open_nofollow_unix.go` at line 61, Replace the hard-coded
file mode 0o600 in the os.OpenFile call with the sidecarFileMode constant so the
mode is maintained centrally; update the call that currently reads
os.OpenFile(path, flag, 0o600) to use sidecarFileMode (referencing the same
variables path and flag), and grep this file for any other literal 0o600 usages
(e.g., the similar open/create calls later in the file) and replace them with
sidecarFileMode to ensure consistency.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@internal/backup/open_nofollow_unix_test.go`:
- Around line 20-24: After seeding the sidecar file with os.WriteFile(...,
0o644), immediately stat the file with os.Stat and assert the file mode's
permissions (info.Mode().Perm()) equal 0o644; if not, fail the test (t.Fatalf)
so the test doesn't silently pass on systems that applied a restrictive umask or
different FS behavior. Locate the seed call in the test (the os.WriteFile
invocation) and add the os.Stat + Mode().Perm() check right after it to ensure
the precondition for the chmod-enforcement logic is actually met.

In `@internal/backup/open_nofollow_unix.go`:
- Line 61: Replace the hard-coded file mode 0o600 in the os.OpenFile call with
the sidecarFileMode constant so the mode is maintained centrally; update the
call that currently reads os.OpenFile(path, flag, 0o600) to use sidecarFileMode
(referencing the same variables path and flag), and grep this file for any other
literal 0o600 usages (e.g., the similar open/create calls later in the file) and
replace them with sidecarFileMode to ensure consistency.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 3f511d1d-5715-4bc5-a9b4-21781d99f0ca

📥 Commits

Reviewing files that changed from the base of the PR and between 38f001e and 5adca7f.

📒 Files selected for processing (6)

cmd/elastickv-snapshot-encode/main.go
cmd/elastickv-snapshot-encode/main_test.go
internal/backup/encode_s3.go
internal/backup/encode_s3_test.go
internal/backup/open_nofollow_unix.go
internal/backup/open_nofollow_unix_test.go

🚧 Files skipped from review as they are similar to previous changes (2)

cmd/elastickv-snapshot-encode/main.go
cmd/elastickv-snapshot-encode/main_test.go

claude · 2026-06-02T14:50:52Z

Claude finished @bootjp's task in 4m 42s —— View job

Code Review - PR #904 (v33)

Tasks

Gather context and understand the request
Read changed files (v33 commit 5adca7f)
Review Fix 1: S3 encoder fail-closed on non-directory top-level entries
Review Fix 2: publishAndFsync rollback on parent-dir fsync failure
Review Fix 3: rollbackOrphanFSMAndSidecar Lstat-gate for sidecar removal
Check for any new issues introduced in v33
Post final review

v33 Verification — All Three Codex P2 Fixes Confirmed

Item	Status
Codex P2 v31 L640: `!ent.IsDir()` in S3 encoder was a silent `continue`	✅ Now returns `errors.Wrapf(ErrS3EncodeNotRegular, "s3/%s is not a directory…")` for non-directory, non-`_*` entries
`_*` reserved-prefix entries (forward compat for `_incomplete_uploads/`, `_orphans/`) still skipped	✅ `strings.HasPrefix(name, "_")` → `continue` inside the `!ent.IsDir()` branch
`TestS3EncodeRejectsNonDirectoryBucketEntry`: `stray.txt` under `s3/` → `ErrS3EncodeNotRegular`	✅
`TestS3EncodeIgnoresReservedPrefixEntry`: `_marker` file under `s3/` → no error	✅
Codex P2 v32 L642: `writeAndPublish` did not roll back after fsync failure	✅ `publishAndFsync` extracted; on fsync failure calls `os.Remove(outputPath)` before returning the error
`publishedTempPath` deferred cleanup on fsync-failure path: `os.Remove(tempPath)` → ErrNotExist silently ignored	✅ Deferred `_ = os.Remove(publishedTempPath)` handles ErrNotExist gracefully
Codex P2 v32 L520: `rollbackOrphanFSMAndSidecar` unconditionally removed sidecar path	✅ Now `os.Lstat` first; only removes if `info.Mode().IsRegular()`; non-regular (symlink/FIFO/dir) logged-and-skipped
`assertSymlinkSidecarRollbackInvariants` extended with third assertion: `os.Lstat(sidecarPath)` non-nil AND mode has `ModeSymlink` bit	✅ Pins that the operator-placed symlink is preserved by the v33 Lstat gate

One Minor Doc Nit (non-blocking)

encode_s3.go line 93 cites "codex P2 v32 #904" — should be "codex P2 v31 #904"

The S3 non-directory bucket finding appeared in the codex review of commit a2ed548f (v31 commit, reviewed at 2026-06-02T13:52:11Z). Codex's v32 review (6f2040858f at 2026-06-02T14:10:18Z) covered the fsync and sidecar-rollback findings. The inline comment at line 93 currently reads:

// ...silently skipping would let the encoder publish a partial .fsm with
// the affected bucket omitted (codex P2 v32 #904; the manifest's…

This should read (codex P2 v31 #904; …). Not a blocker — the code behavior is entirely correct; only the attribution round-number is off by one.

Carry-Over Observations (non-blocking, from CodeRabbit v33 review)

open_nofollow_unix_test.go line 22 — seed file permissions not verified before enforcement test

TestOpenSidecarFileEnforcesOwnerOnlyMode seeds a file at 0o644 but doesn't verify that permission was honored before testing the enforcement. On systems with a very restrictive umask, os.WriteFile(..., 0o644) might produce 0o600, causing the test to pass even if Chmod in openSidecarFile is broken. Adding a t.Skipf guard ("environment refused permissive seed") before calling OpenSidecarFile would make the test self-aware of this edge. CodeRabbit flagged this as a low-priority nitpick.

open_nofollow_unix.go line 61 — 0o600 literal instead of sidecarFileMode constant

f, err := os.OpenFile(path, flag, 0o600)

sidecarFileMode is defined at line 118 and is used at line 107 (f.Chmod(sidecarFileMode)). The comment at lines 114–117 explicitly says "a future edit that widens one must touch both" — but if line 61 used the constant, a single-location update would cover both. Replacing the literal with the constant removes the coordination burden. CodeRabbit flagged this as a quick-win nitpick.

Strengths in v33

The _* skip in !ent.IsDir() is correctly nested inside the not-a-directory branch, not at the top-level loop. This means a _incomplete_uploads that IS a directory continues to reach encodeBucket (pre-existing behavior, unchanged), while a regular-file _* entry is skipped rather than failing. The reserved-prefix test (TestS3EncodeIgnoresReservedPrefixEntry) pins this edge directly.
publishAndFsync is cleanly extracted as a standalone function with a single-responsibility scope (rename → fsync → rollback-on-failure), which is why writeAndPublish remained under the cyclop bound. The rollback comment is precise about why os.Remove is safe here: the rename just succeeded, so we know outputPath is a regular file (or doesn't exist if the kernel concurrently removed it).
The rollbackOrphanFSMAndSidecar Lstat gate at line 537 correctly distinguishes "this run left a partial sidecar we should clean up" from "there's operator-owned non-regular content at the sidecar path that OpenSidecarFile correctly refused to touch." The log-and-skip path on non-regular entries is the right behavior — the sidecar-write error already propagates to encodeOne as the dominant signal.
assertSymlinkSidecarRollbackInvariants is a clean extraction that pins all three post-failure invariants (victim unchanged, FSM gone, symlink preserved) in a single named helper. Future tests that exercise similar sidecar-failure paths can reuse it.

Verdict: Ready to merge. All three codex P2 findings are correctly implemented and pinned by tests. The one doc nit (round-number citation in encode_s3.go:93) and the two CodeRabbit carry-overs are non-blocking. All prior rounds' (v1–v32) findings remain intact.

| Branch | |

…odex P2 v33) ## Codex P2 v33: rollback destroys hard-linked sidecars v32's rollbackOrphanFSMAndSidecar gated sidecar removal on info.Mode().IsRegular(). A pre-existing hard link at the sidecar path passes IsRegular(), so the rollback would os.Remove it — destroying an operator-owned link that OpenSidecarFile's Nlink > 1 guard correctly refused to clobber. In a shared backup directory this lets a failed encode unlink files outside the dump tree. ## Fix Track whether OpenSidecarFile actually succeeded on this run. writeSidecar's signature changes from `error` to `(truncated bool, err error)`: - OpenSidecarFile fails (symlink ELOOP, hard-link Nlink>1, FIFO ENXIO, non-regular file, etc.) → return (false, err). Caller MUST NOT remove the sidecar path; the pre-existing entry is operator-owned and was never touched. - OpenSidecarFile succeeds (truncated and chmod'd a fresh or single-link regular file) → return (true, err) on any subsequent failure. Caller rollback removes the partial bytes. rollbackOrphanFSMAndSidecar gains a sidecarTruncated bool parameter. The Lstat + IsRegular check from v33 is removed — the truncated flag is a precise signal of "this run owns the bytes," not a stat-time heuristic that can race with operator changes. ## Pinned by TestCLISidecarWriteRefusesHardLinkTarget (unix-only): pre-plants a victim file with a sentinel body, os.Link's the sidecar path to it, runs the CLI, asserts: - Exit code = exitUserErr. - Victim contents are byte-equal preserved (Nlink guard refused truncation). - Hard link at sidecar path is preserved (sidecarTruncated=false → rollback skips it). - .fsm at outputPath is removed (rollback's FSM branch always fires on sidecar-write failure with successful publish). The v33 TestCLISidecarWriteRefusesSymlinkTarget continues to pass: OpenSidecarFile ELOOP → sidecarTruncated=false → symlink survives. ## CodeRabbit nits folded in 1. `internal/backup/open_nofollow_unix.go:61`: `os.OpenFile(..., flag, 0o600)` now uses the `sidecarFileMode` const (was literal). Keeps the OpenFile mode and the post-Truncate Chmod mode synchronized — a future widening of one is mechanically caught. 2. `internal/backup/open_nofollow_unix_test.go:20-24`: TestOpenSidecarFileEnforcesOwnerOnlyMode now `os.Stat`s the seeded 0o644 file and `t.Skip`s if the environment refused the permissive mode (restrictive umask / FS). Without this, the test could pass spuriously on environments where WriteFile silently produces 0o600. ## Caller audit per CLAUDE.md semantic-change rule - writeSidecar return signature changed from `error` to `(bool, error)`. Sole caller is encodeOne (success-path branch); updated in lock-step to capture both values. The bool flows into rollbackOrphanFSMAndSidecar via a new parameter. - rollbackOrphanFSMAndSidecar gained a sidecarTruncated bool parameter. Sole caller is encodeOne's success-path sidecar-failure branch; updated in lock-step. v33's Lstat + IsRegular check removed (replaced by the precise truncated flag), so non-regular paths are no longer destroyed even by paths the v33 check would have accidentally caught. - openSidecarFile (unix) OpenFile mode arg changed from 0o600 literal to sidecarFileMode const. Same numeric value; no behavior change. All callers (in-package adapter writers + the exported OpenSidecarFile) are unaffected. Tests + lint green.

bootjp · 2026-06-02T15:00:21Z

@claude review

v34 addresses codex P2 v33 (hard-linked sidecars) plus two CodeRabbit nits.

codex P2 v33 — v32/v33's rollback gated on IsRegular(), which is true for hard links. OpenSidecarFile refuses hard links (Nlink > 1) but the rollback would unlink them anyway, destroying operator-owned files outside the dump tree. Fix: track sidecarTruncated bool from writeSidecar. The bool is true iff OpenSidecarFile succeeded — a precise "this run owns the bytes" signal rather than a stat-time heuristic. Rollback only removes the sidecar when truncated; refused entries (symlinks, hard links, FIFOs, etc.) are operator-owned and preserved. Pinned by new TestCLISidecarWriteRefusesHardLinkTarget (unix-only; victim contents byte-equal preserved, hard link survives, .fsm removed).

CodeRabbit nit (quick win) — os.OpenFile in openSidecarFile now uses sidecarFileMode const instead of 0o600 literal; keeps it synchronized with the post-Truncate Chmod.

CodeRabbit nit (low value) — TestOpenSidecarFileEnforcesOwnerOnlyMode now os.Stats the seeded 0o644 file and t.Skips if the environment refused the permissive seed (restrictive umask), so the test doesn't pass spuriously.

Caller audit (CLAUDE.md): writeSidecar signature changed from error to (bool, error); sole caller is encodeOne (updated in lock-step). rollbackOrphanFSMAndSidecar gained sidecarTruncated bool; sole caller updated. v33's Lstat+IsRegular gate replaced by the precise truncated flag, so hard-link / FIFO / dir cases are now correctly preserved.

go test -race ./internal/backup/... ./cmd/elastickv-snapshot-encode/... + golangci-lint green.

claude · 2026-06-02T15:00:44Z

Claude encountered an error —— View job

I'll analyze this and get back to you.

Claude v33 caught: encode_s3.go:93 attributed the non-directory bucket guard to "codex P2 v32 #904" but the finding was actually flagged in codex's review of commit a2ed548 (v31 commit). v32 was when codex flagged the fsync + sidecar-rollback issues. The attribution round-number was off by one. Doc-only one-character fix (v32 → v31). No behavior change; lint green.

bootjp · 2026-06-02T15:02:11Z

@claude review

v35 = doc-only one-character fix per claude v33 nit (encode_s3.go:93 citation v32 → v31).

(Also note: v34 from earlier addresses codex P2 v33 hard-linked sidecar rollback + two CodeRabbit nits; v34's claude bot job errored without producing output, but CI on v34 was all green. v35 stacks on v34.)

go build + golangci-lint green.

claude · 2026-06-02T15:02:34Z

Claude encountered an error —— View job

I'll analyze this and get back to you.

chatgpt-codex-connector

💡 Codex Review

elastickv/internal/backup/encode_dynamodb.go

Lines 101 to 102 in 365744e

    
           if !ent.IsDir() { 
        
           	continue

Fail closed on non-directory DynamoDB tables

When dynamodb/ contains a canonical table entry that is a regular file or symlink (for example from a truncated or reused decoded tree), this branch silently skips it and the encoder can publish an .fsm with that table omitted; the S3 path in this same change already treats non-directory top-level entries as malformed for this reason. Please return an adapter-data error for non-directory entries under the selected DynamoDB root instead of treating them as absent.

elastickv/internal/backup/encode_sqs.go

Lines 142 to 143 in 365744e

    
           if !ent.IsDir() { 
        
           	continue

Fail closed on non-directory SQS queues

When sqs/ contains a queue entry that is a regular file or symlink, this silently skips it, so a malformed or partially cleaned input tree can encode successfully while dropping the affected queue and its messages. Since enabled adapter errors are now classified as data failures, this should reject non-directory top-level SQS entries rather than publish a partial restore artifact.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

bootjp · 2026-06-02T15:05:55Z

@claude review

Re-triggering for v35 — prior two bot jobs errored. Status:

v34 (commit 03a9883): codex review came back with 0 inline findings — clean approval.
v33 (commit 5adca7f): claude verdict was "Ready to merge"; the one v32→v31 citation nit it flagged is now fixed in v35.
v35 (commit 365744e): one-char doc-only fix.
CI on v34: all green; v35 CI in progress.

If this v35 review also errors, the prior bot signals form a coherent review-OK.

claude · 2026-06-02T15:06:21Z

Claude finished @bootjp's task in 59s —— View job

Code Review - PR #904 (v35)

Tasks

Gather context and understand the request
Read changed files (v35 commit 365744e)
Verify mandatory v33 nit: citation v32 → v31 in encode_s3.go
Check for any new issues introduced in v35
Post final review

v35 Verification — v33 Citation Nit Confirmed

The diff is exactly one token change in one comment line:

Item	Status
`encode_s3.go:93`: `codex P2 v32 #904` → `codex P2 v31 #904`	✅
The S3 non-directory bucket finding appeared in codex's review of v31 (commit `a2ed548f`, reviewed at `2026-06-02T13:52:11Z`)	✅ Confirmed against git log
v32 citations elsewhere in the codebase (`main.go:656`, `main.go:674`, `main_test.go:1066`, `encode_s3_test.go:140,166`) are all accurate — v32 introduced the fsync, rollback, and sidecar permission changes	✅ No over-correction
No logic, test, or other production files touched	✅ Pure doc-only change

Verdict: Ready to merge. v35 closes the single citation nit from the v33 review. All findings across all 35 rounds of review are resolved. The PR is clean.

v1–v5: parsing strictness, file permissions, dead code, doc correctness
v6–v10: zero-adapter guard, sidecar-on-mismatch, DynamoDB JSONL fail-closed, adapter-data exit-2, phantom sentinel name
v11–v15: doc only, test name parity, test error handling, file permission enforcement, manifest error exit-2
v16–v20: Redis multi-DB fail-closed, buf.Len() assertion parity, EncodeSnapshot godoc placement, zero-AdapterSet test correction
v21–v25: symlink stale output removal, writeAndPublish doc, test refactors, fsync parent dir, sidecar no-follow open
v26–v30: adapter scope validation, v28/v29 scope-content heuristic retract (codex P1)
v31–v35: doc alignment with v30, rollback orphan FSM + chmod sidecar, S3/fsync/sidecar-rollback P2 fixes, hard-link sidecar preservation, citation fix

| Branch | |

bootjp added 2 commits June 2, 2026 00:53

Merge remote-tracking branch 'origin/main' into backup/m6-cli-design

1fa691f

gemini-code-assist Bot reviewed Jun 1, 2026

View reviewed changes

chatgpt-codex-connector Bot reviewed Jun 1, 2026

View reviewed changes

chatgpt-codex-connector Bot reviewed Jun 2, 2026

View reviewed changes

coderabbitai Bot reviewed Jun 2, 2026

View reviewed changes

chatgpt-codex-connector Bot reviewed Jun 2, 2026

View reviewed changes

bootjp merged commit 9859f00 into main Jun 3, 2026
10 checks passed

bootjp deleted the backup/m6-cli-design branch June 3, 2026 03:23

		if _, err := fmt.Sscanf(s, "%d", &ts); err != nil {
		return 0, errors.Wrap(err, "--last-commit-ts decimal parse")

		b := newSnapshotBuilder(opts.LastCommitTS)
		enabled, err := runAdapterEncoders(b, opts)

		{"dynamodb", func(s AdapterSet) bool { return s.DynamoDB }, func(b *snapshotBuilder, root string) error {
		return errors.Wrap(NewDynamoDBEncoder(root).Encode(b), "dynamodb encoder")

		if err := fsyncParentDir(cfg.outputPath); err != nil {
		return result, errors.Wrap(err, "fsync output dir after rename")

		if srerr := os.Remove(sidecarPath); srerr != nil && !errors.Is(srerr, os.ErrNotExist) {
		logger.Warn("rollback partial sidecar after write failure", "err", srerr)

Conversation

bootjp commented Jun 1, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What lands

Test plan

Caller audit per CLAUDE.md semantic-change rule

Self-review (5 passes)

Risk

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

bootjp commented Jun 1, 2026

Uh oh!

claude Bot commented Jun 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review - PR #904

Summary

Issues

Observations (no fix required, but worth knowing)

Strengths

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

bootjp commented Jun 1, 2026

Uh oh!

claude Bot commented Jun 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review - PR #904 (v2)

v2 Verification — All 6 Required Fixes Confirmed

One New Issue Introduced by v2

Carry-over Observations (non-blocking, noted in v1)

Strengths in v2

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

bootjp commented Jun 1, 2026

Uh oh!

claude Bot commented Jun 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review - PR #904 (v3)

v3 Verification — All v2 Carry-overs Confirmed

bootjp commented Jun 1, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 1, 2026 •

edited

Loading

claude Bot commented Jun 1, 2026 •

edited

Loading

claude Bot commented Jun 1, 2026 •

edited

Loading

claude Bot commented Jun 1, 2026 •

edited

Loading

claude Bot commented Jun 1, 2026 •

edited

Loading

One Remaining Issue — `encodeInfoFilePerm` is dead code; sidecar created `0644`

claude Bot commented Jun 1, 2026 •

edited

Loading

claude Bot commented Jun 1, 2026 •

edited

Loading

claude Bot commented Jun 2, 2026 •

edited

Loading