Skip to content

backup: Phase 0b M4-2b impl - S3 collision-rename reversal#928

Open
bootjp wants to merge 4 commits into
mainfrom
feat/m4-2b-collision-impl
Open

backup: Phase 0b M4-2b impl - S3 collision-rename reversal#928
bootjp wants to merge 4 commits into
mainfrom
feat/m4-2b-collision-impl

Conversation

@bootjp
Copy link
Copy Markdown
Owner

@bootjp bootjp commented Jun 4, 2026

Summary

Phase 0b M4-2b implementation — S3 collision-rename reversal. The design
doc landed via PR #913 (merged 2026-06-03). This PR carries the
implementation that the doc described, in 3 commits:

  • Commit A (c4e87440): keymap loader + per-record invariant validator +
    orphan detector. New file encode_s3_collision.go plus 9 unit tests.
  • Commit B (2de1c94b): integration — drop ErrS3EncodeUnsupportedCollision,
    thread the loaded keymap through walkObjectswalkObjectEntry
    encodeObject. Update the now-stale sentinel reference in
    encode_snapshot.go:655.
  • Commit C (92b14f56): 8 end-to-end tests covering the design doc's
    test plan table, asserting at the !s3|obj|head|<original-key>
    record-byte level via DecodeLiveEntries.

Three kinds reversed

Kind Decoder wrote M4-2b recovers
KindS3LeafData shorter key → <key>.elastickv-leaf-data original <key>
KindMetaCollision *.elastickv-meta.json user key → renamed original *.elastickv-meta.json key
KindSHAFallback filename-length overflow → <sha>__<truncated> full-rel-path lookup honors it (forward-compat)

Fail-closed invariants (design §"Error contract")

Invariant Sentinel chain
Malformed JSONL ErrInvalidKeymapRecord
Duplicate Encoded (diverges from LoadKeymap last-wins) ErrInvalidKeymapRecord (claude v913 v2)
KindS3LeafData encoded missing .elastickv-leaf-data ErrInvalidKeymapRecord
KindMetaCollision original missing .elastickv-meta.json ErrInvalidKeymapRecord
Unknown Kind (closed-set forward-compat) ErrInvalidKeymapRecord (claude v913 v2)
Original targets _orphans/ or _incomplete_uploads/ (narrow rule) ErrInvalidKeymapRecord (codex P2 v913 v1)
Orphan keymap entry ErrInvalidKeymapRecord

Every gate wraps to ErrS3EncodeInvalidBucket at encodeBucketObjects,
which runAdapterEncoders marks with ErrEncodeAdapterData for CLI
exit-2 routing.

Read-side safety (codex P2 v913 v3)

loadBucketKeymap opens KEYMAP.jsonl via root.LstatrefuseHardLink
root.Open. NOT OpenSidecarFile, which is write-side on every
platform and would truncate the file before reading.

Legitimacy gate preserved (codex P2 v913 v2)

isKeymapCollisionTracker continues to distinguish the tracker file from
a legitimate user object literally named KEYMAP.jsonl. When tracker == false (sidecar present), loadBucketKeymap is not invoked — the file
takes the normal object-body path. TestS3EncodeKeymapObjectRoundTrip
(preserved unchanged from M4-2a) pins this.

5-lens self-review

  1. Data loss — Every renamed object recovers its original S3 key.
    Orphan detection runs at load time; fail-closed prevents partial
    restore.
  2. Concurrency — Offline encoder; no goroutines / locks / shared state.
  3. Performance — Load-once per bucket; O(records-in-keymap) memory,
    O(1) lookup per walked body. Matches the Redis loadKeymap precedent.
  4. Data consistency — Object key bytes recovered byte-equal to the
    decoder's input. Errors fail-closed via existing sentinels routed to
    exit-2.
  5. Test coverage — 9 unit + 8 end-to-end + 2 preserved legitimacy
    regressions. All gate predicates pinned by ≥1 regression test.

Test plan

  • go test ./internal/backup/ passes
  • golangci-lint run --config=.golangci.yaml ./internal/backup/ 0 issues
  • Bot review cycle

bootjp added 3 commits June 4, 2026 21:48
Per design doc PR #913 §'Per-bucket keymap loading'. Adds the read-side
helper that inverts the S3 decoder's KEYMAP.jsonl writes, plus the per-
record invariant validator. No call sites wired up yet — commit B
integrates this into the encodeBucketObjects walk.

New (encode_s3_collision.go):
- loadBucketKeymap: opens <bucketDir>/KEYMAP.jsonl via the read-side
  pipeline (root.Lstat -> refuseHardLink -> root.Open). MUST NOT use
  OpenSidecarFile - that helper is write-side on every platform
  (O_WRONLY|O_CREATE|O_TRUNC on Windows/fallback, O_WRONLY|O_CREATE +
  Truncate(0) on Unix) and would erase the file before reading.
  Codex P2 v913 v3.
- readKeymapStrict: shared iterator over KeymapReader. Diverges from
  LoadKeymap (last-wins) — duplicate Encoded means corrupt dump, fail
  closed. Claude v913 v2.
- validateKeymapRecord: per-record invariants — Kind in closed set
  {S3LeafData, MetaCollision, SHAFallback}; KindS3LeafData encoded
  must end in S3LeafDataSuffix; KindMetaCollision original must end
  in S3MetaSuffixReserved.
- validateKeymapReservedRoot: rejects original keys whose first
  slash-split path component is _orphans or _incomplete_uploads.
  Whole _ namespace is NOT reserved - legit user keys like _foo /
  _foo/bar / nested/_orphans/x pass. Codex P2 v913 v1.
- verifyKeymapTargetsExist: orphan-record detector (Encoded with no
  on-disk body). Runs once after load so operator sees the keymap
  inconsistency directly rather than as a later missing-sidecar
  error. O(N) Lstats per bucket.
- resolveObjectKeyFromRel: lookup helper. Slash-form conversion +
  keymap lookup; nil keymap or miss returns slash-form rel as-is
  (no-collision M4-2a path).

Tests (encode_s3_collision_test.go):
- TestLoadBucketKeymap_HappyPath - three kinds round-trip into map.
- TestLoadBucketKeymap_DuplicateEncoded - fail-closed regression for
  the LoadKeymap-divergence contract (claude v913 v2).
- TestLoadBucketKeymap_MalformedJSON - wraps ErrInvalidKeymapRecord.
- TestLoadBucketKeymap_MissingFile - (nil, nil) defensive return.
- TestValidateKeymapRecord_KindLeafSuffix - 3-case table.
- TestValidateKeymapRecord_KindMetaSuffix - good + bad.
- TestValidateKeymapRecord_UnknownKind - forward-compat guard
  (claude v913 v2).
- TestValidateKeymapReservedRoot - 9-case table covering boundary
  cases: _orphans (reject), _foo (accept), _orphansFoo (accept),
  nested/_orphans (accept), etc. Codex P2 v913 v1.
- TestVerifyKeymapTargetsExist - present + orphan subtests.
- TestResolveObjectKeyFromRel - 4-case lookup contract.

Caller audit: pure additions; no existing call site changed in this
commit. The loader / validator are unreachable from the encoder
graph until commit B wires them into encodeBucketObjects.

go test + golangci-lint clean on ./internal/backup/.
Per design doc PR #913 §'Object-walk integration'. Wires the
loader from commit A into encodeBucketObjects and threads the
loaded keymap through walkObjects -> walkObjectEntry -> encodeObject.
For each on-disk object whose bucket-relative path appears in the
keymap, the encoder now emits records under the recovered original
S3 key rather than the renamed on-disk path.

Sentinel removal:
- ErrS3EncodeUnsupportedCollision removed - the encoder now reverses
  the collision renames it used to fail closed on. All wrap sites
  internal to this package were inside encodeBucketObjects (the now-
  replaced fail-closed branch) plus the test deleted below. grep
  confirms no external callers under /Users/bootjp/src/elastickv2.

Integration points:
- encodeBucketObjects: when isKeymapCollisionTracker reports true,
  loadBucketKeymap + verifyKeymapTargetsExist run before the walk.
  Both errors wrap to ErrS3EncodeInvalidBucket so the CLI's
  ErrEncodeAdapterData mark routes them to exit-2 unchanged. When
  the tracker is absent the keymap stays nil and the walk runs the
  M4-2a path byte-identically.
- walkObjects / walkObjectEntry / walkObjectSubdir / encodeObject
  all gain a 'keymap map[string]KeymapRecord' parameter (nil-safe).
  encodeObject swaps filepath.ToSlash(objRel) for
  resolveObjectKeyFromRel(keymap, objRel).
- walkObjectEntry: explicit skip for the top-level 'KEYMAP.jsonl'
  filename when keymap is non-nil. Keeps the single
  disambiguation-decision contract in isKeymapCollisionTracker -
  the walker never needs to re-derive whether KEYMAP.jsonl is a
  tracker (loadBucketKeymap already consumed it).

Comment update:
- encode_snapshot.go:655 docstring no longer cites
  ErrS3EncodeUnsupportedCollision (now removed). Substituted
  ErrS3EncodeReservedPrefixCollision + ErrS3EncodeInvalidBucket
  which still exist and play the same exit-2-routing example role.

Test removal:
- TestS3EncodeRejectsKeymapCollisionTracker deleted. The fixture
  (KEYMAP.jsonl with no sidecar) is exactly what M4-2b now reverses;
  commit C will add a positive round-trip test on the same shape.

Tests preserved unchanged:
- TestS3EncodeKeymapObjectRoundTrip - legitimate user 'KEYMAP.jsonl'
  with sidecar still round-trips (proves isKeymapCollisionTracker's
  sidecar disambiguation gates correctly; codex P2 v913 v2).
- TestS3EncodeLeafDataObjectRoundTrip - a real object whose key ends
  in .elastickv-leaf-data + sidecar + NO keymap entry continues to
  take the legacy no-collision path.

Caller audit (semantic changes):
- walkObjects / walkObjectEntry / walkObjectSubdir / encodeObject -
  all unexported; single production call chain from
  encodeBucketObjects. All updated in this commit.
- ErrS3EncodeUnsupportedCollision: grep on /Users/bootjp/src/elastickv2
  finds zero references after this commit.

go test + golangci-lint clean on ./internal/backup/.
Per design doc PR #913 §'Test plan'. Adds the 8 design-doc tests
covering the end-to-end encoder behavior:

- TestS3EncodeRoundTripsLeafDataCollision - both <key>.elastickv-leaf-data
  (renamed shorter key) and <key>/sub (longer key) round-trip; emitted
  !s3|obj|head|<orig> manifest records are under the recovered original
  S3 keys with no .elastickv-leaf-data suffix leak.
- TestS3EncodeRoundTripsMetaCollision - the --rename-collisions=true
  variant for a user object whose key naturally ends in
  .elastickv-meta.json is reversed via KindMetaCollision.
- TestS3EncodeRejectsKeymapTargetingReservedSubtree - 2-case sub-test
  matrix on _orphans/foo and _incomplete_uploads/x; integration-level
  pin for the design's reserved-subtree gate.
- TestS3EncodeAcceptsKeymapWithUserUnderscoreKey - codex P2 v913 v1
  narrow-reserved-rule: a single underscore prefix (_foo) is a legit
  user key and round-trips.
- TestS3EncodeRejectsOrphanKeymapEntry - integration-level fail-closed
  for a keymap entry whose Encoded has no on-disk body.
- TestS3EncodeRejectsDuplicateKeymapEntry - integration-level
  fail-closed for two records sharing an Encoded value.
- TestS3EncodeRejectsMalformedKeymapJSON - integration-level fail-closed
  for an unparseable line.
- TestS3EncodeMissingKeymapIsValidNoCollisionDump - regression that the
  M4-2a happy path is unchanged when no KEYMAP.jsonl is present.

The integration tests assert at the !s3|obj|head| record-byte level
via assertS3ManifestEmittedUnder rather than re-running the decoder.
A full decoder round-trip would re-apply the renames (or reject the
.elastickv-meta.json-suffixed key without --rename-collisions=true),
obscuring whether the encoder recovered the original key.

Helper shape adjustments to satisfy unparam:
- assertS3ManifestEmittedUnder bakes 'b' as the bucket via the new
  collisionTestBucket package constant.
- writeS3KeymapTracker drops its bucket parameter; uses the same
  shared constant.

Existing tests preserved unchanged (the legitimacy gate from claude
v913 v2 codex P2 still applies):
- TestS3EncodeKeymapObjectRoundTrip - legitimate user object named
  KEYMAP.jsonl (with sidecar) still round-trips.
- TestS3EncodeLeafDataObjectRoundTrip - a legit object key ending in
  .elastickv-leaf-data (with sidecar, no keymap entry) keeps the
  M4-2a no-collision path.

go test + golangci-lint clean on ./internal/backup/.
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jun 4, 2026

Warning

Review limit reached

@bootjp, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 50 minutes and 22 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 59f2239c-848c-46ed-9a90-7ef61c8a317a

📥 Commits

Reviewing files that changed from the base of the PR and between 0ed19ec and f033e46.

📒 Files selected for processing (6)
  • internal/backup/encode_s3_collision.go
  • internal/backup/encode_s3_collision_roundtrip_test.go
  • internal/backup/encode_s3_collision_test.go
  • internal/backup/encode_s3_objects.go
  • internal/backup/encode_s3_objects_test.go
  • internal/backup/encode_snapshot.go
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/m4-2b-collision-impl

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements the M4-2b helper to reverse S3 decoder collision-rename writes by parsing and strictly validating the KEYMAP.jsonl file. It replaces the previous behavior of failing closed upon encountering a collision tracker with a robust recovery mechanism that maps renamed on-disk files back to their original S3 keys. The review feedback suggests two key improvements to validation: ensuring that rec.Encoded does not contain path traversal segments like . or .. to prevent out-of-bucket file probing, and validating that the decoded original key is non-empty to fail closed earlier with a clear error.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment on lines +131 to +136
func validateKeymapRecord(rec KeymapRecord) error {
if _, ok := keymapAllowedKinds[rec.Kind]; !ok {
return errors.Wrapf(ErrInvalidKeymapRecord,
"unknown kind %q for encoded segment %q", rec.Kind, rec.Encoded)
}
originalBytes, err := rec.Original()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

To prevent path traversal or out-of-bucket file probing, validate that rec.Encoded is a clean, relative path that does not contain any ., .., or empty segments.

func validateKeymapRecord(rec KeymapRecord) error {
	if _, ok := keymapAllowedKinds[rec.Kind]; !ok {
		return errors.Wrapf(ErrInvalidKeymapRecord,
			"unknown kind %q for encoded segment %q", rec.Kind, rec.Encoded)
	}
	for _, seg := range strings.Split(rec.Encoded, "/") {
		if seg == "." || seg == ".." || seg == "" {
			return errors.Wrapf(ErrInvalidKeymapRecord, "encoded path %q contains invalid segment %q", rec.Encoded, seg)
		}
	}
	originalBytes, err := rec.Original()

Comment on lines +136 to +143
originalBytes, err := rec.Original()
if err != nil {
return err
}
original := string(originalBytes)
if err := validateKeymapReservedRoot(rec, original); err != nil {
return err
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

S3 keys cannot be empty. Validating that the decoded original key is non-empty during keymap loading allows the encoder to fail closed earlier with a clear error message, rather than waiting until the object body assembly phase.

	originalBytes, err := rec.Original()
	if err != nil {
		return err
	}
	original := string(originalBytes)
	if original == "" {
		return errors.Wrap(ErrInvalidKeymapRecord, "empty original key")
	}
	if err := validateKeymapReservedRoot(rec, original); err != nil {
		return err
	}

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 92b14f56c5

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines 93 to +94
if tracker {
return errors.Wrapf(ErrS3EncodeUnsupportedCollision,
"%s: collision-rename KEYMAP.jsonl present", bucketDir)
keymap, err = e.loadBucketKeymap(root, bucketDir)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Require a regular sidecar before skipping keymap load

When a bucket has a collision tracker and also contains a valid user object under the prefix KEYMAP.jsonl.elastickv-meta.json/, isKeymapCollisionTracker treats that directory as the companion sidecar for a literal KEYMAP.jsonl object, so tracker is false and this new keymap-loading branch is skipped. The walk then tries to encode KEYMAP.jsonl as an object and fails on the directory sidecar, rejecting an otherwise valid collision dump; the sidecar disambiguator needs to require a regular file, not just any entry, before bypassing the tracker path.

Useful? React with 👍 / 👎.

… regular-file sidecar (gemini medium x2 + codex P2)

Three findings on PR #928:

1. Gemini medium (encode_s3_collision.go:136): path-traversal guard.
   rec.Encoded values containing '..', '.', or empty segments would
   let a hand-crafted keymap escape the bucket sub-root when
   filepath.Join joins Encoded into the Lstat path. New
   validateKeymapEncodedPath rejects any non-clean relative path.
   Pinned by TestValidateKeymapRecord_PathTraversal (6 boundary
   cases: leading slash, trailing slash, double slash, contains
   '..' / '.' / middle '..').

2. Gemini medium (encode_s3_collision.go:143): non-empty original.
   S3 object keys cannot be empty. Reject the keymap record at load
   time so the operator sees the problem directly rather than as a
   downstream object-body assembly error. Pinned by
   TestValidateKeymapRecord_EmptyOriginal.

3. Codex P2 (encode_s3_objects.go:94): regular-file sidecar.
   isKeymapCollisionTracker used rootEntryExists to look for a
   companion sidecar at <bucket>/KEYMAP.jsonl.elastickv-meta.json,
   but rootEntryExists returns true for any Lstat-able entry
   including a directory. A user key prefix
   'KEYMAP.jsonl.elastickv-meta.json/...' creates a directory at the
   sidecar path, so the tracker was mis-classified as a user object
   and keymap loading was skipped. Fix: require the sidecar to be a
   regular file. Only a real file disambiguates a legitimate user
   object; any other mode means the tracker classification stands.
   Pinned by TestS3EncodeTrackerWithDirectorySidecar.

Caller audit:
- validateKeymapRecord: split into validateKeymapEncodedPath +
  validateKeymapKindSuffix to keep cyclop under 10. Only callers
  are loadBucketKeymap (production) + test sites.
- rootEntryExists: had only one caller (isKeymapCollisionTracker)
  which now inlines the Lstat + IsRegular check; the helper was
  removed.
- isKeymapCollisionTracker: same single call site
  (encodeBucketObjects). Behavior tightened only for the dir-at-
  sidecar-path edge case; happy path unchanged.

go test + golangci-lint clean on ./internal/backup/.
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Jun 4, 2026

@claude review

Addresses all 3 review findings on this PR:

  1. Gemini medium (encode_s3_collision.go:136) — path traversal guard. New validateKeymapEncodedPath rejects any rec.Encoded containing ., .., or empty segments. Pinned by TestValidateKeymapRecord_PathTraversal (6 boundary cases).

  2. Gemini medium (encode_s3_collision.go:143) — non-empty original. Reject keymap records with empty decoded original key at load time. Pinned by TestValidateKeymapRecord_EmptyOriginal.

  3. Codex P2 (encode_s3_objects.go:94) — regular-file sidecar. isKeymapCollisionTracker now requires the sidecar to be a regular file. A directory at KEYMAP.jsonl.elastickv-meta.json/ (from a user key prefix collision) no longer misclassifies the tracker as a user object. Pinned by TestS3EncodeTrackerWithDirectorySidecar.

Caller audit: validateKeymapRecord split into 3 helpers to keep cyclop under 10. rootEntryExists had a single caller and was inlined.

@claude
Copy link
Copy Markdown

claude Bot commented Jun 4, 2026

Claude finished @bootjp's task in 3m 4s —— View job


PR #928 Review — M4-2b S3 collision-rename reversal

Tasks:

  • Load GitHub comment tool
  • Read changed files
  • Review: path traversal guard (validateKeymapEncodedPath)
  • Review: non-empty original key validation
  • Review: regular-file sidecar check in isKeymapCollisionTracker
  • Review: overall code quality, invariants, test coverage
  • Post final review

All three review findings are correctly addressed. Details below.

Finding 1: Path traversal guard — validateKeymapEncodedPath

encode_s3_collision.go:162–170 correctly splits rec.Encoded on / and rejects empty segments (catches leading /, trailing /, and //), ., and ... The check runs before rec.Original() decoding, giving a fast-fail on obviously malformed records. All 6 boundary cases in TestValidateKeymapRecord_PathTraversal are handled correctly:

  • "../escape…".. segment caught ✅
  • "./still-bad…". segment caught ✅
  • "path//to…" → empty segment from // caught ✅
  • "path/to…/" → trailing empty segment caught ✅
  • "/path/to…" → leading empty segment caught ✅
  • "path/../escape…" → middle .. caught ✅

Finding 2: Non-empty original key — encode_s3_collision.go:147–150

Empty original is rejected immediately after rec.Original() decodes, before validateKeymapReservedRoot or validateKeymapKindSuffix are called. Error message includes "empty original" as checked by TestValidateKeymapRecord_EmptyOriginal. Early rejection gives a readable diagnostic at load time rather than at object-body assembly.

Finding 3: Regular-file sidecar — isKeymapCollisionTracker

encode_s3_objects.go:143–145 now checks sinfo.Mode().IsRegular() after the sidecar Lstat. A directory at KEYMAP.jsonl.elastickv-meta.json/ (created by a user key prefix collision) no longer falsely classifies the tracker as a user object. TestS3EncodeTrackerWithDirectorySidecar creates exactly this scenario and confirms the tracker is still loaded, recovering "path/to" from the keymap.

encode_snapshot.go sentinel reference ✅

The comment at line 655 now lists ErrS3EncodeReservedPrefixCollision, ErrS3EncodeInvalidBucket in place of the dropped ErrS3EncodeUnsupportedCollision. Clean housekeeping.


Minor observations (not blocking)

1. Dual-invalid test fixtureTestS3EncodeRejectsKeymapTargetingReservedSubtree uses leafRecord("renamed-body", reserved+"/forbidden/key"). The encoded value "renamed-body" lacks the .elastickv-leaf-data suffix that validateKeymapKindSuffix requires for KindS3LeafData. The test still passes correctly because validateKeymapReservedRoot fires first and returns ErrInvalidKeymapRecord. The assertion (ErrS3EncodeInvalidBucket) is correct, but the fixture is accidentally "double-invalid." This is not a bug — the test pins the intended gate — but it's slightly misleading.

2. Redundant empty-check in validateKeymapReservedRootencode_s3_collision.go:209–211 re-checks original == "" even though validateKeymapRecord already rejects empty before calling it. The TestValidateKeymapReservedRoot "empty key" case (wantErr: false) confirms the intent — the function is designed to be callable standalone with empty input. Harmless and documented.

3. TOCTOU window in loadBucketKeymap — The function re-Lstats KEYMAP.jsonl after isKeymapCollisionTracker already did so. A race where the file disappears between the two calls returns (nil, nil) rather than failing. The comment at line 74–76 documents this explicitly as defensive behavior. Reasonable for an offline encoder.

4. Original is not path-traversal-validated — Correct: S3 object keys are arbitrary unicode strings and may legitimately contain ../. The original key is stored as a KV record key, not materialized as a filesystem path by this encoder.


Overall: All three review findings are correctly addressed. Implementation is clean, decomposed well (validateKeymapRecord split into 3 helpers under the cyclop limit), and test coverage is thorough (9 unit + 8 end-to-end + 2 preserved legitimacy regressions). No blocking concerns.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f033e469c5

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

// which is the slash-form S3 key + optional rename
// suffix). Convert to filepath form before Lstat.
rel := filepath.Join(bucketDir, filepath.FromSlash(encoded))
_, err := root.Lstat(rel)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Require keymap targets to be object bodies

When a corrupt KEYMAP entry names an existing directory (for example a prefix created by another object) instead of the renamed object body, this Lstat succeeds and the verifier treats the record as valid. The object walk then recurses into that directory and never calls resolveObjectKeyFromRel for the encoded path, so Encode can finish while silently ignoring the keymap entry; fail closed here by requiring the target to be a regular file/body, not just any existing filesystem entry.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant