Skip to content

audit: SLSA v1.2 Build Track L3 isolation analysis (#216)#217

Merged
TomHennen merged 9 commits into
mainfrom
claude/issue-216-agents-qM7Hn
May 17, 2026
Merged

audit: SLSA v1.2 Build Track L3 isolation analysis (#216)#217
TomHennen merged 9 commits into
mainfrom
claude/issue-216-agents-qM7Hn

Conversation

@TomHennen
Copy link
Copy Markdown
Owner

@TomHennen TomHennen commented May 14, 2026

Closes #216 (the audit deliverable). Concrete fixes for the gaps the audit
finds are explicitly out of scope per #216 — each spawns its own follow-up
issue.

Summary

Adds docs/SLSA_L3_AUDIT.md and a short framing section in docs/SPEC.md.
The audit is verbatim-spec-quoted, source-verified for the
implementation-detail claims (uv, BuildKit, containerd), and produces a
per-builder MEETS / GAP table against SLSA v1.2 Build Track L3.

Verdicts

Builder L3 isolation verdict
npm (npm sub-path) MEETS WITH PRECONDITION — relies on npm ci re-verifying tarballs against package-lock.json on every install
npm (pnpm sub-path) MEETS — cache disabled per #205 / #212
python (pip sub-path) MEETSsetup-python cache not opted into
python (uv sub-path) GAPsetup-uv defaults enable-cache: auto and uv's cache-hit code trusts pre-stored hashes from sidecar pointer files instead of re-hashing on use (source-verified against astral-sh/uv main). Same shape as the pnpm-store gap that #205 closed.
container GAPtype=gha + mode=max combined with BuildKit's ingest-only digest check and GHA's branch-scoped cache rules makes cross-scope poisoning reachable from pull_request_target, default-branch, and parent-branch contexts (source-verified against moby/buildkit and containerd/containerd; cross-referenced with Adnan Khan and the TanStack postmortem).
shell N/A — no provenance produced

The other four L3 requirements (Provenance Exists, Authentic, Unforgeable,
Hosted) are also covered. All MEET by-construction via wrangle's use of
generator_generic_slsa3.yml and the reusable-workflow consumption model
(build job: contents: read; provenance job: id-token: write in a separate
invocation).

Recommended remediation pattern

The audit's primary recommendation closes both GAP findings with one
structural change: gate caches on the same should-release signal wrangle
already uses to gate provenance creation.

  • PR / dev events → caches enabled (fast iteration; PRs produce no L3
    attestation, so cache-poisoning is not an L3 concern at that layer).
  • Release events → caches disabled (no shared mutable state; the bytes the
    generator signs over are derived without consulting any cross-build cache).

Wrangle already builds on every event including PRs, and already gates
provenance + verify + the adopter publish job on actions/release_gate's
should-release output. The cache configuration is the missing piece — and
adding it composes with the existing release-gate vocabulary instead of
inventing a parallel mechanism.

The audit also takes a position on PR-to-PR cache poisoning (which the
release-vs-PR asymmetry does not close): default unchanged, but document
the threat and expose adopter-tunable knobs (per-PR cache namespacing,
read-only cache-from, or a global cache-disabled mode), and dogfood the
strict knob on wrangle's own CI because wrangle's compromise propagates to
every adopter.

Build Track level, in binary terms

The MEETS / GAP verdicts above are the audit's analytical language. Translated
into the single-claim-per-workflow vocabulary wrangle's user-facing docs should
use (see the "Bottom line" table at the top of SLSA_L3_AUDIT.md):

Builder (via wrangle's reusable workflow) Build Track level today
npm (npm sub-path) Build L3
npm (pnpm sub-path) Build L3
python (pip sub-path) Build L3
python (uv sub-path) Build L2 → Build L3 after Finding 1
container Build L2 → Build L3 after Finding 2
shell N/A — no provenance

Scope: this PR is the audit only

This PR delivers docs/SLSA_L3_AUDIT.md and nothing else. An earlier revision
also added a framing section to docs/SPEC.md; per review, that is reverted —
docs/SPEC.md is unchanged by this PR. Acting on the audit (the SPEC.md
rewrite, the README sweep, the two GAP-closing implementations) is follow-up
work, each its own issue.

Follow-up issues this audit suggests

Each becomes its own issue per the contract of #216:

  1. Implement the release-vs-PR cache asymmetry on the container path (the L3 fix for Finding 2).
  2. Implement the same asymmetry on the python uv path (the L3 fix for Finding 1).
  3. Add adopter-tunable PR cache knobs (per-PR scope, read-only, cache: 'never') to the reusable workflows.
  4. Adopt the strictest knob on wrangle's own CI (dogfooding).
  5. Document the PR-to-PR threat and the available knobs in build-type READMEs.
  6. (Highly recommended) Add a ::stop-commands:: guard around build_and_pack.sh / install_deps.sh / run_tests.sh invocations, mirroring the SLSA Go builder.
  7. (Doc-only) Rewrite docs/SPEC.md to claim a single Build Track level (Build L2 / Build L3) per workflow — no "L3-for-signing vs L3-for-build-platform" distinction.
  8. (Doc-only) Sweep the build-type READMEs (build/, build/actions/{container,npm,python}/, actions/scan/, gh_workflow_examples/) to match the per-builder Build Track levels above; today container and python-uv would read "Build L2".

Test plan

This is a documentation-only change. The repo's CI runs actionlint,
shellcheck, and bats; none of those touch markdown files.

  • Skim the audit doc for style and accuracy
  • Confirm docs/SPEC.md shows no diff (the framing tweak was reverted)
  • Decide which follow-up issues to spawn

https://claude.ai/code/session_017TAfNBd612BTzqXMzkNYgE


Generated by Claude Code

claude added 3 commits May 14, 2026 21:57
Per-builder conformance audit against SLSA v1.2 "Isolated" with verbatim
spec quotes, Pattern A vs Pattern B revisit, and three findings:

- npm (npm path): MEETS WITH PRECONDITION — relies on npm ci's per-install
  tarball re-verification against package-lock.json.
- npm (pnpm path): MEETS — cache disabled per #205/#212.
- python (pip path): MEETS — setup-python cache not opted into.
- python (uv path): GAP — astral-sh/setup-uv defaults enable-cache: auto and
  uv's cache-hit code trusts pre-stored hashes from sidecar pointer files
  rather than re-hashing on use (source-verified against astral-sh/uv main).
  Structurally the same shape as the pnpm-store gap closed in #205.
- container path: GAP — type=gha + mode=max combined with BuildKit's
  ingest-only digest check and GHA's branch-scoped cache rules makes
  cross-scope poisoning reachable from pull_request_target, default-branch,
  and parent-branch contexts. Source-verified against moby/buildkit and
  containerd/containerd; cross-referenced with Adnan Khan's research and
  the TanStack postmortem.
- shell path: N/A — no provenance produced.

Audit also covers the other four L3 requirements (Provenance Exists,
Authentic, Unforgeable, Hosted) and verdicts each MEETS by-construction
via wrangle's use of generator_generic_slsa3.yml + reusable-workflow
permission separation (build job: contents: read; provenance job:
id-token: write in a separate invocation).

SPEC.md gains a short "SLSA L3 claims" section distinguishing
L3-for-signing from L3-for-build-platform so README claims aren't
read as making the stronger build-platform claim implicitly.

Concrete fixes for the two GAP findings are explicitly out of scope per
the contract of #216; recommendations are listed in the audit doc, and
each becomes a follow-up issue.
Wrangle already gates SLSA provenance, verify, and the adopter publish job
on actions/release_gate's should-release output. PR builds run (and test,
and emit SBOMs) but produce no L3 attestation. Caches, however, are
currently NOT gated on the same signal — type=gha + mode=max on container
and the default-on uv cache fire on every event regardless of release
status.

Cleaner remediation than the per-finding mitigations: gate caches on the
same should-release signal wrangle uses for provenance. PR builds keep
caches (fast iteration; no L3 attestation, so cache-poisoning is not an
L3 concern at that layer). Release builds drop caches (fresh, isolated,
the spec's "the output of the build MUST be identical whether or not the
cache is used" is satisfied trivially when the cache is not used).

Adds a "Release-vs-PR build asymmetry" section to the audit with
per-builder implementation sketches and an honest scope statement
(closes the two GAP findings for releases; does not close PR-to-PR
poisoning, which is not L3-relevant). Updates the existing GAP findings
to mark this pattern as the preferred remediation with the original
local mitigations retained as supplemental fallbacks.
Adds an explicit position on PR-to-PR cache poisoning. Recommendation:
default unchanged (PR caches on, release caches off — the asymmetry
already proposed) but document the threat and expose adopter-tunable
knobs (per-PR cache namespacing via BuildKit scope= or UV_CACHE_DIR,
read-only cache-from on PRs, or a global cache-disabled mode). Wrangle
should dogfood the strict knob on its own repo because wrangle's own
compromise propagates to every adopter — a different threat tier than
the typical adopter repo. Also surfaces the pull_request_target warning
that cross-references #202.

This is a judgment-call section, not a SLSA-spec lookup. It exists because
"wrangle protects release builds but leaves the same cache-poisoning
vector open on every PR" is awkward to ship without an opinion.
@TomHennen TomHennen temporarily deployed to integration-test May 14, 2026 22:24 — with GitHub Actions Inactive
Copy link
Copy Markdown
Owner Author

@TomHennen TomHennen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested Refinements
Strengthen Finding 3 (stop-commands): While you marked this as "Defense in Depth," I would argue it's highly recommended for Wrangle given your focus on "vibe-coding" and third-party script execution.

Self-Hosted Runner Warning: The audit mentions this in cross-cutting findings; I’d recommend making this a "Warning" block in the final SLSA_L3_AUDIT.md to ensure adopters don't accidentally invalidate their L3 status by moving to persistent runners.

@TomHennen
Copy link
Copy Markdown
Owner Author

Post-audit verification: uv --refresh doesn't actually close the gap

Followed up on the audit's Finding 1 mitigation menu by reading through astral-sh/uv main to verify each option. Two corrections / caveats worth threading into the audit, but flagging them as PR comments rather than editing the audit text since they're "post-audit verification" rather than part of the original deliverable.

Option 2 (uv sync --refresh) — does NOT close this vector for PyPI

The audit's framing ("same network cost as option 1 but more granular") implies --refresh causes a re-download. It doesn't, in the case that matters. The chain:

  1. --refreshRefresh::All(now) (crates/uv-cache/src/lib.rs:1373)
  2. Marks entries Freshness::StaleCacheControl::MustRevalidate (crates/uv-client/src/cached_client.rs:182)
  3. Sets Cache-Control: no-cache on the request (cached_client.rs:495) — server gets to choose 304 vs 200.

PyPI wheel URLs are content-addressed and immutable, so the server returns 304 Not Modified essentially always. On 304, the branch at cached_client.rs:306–335 rewrites the cache-policy bytes and returns Payload::from_aligned_bytes(cached.data) — deserializing the existing Archive { id, hashes, filename } from the unchanged .http sidecar. The on-disk wheel bytes at cache().archive(&id) are never read or re-hashed. The downstream wheel.satisfies(policy) check then compares attacker-controlled archive.hashes against the lockfile's policy.hashes — passes trivially if the attacker stored coherent values.

Only the 200-Modified branch (distribution_database.rs:683–745) re-hashes via HashReader, and that branch is only taken when the upstream URL's content actually changed — precisely the case where no protection was needed.

Net effect: for the common PyPI case, --refresh is a no-op against the poisoning vector this audit describes. Suggest striking option 2 from the menu, or annotating it as "HTTP-revalidation, not on-disk re-hash; does not close this vector for immutable URLs."

For completeness: --reinstall controls site-packages reinstallation, not the cache — also doesn't close the gap. UV_NO_CACHE=1 does close it (uv-cache/src/cli.rs:43–46 allocates a fresh tempdir).

Option 3 (UV_CACHE_DIR=$RUNNER_TEMP/uv-cache) — caveat worth surfacing

"Ephemeral with $RUNNER_TEMP" only holds if no actions/cache step rehydrates the path before uv sync runs. If a workflow saves and restores $UV_CACHE_DIR under a stable key, the poisoning surface is re-created inside $RUNNER_TEMP and the protection collapses.

Wrangle itself doesn't wire up such a step today (Cross-cutting Finding #1 already notes no actions/cache calls anywhere in the repo), so the recommendation holds for wrangle's own consumption. But adopters who add their own caching around the wrangle composite need to be aware. Worth a sentence in option 3.

Doesn't change the verdict

GAP still stands. The audit's verdict line ("option (3) as default, option (1) as adopter opt-out") is still coherent — option (1) is straightforwardly correct and option (3) is fine with the caveat above. Just thought the --refresh finding was worth surfacing since "force-refresh" is the natural first instinct anyone would try, and the spec-suggestion would actively mislead them.

Comment thread docs/SLSA_L3_AUDIT.md Outdated
supply is the `base64-subjects` (the digests of the artifacts), which falls
under the spec's explicit tenant-allowed carve-out for the `subject` field.

**Conditional caveat:** this verdict is contingent on the reusable-workflow
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which reusable workflow are you referring to here? the wrangle reusable build workflow or the slsa generic one?

The wrangle one is fine I think and here you're referring to the SLSA one?

Comment thread docs/SLSA_L3_AUDIT.md
Comment thread docs/SLSA_L3_AUDIT.md Outdated
Comment thread docs/SLSA_L3_AUDIT.md Outdated
(line 24: `contents: read`).

This separation is the same property the SLSA upstream's BYOB framework
(`delegator_lowperms-generic_slsa3.yml`) enforces for ecosystem-specific Pattern A
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're using Pattern A builders here before defining it.

Comment thread docs/SLSA_L3_AUDIT.md Outdated
the on-disk wheel bytes. The smoking-gun line is the `hashes:
archive.hashes` assignment in `load_wheel`'s cache-hit branch
([`crates/uv-distribution/src/distribution_database.rs`](https://github.com/astral-sh/uv/blob/main/crates/uv-distribution/src/distribution_database.rs)).
3. The cache-hit filter is `archive.has_digests(hashes)`
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you verify this? Looking at the code it's just checking to see if it has been hashed, not trying to check the contents itself. Be very careful in your analysis. You probably want to look at the 'matches' function.

Please redo the entire uv analysis with another agent to see if its conclusions match our original conclusions.

Comment thread docs/SLSA_L3_AUDIT.md Outdated
and parallels). This matches Pattern A enforcement.

**4. Self-hosted runner caveat.** GitHub-hosted runners are ephemeral; SLSA's
"ephemeral build environment" requirement is satisfied by the runner image
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so where does the cache actually come from if the image is ephemeral? presumably there's some state kept somewhere?

Copy link
Copy Markdown
Owner Author

Review feedback addressed

I worked through every unresolved review thread (and your own post-audit comment) and pushed the fixes to a separate branch: claude/review-pr-217-UV5aa — one commit, doc-only. You can cherry-pick / merge / squash into the PR branch as you prefer (or close it if you'd rather edit yourself).

Per-thread summary of what changed:

Line 67 — "Does SLSA make this distinction?"
No, SLSA doesn't use those terms verbatim. Added a paragraph immediately under the section heading explaining that "L3-for-signing" and "L3-for-build-platform" are this audit's shorthand mapping onto SLSA's "Provenance is Unforgeable" and "Isolated" requirements. The shorthand exists because the two are routinely conflated in adopter prose; the framing makes that distinction explicit.

Line 112 — "Did you check this one?"
The "MUST NOT open services that allow for remote influence" bullet. Added a paragraph just below the quoted spec block: every wrangle composite is verified to not start listening sockets / debug servers / remote-control endpoints. Outbound network calls (npm registry, PyPI, container registry, the docker/build-push-action push step) are client-side, not services the build opens. The externalParameters carve-out isn't needed.

Line 264 — "which reusable workflow are you referring to here?"
You read it right — meant wrangle's (build_and_publish_python.yml etc.), not SLSA's. Rewrote the caveat to spell that out explicitly, with the inline file reference and a parenthetical disambiguating it from the upstream SLSA generator's reusable workflow that wrangle's invokes internally.

Line 323 — "Pattern A used before defining it"
Correct, the term first appeared in the "Reusable workflows" section but the Pattern A vs B revisit is several sections later. Added an inline parenthetical defining Pattern A on first use (build logic inside an upstream-controlled reusable workflow) and Pattern B (wrangle's composite + generic generator approach), plus a forward link to the full Pattern A vs B revisit section.

Line 516 — "redo the uv analysis carefully; look at matches"
You were right that the original chain was incomplete. Spawned an independent verification agent, which traced the full path. Findings:

  • has_digests (the original audit's "smoking gun") is algorithm-only — it calls has_required_algorithms (hash.rs:82–103) and only confirms the sidecar advertises some hash under the required algorithm.
  • But that's not the only check. wheel.satisfies(policy) (preparer.rs:145–156) calls HashPolicy::matches (hash.rs:66–79), which does compare full (algorithm, digest) values against the lockfile (via HashDigest's derived PartialEq).

The cache-poisoning conclusion still stands — because both filters operate on hashes read from the sidecar, not from re-hashing the wheel bytes on disk. The on-disk wheel is never re-hashed on the cache-hit path. An attacker who can write $UV_CACHE_DIR writes a coherent (wheel, sidecar) pair and passes both filters trivially.

Rewrote the "Evidence — uv's cache-hit path does not re-hash on use" block to walk through both filters honestly with line-anchored URLs, and added a paragraph specifically saying "the second filter at first glance reads like the validation that would close the gap. It is not, because…" so a future reader reaching for the same suspicion you raised lands on the answer.

Line 693 — "where does the cache come from if the image is ephemeral?"
Good catch — the original "ephemeral" framing was sloppy. GitHub-hosted runners are ephemeral on the VM-image axis (on-disk state in ~/.cache/* etc. doesn't survive a runner reboot), but the GHA cache service that backs actions/cache and BuildKit's type=gha is a separate GitHub-managed cache backend, not part of the runner image — it's keyed by repo+branch scope and retrieved over the network at the start of each job. The runner is fresh, the cache contents are not, and that asymmetry is exactly what the uv / BuildKit findings exploit. Rewrote cross-cutting finding #4 to explain both axes explicitly.

While in that section, also promoted it to a Warning block per your general review note ("Self-Hosted Runner Warning"): the verdicts in this audit assume GitHub-hosted runners, and an adopter who moves to self-hosted runners may silently drop below SLSA L3-for-build-platform without any other change.

General review note — "Strengthen Finding 3"
Done. Renamed the finding from "(defense in depth)" to "(highly recommended)", rewrote the severity paragraph to spell out why wrangle in particular should treat ::stop-commands:: on par with Finding 1 and Finding 2 (the supply-chain attack class is wrangle's whole reason for existing, a compromise of wrangle propagates to every adopter, Pattern A's Go builder already does this), and tightened the recommendation to drop the "optional; nice-to-have" framing. Also updated the cross-references elsewhere in the doc.

Your own post-audit comment — --refresh and UV_CACHE_DIR caveat
Both incorporated into the uv mitigation menu:

  • Option 2 (uv sync --refresh): rewrote with your traced chain showing it's a no-op for PyPI's immutable URLs. Documented as a trap (left in the list rather than struck so anyone reaching for the obvious "force refresh" fix lands on the explanation).
  • Option 3 (UV_CACHE_DIR=$RUNNER_TEMP/uv-cache): added the caveat about adopter actions/cache rehydration collapsing the protection, with a note that wrangle itself uses no actions/cache calls (cross-cutting finding Get wrangle results displayed well (enough) #1) so the protection holds for wrangle's own invocation.

Verdict line updated to mark option (2) as listed-only-to-warn.

Diff is +170 / -44 on docs/SLSA_L3_AUDIT.md and nothing else — no YAML or shell changes, so the existing CI (actionlint / shellcheck / bats) is unaffected.


Generated by Claude Code

Addresses the review comments on PR #217:

- Clarify "L3-for-signing" / "L3-for-build-platform" terms are this
  audit's framing, not verbatim SLSA spec language (Tom's line 67 q).
- Add an explicit MEETS verdict for SLSA's "MUST NOT open services that
  allow for remote influence" L3 bullet — none of wrangle's composites
  open listening services (Tom's line 112 q: "did you check this one?").
- Disambiguate "reusable workflow" in the Provenance-is-Unforgeable
  conditional caveat: explicitly say wrangle's, not the upstream SLSA
  generator's (Tom's line 264 q).
- Define "Pattern A" inline on first use with a forward link to the
  Pattern A vs Pattern B revisit section (Tom's line 323 q).
- Redo the uv cache-hit hash-validation chain. Independent verification
  found the original chain incomplete: the audit named has_digests as
  the cache-hit check (algorithm-only) but missed the subsequent
  HashPolicy::matches via wheel.satisfies(policy), which DOES compare
  digest values. Both checks operate on sidecar-sourced hashes, so the
  cache-poisoning conclusion holds, but the explanation now reflects
  the full two-filter call chain with line-specific URLs (Tom's
  line 516 q + matches function reference).
- Elevate the self-hosted runner caveat to a Warning block and explain
  where GHA caches actually live (separate GitHub-managed cache backend,
  not on the runner VM image) — addresses the "where does the cache come
  from if the image is ephemeral?" question and the PR-level request to
  make this a Warning (Tom's line 693 q + general review note).
- Promote Finding 3 from "defense in depth, optional" to "highly
  recommended", with explicit justification grounded in wrangle's
  supply-chain-security positioning (general review note).
- Update Option 2 of the uv mitigation menu: uv sync --refresh does NOT
  close the vector for PyPI's immutable URLs (304 path skips re-hash);
  documented as a trap rather than struck so anyone reaching for the
  obvious "force refresh" fix lands on the explanation (post-audit
  PR comment).
- Add a caveat to Option 3 (UV_CACHE_DIR=$RUNNER_TEMP/uv-cache): the
  ephemeral-cache protection collapses if an adopter wires their own
  actions/cache around the wrangle composite (post-audit PR comment).

No behavioral or YAML changes; documentation-only.
@TomHennen TomHennen temporarily deployed to integration-test May 15, 2026 02:34 — with GitHub Actions Inactive
Comment thread docs/SLSA_L3_AUDIT.md Outdated
builders: the build job has minimal permissions, the signing job runs separately
with `id-token: write`, neither can directly tamper with the other. When an
(`delegator_lowperms-generic_slsa3.yml`) enforces for ecosystem-specific
"Pattern A" builders — i.e., the SLSA project's term for builders where the
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I generally find this pattern A/B distinction to be confusing.

The suggestion is that SLSA's builders are the only pattern A, while wrangle is only pattern B. However, wrangle currently provides both. Wrangles reusable workflows are Pattern A and the composite actions are Pattern B. Is that right? If so we should probably reconsider how we talk about them.

Is there a SLSA specific property we can use? Is the distinction really the difference between Provenance is Authentic and Provenance is Unforgeable? Alternatively, those are just the requirements we're talking about and a better distinction might be something about "reusable workflows provided by another org" vs "actions provided by another org that we run ourselves". Maybe just "reusable vs direct" to in shorthand?

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think wrangle needs to be very clear about discouraging direct usage of builds.

Comment thread docs/SLSA_L3_AUDIT.md Outdated
it does **not** compare digest values.
hash
([`crates/uv-installer/src/preparer.rs:145–156`](https://github.com/astral-sh/uv/blob/main/crates/uv-installer/src/preparer.rs#L145-L156)).
2. Inside `get_or_build_wheel`, the cache-hit branch reads the `.rev` sidecar
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a cache poisoning attack we could worry about two major classes...

  1. Cache lookups aren't content addressed and an attacker can cause a lookup for 'foo' to resolve to a bad thing and then cause later uses of 'foo' to be bad too.
  2. Cache lookups are content addressed. An attacker can get a bad thing into the cache but it will only be used if someone uses the bad hash to do the lookup. This is typically hard. It can probably only be exploited if an attacker has the ability to modify files within the cache out of band? E.g. they can mess with the sidecar? If the hash is verified at entry into the cache, but not verified later that's a problem if the attacker has write access to the cache itself (they can change the bytes) but not a problem if they don't.

Is that right? Please be thorough.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like you do mention this below... "an attacker who can write to $UV_CACHE_DIR writes
both the wheel and the sidecar as a coherent pair"

Is it fair to assume the attacker has write access to this directory? Is there anything we can do to prevent it? What does SLSA Build L3 require?

Comment thread docs/SLSA_L3_AUDIT.md
Replaces the home-grown "Pattern A / Pattern B" shorthand, which Tom
flagged as conflating two unrelated axes, and answers the cache-model
questions raised in the second review round.

- Drop "Pattern A / Pattern B" terminology entirely. The SLSA
  architecture axis now uses SLSA's real terms: "ecosystem-specific
  builder" (build runs inside the trusted upstream reusable workflow,
  e.g. builder_go_slsa3.yml) vs "generic generator"
  (generator_generic_slsa3.yml, which only signs). Wrangle uses the
  generic generator. The "Pattern A vs Pattern B revisit" section is
  retitled "Ecosystem-specific builders vs the generic generator" and
  rewritten, opening with an explicit statement that the two axes are
  orthogonal.
- Introduce "reusable vs direct" shorthand for the wrangle-consumption
  axis (how an adopter consumes wrangle) and use it consistently.
- Rename "Composite actions invoked directly" to "Direct composite
  consumption (NOT a supported L3 path)" and add a Warning block:
  direct composite consumption is not a supported L3 path, reusable
  workflows are the only supported way to get wrangle's L3 claims, and
  an adopter on the direct path must not advertise L3. Adopter-facing
  findings now flag the README doc work as a priority.
- uv section: add a "Threat model" subsection answering Tom's two
  questions. Classifies uv's cache hit as content-addressed-lookup-
  with-no-re-verification-at-use (his class 2), confirms the attacker
  does have write access to the shared cache via the GHA cache service
  round-trip (setup-uv saves/restores $UV_CACHE_DIR through it, so any
  build that runs uv can poison later builds without persistent-runner
  access), and states what SLSA L3 requires.
- Cross-cutting finding 1: answer "are wrangle's caches actually on
  GitHub's Actions cache service?" Yes for npm (cache: npm), uv
  (setup-uv enable-cache default), and container (type=gha); no for
  pip (cache: not set). "No direct actions/cache calls" does not put
  the cross-build cache service out of scope.

Documentation-only; no YAML or shell changes.
@TomHennen TomHennen temporarily deployed to integration-test May 16, 2026 13:53 — with GitHub Actions Inactive
Copy link
Copy Markdown
Owner Author

Second review round addressed

Pushed cde656b to the PR branch (claude/issue-216-agents-qM7Hn). Doc-only, +260 / -118 on docs/SLSA_L3_AUDIT.md.

Pattern A/B is confusing — agreed, and you diagnosed it exactly: the audit's "Pattern A/B" conflated two orthogonal axes. Dropped the term entirely:

  • SLSA architecture axis now uses SLSA's real terms — ecosystem-specific builder (build runs inside the trusted upstream reusable workflow, e.g. builder_go_slsa3.yml) vs generic generator (generator_generic_slsa3.yml, signs only). Wrangle uses the generic generator. So no — wrangle's reusable workflows are not "Pattern A"; they're wrangle-authored wrappers around the generic generator. That mis-mapping was the whole problem.
  • Wrangle consumption axis now uses your reusable vs direct shorthand consistently.
  • The section heading "Pattern A vs Pattern B revisit" → "Ecosystem-specific builders vs the generic generator", rewritten to open with an explicit statement that the two axes are orthogonal.

Discourage direct usage — renamed the section to "Direct composite consumption (NOT a supported L3 path)" and added a Warning block: reusable workflows are the only supported way to obtain wrangle's L3 claims; direct composite consumption is for non-L3 use only, and an adopter on that path must not advertise L3. The adopter-facing findings now flag the README doc work as a priority item (false-assurance risk), not a nicety.

Two classes of cache poisoning — "is that right?" — yes, your model is right. Added a "Threat model" subsection to the uv section. uv's cache hit is your class 2: lookup is by package URL/version, the hash lives in the sidecar, and the on-disk wheel is never re-hashed at use — so the attack requires writing a coherent (wheel, sidecar) pair into cache storage. (For contrast: npm's npm ci does re-hash at use, which is why npm is MEETS-with-precondition and uv is GAP — that's the single distinguishing factor.)

Is it fair to assume write access? What does L3 require? — yes, and the mechanism is the GHA cache service round-trip, not persistent-runner compromise. setup-uv's cache integration saves $UV_CACHE_DIR to the GHA cache service at job end and restores it at job start. Any build that runs uv with caching on therefore has write access to the shared cache later builds restore — a malicious dependency hook / poisoned pyproject.toml script / malicious test in one build poisons the cache, setup-uv publishes it, and GHA's branch-scoped rules let later builds restore it. Exactly the #205 pnpm vector. SLSA L3's "Isolated" requirement is categorical ("output MUST be identical whether or not the cache is used"), so the only roads are (a) re-hash on use — uv-upstream-only — or (b) don't consume a shared cache for L3-attested builds, which is what the release-vs-PR asymmetry does.

Are wrangle's caches actually on GitHub's Actions cache service? — yes for three of four:

  • npm cache: npmsetup-node calls actions/cache internally → GHA cache service.
  • uv → setup-uv defaults enable-cache: auto → saves/restores $UV_CACHE_DIR via the GHA cache service.
  • container type=gha → BuildKit talks to the GHA cache API directly.
  • pip is the exception — wrangle doesn't set cache: on setup-python, so ~/.cache/pip is per-run/local only.

Rewrote cross-cutting finding #1 to say this explicitly: "no direct actions/cache calls" was true but misleading, since the cross-build cache service is the backing store for the npm/uv/container surfaces and its branch-scoped sharing is what the GAP findings exploit.

I had a quick clarifying exchange with the repo owner before reworking the Pattern A/B framing (it was a structural call with a few viable directions) — the approach above reflects that.

The review branch claude/review-pr-217-UV5aa is kept in sync.


Generated by Claude Code

Folds in the deeper uv source trace and the threat-model analysis from
PR 217 review discussion.

- Correct a factual overstatement: the audit said a cache attacker must
  write "a coherent (wheel, sidecar) pair." Source tracing shows the
  install source is the unzipped archive store at archive-v0/<id>, where
  <id> is a random uv_fastid::Id::insecure() token (uv's own persist()
  carries a TODO to make it content-addressed). Nothing rebinds the
  installed bytes to the recorded hash, so an attacker overwrites the
  unzipped directory and touches no sidecar and forges no hash.
- Rewrite the uv evidence chain: pinned to uv commit 1e99086, traces the
  actual uv sync -> registry wheel -> download_wheel -> get_cacheable
  (.http sidecar) -> link_wheel_files path, and adds the archive-store
  layer. Ground the "uv treats the cache as trusted" characterization in
  uv's SECURITY.md and cache docs.
- Add a severity rating to Finding 1: P2 / should-fix. The precondition
  (code execution in a build whose cache reaches a release build)
  already lets the attacker tamper with the current release directly;
  the cache gap's marginal value is persistence and stealth, and it is a
  real violation of L3's named cache-isolation requirement. Not a
  critical release RCE.
- Explain who holds write access to UV_CACHE_DIR: no privilege boundary
  inside a job, so ordinary dependency code (run during uv run pytest,
  or sdist build backends) has it.
- Promote pull_request_target from a wrangle-internal aside to a
  first-class caveat: pull_request_target / workflow_run run in the
  base-branch cache scope, removing the "must reach the default-branch
  scope" limitation and exposing adopters who call wrangle reusable
  workflows from that context.
- Expand the uv source-verification reference list with the pinned
  commit and the newly traced files.

Documentation-only; no YAML or shell changes.
@TomHennen TomHennen temporarily deployed to integration-test May 16, 2026 14:53 — with GitHub Actions Inactive
Comment thread docs/SPEC.md Outdated
| **Publish** | Push artifact to registry, sign with Cosign | `build_and_publish_*.yml` | v0.1 (container) |
| **Verify** | Generate SLSA L3 build provenance, verify attestations against policy (Ampel) | `build_and_publish_*.yml` / future | v0.1 (provenance), v0.2 (policy) |

### SLSA L3 claims: what wrangle asserts
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think wrangle should make this distinction. it's too confusing to readers.

Instead wrangle should only ever claim "Build L2" or "Build L3" for any given workflow or method of using wrangle. Please check other docs that might need updating.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, and kept simple: this PR stays just the audit. Acting on it — the SPEC.md rewrite, the README sweep, the two GAP-closing implementations — is follow-up work, each its own issue per the contract of #216.

Pushed e7a118b (doc-only):

  • Reverted the docs/SPEC.md "SLSA L3 claims" section this PR added. docs/SPEC.md is now back to its exact pre-PR state — no SPEC change ships in this PR.

  • Added a "Bottom line" table near the top of SLSA_L3_AUDIT.md translating the per-builder verdicts into the binary vocabulary you asked for — each workflow is Build L2 or Build L3, nothing finer:

    Builder Build Track level today
    npm (npm sub-path) Build L3
    npm (pnpm sub-path) Build L3
    python (pip sub-path) Build L3
    python (uv sub-path) Build L2 → Build L3 after Finding 1
    container Build L2 → Build L3 after Finding 2
    shell N/A — no provenance
  • The audit still uses "L3-for-signing / L3-for-build-platform" internally as analytical shorthand — it's the language the per-builder analysis is written in — but it's now explicitly labelled audit-only, with a note that wrangle's user-facing docs must not use it.

The docs/SPEC.md rewrite (single Build-Track-level claim per workflow) and the README sweep are now in the PR's follow-up list as their own doc-only issues. Today the container and python-uv READMEs would drop to "Build L2" until Findings 1 and 2 land.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up to my note above: you said not to use "SLSA L3 for signing" anywhere, so I've gone further than "audit-only shorthand."

Pushed 29dae64 — the "L3-for-signing / L3-for-build-platform" shorthand is now removed from the audit entirely. Every occurrence (~20) is replaced with the SLSA v1.2 spec's own requirement names, "Provenance is Unforgeable" and "Isolated". The section that defined the shorthand ("The central framing…") is retitled "The two L3 requirements this audit turns on" and rewritten to introduce them by spec name instead of coining a term.

grep -rn 'L3-for-signing\|L3 for signing\|L3-for-build-platform' across docs/, README.md, build/, actions/ now returns nothing.

PR diff vs merge-base is still exactly one file: docs/SLSA_L3_AUDIT.md.

Per PR 217 review (docs/SPEC.md line 46): wrangle should not teach
adopters an "L3-for-signing vs L3-for-build-platform" distinction —
it is too confusing. Wrangle's user-facing docs should claim exactly
one Build Track level (Build L2 or Build L3) per workflow.

This PR is scoped to be *just the audit*. Acting on it — the SPEC.md
rewrite, the README sweep, and the two GAP-closing implementations —
is follow-up work, each its own issue per the contract of #216.

- Revert the SPEC.md "SLSA L3 claims" section this PR added. SPEC.md
  returns to its pre-PR state; no SPEC change ships in this PR.
- Add a "Bottom line" table near the top of SLSA_L3_AUDIT.md that
  translates the per-builder verdicts into the binary Build L2 /
  Build L3 vocabulary: npm + pnpm + pip are Build L3; uv and
  container are Build L2 today and reach Build L3 once Findings 1
  and 2 land; shell is N/A.
- Keep the audit's internal "L3-for-signing / L3-for-build-platform"
  shorthand, but label it explicitly as analytical-only and note
  that user-facing docs must not use it.
- Rewrite the "Adopter-facing framing" recommendation: instead of
  adding a dual-distinction section to SPEC.md, spawn a doc-only
  follow-up that gives SPEC.md and every README a single
  Build-Track-level claim per workflow.
- Fix the two now-dangling links to the removed SPEC.md anchor.

Documentation-only; no YAML or shell changes.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@TomHennen TomHennen temporarily deployed to integration-test May 17, 2026 14:45 — with GitHub Actions Inactive
Per PR 217 review: do not use "SLSA L3 for signing" (or the paired
"L3-for-build-platform") anywhere. The audit now refers to the two
load-bearing L3 requirements by their own SLSA v1.2 spec names —
"Provenance is Unforgeable" and "Isolated" — at every occurrence.

- Retitle "The central framing: L3-for-signing vs L3-for-build-platform"
  to "The two L3 requirements this audit turns on"; rewrite it to
  introduce the requirements by spec name rather than define a shorthand.
- Replace all ~20 in-body uses of the shorthand with the spec terms
  ("Isolated" verdict/gap/conformance; "Provenance is Unforgeable").
- Fix the Contents anchor for the retitled section.

Documentation-only; no YAML or shell changes.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Comment thread docs/SLSA_L3_AUDIT.md Outdated
structurally cheap (hash the bytes already on disk) and would close the
gap for every uv user.

**Verdict: GAP.** Recommendation: option (3) as a default for the release path
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

below you suggest option 1. My preference is for option 1 anyways (safer).

Can you confirm and update the audit?

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirmed — and you caught a real internal inconsistency. The uv-path verdict line (line 792) recommended option (3) as default with option (1) as opt-out, but Finding 1 further down recommends option (1). Option (1) is the safer lever, so I've converged everything on it.

Pushed f02550f:

  • uv-path verdict line now recommends option (1) — disable the uv cache on the release path. Rationale spelled out: option (1) removes the shared-cache surface entirely, so SLSA's "output MUST be identical whether or not the cache is used" holds categorically; option (3)'s ephemeral-cache protection is only conditional (it relies on $RUNNER_TEMP staying ephemeral and on no adopter actions/cache step rehydrating the path — the caveat the audit already documents). Option (3) is demoted to an explicit fallback.
  • Release-vs-PR implementation sketch (Python uv path) — dropped the UV_CACHE_DIR framing in the prose; it now describes option (1) (enable-cache: false on release), matching the YAML example that was already shown there.
  • Finding 1 recommendation — names option (1) explicitly, cross-links the mitigation menu, and marks option (3) as the conditional fallback.

All three spots now point the same way: option (1) is the default, option (3) is the fallback for when release-gate plumbing is deferred. Doc-only; PR diff vs merge-base is still just docs/SLSA_L3_AUDIT.md.

Per PR 217 review (line 792): the uv-path verdict line recommended
option (3) as default with option (1) as opt-out, but Finding 1 below
recommends option (1). Tom prefers option (1) — it is safer. Converge
all three spots on option (1) as the default:

- uv-path verdict line: recommend option (1) (disable the uv cache on
  the release path). It removes the shared-cache surface entirely, so
  SLSA's "output MUST be identical whether or not the cache is used"
  holds categorically; option (3)'s ephemeral-cache protection is only
  conditional (depends on $RUNNER_TEMP staying ephemeral and no adopter
  actions/cache rehydration). Option (3) demoted to explicit fallback.
- Release-vs-PR implementation sketch (Python uv path): drop the
  UV_CACHE_DIR framing; describe option (1) (enable-cache: false on
  release) as the lever, matching the YAML example already shown.
- Finding 1 recommendation: name option (1) explicitly and cross-link
  the mitigation menu; mark option (3) as the conditional fallback.

Documentation-only; no YAML or shell changes.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@TomHennen TomHennen temporarily deployed to integration-test May 17, 2026 18:53 — with GitHub Actions Inactive
@TomHennen TomHennen merged commit 2cba750 into main May 17, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Audit wrangle builders against SLSA L3 isolation + compare with ecosystem-specific generators

2 participants