Skip to content

--experimental_remote_cache_chunking can produce truncated top-level outputs when combined with --disk_cache #29544

@tyler-french

Description

@tyler-french

Description of the bug:

With Bazel 9.1.0, a build using --experimental_remote_cache_chunking, a local --disk_cache, and top-level output downloads can report success while materializing a large output file as a single CDC chunk instead of the full blob.

The corrupted output has the size and contents of one CDC chunk, not the original file. For executable outputs this can result in Exec format error because the file no longer starts with the expected executable header.

  • Bazel 9.1.0
  • --experimental_remote_cache_chunking
  • --disk_cache=...
  • --remote_download_outputs=toplevel or any code path materializing an output file through Bazel's path-backed output stream
  • Remote cache/server that advertises CDC SplitBlob / SpliceBlob support

A large executable output is downloaded from a remote cache hit, Bazel reports success, but the file in bazel-bin/... is only one CDC chunk. Disabling --experimental_remote_cache_chunking avoids the issue

Expected behavior

The top-level output should be the full reassembled blob matching the digest recorded in the ActionResult.

If chunk reassembly produces the wrong bytes, Bazel should fail the download rather than reporting a successful build with a corrupted output.

Root cause

Bazel 9.1.0's chunked download path reassembles a blob by passing the final caller-provided output stream into each per-chunk download:

for (Digest chunkDigest : chunkDigests) {
  getFromFuture(combinedCache.downloadBlob(context, chunkDigest, out));
}

For top-level output materialization, that out eventually wraps a LazyFileOutputStream for the final output path. ReportingOutputStream and LazyFileOutputStream implement MaybePathBacked, so the local disk cache can see the final output path.

DiskCacheClient.download(...) has a path-backed fast path: when the output stream exposes a path, it copies the cached CAS object directly to that path. That is correct for whole-blob downloads, but it is wrong when the CAS object being downloaded is an individual CDC chunk and the path is the final parent blob output.

As a result, each chunk download can replace the final output file with that chunk. After the last chunk download, the output path contains only one chunk.

Which category does this issue belong to?

No response

What's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

  • Bazel 9.1.0
  • --experimental_remote_cache_chunking
  • --disk_cache=...
  • --remote_download_outputs=toplevel or any code path materializing an output file through Bazel's path-backed output stream
  • Remote cache/server that advertises CDC SplitBlob / SpliceBlob support
  • Access a binary from the bazel-bin after a build

Which operating system are you running Bazel on?

Linux x86

What is the output of bazel info release?

release 9.1.0

Resolved by #29614

Metadata

Metadata

Assignees

No one assigned

    Labels

    team-Remote-ExecIssues and PRs for the Execution (Remote) teamtype: buguntriagedHas not yet been seen by appropriate subteam

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions