From 31432c4db36125e416df06fe9a417dff61ba968b Mon Sep 17 00:00:00 2001 From: Rin Oliver Date: Fri, 5 Jun 2026 15:48:45 -0500 Subject: [PATCH] Add Bazel RBE integration guide (DI-574) New customer-facing page covering Orka as the macOS worker layer in a Bazel Remote Build Execution setup. Covers two paths: open-source buildfarm (recommended) and BuildBuddy (for teams already using it). Includes multi-worker deployment, GitHub Actions workflow automation, LaunchDaemon auto-start pattern, and persistent cache setup. Co-Authored-By: Claude Sonnet 4.6 --- docs.json | 1 + orka/orka-devops-integrations/bazel-rbe.mdx | 553 ++++++++++++++++++++ 2 files changed, 554 insertions(+) create mode 100644 orka/orka-devops-integrations/bazel-rbe.mdx diff --git a/docs.json b/docs.json index f808486..3427951 100644 --- a/docs.json +++ b/docs.json @@ -172,6 +172,7 @@ "orka/orka-devops-integrations/buildkite", "orka/orka-devops-integrations/teamcity", "orka/orka-devops-integrations/packer", + "orka/orka-devops-integrations/bazel-rbe", "orka/orka-devops-integrations/claude-code" ] }, diff --git a/orka/orka-devops-integrations/bazel-rbe.mdx b/orka/orka-devops-integrations/bazel-rbe.mdx new file mode 100644 index 0000000..e7a67bb --- /dev/null +++ b/orka/orka-devops-integrations/bazel-rbe.mdx @@ -0,0 +1,553 @@ +--- +title: "Bazel Remote Build Execution with Orka" +description: "Run macOS Bazel RBE workers on Orka VMs. Use open-source buildfarm or BuildBuddy as the RBE control plane — Orka provides the macOS worker fleet either way." +--- + +Bazel Remote Build Execution (RBE) splits a build into two distinct layers: the control plane, which schedules actions, manages the cache, and routes work to available workers; and the worker pool, which executes the actions on real hardware. For iOS and macOS builds, the worker pool has to be macOS — and that's where Orka fits. + +This page covers two paths: using open-source [bazel-buildfarm](https://github.com/bazelbuild/bazel-buildfarm) as the control plane, and using [BuildBuddy](https://www.buildbuddy.io/) as the control plane. Both use Orka VMs as the macOS worker fleet. + +## Architecture + +``` +[Developer / CI] + | + | gRPC (remote-apis protocol) + v +[RBE server — Linux VM] + bazel-buildfarm or BuildBuddy + Redis backplane (buildfarm only) + | + | dispatch to workers + v +[Orka macOS VMs — worker pool] + Xcode, Homebrew, Java, Bazelisk + bazel-buildfarm worker or BuildBuddy executor + | + v +[Remote cache — Linux VM or S3] + bazel-remote, buildfarm CAS, or BuildBuddy cache +``` + +The RBE server and cache typically run on a Linux VM co-located in your Orka cluster at MacStadium. The macOS workers are Orka VMs deployed from a versioned OCI image. + + +Bazel 9.0.0 is not compatible with bazel-buildfarm as of February 2026. Pin your workers and client to Bazel 8.x by running `echo "8.2.1" > .bazelversion` in your buildfarm directory before building. This does not affect BuildBuddy setups. + + +--- + +## Option 1: Open-source RBE with bazel-buildfarm + +### What you need + +- One Ubuntu VM (or bare-metal Linux) for the buildfarm server and Redis. This can be a co-located VM in your MacStadium environment. +- One or more Orka VMs as macOS workers. Scale the worker count to match your parallelism target. +- Your Bazel client (local dev machine or CI runner) with network access to the buildfarm server. + +### Set up the buildfarm server + +On your Ubuntu VM: + +**Install Java:** + +```bash +sudo apt install default-jdk +``` + +**Install Docker:** + +```bash +sudo apt-get update +sudo apt-get install ca-certificates curl +sudo install -m 0755 -d /etc/apt/keyrings +sudo curl -fsSL https://download.docker.com/linux/ubuntu/gpg -o /etc/apt/keyrings/docker.asc +sudo chmod a+r /etc/apt/keyrings/docker.asc + +echo \ + "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/ubuntu \ + $(. /etc/os-release && echo "$VERSION_CODENAME") stable" | \ + sudo tee /etc/apt/sources.list.d/docker.list > /dev/null +sudo apt-get update +sudo apt-get install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin +``` + +**Install Bazel:** + +```bash +sudo apt install npm +sudo npm install -g @bazel/bazelisk +``` + +**Start Redis:** + +```bash +sudo docker run -d --rm --name buildfarm-redis -p 6379:6379 redis:7.2.4 +``` + +**Install redis-cli and configure Redis:** + +```bash +sudo apt install redis-tools +redis-cli config set stop-writes-on-bgsave-error no +``` + +**Clone and configure buildfarm:** + +```bash +sudo apt install gh +gh auth login +gh repo clone bazelbuild/bazel-buildfarm +cd bazel-buildfarm + +# Pin Bazel version (9.0.0 is not supported) +echo "8.2.1" > .bazelversion +``` + +Edit `examples/config.minimal.yml` and set the Redis URI to your VM's IP: + +```yaml +backplane: + redisUri: "redis://:6379" + queues: + - name: "cpu" + properties: + - name: "min-cores" + value: "*" + - name: "max-cores" + value: "*" +``` + +**Start the buildfarm server:** + +```bash +bazelisk run //src/main/java/build/buildfarm:buildfarm-server -- \ + --jvm_flag=-Djava.util.logging.config.file=$PWD/examples/logging.properties \ + $PWD/examples/config.minimal.yml +``` + +The server listens on port `8980` by default. + +--- + +### Prepare your Orka worker image + +Create a base macOS VM image with the following installed. You'll deploy multiple workers from this image, so getting it right once avoids repeating the setup on each VM. + +**Install Xcode:** + +Download Xcode from [developer.apple.com](https://developer.apple.com/xcode/). Launch it, accept the license agreements, and complete the initial setup. Make sure the version matches your project's minimum Xcode requirement. + +**Install Homebrew:** + +```bash +/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)" +``` + +Follow the post-install instructions to add Homebrew to your shell path. + +**Install dependencies:** + +```bash +brew install temurin # Java +brew install bazelisk # Bazel version manager +brew install gh # GitHub CLI +``` + +**Clone buildfarm and pin Bazel:** + +```bash +gh auth login +gh repo clone bazelbuild/bazel-buildfarm +cd bazel-buildfarm +echo "8.2.1" > .bazelversion +``` + +Once everything is installed, save this VM as a base image in Orka. You'll use this image for all your workers. + +--- + +### Deploy and configure workers + +For each worker, deploy an Orka VM from your base image and configure it to point at the buildfarm server. + +**Deploy a worker VM:** + +```bash +orka3 vm deploy --vm-config --name +``` + +**Edit the worker config on each VM:** + +On the worker VM, edit `bazel-buildfarm/examples/config.minimal.yml`: + +```yaml +backplane: + redisUri: "redis://:6379" + queues: + - name: "cpu" + properties: + - name: "min-cores" + value: "*" + - name: "max-cores" + value: "*" +worker: + publicName: ":8981" # Must be unique per worker + executionPolicies: + - name: test + executionWrapper: + path: "/Users/admin/buildfarm/bazel-buildfarm/macos-wrapper.sh" +``` + +The `publicName` field must be unique for each worker. Use each VM's IP address. Workers that share a `publicName` will conflict. + +**Start the worker:** + +```bash +cd ~/bazel-buildfarm +bazelisk run //src/main/java/build/buildfarm:buildfarm-shard-worker -- \ + --jvm_flag=-Djava.util.logging.config.file=$PWD/examples/logging.properties \ + $PWD/examples/config.minimal.yml +``` + +Repeat for each worker VM. All workers point at the same Redis and buildfarm server; the server distributes work across the pool automatically. + +--- + +### Scale the worker pool + +Because all workers use the same base image and the same buildfarm server address, scaling is straightforward: deploy more Orka VMs from the base image, set a unique `publicName` on each, and start the worker process. + +A reasonable starting point for iOS simulator workloads is one worker VM per Orka node, with each VM sized to leave headroom for simulator processes. For compile-bound Bazel actions, higher VM density per node is fine since those workloads are CPU-bound rather than I/O-bound. + + +If you're running parallel simulator tests across multiple VMs on the same node and see I/O slowdowns, the bottleneck is disk contention from concurrent simulator writes — not virtualization overhead. Distribute across more nodes at lower VM-per-node density rather than reducing the total VM count. + + +--- + +### Connect your Bazel client + +Add the following to your project's `.bazelrc`: + +``` +build --remote_executor=grpc://:8980 +build --remote_default_exec_properties=execution-policy=test +``` + +Or pass the flags directly: + +```bash +bazel build \ + --remote_executor=grpc://:8980 \ + --remote_default_exec_properties=execution-policy=test \ + //your:target +``` + +To verify the cache is working, run the build once (cold), then run `bazel clean --expunge` and build again. The second build should show `remote cache hit` for most actions and complete significantly faster. + +Expected output on a warm cache: + +``` +INFO: Elapsed time: 6.595s, Critical Path: 0.14s +INFO: 65 processes: 28 remote cache hit, 37 internal. +INFO: Build completed successfully, 65 total actions +``` + +--- + +## Option 2: BuildBuddy with Orka + +If your team is already using BuildBuddy as your RBE control plane, Orka VMs can serve as the self-hosted Mac executors that BuildBuddy dispatches work to. This keeps Orka in the stack as the macOS worker layer — you get clean VM boots, OCI image versioning, and node failover even when BuildBuddy is handling scheduling and caching. + +For teams evaluating options, the OSS buildfarm path above is the recommended starting point. It gives you full control over the stack, no additional vendor dependency, and the same Orka worker benefits. + + +BuildBuddy's snapshot-based warm-runner technology (Firecracker) is Linux-only. On macOS, BuildBuddy falls back to runner recycling — keeping the executor process alive between invocations rather than snapshotting it. Orka VM snapshots work independently of this and are available regardless of which RBE control plane you use. + + +### What you need + +- An existing BuildBuddy deployment (cloud or self-hosted). +- One or more Orka VMs as macOS executors. +- Your Bazel client with network access to BuildBuddy's gRPC endpoint. + +### Prepare your Orka executor image + +Install the following on a base macOS VM: + +**Xcode:** Download from [developer.apple.com](https://developer.apple.com/xcode/). Accept the license and complete setup. + +**Homebrew:** + +```bash +/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)" +``` + +**Dependencies:** + +```bash +brew install bazelisk +``` + +**BuildBuddy executor binary:** + +Download the latest `buildbuddy-executor` binary for macOS/ARM from [BuildBuddy's GitHub releases](https://github.com/buildbuddy-io/buildbuddy/releases). Move it to a stable path: + +```bash +chmod +x buildbuddy-executor +sudo mv buildbuddy-executor /usr/local/bin/buildbuddy-executor +``` + +**Create the executor config file** at `/etc/buildbuddy/executor.yaml`: + +```yaml +executor: + app_target: "grpcs://remote.buildbuddy.io:443" + api_key: "" + local_cache_size_bytes: 10000000000 # 10 GB local disk cache +``` + +For a self-hosted BuildBuddy deployment, replace `remote.buildbuddy.io` with your BuildBuddy server address. + +**Run the executor:** + +```bash +buildbuddy-executor --config_file /etc/buildbuddy/executor.yaml +``` + +To keep the executor running across VM restarts, configure it as a LaunchDaemon. Create `/Library/LaunchDaemons/io.buildbuddy.executor.plist`: + +```xml + + + + + Label + io.buildbuddy.executor + ProgramArguments + + /usr/local/bin/buildbuddy-executor + --config_file + /etc/buildbuddy/executor.yaml + + RunAtLoad + + KeepAlive + + StandardOutPath + /var/log/buildbuddy-executor.log + StandardErrorPath + /var/log/buildbuddy-executor.log + + +``` + +Load it: + +```bash +sudo launchctl load /Library/LaunchDaemons/io.buildbuddy.executor.plist +``` + +Save this VM as a base image and deploy additional executor VMs from it to scale the pool. + +--- + +### Connect your Bazel client + +Add the following to your `.bazelrc`: + +``` +build --remote_executor=grpcs://remote.buildbuddy.io:443 +build --remote_cache=grpcs://remote.buildbuddy.io:443 +build --remote_header=x-buildbuddy-api-key= +``` + +BuildBuddy's web UI shows build results, cache hit rates, and action timing at [app.buildbuddy.io](https://app.buildbuddy.io). + +--- + +## Automate worker lifecycle with GitHub Actions + +The manual steps above work for static worker pools that stay running. For ephemeral workers that spin up per build and tear down when it's done, you can drive the whole thing from a GitHub Actions workflow. This pattern works for both the buildfarm and BuildBuddy setups — the difference is just what's installed on the worker image. + +### Set up worker auto-start in the base image + +The key to making ephemeral workers viable is having the worker process start automatically when the VM boots, so the GitHub Actions workflow doesn't need to SSH into each VM to configure it. Do this once when building your base image, then save it. + +**Create a startup script** at `/Users/admin/start-buildfarm-worker.sh`: + +```bash +#!/bin/bash +cd /Users/admin/bazel-buildfarm +exec bazelisk run //src/main/java/build/buildfarm:buildfarm-shard-worker -- \ + --jvm_flag=-Djava.util.logging.config.file=$PWD/examples/logging.properties \ + $PWD/examples/config.minimal.yml +``` + +```bash +chmod +x /Users/admin/start-buildfarm-worker.sh +``` + +**Create a LaunchDaemon** at `/Library/LaunchDaemons/io.macstadium.buildfarm-worker.plist`: + +```xml + + + + + Label + io.macstadium.buildfarm-worker + ProgramArguments + + /bin/bash + /Users/admin/start-buildfarm-worker.sh + + RunAtLoad + + KeepAlive + + WorkingDirectory + /Users/admin/bazel-buildfarm + StandardOutPath + /var/log/buildfarm-worker.log + StandardErrorPath + /var/log/buildfarm-worker.log + + +``` + +```bash +sudo launchctl load /Library/LaunchDaemons/io.macstadium.buildfarm-worker.plist +``` + +With this in place, every VM deployed from this image will register with your buildfarm server within about 30 seconds of boot. Save the image in Orka before continuing. + +For BuildBuddy setups, the LaunchDaemon above already covers auto-start — no separate script needed since the BuildBuddy LaunchDaemon is set up in the executor image prep steps above. + +--- + +### The workflow + +The workflow runs on a self-hosted runner that has access to your Orka cluster — the Linux VM running your buildfarm server is the natural home for it, since it's already in the same network as your Orka nodes. Store your Orka service account name and token as GitHub Actions secrets. + +```yaml +name: Bazel RBE + +on: + push: + branches: [main] + pull_request: + +env: + BUILDFARM_SERVER: "10.221.188.8" # IP of your persistent buildfarm server + WORKER_COUNT: "4" # How many Orka worker VMs to deploy + WORKER_VM_CONFIG: "bazel-worker" # Name of your saved Orka VM config + +jobs: + build: + runs-on: [self-hosted, linux] + + steps: + - uses: actions/checkout@v4 + + - name: Authenticate with Orka + id: auth + run: | + TOKEN=$(orka3 serviceaccount token "${{ secrets.ORKA_SA_NAME }}" \ + --endpoint "${{ vars.ORKA_ENDPOINT }}") + echo "token=$TOKEN" >> "$GITHUB_OUTPUT" + env: + ORKA_SA_TOKEN: ${{ secrets.ORKA_SA_TOKEN }} + + - name: Deploy worker VMs + id: workers + run: | + NAMES=() + for i in $(seq 1 "$WORKER_COUNT"); do + NAME="bazel-worker-${{ github.run_id }}-${i}" + orka3 vm deploy \ + --vm-config "$WORKER_VM_CONFIG" \ + --name "$NAME" \ + --endpoint "${{ vars.ORKA_ENDPOINT }}" \ + --token "${{ steps.auth.outputs.token }}" + NAMES+=("$NAME") + done + echo "names=${NAMES[*]}" >> "$GITHUB_OUTPUT" + + - name: Wait for workers to register + run: | + echo "Waiting for VMs to boot and register with buildfarm..." + sleep 45 + for NAME in ${{ steps.workers.outputs.names }}; do + orka3 vm list "$NAME" \ + --endpoint "${{ vars.ORKA_ENDPOINT }}" \ + --token "${{ steps.auth.outputs.token }}" + done + + - name: Run Bazel build + run: | + bazel build \ + --remote_executor=grpc://${{ env.BUILDFARM_SERVER }}:8980 \ + --remote_default_exec_properties=execution-policy=test \ + //... + + - name: Tear down worker VMs + if: always() + run: | + for NAME in ${{ steps.workers.outputs.names }}; do + orka3 vm delete "$NAME" \ + --endpoint "${{ vars.ORKA_ENDPOINT }}" \ + --token "${{ steps.auth.outputs.token }}" || true + done +``` + +A few things worth noting: + +- **Service account auth** — `orka3 serviceaccount token` is the right auth method for CI/CD. User tokens from `orka3 login` expire after an hour and will fail mid-build. See [Manage service accounts](/orka/orka-cluster-access/orka-cluster-manage-service-accounts) for setup. +- **Worker count** — `WORKER_COUNT` is the lever for parallelism. Start at 2–4 and increase if your build has enough parallelizable actions to justify it. Each VM adds to your cluster's VM slot consumption. +- **The 45-second wait** — covers VM boot plus the time for the LaunchDaemon to start the buildfarm worker process and register with the server. Adjust down if your cluster consistently boots faster; adjust up if you see "no workers available" errors on the first few actions. +- **`if: always()` on teardown** — ensures VMs are cleaned up even when the build fails. Without this, a failed build leaves orphaned VMs running until someone deletes them manually. +- **Unique VM names** — appending `github.run_id` to the VM name avoids naming collisions when multiple builds run in parallel. + +--- + +### Scale across multiple runs + +If your team runs many concurrent builds, the `WORKER_COUNT` per workflow multiplies by the number of parallel runs. Watch your cluster's available VM slots — the Orka web UI shows current VM counts per node. For sustained high-concurrency usage, a persistent warm pool (always-on workers) is more efficient than deploying and tearing down per-build, since it avoids the per-build boot latency and avoids VM slot churn. + +--- + +For both setups, a shared remote cache means build outputs survive VM restarts and are shared across all workers and clients. The simplest approach is running [bazel-remote](https://github.com/buchgr/bazel-remote) on your Linux VM: + +```bash +docker run -d \ + -u 1000:1000 \ + -v /path/to/cache:/data \ + -p 9090:8080 \ + --restart unless-stopped \ + buchgr/bazel-remote-cache \ + --max_size=50 +``` + +Point your Bazel client at it: + +``` +build --remote_cache=http://:9090 +``` + +For buildfarm setups, the buildfarm server also provides a Content-Addressable Store (CAS) — you don't need a separate cache server unless you want to share the cache with clients that bypass RBE. + +--- + +## Why Orka for the worker pool + +Bazel RBE handles scheduling, caching, and build orchestration. What it doesn't handle is provisioning a fleet of macOS workers that stays clean, versioned, and operationally tractable at scale. That's what Orka provides: + +- **Clean workers on demand.** Deploying a fresh Orka VM from a versioned OCI image takes seconds. Bare-metal re-imaging takes hours. +- **Image versioning.** Orka 3.5's Harbor OCI registry lets you treat your build environments like container images — versioned, signed, pullable by tag. Xcode drift across the fleet collapses cache hit rates; pinned OCI images prevent it. +- **Worker pool scaling.** Deploy additional worker VMs from the same base image without configuring physical hardware. +- **Declarative VM counts.** Orka's Kubernetes-native operator supports deployment-style VM pool management. Declare the pool size; the control plane keeps it warm. +- **Node failover.** If an Orka node goes down, VMs reschedule to healthy nodes. Bare-metal host failure is a manual remediation. + +For more on the Orka value proposition in Bazel RBE architectures, contact [support@macstadium.com](mailto:support@macstadium.com) or speak with your MacStadium solutions engineer.