Skip to content

ci: add Hetzner self-hosted runner smoke test#1483

Open
scottbuckel wants to merge 7 commits into
mainfrom
test/hetzner-runner
Open

ci: add Hetzner self-hosted runner smoke test#1483
scottbuckel wants to merge 7 commits into
mainfrom
test/hetzner-runner

Conversation

@scottbuckel
Copy link
Copy Markdown
Contributor

Summary

Adds a manual + branch-pushed smoke-test workflow that targets the new SRE-managed Hetzner runner pool, so we can validate runner health, cache services, and Docker availability before migrating production CI workflows.

  • runs-on: [self-hosted, hetzner, tier:{small,medium,large}] — matrix runs one job per tier label so we exercise all three
  • 5-minute timeout
  • Prints runner identity (hostname, kernel, CPU/RAM/disk), Docker version, cache service reachability (sccache, GHA cache, Earthly satellite), and the cache-related env vars the runner injects
  • Verifies the runner can actions/checkout this repo

Trigger

  • Manual via Actions tab → "Test Hetzner self-hosted runner" → Run workflow
  • Push to test/hetzner-runner branch (this PR)

Does NOT run on main or other PRs.

Provisioning context

Runners are provisioned + configured by shieldedtech/shielded-iac PR #1264. Box is fsn1-runner-01.hetzner.shielded.tools (Hetzner AX162-R, 16c/32t, 128GB RAM, 2×1.92TB NVMe). Currently 8 systemd-managed actions/runner instances.

Next steps after this lands

  • Once the smoke test is green, migrate hot midnight-node CI workflows (cargo build, earthly +deploy, etc.) to use these runners — likely starts with continuous-integration.yml's ubuntu-latest-16-core-x64 jobs.
  • Drop use-cache: false on EarthBuild setup steps so the persistent Earthly satellite + sccache + GHA cache server actually deliver wins.

Adds a workflow_dispatch + branch-pushed smoke test that targets the new
SRE-managed Hetzner runner pool (fsn1-runner-01, 8 instances at
tier:small/medium/large labels). Does NOT run on main or PRs.

Provisioning context: shieldedtech/shielded-iac PR #1264.

Signed-off-by: Scott Buckel <scott.buckel@shielded.io>
@scottbuckel scottbuckel requested a review from a team as a code owner May 8, 2026 21:04
Targets the new SRE-managed Hetzner runner pool (fsn1-runner-01) for the
heavy 'Build node and images' job. Also flips EarthBuild use-cache from
false to true so the persistent Earthly satellite carries layer cache
across runs.

ONLY for this test PR — revert before any merge to main. The Hetzner pool
is currently a single AX162-R; production CI shouldn't depend on it until
we have at least 2 boxes for redundancy.

PR for the runner provisioning: shieldedtech/shielded-iac#1264.

Signed-off-by: Scott Buckel <scott.buckel@shielded.io>
use-cache=true tells EarthBuild's setup-action to use GitHub Actions cache
for Earthly state, including TLS-secured buildkitd certs. On a fresh runner
with no prior GHA cache, those certs don't exist and Earthly fails:
  TLS CA file /opt/actions-runner/.earthly/certs/ca_cert.pem is missing

We don't want GHA cache anyway — the Hetzner runner has a persistent local
Earthly satellite at tcp://127.0.0.1:8372, accessed via EARTHLY_BUILDKIT_HOST
env var (set by the ansible runner role). Reverting use-cache to false.

The satellite is still doing its job; just not via the EarthBuild action's
GHA-cache mechanism.

Signed-off-by: Scott Buckel <scott.buckel@shielded.io>
Self-hosted runner pool is the only one — 'self-hosted' + a tier label
is enough to uniquely target it. Removing the 'hetzner' tag from
runs-on lets the labels evolve without coupling workflow YAMLs to the
hosting provider.

Signed-off-by: Scott Buckel <scott.buckel@shielded.io>
The repo's .earthly/config.yml sets buildkit_additional_args and
cache_size_pct, which tell earthly to manage its OWN local buildkitd
with TLS. That config takes precedence over EARTHLY_BUILDKIT_HOST,
so the env var is silently ignored and earthly tries to spin up a
local buildkit container — failing because no TLS certs exist at
~/.earthly/certs/.

Add a workflow step that overwrites .earthly/config.yml after checkout
with one that points at the local satellite over plain TCP. Test-only;
revert before merge.

Signed-off-by: Scott Buckel <scott.buckel@shielded.io>
Switching strategy: instead of pointing earthly at our separately-managed
'earthly-satellite' container (which earthly treats as a port conflict
because it thinks loopback buildkit_host = its own to manage), let earthly
manage its own buildkitd at the default address. The only thing we need
to override on midnight-node's repo config is TLS — earthly defaults to
TLS-enabled but no certs are provisioned on the runner.

Requires the 'earthly-satellite' systemd service to be stopped on the
Hetzner runner so port 8372 is free for earthly's own managed buildkitd.

Signed-off-by: Scott Buckel <scott.buckel@shielded.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant