ci: add Hetzner self-hosted runner smoke test#1483
Open
scottbuckel wants to merge 7 commits into
Open
Conversation
Adds a workflow_dispatch + branch-pushed smoke test that targets the new SRE-managed Hetzner runner pool (fsn1-runner-01, 8 instances at tier:small/medium/large labels). Does NOT run on main or PRs. Provisioning context: shieldedtech/shielded-iac PR #1264. Signed-off-by: Scott Buckel <scott.buckel@shielded.io>
Targets the new SRE-managed Hetzner runner pool (fsn1-runner-01) for the heavy 'Build node and images' job. Also flips EarthBuild use-cache from false to true so the persistent Earthly satellite carries layer cache across runs. ONLY for this test PR — revert before any merge to main. The Hetzner pool is currently a single AX162-R; production CI shouldn't depend on it until we have at least 2 boxes for redundancy. PR for the runner provisioning: shieldedtech/shielded-iac#1264. Signed-off-by: Scott Buckel <scott.buckel@shielded.io>
use-cache=true tells EarthBuild's setup-action to use GitHub Actions cache for Earthly state, including TLS-secured buildkitd certs. On a fresh runner with no prior GHA cache, those certs don't exist and Earthly fails: TLS CA file /opt/actions-runner/.earthly/certs/ca_cert.pem is missing We don't want GHA cache anyway — the Hetzner runner has a persistent local Earthly satellite at tcp://127.0.0.1:8372, accessed via EARTHLY_BUILDKIT_HOST env var (set by the ansible runner role). Reverting use-cache to false. The satellite is still doing its job; just not via the EarthBuild action's GHA-cache mechanism. Signed-off-by: Scott Buckel <scott.buckel@shielded.io>
Self-hosted runner pool is the only one — 'self-hosted' + a tier label is enough to uniquely target it. Removing the 'hetzner' tag from runs-on lets the labels evolve without coupling workflow YAMLs to the hosting provider. Signed-off-by: Scott Buckel <scott.buckel@shielded.io>
The repo's .earthly/config.yml sets buildkit_additional_args and cache_size_pct, which tell earthly to manage its OWN local buildkitd with TLS. That config takes precedence over EARTHLY_BUILDKIT_HOST, so the env var is silently ignored and earthly tries to spin up a local buildkit container — failing because no TLS certs exist at ~/.earthly/certs/. Add a workflow step that overwrites .earthly/config.yml after checkout with one that points at the local satellite over plain TCP. Test-only; revert before merge. Signed-off-by: Scott Buckel <scott.buckel@shielded.io>
Switching strategy: instead of pointing earthly at our separately-managed 'earthly-satellite' container (which earthly treats as a port conflict because it thinks loopback buildkit_host = its own to manage), let earthly manage its own buildkitd at the default address. The only thing we need to override on midnight-node's repo config is TLS — earthly defaults to TLS-enabled but no certs are provisioned on the runner. Requires the 'earthly-satellite' systemd service to be stopped on the Hetzner runner so port 8372 is free for earthly's own managed buildkitd. Signed-off-by: Scott Buckel <scott.buckel@shielded.io>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a manual + branch-pushed smoke-test workflow that targets the new SRE-managed Hetzner runner pool, so we can validate runner health, cache services, and Docker availability before migrating production CI workflows.
runs-on: [self-hosted, hetzner, tier:{small,medium,large}]— matrix runs one job per tier label so we exercise all threeactions/checkoutthis repoTrigger
test/hetzner-runnerbranch (this PR)Does NOT run on main or other PRs.
Provisioning context
Runners are provisioned + configured by
shieldedtech/shielded-iacPR #1264. Box isfsn1-runner-01.hetzner.shielded.tools(Hetzner AX162-R, 16c/32t, 128GB RAM, 2×1.92TB NVMe). Currently 8 systemd-managedactions/runnerinstances.Next steps after this lands
continuous-integration.yml'subuntu-latest-16-core-x64jobs.use-cache: falseon EarthBuild setup steps so the persistent Earthly satellite + sccache + GHA cache server actually deliver wins.