Skip to content

fix: Re-use existing runner scale set instead of panicking on startup#63

Closed
baburciu wants to merge 1 commit into
macstadium:mainfrom
baburciu:fm-main
Closed

fix: Re-use existing runner scale set instead of panicking on startup#63
baburciu wants to merge 1 commit into
macstadium:mainfrom
baburciu:fm-main

Conversation

@baburciu

@baburciu baburciu commented Apr 28, 2026

Copy link
Copy Markdown

When a pod crashes or restarts without clean shutdown, the runner scale set remains registered in GitHub. According to actions/actions-runner-controller#3035, there's no GH API to purge stale runner scale sets.

When the application restarts after an unclean shutdown (crash, OOMKill, pod eviction, CrashLoopBackOff), the runner scale set remains registered but offline in GitHub:
Orka GitHub Runner Scale Sets

On the next startup, CreateRunnerScaleSet fails with:

panic: unable to create runner <name>, err: 400 - had issue communicating with
Actions backend: The runner scale set already exists in runner group Default.

This puts the pod into a permanent CrashLoopBackOff, because every restart attempt hits the same stale scale set and panics again.

Root cause: main.go unconditionally calls CreateRunnerScaleSet without first checking whether one already exists. The cleanup goroutine that deletes the scale set on shutdown only runs on graceful SIGTERM — it does not run when the process panics or is force-killed.

Fix: Before attempting to create a new runner scale set, call GetRunnerScaleSet (which already exists in the codebase) to check for an existing one. If found, re-use it. If not, create a new one as before.

  // Before (always creates, panics if exists)
  runnerScaleSet, err := actionsClient.CreateRunnerScaleSet(ctx, &types.RunnerScaleSet{...})

  // After (get-or-create)
  runnerScaleSet, err := actionsClient.GetRunnerScaleSet(ctx, groupId, runnerName)
  if runnerScaleSet != nil {
      logger.Infof("found existing runner scale set %s (id: %d), re-using it", ...)
  } else {
      runnerScaleSet, err = actionsClient.CreateRunnerScaleSet(ctx, &types.RunnerScaleSet{...})
  }

Co-Authored-By: Claude Opus 4.6 (1M context) noreply@anthropic.com

When a pod crashes or restarts without clean shutdown, the runner scale
set remains registered in GitHub. The previous code always called
CreateRunnerScaleSet, which panics if one already exists. Now we first
check with GetRunnerScaleSet and re-adopt it if found.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@ispasov

ispasov commented May 27, 2026

Copy link
Copy Markdown
Collaborator

@baburciu Thank you for your contribution.
I see that this is still in draft.
Meanwhile it is something that we have already implemented. You can check it out here #65

@baburciu

Copy link
Copy Markdown
Author

@ispasov thank you, I forgot to make it ready upon testing. I'll close mine

@baburciu baburciu closed this May 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants