Skip to content

Resolve issues with orphaned VMs when process restarts with in-flight jobs#64

Closed
Rich7690 wants to merge 3 commits into
macstadium:mainfrom
Rich7690:orphaned-vms
Closed

Resolve issues with orphaned VMs when process restarts with in-flight jobs#64
Rich7690 wants to merge 3 commits into
macstadium:mainfrom
Rich7690:orphaned-vms

Conversation

@Rich7690

Copy link
Copy Markdown

Currently if the process restarts for any reason, the trackedVMs list is reset to zero on startup. This causes issues if you have many jobs in flight by leaving the VMs orphaned on the nodes and it has to be cleaned up manually. This addition adds a orka3 vm list on startup to index all the VMs back into the list so it follows similar orphaned checks.
I wasn't sure if this is the best way to check which VMs belong to the given runner scale set aside from prefix checks on the VM names since the orka API can't necessarily filter any other way.

@Rich7690 Rich7690 requested a review from a team as a code owner May 26, 2026 23:27
@ispasov

ispasov commented May 27, 2026

Copy link
Copy Markdown
Collaborator

As different users have different workflows, lets add this behind an option.
It can be enabled by default, but we should allow people to opt-out from it.

@Rich7690

Copy link
Copy Markdown
Author

As different users have different workflows, lets add this behind an option. It can be enabled by default, but we should allow people to opt-out from it.

Thanks for the review. Added an environment variable to configure the behavior.

@ispasov

ispasov commented May 28, 2026

Copy link
Copy Markdown
Collaborator

@Rich7690 can you rebase your branch against main so the latest change can be applied.
This would allow us to test this in a scenario where the scaleset is preserved

@Rich7690

Copy link
Copy Markdown
Author

Closing in favor of #66

@Rich7690 Rich7690 closed this May 29, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants