Skip to content

Conversation

@wlggraham
Copy link
Contributor

@wlggraham wlggraham commented Jan 23, 2026

PR Details

Description

This PR enables cleanup in sparkeks plugin. Because the Cleanup() function does not include a Runtime object in the signature, some of the functions had to be modified slightly, the main one being where the kubeconfig file would be created.

The stale query modification prevents a race condition where the janitor can actually move a job from 'canceling' to 'canceled' before the go context actually cancels context. This can lead to the new "failed" status overwriting the 'canceling' status. The query was tested and still only takes ~2-3ms.

Types of changes

  • Docs change / refactoring / dependency upgrade
  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist

  • My code follows the code style of this project.
  • My change requires a change to the documentation and I have updated the documentation accordingly.
  • I have added tests to cover my changes.

Copy link
Contributor

@hladush hladush left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add metrics and LGTM

@wlggraham
Copy link
Contributor Author

Add metrics and LGTM

The janitor process has a bunch of metrics that will report any failures that happen within the cleanup function, so I think we're good.

@wlggraham wlggraham merged commit 7d4cdc0 into main Jan 23, 2026
6 checks passed
@wlggraham wlggraham deleted the spark_eks_job_cancel branch January 23, 2026 15:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants