CADC-15011: Kueue testing updates for lifecycle management and control-plane observability#334
Merged
CADC-15011: Kueue testing updates for lifecycle management and control-plane observability#334
Conversation
… cleanup of the testing methodology and cli
Contributor
✅ All pre-commit checks passedThanks for keeping the repo tidy! ✨ |
…moving code duplication - Move and to top-level CLI for simpler access (was ) - Consolidate as the primary end-to-end workflow command, replacing separate and commands - Extract shared constants to and centralized Kubernetes config to to eliminate magic numbers and reduce duplication - Remove experimental module (unused code) - Refactor internal suite orchestration to accept resolved performance/eviction options instead of raw profile names, enabling flexible parameter overrides - Update documentation to reflect simplified CLI surface and new command structure - Remove standalone CLI commands (observation functionality now accessed through and ) - Update all tests to use new command structure and internal APIs
Refactor codebase for improved maintainability and CLI UX
SharonGoliath
approved these changes
Mar 11, 2026
SharonGoliath
left a comment
There was a problem hiding this comment.
- It's too big to read individually, so I downloaded and installed it, and did a pylint (default configuration) and pytest --cov on it.
pylint output that I think is important enough to raise:
- Bug: access before definition (src/kueuer/utils/k8s_config.py)Pylint: E0203: Access to member '_initialized' before its definition line 42 at line 39.
- Wrong number of arguments (src/kueuer/benchmarks/track.py)
Pylint: E1121: Too many positional arguments for method call at lines 293, 326, 340 (e.g. status(item, "Complete") / status(item, "Failed")).- Cyclic import (pylint R0401)
Chain: kueuer.benchmarks.benchmark → kueuer.lifecycle.commands → kueuer.lifecycle.suite (and back).
- There's no login host, so I don't know how this will change execution instructions. Shaun has set up capsule for multi-tenant (CADC and RCS) user management. Use an OIDC claim for kubectl.
- Comments on the documentation:
- benchmark-walkthrough.md - line 68 mentions "manifest state". A "manifest" shows up in a lot of places in the code, but this is the only mention in the docs. Either remove the single mention, or explain it if it's important enough.
- metrics-semantics.md - lines 90 - 94 read more like they belong in a change log
- the docs do not mention the kueue occupation restrictions. For my own knowledge, do these restrictions affect this testing?
- What would happen if testing was done with milli-cores, for example?
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Overview
This PR introduces a comprehensive overhaul of the
kueuertesting framework for production-ready Kueue validation system with full lifecycle management, control-plane observability, and automated analysis.Key Improvements
Lifecycle
lifecyclemodule: Complete workflow automation from preflight checks to teardowncontrolandbacklogscenarios for testing under different queue pressuresartifacts/<run_id>/structureObservability
observemodule to gather control-plane metrics during benchmark runsMetrics
Streamlined CLI
kr benchmark e2eworkflowlocal-safeandcluster-scalefor different testing scenariosDocumentation
What This Enables
The tool now answers two critical production questions:
Example Usage