Skip to content

Fix Slurm noquote marker handling#556

Open
fallintoplace wants to merge 1 commit into
NVIDIA-NeMo:mainfrom
fallintoplace:fix/slurm-noquote-wrapper
Open

Fix Slurm noquote marker handling#556
fallintoplace wants to merge 1 commit into
NVIDIA-NeMo:mainfrom
fallintoplace:fix/slurm-noquote-wrapper

Conversation

@fallintoplace

Copy link
Copy Markdown

What changed

  • replace the Slurm noquote type alias with a real string marker class
  • keep grouped --output and --error log paths wrapped in that marker after .replace(...)
  • add a regression test showing regular srun_args are shell-quoted again while the ${SLURM_RESTART_COUNT:-0} log path stays unquoted for shell expansion

Why

noquote was defined as an alias for str, so isinstance(arg, noquote) matched essentially every string and bypassed shlex.quote(...) for almost all srun arguments. That defeated the quoting branch entirely.

Using a wrapper class restores the intended split:

  • ordinary string arguments are quoted
  • only explicitly marked log-path strings bypass quoting

Impact

Slurm script materialization now treats shell-sensitive string arguments as single arguments again, while preserving the special unquoted log-file paths that rely on ${SLURM_RESTART_COUNT:-0} expansion.

Checks

  • uv run -- pytest test/core/execution/test_slurm_templates.py
  • uv run --group lint -- ruff check nemo_run/core/execution/slurm.py test/core/execution/test_slurm_templates.py

Notes

I found the same noquote: TypeAlias = str pattern in nemo_run/run/ray/slurm.py, but kept this PR scoped to the reported core Slurm path so the fix is easier to review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant