Skip to content

feat: add max_training_steps config for CI smoke tests#1690

Open
dinhxuanvu wants to merge 4 commits into
NovaSky-AI:mainfrom
dinhxuanvu:feat/max-training-steps
Open

feat: add max_training_steps config for CI smoke tests#1690
dinhxuanvu wants to merge 4 commits into
NovaSky-AI:mainfrom
dinhxuanvu:feat/max-training-steps

Conversation

@dinhxuanvu
Copy link
Copy Markdown

Summary

Adds a max_training_steps config field that caps training after N steps regardless of epochs or dataset size. This enables fast CI smoke tests that validate full training pipelines end-to-end without running to completion.

  • Supported in all trainer paths: sync RL, async RL, and SFT
  • Validated as a positive integer when set (ValueError on <= 0)
  • SFT: caps num_steps after resolution from num_epochs and propagates to the LR scheduler
  • RL: caps total_training_steps at initialization and adds an early exit after step completion
  • Async RL: cancels in-flight generation tasks on early exit to avoid resource leaks

Motivation

Running full training configs in nightly CI is prohibitively expensive. max_training_steps=2 exercises the core pipeline — data loading, model init, forward/backward, and logging — in minutes rather than hours/days, validating that the full stack initializes and trains without errors. Unlike dummy_run_max_steps (SFT-only, uses fabricated data), this uses real data and works across RL and SFT configs.

Test plan

  • Unit tests covering config construction, validation, capping logic, early exit, and post-loop cleanup
  • 38 tests (33 pass, 5 skip on macOS due to transformers unavailability)
  • Pre-commit checks pass (ruff, black)
  • GPU integration test with a real config + max_training_steps=2 (requires cluster)

Add an optional max_training_steps parameter to TrainerConfig (RL) and
SFTConfig (SFT) that caps training early regardless of epochs or dataset
size. This enables CI smoke tests to exercise the full pipeline (data
loading, model init, forward/backward pass) while limiting wall-clock
time to a few steps.

Applied consistently across all trainer paths:
- Sync RL trainer (RayPPOTrainer)
- Fully async RL trainer (FullyAsyncRayPPOTrainer)
- SFT trainer (SFTTrainer)

Unlike dummy_run_max_steps (which fabricates synthetic data for
benchmarking), max_training_steps runs the real data path end-to-end
and simply exits early — making it suitable for config validation.
Both RL (validate_cfg) and SFT (validate_sft_cfg) now raise when
max_training_steps is set to 0 or a negative value, preventing
nonsensical behavior like negative num_steps or global_step.
Change assert to ValueError in validate_cfg for consistency with SFT
side and safety against -O execution. Add test coverage for RL
validate_cfg rejecting zero/negative max_training_steps.
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a max_training_steps configuration option across RL (sync/async) and SFT trainers, enabling early termination of training regardless of the number of epochs or dataset size. The changes include updates to configuration classes, validation logic, and trainer implementations, along with a new test suite. Review feedback identifies an off-by-one error in the early exit logic of both the synchronous and fully asynchronous trainers, where global_step is incremented before the limit check; this could lead to incorrect checkpointing and skipped steps upon resumption.

Comment thread skyrl/train/fully_async_trainer.py
Comment thread skyrl/train/trainer.py
global_step is incremented before the max_training_steps check, so
when the loop breaks it is max_training_steps + 1. Clamp it back to
max_training_steps before exiting to ensure checkpoint metadata
records the correct final step.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant