feat: add max_training_steps config for CI smoke tests by dinhxuanvu · Pull Request #1690 · NovaSky-AI/SkyRL

dinhxuanvu · 2026-05-19T17:11:13Z

Summary

Adds a max_training_steps config field that caps training after N steps regardless of epochs or dataset size. This enables fast CI smoke tests that validate full training pipelines end-to-end without running to completion.

Supported in all trainer paths: sync RL, async RL, and SFT
Validated as a positive integer when set (ValueError on <= 0)
SFT: caps num_steps after resolution from num_epochs and propagates to the LR scheduler
RL: caps total_training_steps at initialization and adds an early exit after step completion
Async RL: cancels in-flight generation tasks on early exit to avoid resource leaks

Motivation

Running full training configs in nightly CI is prohibitively expensive. max_training_steps=2 exercises the core pipeline — data loading, model init, forward/backward, and logging — in minutes rather than hours/days, validating that the full stack initializes and trains without errors. Unlike dummy_run_max_steps (SFT-only, uses fabricated data), this uses real data and works across RL and SFT configs.

Test plan

Unit tests covering config construction, validation, capping logic, early exit, and post-loop cleanup
38 tests (33 pass, 5 skip on macOS due to transformers unavailability)
Pre-commit checks pass (ruff, black)
GPU integration test with a real config + max_training_steps=2 (requires cluster)

Add an optional max_training_steps parameter to TrainerConfig (RL) and SFTConfig (SFT) that caps training early regardless of epochs or dataset size. This enables CI smoke tests to exercise the full pipeline (data loading, model init, forward/backward pass) while limiting wall-clock time to a few steps. Applied consistently across all trainer paths: - Sync RL trainer (RayPPOTrainer) - Fully async RL trainer (FullyAsyncRayPPOTrainer) - SFT trainer (SFTTrainer) Unlike dummy_run_max_steps (which fabricates synthetic data for benchmarking), max_training_steps runs the real data path end-to-end and simply exits early — making it suitable for config validation.

Both RL (validate_cfg) and SFT (validate_sft_cfg) now raise when max_training_steps is set to 0 or a negative value, preventing nonsensical behavior like negative num_steps or global_step.

Change assert to ValueError in validate_cfg for consistency with SFT side and safety against -O execution. Add test coverage for RL validate_cfg rejecting zero/negative max_training_steps.

gemini-code-assist

Code Review

This pull request introduces a max_training_steps configuration option across RL (sync/async) and SFT trainers, enabling early termination of training regardless of the number of epochs or dataset size. The changes include updates to configuration classes, validation logic, and trainer implementations, along with a new test suite. Review feedback identifies an off-by-one error in the early exit logic of both the synchronous and fully asynchronous trainers, where global_step is incremented before the limit check; this could lead to incorrect checkpointing and skipped steps upon resumption.

global_step is incremented before the max_training_steps check, so when the loop breaks it is max_training_steps + 1. Clamp it back to max_training_steps before exiting to ensure checkpoint metadata records the correct final step.

dinhxuanvu added 3 commits May 19, 2026 12:30

fix: reject zero/negative max_training_steps in validation

3bec795

Both RL (validate_cfg) and SFT (validate_sft_cfg) now raise when max_training_steps is set to 0 or a negative value, preventing nonsensical behavior like negative num_steps or global_step.

fix: use ValueError for RL max_training_steps validation + add tests

ebc84a5

Change assert to ValueError in validate_cfg for consistency with SFT side and safety against -O execution. Add test coverage for RL validate_cfg rejecting zero/negative max_training_steps.

gemini-code-assist Bot reviewed May 19, 2026

View reviewed changes

Comment thread skyrl/train/fully_async_trainer.py

Comment thread skyrl/train/trainer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add max_training_steps config for CI smoke tests#1690

feat: add max_training_steps config for CI smoke tests#1690
dinhxuanvu wants to merge 4 commits into
NovaSky-AI:mainfrom
dinhxuanvu:feat/max-training-steps

dinhxuanvu commented May 19, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

dinhxuanvu commented May 19, 2026

Summary

Motivation

Test plan

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant