Normalize scheduler/warmup step-like arguments by grad_accumulation_steps; warmup-only configuration, sane argument defaults and tests #582

jyork03 · 2025-11-01T19:39:41Z

Summary:

Step-like arguments are converted by grad_accumulation_steps for consistency with configured training steps.
- e.g. A warmup of 100, with gradient accumulation of 32 steps will now last for 100 steps instead of 100*32 steps. This should be more intuitive for users.
Missing scheduler name now falls back to constant when warmup > 0, making warmup-only configs simpler.
If both name and warmup are missing, we raise KeyError("name") to surface misconfigurations early.
Centralized learning rate validation and safer handling of variable argument lengths across schedulers.

Grad accumulation normalization:

warmup: steps
linear_schedule: steps
cosine_decay: decay_steps
step_decay: step_size
Conversion: effective_steps = ceil(config_steps / max(1, grad_accumulation_steps)), with a minimum of 1.

Tests:

Warmup-only config now validated in tests.
Added tests for linear/cosine/step variants and grad-accum conversions; linear schedule comparisons use MLX reference.

Behavioral notes:

No change for valid, named schedulers.
Schedules without arguments fall back to sane defaults
Unknown names still raise ValueError as before.

Minor improvements:

Added muon to --optimizer help text
Changed --learning-rate help text from "Adam learning rate." to "Optimizer learning rate." since it applies to more than just Adam
Improved lr_schedule configuration discoverability and documentation

From:

"lr_schedule": None

To:

# name options: "cosine_decay", "linear_schedule", "exponential_decay", or "step_decay"
# arguments match positional values for the corresponding mlx scheduler:
# See: https://ml-explore.github.io/mlx/build/html/python/optimizers/schedulers.html
"lr_schedule": {"name": None, "arguments": [], "warmup": 0, "warmup_init": 0.0},

Added reference to config and docs where the scheduler is built:

# Initialize the selected optimizer
  lr = args.learning_rate
  if args.lr_schedule.get("name", None) or args.lr_schedule.get("warmup", 0) > 0:
      # See CONFIG_DEFAULTS["lr_schedule"] for the format
      # and https://ml-explore.github.io/mlx/build/html/python/optimizers/schedulers.html
      # for the available schedulers
      lr = build_schedule(
          args.lr_schedule,
          args.learning_rate,
          args.iters,
          args.grad_accumulation_steps,
      )

…ests * Allow warmup-only configs by defaulting missing name to "constant" when warmup > 0 * Raise KeyError when both name and warmup are missing; keep ValueError for unknown names * Centralize learning-rate presence check; robust arg handling for linear/cosine/exponential/step schedulers * Convert step-like args by grad_accumulation_steps consistently * Update tests: warmup-only behavior validated; add schedule argument and grad-accum conversion tests; align linear schedule checks with MLX reference

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Normalize scheduler/warmup step-like arguments by grad_accumulation_steps; warmup-only configuration, sane argument defaults and tests #582

Normalize scheduler/warmup step-like arguments by grad_accumulation_steps; warmup-only configuration, sane argument defaults and tests #582

Uh oh!

jyork03 commented Nov 1, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Normalize scheduler/warmup step-like arguments by grad_accumulation_steps; warmup-only configuration, sane argument defaults and tests #582

Are you sure you want to change the base?

Normalize scheduler/warmup step-like arguments by grad_accumulation_steps; warmup-only configuration, sane argument defaults and tests #582

Uh oh!

Conversation

jyork03 commented Nov 1, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant