Summary
SkyRL uses a custom MegatronBridge implementation for GLM 4.7 :
|
"""Register megatron-bridge implementations for model architectures not yet |
|
supported upstream. |
|
|
|
Import this module at the top of ``megatron_worker.py`` so that bridges are |
|
registered before any ``AutoBridge.from_hf_pretrained`` call. |
|
|
|
All registrations are guarded by a top-level ``try/except ImportError`` so that |
|
the rest of the codebase still works in CPU-only (no megatron-bridge) environments. |
|
""" |
|
|
|
try: |
|
from megatron.bridge.models.conversion.model_bridge import MegatronModelBridge |
|
from megatron.bridge.models.deepseek.deepseek_v3_bridge import DeepSeekV3Bridge |
|
from megatron.bridge.models.hf_pretrained.causal_lm import PreTrainedCausalLM |
|
from megatron.core.models.gpt.gpt_model import GPTModel |
|
|
|
@MegatronModelBridge.register_bridge( |
|
source="Glm4MoeLiteForCausalLM", |
|
target=GPTModel, |
|
) |
|
class GLM47FlashBridge(DeepSeekV3Bridge): |
|
"""Bridge for GLM-4.7-Flash (Glm4MoeLiteForCausalLM). |
|
|
|
GLM-4.7-Flash is architecturally identical to DeepSeek-V3 (MLA + MoE) |
|
but its HF config differs in rope_scaling format: |
|
- DeepSeek: rope_scaling has factor/mscale/mscale_all_dim, top-level rope_theta |
|
- GLM-4.7-Flash: rope_scaling has rope_theta/rope_type, no mscale fields |
|
|
|
We reuse DeepSeekV3Bridge.provider_bridge() (which sets all critical |
|
TP/MoE/MLA provider attributes) by temporarily normalizing the HF config |
|
rope fields so the base CONFIG_MAPPING can handle them. |
|
""" |
Given the current speed of dependency upgrades to support new models, it would be best to add GLM 4.7 to CI to ensure no regressions. We currently have a Megatron Models CI :
https://github.com/NovaSky-AI/SkyRL/blob/main/ci/gpu_ci_run_skyrl_train_megatron_models.sh
Which basically runs this script:
|
""" |
|
Run with: |
|
uv run --isolated --extra dev --extra megatron -- pytest -s tests/backends/skyrl_train/gpu/gpu_ci/megatron/test_megatron_models.py |
|
""" |
|
|
We should add a tiny GLM 4.7 model as well to the same CI workflow.
Summary
SkyRL uses a custom MegatronBridge implementation for GLM 4.7 :
SkyRL/skyrl/backends/skyrl_train/workers/megatron/model_bridges.py
Lines 1 to 32 in 33f18ba
Given the current speed of dependency upgrades to support new models, it would be best to add GLM 4.7 to CI to ensure no regressions. We currently have a Megatron Models CI :
https://github.com/NovaSky-AI/SkyRL/blob/main/ci/gpu_ci_run_skyrl_train_megatron_models.sh
Which basically runs this script:
SkyRL/tests/backends/skyrl_train/gpu/gpu_ci/megatron/test_megatron_models.py
Lines 1 to 5 in 33f18ba
We should add a tiny GLM 4.7 model as well to the same CI workflow.