Skip to content

fix(recipes): bump Hugging Face Hub helper-job pins to 1.16.4#10986

Open
MatejKosec wants to merge 2 commits into
ai-dynamo:mainfrom
MatejKosec:fix-hf-hub-pin-10857-clean
Open

fix(recipes): bump Hugging Face Hub helper-job pins to 1.16.4#10986
MatejKosec wants to merge 2 commits into
ai-dynamo:mainfrom
MatejKosec:fix-hf-hub-pin-10857-clean

Conversation

@MatejKosec

@MatejKosec MatejKosec commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

Summary

Updates the Hugging Face Hub CLI pin used by Dynamo model-cache and vLLM LoRA helper jobs from 1.11.0 to 1.16.4.

Issue #10857 reports that the Qwen3-32B model-cache job fails after installing huggingface_hub==1.11.0 because the hf CLI import path is missing click. The issue reporter confirmed that huggingface_hub==1.16.4 installs the needed CLI dependencies and allows the same hf download command to run.

Scope

This change is limited to helper YAML files that install Hugging Face Hub immediately before an hf download command. It preserves the existing package spelling in each file:

  • huggingface_hub==1.11.0huggingface_hub==1.16.4
  • huggingface-hub==1.11.0huggingface-hub==1.16.4

No model IDs, revisions, cache paths, PVCs, image names, resource requests, awscli pins, HF token wiring, Dockerfiles, runtime code, operator code, Rust code, or Python code are changed.

Files changed

The diff contains 18 one-line YAML replacements under:

  • recipes/*/model-cache/model-download*.yaml
  • examples/backends/vllm/deploy/lora/**/sync-lora-job.yaml

Validation

The factory run validated the generated branch before publication:

  • pre-commit run --files <18 changed YAML files> --hook-stage manual passed.
  • Static inspection confirmed no huggingface[_-]hub==1.11.0 pins remain in the planned recipe/example scope.
  • Static inspection confirmed every changed file retains an hf download command and that the diff consists only of exact 1.11.01.16.4 pin replacements.
  • A clean temporary Python environment installed huggingface_hub==1.16.4 and imported both click and huggingface_hub.cli.hf.main successfully.

A full Kubernetes model-cache job was not run because it would require cluster/PVC/HF-token setup and large model downloads. This PR addresses the CLI dependency pin that caused the import failure.

Summary by CodeRabbit

  • Chores
    • Updated the Hugging Face client dependency used by several model download and sync jobs.
    • These jobs now use a newer, consistent version across supported recipes and backends.
    • No changes were made to download behavior, job structure, or other runtime settings.

Co-Authored-By: Claude <noreply@anthropic.com>
(cherry picked from commit 5b090f6)
@MatejKosec MatejKosec requested review from a team as code owners June 26, 2026 09:43
@MatejKosec MatejKosec temporarily deployed to external_collaborator June 26, 2026 09:43 — with GitHub Actions Inactive
@github-actions

Copy link
Copy Markdown
Contributor

👋 Hi MatejKosec! Thank you for contributing to ai-dynamo/dynamo.

Just a reminder: The NVIDIA Test Github Validation CI runs an essential subset of the testing framework to quickly catch errors.Your PR reviewers may elect to test the changes comprehensively before approving your changes.

🚀

@github-actions github-actions Bot added fix external-contribution Pull request is from an external contributor backend::vllm Relates to the vllm backend labels Jun 26, 2026
@coderabbitai

coderabbitai Bot commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: a69cb26f-491f-45bf-b5b7-f922e6d8db15

📥 Commits

Reviewing files that changed from the base of the PR and between 15bdb11 and e537182.

📒 Files selected for processing (18)
  • examples/backends/vllm/deploy/lora/multimodal/sync-lora-job.yaml
  • examples/backends/vllm/deploy/lora/sync-lora-job.yaml
  • recipes/deepseek-r1/model-cache/model-download-sglang.yaml
  • recipes/deepseek-r1/model-cache/model-download.yaml
  • recipes/deepseek-v32-fp4/model-cache/model-download.yaml
  • recipes/deepseek-v4/deepseek-v4-flash/model-cache/model-download.yaml
  • recipes/deepseek-v4/deepseek-v4-pro/model-cache/model-download.yaml
  • recipes/glm-5-nvfp4/model-cache/model-download.yaml
  • recipes/gpt-oss-120b/model-cache/model-download.yaml
  • recipes/kimi-k2.5/model-cache/model-download.yaml
  • recipes/llama-3-70b/model-cache/model-download.yaml
  • recipes/nemotron-3-nano-omni/model-cache/model-download.yaml
  • recipes/nemotron-3-super-fp8/model-cache/model-download.yaml
  • recipes/qwen3-235b-a22b-fp8/model-cache/model-download.yaml
  • recipes/qwen3-32b-fp8/model-cache/model-download.yaml
  • recipes/qwen3-32b/model-cache/model-download.yaml
  • recipes/qwen3-vl-30b/model-cache/model-download.yaml
  • recipes/qwen3.6-35b/model-cache/model-download.yaml

Walkthrough

The PR updates Kubernetes job startup scripts across LoRA sync and recipe model-download workflows to install huggingface-hub/huggingface_hub 1.16.4 instead of 1.11.0. The existing download commands and other job settings are unchanged.

Changes

Hugging Face client pin update

Layer / File(s) Summary
LoRA sync jobs
examples/backends/vllm/deploy/lora/multimodal/sync-lora-job.yaml, examples/backends/vllm/deploy/lora/sync-lora-job.yaml
The LoRA sync job scripts replace the pinned huggingface-hub install version with 1.16.4; awscli and the sync flow stay the same.
Model download jobs
recipes/deepseek-r1/model-cache/model-download*.yaml, recipes/deepseek-v32-fp4/model-cache/model-download.yaml, recipes/deepseek-v4/deepseek-v4-*/model-cache/model-download.yaml, recipes/glm-5-nvfp4/model-cache/model-download.yaml, recipes/gpt-oss-120b/model-cache/model-download.yaml, recipes/kimi-k2.5/model-cache/model-download.yaml, recipes/llama-3-70b/model-cache/model-download.yaml, recipes/nemotron-3-nano-omni/model-cache/model-download.yaml, recipes/nemotron-3-super-fp8/model-cache/model-download.yaml, recipes/qwen3-235b-a22b-fp8/model-cache/model-download.yaml, recipes/qwen3-32b-fp8/model-cache/model-download.yaml, recipes/qwen3-32b/model-cache/model-download.yaml, recipes/qwen3-vl-30b/model-cache/model-download.yaml, recipes/qwen3.6-35b/model-cache/model-download.yaml
The recipe job scripts replace the pinned huggingface_hub install version with 1.16.4 before the existing hf download commands.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Description check ⚠️ Warning The description is informative, but it does not follow the required template and is missing the required Related Issues and reviewer-start sections. Reformat the PR description to match the template and add the missing sections, especially Where should reviewer start? and Related Issues (for example, Closes #10857).
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly states the main change: bumping the Hugging Face Hub helper-job pins to 1.16.4.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

Comment @coderabbitai help to get the list of available commands.

@devin-ai-integration devin-ai-integration Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 0 potential issues.

Open in Devin Review

@MatejKosec MatejKosec requested a review from BenHamm June 26, 2026 09:59
@datadog-official

datadog-official Bot commented Jun 26, 2026

Copy link
Copy Markdown

Pipelines

⚠️ Warnings

🚦 4 Pipeline jobs failed

PR | deploy-operator   View in Datadog   GitHub Actions

PR | deploy-status-check   View in Datadog   GitHub Actions

PR | dynamo-runtime / rust-gpu   View in Datadog   GitHub Actions

View all 4 failed jobs.

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: ef5e9b7 | Docs | Give us feedback!

@devin-ai-integration devin-ai-integration Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 0 new potential issues.

Open in Devin Review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend::vllm Relates to the vllm backend external-contribution Pull request is from an external contributor fix size/M

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant