ci: add markdown link checker for docs and README#841
ci: add markdown link checker for docs and README#841Junior00619 wants to merge 9 commits intoNVIDIA:mainfrom
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds a GitHub Actions job that detects changed Markdown files in PRs and runs a new Bash script to validate relative Markdown links and MyST Changes
Sequence Diagram(s)sequenceDiagram
participant PR as "Pull Request"
participant GH as "GitHub Actions"
participant Runner as "Workflow Runner"
participant Script as "check-markdown-links.sh"
participant FS as "Repository Filesystem"
PR->>GH: opened / synchronized
GH->>Runner: trigger jobs (includes check-markdown-links)
Runner->>Runner: checkout (fetch-depth: 0)
Runner->>Runner: compute changed `*.md` (base..head) → outputs `skip`, `files`
alt markdown files changed
Runner->>Script: run with `files`
Script->>FS: read files, parse lines, ignore fenced code blocks
Script->>FS: resolve `{include}` and link targets
FS-->>Script: exists / missing
Script->>Runner: emit `::error` annotations for missing targets
Script->>Runner: exit 1 if broken links found
else no markdown changes
Runner-->>GH: job skipped
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~22 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
scripts/check-markdown-links.sh (1)
66-72: Consider handling absolute paths (starting with/).If a link target starts with
/(e.g.,[link](/docs/readme.md)), the resolution$dir/$pathproduces an invalid path likesome/dir//docs/readme.md. While uncommon in markdown, absolute repo-root-relative paths are sometimes used.💡 Optional: Handle root-relative paths
# Resolve relative to the file's directory. - local resolved="$dir/$path" + local resolved + if [[ "$path" == /* ]]; then + # Absolute path (repo-root-relative): strip leading slash. + resolved="${path#/}" + else + resolved="$dir/$path" + fi🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/check-markdown-links.sh` around lines 66 - 72, The resolution logic for link targets currently builds resolved="$dir/$path" which breaks when $path begins with a leading slash; update the check in the section that computes resolved to detect root-relative targets (e.g., if [[ "$path" == /* ]] ) and strip the leading slash or set resolved="${path#/}" so that the final check uses "$REPO_ROOT/$resolved" correctly; keep the existing broken-link echo and broken counter but ensure variables referenced (dir, path, resolved, REPO_ROOT, file, line_num, target) are used consistently..github/workflows/pr.yaml (1)
185-189: Word splitting may break for filenames with spaces.The
${{ steps.changed.outputs.files }}is interpolated with newlines converted to spaces. Combined with unquoted expansion (SC2086disabled), filenames containing spaces would be incorrectly split into multiple arguments.This is uncommon for markdown files but could occur. Consider using a newline-delimited approach with
xargsor a while-read loop.💡 Optional: Safer file passing with xargs
- name: Check markdown links if: steps.changed.outputs.skip != 'true' run: | - # shellcheck disable=SC2086 - bash scripts/check-markdown-links.sh ${{ steps.changed.outputs.files }} + echo "${{ steps.changed.outputs.files }}" | xargs -d '\n' bash scripts/check-markdown-links.sh🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/workflows/pr.yaml around lines 185 - 189, The Check markdown links step currently expands `${{ steps.changed.outputs.files }}` unquoted which can split filenames with spaces; update the workflow to pass the file list safely to scripts by using a null- or newline-delimited pipeline instead of unquoted expansion: e.g., emit the files as NUL-delimited (or pipe the raw output) and invoke check-markdown-links.sh via xargs -0 or a while read -r loop so filenames are preserved, and remove the `SC2086` disable; alternatively modify check-markdown-links.sh to accept stdin or a `--file-list` argument and feed it the safe stream from `steps.changed.outputs.files`.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@scripts/check-markdown-links.sh`:
- Around line 35-43: The fenced-code-toggle logic using in_code_block causes
MyST include lines (```{include} ...) to be treated as plain code fences and
skipped before the include check; modify the fenced-block branch (the regex
handling the triple backtick/tilde) to first test for the MyST include pattern
(match ^\`\`\`\{include\}[[:space:]]+(.+)$), capture inc_path, trim it, resolve
against dir and REPO_ROOT, emit the same error using file/line_num when missing,
increment broken, then set in_code_block=true and continue; this prevents the
later include check (the block that inspects inc_path, broken, REPO_ROOT, dir,
file, line_num) from being skipped—remove or keep the duplicate include-check
later as appropriate to avoid double-reporting.
---
Nitpick comments:
In @.github/workflows/pr.yaml:
- Around line 185-189: The Check markdown links step currently expands `${{
steps.changed.outputs.files }}` unquoted which can split filenames with spaces;
update the workflow to pass the file list safely to scripts by using a null- or
newline-delimited pipeline instead of unquoted expansion: e.g., emit the files
as NUL-delimited (or pipe the raw output) and invoke check-markdown-links.sh via
xargs -0 or a while read -r loop so filenames are preserved, and remove the
`SC2086` disable; alternatively modify check-markdown-links.sh to accept stdin
or a `--file-list` argument and feed it the safe stream from
`steps.changed.outputs.files`.
In `@scripts/check-markdown-links.sh`:
- Around line 66-72: The resolution logic for link targets currently builds
resolved="$dir/$path" which breaks when $path begins with a leading slash;
update the check in the section that computes resolved to detect root-relative
targets (e.g., if [[ "$path" == /* ]] ) and strip the leading slash or set
resolved="${path#/}" so that the final check uses "$REPO_ROOT/$resolved"
correctly; keep the existing broken-link echo and broken counter but ensure
variables referenced (dir, path, resolved, REPO_ROOT, file, line_num, target)
are used consistently.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: a49cc308-2dfc-4371-83a6-b0320cc30181
📒 Files selected for processing (2)
.github/workflows/pr.yamlscripts/check-markdown-links.sh
1bd706d to
d191119
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.github/workflows/pr.yaml:
- Around line 176-180: The Check markdown links step currently expands multiline
`${{ steps.changed.outputs.files }}` unquoted in the run line which causes each
line to be executed as a separate shell statement; change the step to pass that
output through an environment variable (e.g., FILES) and in the run block
convert that multiline string into a shell array (e.g., using mapfile/read -a or
IFS+read) and then invoke scripts/check-markdown-links.sh with the array
expansion so the script receives each filename as a separate argument; target
the "Check markdown links" step and the scripts/check-markdown-links.sh
invocation and replace the direct `${{ steps.changed.outputs.files }}` usage
with the env+array conversion approach.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 7bedc49f-04ee-4e5e-adcd-22d4ca66b5c1
📒 Files selected for processing (2)
.github/workflows/pr.yamlscripts/check-markdown-links.sh
✅ Files skipped from review due to trivial changes (1)
- scripts/check-markdown-links.sh
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.github/workflows/pr.yaml:
- Around line 163-166: The git diff command is being masked by "|| true" when
setting the files variable (files=$(git diff --name-only --diff-filter=ACMRT
"$base" -- '*.md' || true)), which hides real git errors and causes false "No
markdown files changed." results; remove the "|| true" so the step fails on git
errors, or alternatively capture git's exit code and explicitly handle non-zero
failures (e.g., run git diff without "|| true", check its exit status and fail
the job or emit a clear error before setting skip), updating the logic that uses
the files variable accordingly.
- Line 163: The git diff invocation that builds the files variable currently
grabs every *.md (including in node_modules/ and dist/); update the command that
sets files to exclude non-doc paths by adding pathspec exclusions for those
directories (e.g., use Git pathspec exclude syntax like
:(exclude)node_modules/** and :(exclude)dist/**) so only repository docs are
returned, and ensure callers like scripts/check-markdown-links.sh rely on this
filtered list rather than re-accepting unfiltered args.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 31cd8ccc-e106-467c-a35b-5da68e312792
📒 Files selected for processing (1)
.github/workflows/pr.yaml
|
@coderabbitai resume |
✅ Actions performedReviews resumed. |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@scripts/check-markdown-links.sh`:
- Around line 36-46: The include-path handling always prefixes the file
directory into resolved, which breaks root-relative includes like
"/docs/snippet.md"; modify the include branch that parses inc_path (the block
using the regex ^\`\`\`\{include\} and variables inc_path, resolved, dir,
REPO_ROOT) to detect a leading slash and treat it as repo-root relative: if
inc_path starts with "/", strip the leading slash (e.g.
inc_path="${inc_path#/}") or set resolved to the path relative to the repo root,
otherwise keep the current dir-relative behavior (resolved="$dir/$inc_path");
keep the existing existence check using "$REPO_ROOT/$resolved".
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 3229a3b4-8887-4470-a0b1-c362ceba2cbe
📒 Files selected for processing (2)
.github/workflows/pr.yamlscripts/check-markdown-links.sh
🚧 Files skipped from review as they are similar to previous changes (1)
- .github/workflows/pr.yaml
|
@coderabbitai resume |
There was a problem hiding this comment.
♻️ Duplicate comments (1)
scripts/check-markdown-links.sh (1)
36-51:⚠️ Potential issue | 🟠 MajorMissing
in_code_block=trueafter processing MyST include directive.MyST fenced directives have a closing
```fence. After processing the include's opening line, the script continues without settingin_code_block=true. When the closing fence is encountered (line 54), it will incorrectly togglein_code_blockfromfalsetotrue, causing subsequent non-code-block content to be skipped.🐛 Proposed fix: Enter code block state after processing include
if [[ ! -e "$REPO_ROOT/$resolved" ]]; then echo "::error file=${file},line=${line_num}::Broken include: ${inc_path} (resolved: ${resolved})" broken=$((broken + 1)) fi + in_code_block=true continue fi🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/check-markdown-links.sh` around lines 36 - 51, When handling a MyST include directive in the scripts/check-markdown-links.sh block that matches ^\`\`\`\{include\}, you must set the in_code_block variable to true after processing the include (inside the same if branch before the continue) so the script correctly stays inside the fenced code block; update the include-handling branch (the code that computes inc_path, resolved and checks existence) to assign in_code_block=true immediately after the existence check (and before the continue) so the closing fence toggles back to false as intended.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@scripts/check-markdown-links.sh`:
- Around line 36-51: When handling a MyST include directive in the
scripts/check-markdown-links.sh block that matches ^\`\`\`\{include\}, you must
set the in_code_block variable to true after processing the include (inside the
same if branch before the continue) so the script correctly stays inside the
fenced code block; update the include-handling branch (the code that computes
inc_path, resolved and checks existence) to assign in_code_block=true
immediately after the existence check (and before the continue) so the closing
fence toggles back to false as intended.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 1cadc0b1-94e8-4929-8fc2-4ea425f6e4af
📒 Files selected for processing (1)
scripts/check-markdown-links.sh
✅ Actions performedReviews resumed. |
ac81c9c to
fa1b8fd
Compare
|
@coderabbitai resume |
✅ Actions performedReviews resumed. |
47aee05 to
ea81df4
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@scripts/check-markdown-links.sh`:
- Around line 34-36: The include-directive check incorrectly runs before the
in_code_block guard causing fenced examples like ```{include} to be treated as
real includes; update the conditional at the include check (the line matching
^\`\`\`\{include\}...) to only execute when not in a fenced block (i.e., ensure
in_code_block is false) — either move that if to after the existing
in_code_block check or add an explicit "&& !in_code_block" style condition so
the include regex is skipped while in_code_block is true.
- Line 29: The fence matching logic currently toggles the boolean in_code_block
on any ``` or ~~~ marker; instead, record the opening fence character and length
when entering a code block (e.g., set fence_char to '`' or '~' and fence_len to
the count of consecutive markers) and only exit the block when you encounter a
marker with the same fence_char and a length >= fence_len; update the code that
scans lines to: when not in_code_block detect a fence and set in_code_block=true
plus fence_char/fence_len, and when in_code_block ignore other markers until you
see a matching fence_char with sufficient length to close the block, then clear
fence_char/fence_len and set in_code_block=false (referencing the variables
in_code_block, fence_char, and fence_len in your changes).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: dcbea24f-81dd-48bd-b827-11ed9b8486f7
📒 Files selected for processing (2)
.github/workflows/pr.yamlscripts/check-markdown-links.sh
🚧 Files skipped from review as they are similar to previous changes (1)
- .github/workflows/pr.yaml
scripts/check-markdown-links.sh
Outdated
| # Check MyST include directives before the code-block toggle, since | ||
| # ```{include} starts with ``` and would otherwise be swallowed. | ||
| if [[ "$line" =~ ^\`\`\`\{include\}[[:space:]]+(.+)$ ]]; then |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
cat -n scripts/check-markdown-links.sh | head -100Repository: NVIDIA/NemoClaw
Length of output: 4014
🏁 Script executed:
find . -name "*.md" -o -name "*test*" -o -name "*example*" | grep -E "(test|example|spec)" | head -20Repository: NVIDIA/NemoClaw
Length of output: 833
🏁 Script executed:
rg "```{include}" --type md -A 3 -B 3Repository: NVIDIA/NemoClaw
Length of output: 127
🏁 Script executed:
rg '{include}' --type md -A 3 -B 3Repository: NVIDIA/NemoClaw
Length of output: 143
🏁 Script executed:
rg 'include' docs --type md -A 3 -B 3 | head -50Repository: NVIDIA/NemoClaw
Length of output: 3019
🏁 Script executed:
find docs -name "*.md" -type f | head -10Repository: NVIDIA/NemoClaw
Length of output: 363
🏁 Script executed:
cat -n docs/index.md | head -30Repository: NVIDIA/NemoClaw
Length of output: 1171
🏁 Script executed:
# Search for code blocks in markdown files to see if any contain include examples
rg -A 5 -B 5 '```' docs/index.md | head -50Repository: NVIDIA/NemoClaw
Length of output: 1375
🏁 Script executed:
# Look for documentation or test cases showing include directives inside code blocks
find . -name "*.md" -type f -exec grep -l 'include' {} \; | xargs grep -A 10 -B 10 '```' | grep -A 15 -B 15 'include' | head -100Repository: NVIDIA/NemoClaw
Length of output: 8141
🏁 Script executed:
# Check the check-markdown-links script logic more carefully - does line 50 setting in_code_block=true make sense?
# Let's trace through what happens when an include directive is encountered
cat -n scripts/check-markdown-links.sh | sed -n '34,65p'Repository: NVIDIA/NemoClaw
Length of output: 1395
Skip {include} examples when already inside a fenced block.
The include check at line 36 runs before the in_code_block guard, so a docs example inside an outer fence (e.g., ```{include} somefile.md) is still evaluated as a real include directive and can fail the script even though it is just code.
🛠️ Suggested fix
- if [[ "$line" =~ ^\`\`\`\{include\}[[:space:]]+(.+)$ ]]; then
+ if [[ "$in_code_block" == false && "$line" =~ ^[[:space:]]*\`\`\`\{include\}[[:space:]]+(.+)$ ]]; then🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@scripts/check-markdown-links.sh` around lines 34 - 36, The include-directive
check incorrectly runs before the in_code_block guard causing fenced examples
like ```{include} to be treated as real includes; update the conditional at the
include check (the line matching ^\`\`\`\{include\}...) to only execute when not
in a fenced block (i.e., ensure in_code_block is false) — either move that if to
after the existing in_code_block check or add an explicit "&& !in_code_block"
style condition so the include regex is skipped while in_code_block is true.
Wire scripts/check-markdown-links.sh into the PR workflow. The new check-markdown-links job diffs changed .md files against the PR base and verifies every relative link resolves to an existing file. Fix bash regex parse error in check-markdown-links.sh — store the link-extraction pattern in a variable so [[ =~ ]] handles the character-class ')' correctly. Fixes NVIDIA#552
Move the ```{include} check above the fenced code block detection
so it runs before the generic ``` regex matches and continues past
the line. Without this, include paths were never validated.
Move the heredoc output into an env var so the shell handles word-splitting instead of GitHub Actions inline expansion, which can misinterpret newlines as separate statements.
All other scripts in scripts/ are 755. This was missed in the initial commit. Also applies shfmt formatting fixes.
- Track opening fence marker (char type + length) and only close the code block when encountering the same fence type with >= length. - Move MyST include-directive check inside the not-in-code-block branch so fenced examples are not treated as real includes.
b50399a to
12836a3
Compare
Summary
Wires the existing
scripts/check-markdown-links.shinto the PR workflow so broken relative links get caught before merge. The script was already in the tree but wasn't hooked up to CI.Also fixes a bash parse error in the script — the regex
\]\(([^)]+)\)inside[[ =~ ]]chokes because bash interprets the)in the character class as closing the conditional. Standard fix: stash the pattern in a variable.Related Issue
Fixes #552
Changes
scripts/check-markdown-links.sh— move link-extraction regex into a variable to avoid bash parse failure.github/workflows/pr.yaml— newcheck-markdown-linksjob that diffs.mdfiles against the PR base, skips if none changed, and runs the checker only on the diff (so pre-existing issues don't block unrelated PRs)Type of Change
Testing
bash scripts/check-markdown-links.sh README.md CONTRIBUTING.mdlocally — passes clean.github/PULL_REQUEST_TEMPLATE.md(out of scope for this PR since the CI job only checks changed files)Checklist
General
Code Changes
Summary by CodeRabbit