Skip to content

v0.2 follow-ups: sitemap + JSON-LD + security.txt + privacy + badge + npm-audit + sync concurrency#12

Merged
amacsmith merged 3 commits into
mainfrom
claude/v0.2-followups
May 10, 2026
Merged

v0.2 follow-ups: sitemap + JSON-LD + security.txt + privacy + badge + npm-audit + sync concurrency#12
amacsmith merged 3 commits into
mainfrom
claude/v0.2-followups

Conversation

@amacsmith
Copy link
Copy Markdown
Member

@amacsmith amacsmith commented May 8, 2026

Summary

Bundles the deferred high-leverage items from the post-#8 research pass into one focused PR. Tests: 148/148 green (96 registry + 27 MCP + 25 CLI); npm audit --audit-level=high clean.

Adoption / discoverability

  • scripts/render-sitemap.mjs + npm run sitemap — generates site/sitemap.xml from registry.json. One URL per non-revoked entry plus the three canonical pages. Idempotent (byte-identical on no-change input). Wired into pages.yml. 5 new tests.
  • JSON-LD Dataset block + Open Graph / Twitter Card meta tags + <link rel="canonical"> / <link rel="sitemap"> on site/index.html. Search engines + social previews now have something concrete to index.
  • site/robots.txt declaring full indexability + pointing at the sitemap.
  • docs/badge.md + site/badge.svg — embeddable "Indexed by understand-quickly" badge for producer READMEs (shields.io, self-hosted SVG, status-aware variants).
  • docs/alternatives.md — frank side-by-side vs awesome-lists, DevDocs, DeepWiki, OpenDeepWiki, Sourcegraph, Repomix, gitingest. Tells the reader when not to use the registry.

Trust + governance

  • site/.well-known/security.txt (RFC 9116) — security contacts, GitHub Security Advisory link, 1-year expiration.
  • docs/privacy.md — plain-language privacy notice covering Pages logs, Cloudflare Web Analytics retention, what's not collected, GDPR/CCPA posture, opt-out paths.

Supply chain

  • .github/workflows/validate.ymlnpm audit --audit-level=high after npm ci. Fails on high/critical CVEs only; lower severities continue via Dependabot.

Sync efficiency

  • scripts/sync.mjs — bounded-parallel sync via fixed-size worker pool (SYNC_CONCURRENCY=6 default, env override). Node 20+ undici keep-alive reuses TCP connections to raw.githubusercontent.com for free. Wall-clock improvement scales with registry size; serial loop replaced.

README

  • Alternatives and Badge links added to the top nav row.

Test plan

  • npm test (registry) — 96/96 (5 new for render-sitemap)
  • npm test --prefix mcp — 27/27
  • npm test --prefix cli — 25/25
  • npm audit --audit-level=high — 0 high/critical
  • Pages deploy preview renders the new sitemap.xml + robots.txt + .well-known/security.txt at the right URLs (verify post-merge)
  • Sitemap shows up in Google Search Console after first crawl

Notes

  • The sync concurrency change is behavior-preserving for the tests (which use stubbed fetchImpl and don't exercise the loop's parallelism), but it's a real perf/scaling change in production. Watch the next nightly sync.
  • No new runtime deps. The sitemap renderer reuses shard.mjs:loadRegistry.

https://claude.ai/code/session_01PkkHiFCEmuNaymtGiH7Fkk


Generated by Claude Code

Summary by CodeRabbit

Release Notes

  • New Features

    • Added XML sitemap for improved search engine discoverability
    • Added search and social media metadata (JSON-LD, Open Graph, Twitter Cards)
    • Added security vulnerability disclosure metadata
    • Added shareable "Indexed by understand-quickly" badge for documentation
  • Documentation

    • Added Alternatives page comparing similar tools
    • Added Badge integration guide
    • Added Privacy policy
    • Updated README navigation
  • Chores

    • Enhanced security auditing in CI pipeline
    • Improved sync performance with concurrent processing

Review Change Stack

…onomics

Bundles the deferred high-leverage items from the post-merge research
into one focused PR.

Adoption / discoverability
- scripts/render-sitemap.mjs (+ npm run sitemap) generates
  site/sitemap.xml from registry.json, one URL per non-revoked entry
  plus the three canonical pages. Wired into pages.yml so each deploy
  regenerates it. 5 tests under scripts/__tests__/render-sitemap.test.mjs.
- site/index.html: JSON-LD Dataset block + Open Graph / Twitter Card
  meta + canonical/sitemap link rels. Search-engine + social-share
  signal in one place; consumers get a stable og:image / og:description.
- site/robots.txt: full indexability + sitemap pointer.
- docs/badge.md + site/badge.svg: embeddable "Indexed by
  understand-quickly" badge for producer READMEs (shields.io,
  self-hosted SVG, and status-aware variants).
- docs/alternatives.md: frank side-by-side vs awesome-lists, DevDocs,
  DeepWiki, OpenDeepWiki, Sourcegraph, Repomix, gitingest. Tells the
  reader when *not* to use the registry.

Trust + governance
- site/.well-known/security.txt (RFC 9116): contacts +
  GitHub Security Advisory link + 1y expiration. Standard discovery
  path for security researchers.
- docs/privacy.md: plain-language privacy notice — Pages logs,
  Cloudflare Web Analytics retention, what's not collected, GDPR/CCPA
  posture, opt-out paths for producers.

Supply chain
- .github/workflows/validate.yml: `npm audit --audit-level=high` runs
  after `npm ci` and fails on high/critical CVEs. Lower severities are
  still handled via Dependabot.

Sync efficiency
- scripts/sync.mjs: bounded-parallel sync via fixed-size worker pool
  (`SYNC_CONCURRENCY=6` default, env override). Replaces the serial
  per-entry loop. Node 20+ undici keep-alive reuses TCP connections to
  raw.githubusercontent.com without managing a custom Agent. Tests:
  91 -> 91 still green; expected wall-clock improvement on real runs
  scales with registry size.

README
- "Alternatives" and "Badge" links added to the top nav row.

Tests: 96 registry (91 + 5 new sitemap) + 27 MCP + 25 CLI = 148 green.
npm audit: 0 high/critical.

https://claude.ai/code/session_01PkkHiFCEmuNaymtGiH7Fkk
Copilot AI review requested due to automatic review settings May 8, 2026 09:30
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 8, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 713dc9bd-b048-48bb-a89a-7a9d700a70f6

📥 Commits

Reviewing files that changed from the base of the PR and between 17e749c and 5d8593b.

📒 Files selected for processing (12)
  • .github/workflows/audit.yml
  • .github/workflows/pages.yml
  • .github/workflows/validate.yml
  • .gitignore
  • README.md
  • docs/privacy.md
  • package.json
  • scripts/__tests__/render-sitemap.test.mjs
  • scripts/render-sitemap.mjs
  • scripts/sync.mjs
  • site/.well-known/security.txt
  • site/index.html
✅ Files skipped from review due to trivial changes (6)
  • .gitignore
  • package.json
  • .github/workflows/validate.yml
  • site/.well-known/security.txt
  • README.md
  • site/index.html
🚧 Files skipped from review as they are similar to previous changes (4)
  • .github/workflows/pages.yml
  • scripts/sync.mjs
  • scripts/tests/render-sitemap.test.mjs
  • scripts/render-sitemap.mjs
📜 Recent review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: playwright
  • GitHub Check: Analyze (javascript-typescript)
🧰 Additional context used
🪛 LanguageTool
docs/privacy.md

[uncategorized] ~41-~41: The official name of this software platform is spelled with a capital “H”.
Context: ...ur browser's "Clear site data" tool for looptech-ai.github.io. ## Third parties - GitHub ho...

(GITHUB)

🔇 Additional comments (2)
.github/workflows/audit.yml (1)

8-35: Looks good — clean scheduled audit workflow with sensible isolation.

The trigger setup, matrix fan-out, least-privilege permissions, and per-path npm ci + high-threshold audit are all aligned with the stated CI hardening goal.

docs/privacy.md (1)

64-75: GDPR/CCPA clarification is now accurate and well-scoped.

Nice fix: this now correctly distinguishes registry-entry data from server-log personal data and routes data-subject requests to GitHub/Cloudflare directly.


📝 Walkthrough

Walkthrough

This PR enhances the understand-quickly registry with discoverability, security, and performance improvements. It adds sitemap generation from registry data, SEO/social metadata (Open Graph, JSON-LD, robots.txt, security.txt), bounded-concurrency sync optimization, npm audit CI hardening, and comprehensive user documentation covering privacy practices, tool alternatives, and badge integration.

Changes

v0.2 Discoverability & Supply Chain Enhancements

Layer / File(s) Summary
Sitemap Generation
scripts/render-sitemap.mjs
renderSitemap() exports a function that builds <urlset> XML from registry entries, escapes XML special characters, filters revoked entries, and anchors lastmod dates to registry.generated_at with fallback to now().
Sitemap Tests
scripts/__tests__/render-sitemap.test.mjs
Test suite validates static page inclusion, per-entry URL generation with encoded id parameters, priority/status correlation, XML escaping, deterministic output, and edge cases with empty/missing entries.
Build Integration
package.json, .github/workflows/pages.yml, .gitignore
npm run sitemap script added; pages.yml "Stage site" step invokes scripts/render-sitemap.mjs; _site/sitemap.xml added to .gitignore.
SEO & Social Metadata
site/robots.txt, site/.well-known/security.txt, site/index.html
robots.txt allows all crawlers and references sitemap; security.txt provides RFC 9116 contacts and metadata; HTML head gains canonical/sitemap links, Open Graph, Twitter Card, and JSON-LD Dataset describing the site and distributions.
Sync Concurrency Optimization
scripts/sync.mjs
SYNC_CONCURRENCY constant bounds per-entry worker pool to [1, 32] (default 6), replacing sequential sync with parallel workers pulling from shared cursor and writing results by index.
Security CI Hardening
.github/workflows/validate.yml, .github/workflows/audit.yml
validate.yml adds immediate npm audit --audit-level=high step; new audit.yml workflow runs scheduled weekly and on-demand across root, mcp, and cli directories with 5-minute timeout.
Documentation & Navigation
README.md, docs/privacy.md, docs/alternatives.md, docs/badge.md, CHANGELOG.md
README adds Alternatives link; privacy.md details data collection, third-party services, GDPR/CCPA handling, and data removal; alternatives.md provides goal-based tool matrix and positioning; badge.md documents badge variants and CI integration; CHANGELOG.md documents all v0.2 additions.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 Hop along the SEO trail,
A sitemap now won't fail,
Security notes and metadata bright,
Privacy documented right,
Faster syncs with workers aligned,
Registry discovery redesigned! 🌐

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title comprehensively summarizes all major change categories in the PR: sitemap generation, JSON-LD metadata, security.txt, privacy documentation, badge support, npm audit enforcement, and sync concurrency optimization.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch claude/v0.2-followups

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR bundles several v0.2 follow-up improvements focused on discoverability (sitemap/robots/meta), trust/governance docs (privacy + security.txt), supply-chain validation (npm audit in CI), and sync performance (bounded concurrency) for the registry + GitHub Pages site.

Changes:

  • Add sitemap generation (render-sitemap.mjs + npm run sitemap) and wire it into the Pages deploy workflow; add robots.txt.
  • Add SEO/social metadata to site/index.html (JSON-LD Dataset, Open Graph, Twitter cards, canonical/sitemap links) and a hosted badge SVG + docs.
  • Add npm audit --audit-level=high to CI and update sync to use a fixed-size worker pool.

Reviewed changes

Copilot reviewed 15 out of 16 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
site/sitemap.xml Adds a generated sitemap checked into the repo (Pages workflow also generates one at build time).
site/robots.txt Declares full indexability and points crawlers to the sitemap URL.
site/index.html Adds canonical/sitemap links, OG/Twitter meta tags, and JSON-LD Dataset metadata.
site/badge.svg Adds a self-hosted “Indexed by understand-quickly” SVG badge.
site/.well-known/security.txt Adds RFC 9116 security contact metadata for the Pages site.
scripts/sync.mjs Replaces serial sync loop with bounded parallel worker pool; adds SYNC_CONCURRENCY env config.
scripts/render-sitemap.mjs New script to generate site/sitemap.xml from registry.json.
scripts/tests/render-sitemap.test.mjs Adds tests covering sitemap rendering behavior and XML escaping.
README.md Adds “Alternatives” and “Badge” links to the top navigation row.
package.json Adds npm run sitemap script.
docs/privacy.md Adds a plain-language privacy notice for the site/registry.
docs/badge.md Documents badge usage (shields.io, self-hosted SVG, status-aware variant).
docs/alternatives.md Adds comparison doc to help readers choose when to use the registry vs alternatives.
CHANGELOG.md Documents the new features in the Unreleased section.
.github/workflows/validate.yml Adds npm audit --audit-level=high after npm ci.
.github/workflows/pages.yml Runs sitemap generation during Pages build before publishing _site/.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread scripts/sync.mjs Outdated
// connections alive per-host by default for fetch(), so consecutive fetches
// to the same host (raw.githubusercontent.com is the dominant case) reuse
// the TCP+TLS connection without us managing a custom Agent.
const SYNC_CONCURRENCY = Number(process.env.SYNC_CONCURRENCY) || 6;
Comment thread site/index.html Outdated
Comment on lines +31 to +33
"description": "Public, machine-readable registry of code-knowledge graphs and repo-context bundles for AI agents. Each entry is a pointer to a JSON graph in a producer's repository, validated against a versioned JSON Schema.",
"url": "https://looptech-ai.github.io/understand-quickly/",
"license": "https://www.apache.org/licenses/LICENSE-2.0",
Comment thread docs/privacy.md Outdated
Comment on lines +26 to +27
- No localStorage / sessionStorage user data — site UI state lives only
in URL fragments you control.
Comment thread .github/workflows/pages.yml Outdated
Comment on lines +48 to +50
# Generate sitemap.xml from registry.json (idempotent — same input
# produces byte-identical output, so a "no-change" run touches no
# file and downstream caches stay valid). robots.txt and the
Comment thread scripts/render-sitemap.mjs Outdated
Comment on lines +3 to +6
// One static page per registry entry plus the canonical site pages. Output
// is consumed by search engines and by the Pages workflow as a discovery
// surface for new entries. Idempotent: running with no registry changes
// produces a byte-identical sitemap.
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Nitpick comments (3)
.github/workflows/validate.yml (1)

28-35: LGTM — consider adding a scheduled audit run to catch post-merge CVEs

The PR-gate step is correct and the comment accurately describes the intent. One operational note: because the workflow only fires on the listed path changes, a new CVE published against an existing dependency after merge won't be caught until someone opens a PR that touches one of those paths.

Adding a on: schedule: - cron: '0 6 * * 1' trigger (or a separate lightweight audit.yml) would close that window.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/validate.yml around lines 28 - 35, Add a scheduled run so
audits also run post-merge: update the GitHub Actions workflow that contains the
step named "npm audit (high+critical only, dev deps allowed)" to include an on:
schedule trigger (for example cron '0 6 * * 1' for weekly Monday 06:00) or
create a separate lightweight audit workflow (e.g., audit.yml) that runs the
same "npm audit --audit-level=high" step on that cron; ensure the scheduled
workflow uses the same permissions and environment as the PR gate so it fails on
high/critical findings just like the existing step.
site/sitemap.xml (1)

1-39: ⚡ Quick win

Consider .gitignore-ing this generated artifact.

site/sitemap.xml is overwritten by render-sitemap.mjs on every Pages deploy, so the committed copy will diverge from reality whenever new entries are added or removed. Keeping it tracked risks misleading diffs and reviewer confusion.

If the build always regenerates it, the file can safely be added to .gitignore; the cp -r site/. _site/ step runs before the generator, so omitting the static copy has no effect on the deployed output.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@site/sitemap.xml` around lines 1 - 39, site/sitemap.xml is a generated
artifact overwritten by render-sitemap.mjs on every Pages deploy; stop
committing it by adding site/sitemap.xml to .gitignore so tracked copy can't
diverge. Update .gitignore to include the path "site/sitemap.xml" (or a rule
that ignores sitemap.xml in site/) and ensure the deploy step that runs "cp -r
site/. _site/" remains unchanged so the runtime generator (render-sitemap.mjs)
still produces the sitemap during build.
scripts/__tests__/render-sitemap.test.mjs (1)

16-31: ⚡ Quick win

No assertion that last_synced drives the per-entry <lastmod>.

Test 2 supplies last_synced on the entry objects but never asserts that value (e.g. 2026-05-01) appears in the emitted <lastmod> for that entry. If the implementation falls back to NOW() for every URL, this gap would go undetected.

✅ Suggested additional assertion
  assert.match(xml, /\?id=baz%2Fqux/);
  assert.doesNotMatch(xml, /gone%2Fdead/);
+ // last_synced is used as the per-entry lastmod
+ assert.match(xml, /<lastmod>2026-05-01<\/lastmod>/);
+ assert.match(xml, /<lastmod>2026-04-30<\/lastmod>/);
  // ok entries get higher priority than non-ok ones
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/__tests__/render-sitemap.test.mjs` around lines 16 - 31, The test for
renderSitemap is missing assertions that each entry's last_synced populates the
per-URL <lastmod>; update the test that calls renderSitemap (the one using NOW
and entries with id 'foo/bar' and 'baz/qux') to assert that the generated XML
contains the corresponding ISO date strings (e.g. "2026-05-01" for foo/bar and
"2026-04-30" for baz/qux) inside the <lastmod> elements so last_synced drives
per-entry lastmod rather than falling back to NOW().
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/privacy.md`:
- Around line 53-55: Update the "GDPR / CCPA" paragraph in docs/privacy.md to
clarify that while registry.json entries contain no personal data, server-level
logs (GitHub Pages access logs and Cloudflare Web Analytics) may contain IP
addresses which are personal data under GDPR; change the sentence referring to
"no PII in the registry" to explicitly state "registry entries contain no
personal data" and add a short directive that data subject requests for server
logs should be made via GitHub's and Cloudflare's respective data subject
request processes (linking to their portals already referenced in the doc).

In `@scripts/render-sitemap.mjs`:
- Around line 67-69: The code uses absReg.endsWith('registry.json') which
false-positively matches filenames like my-registry.json; change the check to
use basename(absReg) === 'registry.json' so that only a file literally named
registry.json triggers loadRegistry({ root: dirname(absReg) }), otherwise parse
the provided file (via readFileSync(absReg or regPath) as before). Update the
conditional that sets registry to use basename(absReg) === 'registry.json' and
keep loadRegistry, dirname(absReg), and the JSON fallback branch unchanged.

In `@site/.well-known/security.txt`:
- Around line 3-7: The Expires header currently sets a ~2-year horizon which
contradicts the maintenance note "Refresh this file at least annually"; update
the "Expires:" value to one year from today (i.e., set the Expires HTTP-date to
+1 year) and ensure this is updated whenever the top-line maintenance comment
("Refresh this file at least annually.") is observed; specifically edit the
"Expires" header in the security.txt file so it reflects a one-year expiry
window to enforce annual rotation.

In `@site/index.html`:
- Line 33: Update the JSON-LD `license` value in site/index.html so it
references the project's Data License 1.0 rather than Apache 2.0: replace the
current "license": "https://www.apache.org/licenses/LICENSE-2.0" with the
canonical URL for DATA-LICENSE.md (or include an additional license entry if you
need to represent separate code and data licenses); ensure the JSON-LD block
clearly points to the DATA-LICENSE.md raw/blob URL so schema.org crawlers will
pick up the correct dataset license.
- Line 21: The twitter card meta currently uses <meta name="twitter:card"
content="summary_large_image"> but no image meta is present; either add a
matching image meta like <meta name="twitter:image" content="..."> or add an
Open Graph image <meta property="og:image" content="..."> pointing to a suitable
preview image, or change the twitter:card content to "summary" to avoid
expecting a large image; locate the existing twitter:card meta in the HTML and
apply one of these fixes so the card renders as intended.

---

Nitpick comments:
In @.github/workflows/validate.yml:
- Around line 28-35: Add a scheduled run so audits also run post-merge: update
the GitHub Actions workflow that contains the step named "npm audit
(high+critical only, dev deps allowed)" to include an on: schedule trigger (for
example cron '0 6 * * 1' for weekly Monday 06:00) or create a separate
lightweight audit workflow (e.g., audit.yml) that runs the same "npm audit
--audit-level=high" step on that cron; ensure the scheduled workflow uses the
same permissions and environment as the PR gate so it fails on high/critical
findings just like the existing step.

In `@scripts/__tests__/render-sitemap.test.mjs`:
- Around line 16-31: The test for renderSitemap is missing assertions that each
entry's last_synced populates the per-URL <lastmod>; update the test that calls
renderSitemap (the one using NOW and entries with id 'foo/bar' and 'baz/qux') to
assert that the generated XML contains the corresponding ISO date strings (e.g.
"2026-05-01" for foo/bar and "2026-04-30" for baz/qux) inside the <lastmod>
elements so last_synced drives per-entry lastmod rather than falling back to
NOW().

In `@site/sitemap.xml`:
- Around line 1-39: site/sitemap.xml is a generated artifact overwritten by
render-sitemap.mjs on every Pages deploy; stop committing it by adding
site/sitemap.xml to .gitignore so tracked copy can't diverge. Update .gitignore
to include the path "site/sitemap.xml" (or a rule that ignores sitemap.xml in
site/) and ensure the deploy step that runs "cp -r site/. _site/" remains
unchanged so the runtime generator (render-sitemap.mjs) still produces the
sitemap during build.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 80f5a80e-6bd2-43a2-831b-c3cea426ed39

📥 Commits

Reviewing files that changed from the base of the PR and between f9a7642 and 17e749c.

⛔ Files ignored due to path filters (1)
  • site/badge.svg is excluded by !**/*.svg
📒 Files selected for processing (15)
  • .github/workflows/pages.yml
  • .github/workflows/validate.yml
  • CHANGELOG.md
  • README.md
  • docs/alternatives.md
  • docs/badge.md
  • docs/privacy.md
  • package.json
  • scripts/__tests__/render-sitemap.test.mjs
  • scripts/render-sitemap.mjs
  • scripts/sync.mjs
  • site/.well-known/security.txt
  • site/index.html
  • site/robots.txt
  • site/sitemap.xml
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Agent
  • GitHub Check: playwright
🧰 Additional context used
🪛 LanguageTool
docs/alternatives.md

[style] ~13-~13: You have already used this phrasing in nearby sentences. Consider replacing it to add variety to your writing.
Context: ...ithub.com/AIDotNet/OpenDeepWiki) | | "I want to pack a repo into one text file for an L...

(REP_WANT_TO_VB)


[style] ~73-~73: The present participle “missing” normally requires “is”.
Context: ...ocess](./verified-publishers.md). --- Anything missing? PR a row to the table or open ...

(ANYONE_ELSE_WHO_IS_VBG)

🔇 Additional comments (9)
docs/badge.md (1)

1-55: LGTM

Badge documentation is clear and covers the three main embedding scenarios. The status-aware script handles the empty-match case correctly via ${STATUS:-unregistered}.

scripts/sync.mjs (1)

380-398: Bounded-parallel pool implementation looks correct

The cursor++ increment is safe in Node.js's single-threaded event loop model — no other coroutine can observe or mutate cursor between the read and write of the ++ since there is no await between them. Each pool worker captures a unique index i before yielding. new Array(targets.length) with index-targeted writes guarantees no holes after Promise.all. Good use of Math.min(SYNC_CONCURRENCY, targets.length) to handle under-full registries.

docs/alternatives.md (1)

1-74: LGTM

Content is accurate and well-structured. The TL;DR table and side-by-side comparison give readers what they need to make an informed choice.

README.md (1)

18-18: LGTM

Navigation links to docs/alternatives.md and docs/badge.md match the new files added in this PR.

package.json (1)

26-26: LGTM

The sitemap script is consistent with the pattern of other node scripts/... entries and correctly wires up render-sitemap.mjs.

site/robots.txt (1)

1-7: LGTM.

CHANGELOG.md (1)

7-39: LGTM — changelog entry is accurate and well-structured.

.github/workflows/pages.yml (1)

48-52: The workflow correctly omits npm ci for the sitemap generation step.

render-sitemap.mjs and its imported module shard.mjs use only Node built-in modules (node:fs and node:path), so running npm ci before executing the script is unnecessary.

site/index.html (1)

44-49: The stats.json file is already generated and will be deployed. The workflow in sync.yml (line 70) generates it via node scripts/aggregate.mjs --registry registry.json --out site/stats.json, commits it, and the pages.yml deployment copies the entire site/ directory with cp -r site/. _site/ before uploading to GitHub Pages. No action needed.

			> Likely an incorrect or invalid review comment.

Comment thread docs/privacy.md Outdated
Comment thread scripts/render-sitemap.mjs Outdated
Comment thread site/.well-known/security.txt Outdated
Comment thread site/index.html Outdated
Comment thread site/index.html Outdated
claude added 2 commits May 8, 2026 09:39
Copilot inline comments
- scripts/sync.mjs: clamp SYNC_CONCURRENCY to [1, 32] via parseInt +
  Number.isFinite + Math.max/min, with explicit fallback. Defends
  against env values like '0', '-3', and 'abc' that previously could
  either silently coerce to 6 or throw on Array.from with negative
  length.
- site/index.html: change JSON-LD `license` to an array listing both
  DATA-LICENSE.md (data) and Apache-2.0 (code) so dataset catalogues
  surface the correct registry-data license, not just the code license.
- site/index.html: add og:image + twitter:image pointing at /badge.svg,
  and downgrade twitter:card from `summary_large_image` to `summary`
  since we don't ship a 1200x630 banner. A summary card with no image
  asset rendered as text-only previously.
- .github/workflows/pages.yml: rephrase the sitemap-generation comment
  — the renderer always writes; the determinism guarantee is
  byte-identical content for crawler ETag caching, not write avoidance.
- scripts/render-sitemap.mjs: rewrite the header comment to match
  reality (output deterministic across runs with the same registry,
  not the wall clock). Anchor static-page lastmod to
  registry.generated_at instead of `now()` so two runs produce identical
  XML — verified by a new deterministic-output test.
- docs/privacy.md: replace the inaccurate "no localStorage / sessionStorage
  / fragment-only state" claim with a table documenting actual client-side
  state: `uq:layout:<id>` (localStorage), `uq:tour-autoshown`
  (sessionStorage), and the `?id=` query param for entry deep-linking.

CodeRabbit inline comments
- scripts/render-sitemap.mjs: replace `endsWith('registry.json')` with
  `basename(absReg) === 'registry.json'` to avoid false-positive matches
  on filenames like `tests/my-registry.json`, which would silently load
  the wrong file via the sharded loader.
- site/.well-known/security.txt: tighten Expires from 2027-05-08 (~2y
  out, contradicts the "refresh annually" comment) to 2026-11-08 (6
  months) per RFC 9116 §2.5.5 guidance.
- docs/privacy.md: rewrite the GDPR/CCPA paragraph to clarify that
  registry entries contain no personal data, but server-level access
  logs (GitHub Pages, Cloudflare Web Analytics) do contain IP addresses
  which are personal data under GDPR Article 4(1). Direct subject
  requests at GitHub's and Cloudflare's portals.

CodeRabbit nits (also addressed)
- .github/workflows/audit.yml (new): scheduled weekly npm audit at
  Monday 06:00 UTC across root, mcp/, and cli/ via a matrix. Closes
  the post-merge CVE window — the PR-gate validate.yml only fires on
  path-filtered PRs.
- .gitignore: add site/sitemap.xml. The file is regenerated by every
  Pages deploy, so tracking the committed copy invites silent drift.
  Untracked the existing copy in this commit.
- scripts/__tests__/render-sitemap.test.mjs: assert that
  `entry.last_synced` populates per-URL `<lastmod>` (regression guard);
  add a determinism test that runs renderSitemap twice with different
  `now` values and asserts byte-identical output.

Tests: 97/97 (was 96, +1 determinism test). npm audit: 0 high/critical.

https://claude.ai/code/session_01PkkHiFCEmuNaymtGiH7Fkk
Resolve scripts conflict in package.json by keeping all three new
scripts (badges, well-known, sitemap).
Copy link
Copy Markdown
Member Author

Triage of the 5 unresolved review threads from copilot-pull-request-reviewer — all are already addressed in the current branch (which is why they're marked outdated). Posting here so it's easy to confirm without clicking through each:

  1. scripts/sync.mjsSYNC_CONCURRENCY clamping. Hardened: parseInt + Number.isFinite guard + clamp to [1, 32] (scripts/sync.mjs:22-27). NaN, 0, negative, and oversized values all snap to safe values.

  2. site/index.html — JSON-LD Dataset.license. Now an array containing both DATA-LICENSE.md (the data license) and Apache 2.0 (the code license) — site/index.html:39-42.

  3. docs/privacy.md — client-side state claim. Section "Client-side state on the site" now explicitly documents localStorage[uq:layout:<id>], sessionStorage[uq:tour-autoshown], and the ?id=… query param — docs/privacy.md:30-41.

  4. .github/workflows/pages.yml — sitemap "touches no file" comment. Comment was rewritten to: "The renderer always writes the file (no content-compare short-circuit) — _site/ is a fresh staging dir per run, so write avoidance buys nothing" (.github/workflows/pages.yml:48-53).

  5. scripts/render-sitemap.mjs — idempotency claim. Header rewritten and code changed: lastmod is now anchored to registry.generated_at, not today (scripts/render-sitemap.mjs:36-39). Two runs over the same registry produce byte-identical XML.

The threads themselves can be resolved in the UI; happy to do it if someone shares the maintainer view, but I don't have direct GraphQL thread access from here.

🤖 Claude Code


Generated by Claude Code

@amacsmith amacsmith merged commit 4eef6f2 into main May 10, 2026
8 checks passed
amacsmith added a commit that referenced this pull request May 11, 2026
… input, fix scorecard pins

- scripts/__tests__/{aggregate,integration}.test.mjs: gate handler dispatch
  with Object.prototype.hasOwnProperty.call() to prevent prototype-chain
  lookups (js/unvalidated-dynamic-method-call #5, #6).
- scripts/__tests__/render-latest.test.mjs: dispatch on URL hostname rather
  than substring (js/incomplete-url-substring-sanitization #11).
- cli/src/add.mjs: openUrl() now validates input is a well-formed http(s)
  URL before passing argv to open/cmd/xdg-open. Defense in depth against
  js/indirect-command-line-injection #7; existing execFileSync (no shell)
  pattern in spawn.mjs is preserved.
- .github/workflows/scorecard.yml: pin ossf/scorecard-action,
  actions/checkout, and github/codeql-action/upload-sarif to underlying
  commit SHAs (not annotated-tag-object SHAs) — fixes the
  "imposter commit" rejection from the OSSF webapp.

Six other CodeQL alerts (#2, #3, #8, #9, #10, #12) were dismissed as
won't-fix with rationale recorded on the alert.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants