feat: stream MkDocs sites by URL#4
Conversation
- Accept deployed MkDocs root URLs in the listen command - Discover canonical pages from sitemap.xml - Enforce bounded HTTPS fetching and versioned caching - Add offline tests with complete line and branch coverage
PR Review: feat: stream MkDocs sites by URLSummaryThis is a substantial, well-engineered refactor that replaces hardcoded provider-specific commands ( What changed
Strengths
Issues1. Loss of diagnostic detail in fetch errors
except httpx.HTTPError as error:
raise OSError(f"Could not download {current_url}") from errorThis makes debugging harder (e.g., a 403 vs 500 look identical). Consider including the status code. 2. Silent discard of unrecognized sitemap URLs
try:
page_url = policy.validate_url(location.text.strip())
except ValueError:
continueThis is intentional (external links) but could hide broken internal links in a sitemap. Not a blocker, but worth noting. 3. Section derivation is only one level deep
4. No
|
There was a problem hiding this comment.
Code Review
This pull request refactors the application to support streaming documentation from any deployed MkDocs site dynamically via sitemaps, replacing the previous hardcoded support for FastAPI and Typer. It introduces secure web fetching and caching mechanisms (SecureWebFetcher, URLPolicy, WebCache) to validate URLs and prevent unsafe redirects, alongside comprehensive test suites. The review feedback focuses on improving robustness and security, suggesting rejecting backslashes in paths to prevent path traversal, falling back to stale cached pages if sitemap parsing fails, passing raw bytes directly to ElementTree to handle encoding automatically, defensively handling unexpected URLs in sitemaps, and stripping leading slashes from relative page references to ensure correct URL resolution.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
- Reject backslashes and unsafe protocol-relative references - Preserve stale catalogs when fresh sitemaps cannot be parsed - Parse XML using declared encodings and skip mismatched paths - Add regression tests for reviewed edge cases
|
Here's my review of the full PR: PR Review: feat: stream MkDocs sites by URLI've read all 14 changed files. This is a well-executed refactor. Below is my assessment. Architecture & Design
Test Quality100% branch coverage, fully deterministic (no network), all external dependencies stubbed. Edge cases covered: invalid UTF-8, encoding declarations, redirect loops, missing Location headers, duplicate sitemap URLs, empty sitemaps, stale cache, catalog snapshot corruption, interactive page selection, playback controls. Items from Existing ReviewsThe gemini-code-assist review flagged several items that are already addressed in the submitted code:
Minor Observations
VerdictApproved. The PR is production-quality: strong security posture, thorough tests, clean modularity. No blocking issues. |


What
Make Binge Docs stream documentation from any deployed MkDocs root URL.
Why
Requiring each documentation site to be registered in source code limits the tool to a curated list. URL-driven discovery lets users listen to compatible MkDocs sites immediately.
How
binge-docs listen MKDOCS_URL.sitemap.xml.Tests
uv lock --checkuv run ruff check .uv run pytest --cov=binge_docs --cov-branch --cov-fail-under=100Breaking Changes
binge-docs fastapi,binge-docs typer, and the source registry interface.listen.Related Issues
No related GitHub issue exists.