Add GitHub repository snapshot capture by konard · Pull Request #6 · link-assistant/web-capture

konard · 2025-10-25T08:04:11Z

Resolves #5.

Summary

Adds GitHub repository URL detection plus REST API snapshot fetchers for README, repository metadata, and root file tree in both JS and Rust.
Makes /txt and /markdown return compact repository snapshots for plain GitHub repo URLs, while /html and screenshot capture continue returning the original rendered GitHub page.
Covers CLI capture paths, attachment filenames, docs, JS changeset, Rust version bump, and live CI jobs for GitHub repository text/markdown/HTML/screenshot capture.
Resolves the merge conflict with the current main monorepo layout.

Reproduction

Before this change, capturing https://github.com/link-assistant/web-capture as text/markdown did not provide a universal repository README/tree snapshot: text capture treated GitHub as an HTML page instead of a text source, and markdown capture depended on the GitHub web shell.

After this change:

/txt?url=https://github.com/link-assistant/web-capture returns a plain-text repository summary, root files, and README content.
/markdown?url=https://github.com/link-assistant/web-capture returns the same repository data as Markdown.
/html?url=... and PNG screenshot capture still use the original GitHub page.

Tests

npm test -- --runTestsByPath tests/unit/github.test.js tests/integration/api-endpoints.test.js
npm run lint
npm run format:check
npm run check:duplication
npm test -- --testPathIgnorePatterns="docker.test.js"
GITHUB_REPOSITORY_INTEGRATION=true npm test -- --runTestsByPath tests/integration/github-readme.test.js --testTimeout=120000
cargo fmt --all -- --check
cargo clippy --all-targets --all-features -- -D warnings
cargo test --all-features --verbose
cargo test --doc --verbose
GITHUB_REPOSITORY_INTEGRATION=1 cargo test --test integration github_repository::live -- --nocapture

Adding CLAUDE.md with task information for AI processing. This file will be removed when the task is complete. Issue: undefined

… downloads - Add comprehensive test suite for GitHub README page in all supported browser engines - Test markdown download and conversion with both Puppeteer and Playwright engines - Test screenshot capture with both Puppeteer and Playwright engines - Add engine parity tests to ensure both engines produce valid results - Fix Playwright browser adapter to properly handle browser context - Update jest.config.mjs to include integration tests in test matching patterns All 11 new tests pass successfully, verifying that both Puppeteer and Playwright can download and convert the GitHub README page to markdown and capture screenshots. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

This reverts commit 89811e6.

konard · 2025-10-25T08:28:13Z

🤖 Solution Draft Log

This log file contains the complete execution trace of the AI solution draft process.

📎 Log file uploaded as GitHub Gist (343KB)
🔗 View complete solution draft log

Now working session is ended, feel free to review and add any feedback on the solution draft.

konard · 2026-06-04T22:25:54Z

We need to resolve conflicts, and make sure we are able to correctly get README.md of any GitHub project universally as txt, markdown and html.

And also other details of the repository as files tree, project details and so on, so all info displayed in GitHub repository page can be compactly presented as txt, markdown, or original html and so on.

konard · 2026-06-04T23:48:19Z

🤖 AI Work Session Started

Starting automated work session at 2026-06-04T23:48:17.086Z

The PR has been converted to draft mode while work is in progress.

This comment marks the beginning of an AI work session. Please wait for the session to finish, and provide your feedback.

konard · 2026-06-05T00:37:01Z

Working session summary

Implemented and pushed the fix to PR #6:

#6

What changed:

Resolved merge conflicts with current main.
Added GitHub repository snapshot capture for JS and Rust.
/txt and /markdown now return repository metadata, root file tree, and README content for plain GitHub repo URLs.
/html and screenshot capture still return the original GitHub page.
Added offline and live GitHub repository tests, plus CI steps for JS/Rust live coverage.
Updated PR title/body and marked it ready for review.

Verification:

Local JS/Rust lint, format, unit/integration, doctest, live GitHub checks all passed.
GitHub Actions passed on commit 1a1bf935b92be4c0841368b8bca465f0d54e40c8.
PR is MERGEABLE, merge state CLEAN.
Local working tree is clean.

This summary was automatically extracted from the AI working session output.

konard · 2026-06-05T00:37:19Z

🤖 Solution Draft Log

This log file contains the complete execution trace of the AI solution draft process.

💰 Cost estimation:

Model: GPT-5.5
Provider: OpenAI
Public pricing estimate: $24.316772

📊 Context and tokens usage:

664.8K / 1.1M (63%) input tokens, 71.2K / 128K (56%) output tokens

Total: (664.8K + 14.5M cached) input tokens, 71.2K output tokens, $24.316772 cost

🤖 Models used:

Tool: OpenAI Codex
Requested: gpt-5.5
Model: GPT-5.5 (gpt-5.5)

📎 Log file uploaded as Repository (101475KB)

View complete solution draft log

Now working session is ended, feel free to review and add any feedback on the solution draft.

konard · 2026-06-05T00:39:41Z

🎉 Auto-merged

This pull request has been automatically merged by hive-mind.

All CI checks have passed

Auto-merged by hive-mind with --auto-merge flag

Initial commit with task details for issue #5

89811e6

Adding CLAUDE.md with task information for AI processing. This file will be removed when the task is complete. Issue: undefined

konard self-assigned this Oct 25, 2025

konard marked this pull request as ready for review October 25, 2025 08:27

konard changed the title ~~[WIP] Add test that we can actually download markdown version and screenshot of GitHub README page in all our supported engines~~ Add test that we can actually download markdown version and screenshot of GitHub README page in all our supported engines Oct 25, 2025

Revert "Initial commit with task details for issue #5"

802db93

This reverts commit 89811e6.

konard marked this pull request as draft June 4, 2026 23:48

feat: add GitHub repository capture snapshots

1a1bf93

konard changed the title ~~Add test that we can actually download markdown version and screenshot of GitHub README page in all our supported engines~~ Add GitHub repository snapshot capture Jun 5, 2026

konard marked this pull request as ready for review June 5, 2026 00:25

konard merged commit 2610631 into main Jun 5, 2026
18 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add GitHub repository snapshot capture#6

Add GitHub repository snapshot capture#6
konard merged 4 commits into
mainfrom
issue-5-37ef445c

konard commented Oct 25, 2025 •

edited

Loading

Uh oh!

konard commented Oct 25, 2025

Uh oh!

konard commented Jun 4, 2026

Uh oh!

konard commented Jun 4, 2026

Uh oh!

konard commented Jun 5, 2026

Uh oh!

konard commented Jun 5, 2026

Uh oh!

Uh oh!

konard commented Jun 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

konard commented Oct 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Reproduction

Tests

Uh oh!

konard commented Oct 25, 2025

🤖 Solution Draft Log

Uh oh!

konard commented Jun 4, 2026

Uh oh!

konard commented Jun 4, 2026

Uh oh!

konard commented Jun 5, 2026

Working session summary

Uh oh!

konard commented Jun 5, 2026

🤖 Solution Draft Log

💰 Cost estimation:

📊 Context and tokens usage:

🤖 Models used:

📎 Log file uploaded as Repository (101475KB)

Uh oh!

Uh oh!

konard commented Jun 5, 2026

🎉 Auto-merged

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

konard commented Oct 25, 2025 •

edited

Loading