Skip to content

Add GitHub repository snapshot capture#6

Merged
konard merged 4 commits into
mainfrom
issue-5-37ef445c
Jun 5, 2026
Merged

Add GitHub repository snapshot capture#6
konard merged 4 commits into
mainfrom
issue-5-37ef445c

Conversation

@konard
Copy link
Copy Markdown
Collaborator

@konard konard commented Oct 25, 2025

Resolves #5.

Summary

  • Adds GitHub repository URL detection plus REST API snapshot fetchers for README, repository metadata, and root file tree in both JS and Rust.
  • Makes /txt and /markdown return compact repository snapshots for plain GitHub repo URLs, while /html and screenshot capture continue returning the original rendered GitHub page.
  • Covers CLI capture paths, attachment filenames, docs, JS changeset, Rust version bump, and live CI jobs for GitHub repository text/markdown/HTML/screenshot capture.
  • Resolves the merge conflict with the current main monorepo layout.

Reproduction

Before this change, capturing https://github.com/link-assistant/web-capture as text/markdown did not provide a universal repository README/tree snapshot: text capture treated GitHub as an HTML page instead of a text source, and markdown capture depended on the GitHub web shell.

After this change:

  • /txt?url=https://github.com/link-assistant/web-capture returns a plain-text repository summary, root files, and README content.
  • /markdown?url=https://github.com/link-assistant/web-capture returns the same repository data as Markdown.
  • /html?url=... and PNG screenshot capture still use the original GitHub page.

Tests

  • npm test -- --runTestsByPath tests/unit/github.test.js tests/integration/api-endpoints.test.js
  • npm run lint
  • npm run format:check
  • npm run check:duplication
  • npm test -- --testPathIgnorePatterns="docker.test.js"
  • GITHUB_REPOSITORY_INTEGRATION=true npm test -- --runTestsByPath tests/integration/github-readme.test.js --testTimeout=120000
  • cargo fmt --all -- --check
  • cargo clippy --all-targets --all-features -- -D warnings
  • cargo test --all-features --verbose
  • cargo test --doc --verbose
  • GITHUB_REPOSITORY_INTEGRATION=1 cargo test --test integration github_repository::live -- --nocapture

Adding CLAUDE.md with task information for AI processing.
This file will be removed when the task is complete.

Issue: undefined
@konard konard self-assigned this Oct 25, 2025
… downloads

- Add comprehensive test suite for GitHub README page in all supported browser engines
- Test markdown download and conversion with both Puppeteer and Playwright engines
- Test screenshot capture with both Puppeteer and Playwright engines
- Add engine parity tests to ensure both engines produce valid results
- Fix Playwright browser adapter to properly handle browser context
- Update jest.config.mjs to include integration tests in test matching patterns

All 11 new tests pass successfully, verifying that both Puppeteer and Playwright
can download and convert the GitHub README page to markdown and capture screenshots.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
@konard konard marked this pull request as ready for review October 25, 2025 08:27
@konard konard changed the title [WIP] Add test that we can actually download markdown version and screenshot of GitHub README page in all our supported engines Add test that we can actually download markdown version and screenshot of GitHub README page in all our supported engines Oct 25, 2025
@konard
Copy link
Copy Markdown
Collaborator Author

konard commented Oct 25, 2025

🤖 Solution Draft Log

This log file contains the complete execution trace of the AI solution draft process.

📎 Log file uploaded as GitHub Gist (343KB)
🔗 View complete solution draft log


Now working session is ended, feel free to review and add any feedback on the solution draft.

@konard
Copy link
Copy Markdown
Collaborator Author

konard commented Jun 4, 2026

We need to resolve conflicts, and make sure we are able to correctly get README.md of any GitHub project universally as txt, markdown and html.

And also other details of the repository as files tree, project details and so on, so all info displayed in GitHub repository page can be compactly presented as txt, markdown, or original html and so on.

@konard konard marked this pull request as draft June 4, 2026 23:48
@konard
Copy link
Copy Markdown
Collaborator Author

konard commented Jun 4, 2026

🤖 AI Work Session Started

Starting automated work session at 2026-06-04T23:48:17.086Z

The PR has been converted to draft mode while work is in progress.

This comment marks the beginning of an AI work session. Please wait for the session to finish, and provide your feedback.

@konard konard changed the title Add test that we can actually download markdown version and screenshot of GitHub README page in all our supported engines Add GitHub repository snapshot capture Jun 5, 2026
@konard konard marked this pull request as ready for review June 5, 2026 00:25
@konard
Copy link
Copy Markdown
Collaborator Author

konard commented Jun 5, 2026

Working session summary

Implemented and pushed the fix to PR #6:

#6

What changed:

  • Resolved merge conflicts with current main.
  • Added GitHub repository snapshot capture for JS and Rust.
  • /txt and /markdown now return repository metadata, root file tree, and README content for plain GitHub repo URLs.
  • /html and screenshot capture still return the original GitHub page.
  • Added offline and live GitHub repository tests, plus CI steps for JS/Rust live coverage.
  • Updated PR title/body and marked it ready for review.

Verification:

  • Local JS/Rust lint, format, unit/integration, doctest, live GitHub checks all passed.
  • GitHub Actions passed on commit 1a1bf935b92be4c0841368b8bca465f0d54e40c8.
  • PR is MERGEABLE, merge state CLEAN.
  • Local working tree is clean.

This summary was automatically extracted from the AI working session output.

@konard
Copy link
Copy Markdown
Collaborator Author

konard commented Jun 5, 2026

🤖 Solution Draft Log

This log file contains the complete execution trace of the AI solution draft process.

💰 Cost estimation:

  • Model: GPT-5.5
  • Provider: OpenAI
  • Public pricing estimate: $24.316772

📊 Context and tokens usage:

  • 664.8K / 1.1M (63%) input tokens, 71.2K / 128K (56%) output tokens

Total: (664.8K + 14.5M cached) input tokens, 71.2K output tokens, $24.316772 cost

🤖 Models used:

  • Tool: OpenAI Codex
  • Requested: gpt-5.5
  • Model: GPT-5.5 (gpt-5.5)

📎 Log file uploaded as Repository (101475KB)


Now working session is ended, feel free to review and add any feedback on the solution draft.

@konard konard merged commit 2610631 into main Jun 5, 2026
18 checks passed
@konard
Copy link
Copy Markdown
Collaborator Author

konard commented Jun 5, 2026

🎉 Auto-merged

This pull request has been automatically merged by hive-mind.

  • All CI checks have passed

Auto-merged by hive-mind with --auto-merge flag

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add test that we can actually download markdown version and screenshot of GitHub README page in all our supported engines

1 participant