Add Wikipedia page download test for all supported engines by konard · Pull Request #10 · link-assistant/web-capture

konard · 2025-10-25T08:07:10Z

Summary

Closes #8.

Adds a live integration test proving web-capture can actually download the Wikipedia page (https://en.wikipedia.org/wiki/Wikipedia) as both Markdown and an image (PNG screenshot) in every supported engine, across both implementations in this monorepo:

Implementation	Supported engine(s)	New test
JS (`js/`)	Puppeteer, Playwright	`js/tests/integration/wikipedia-download.test.js`
Rust (`rust/`)	chromiumoxide (`chrome`)	`rust/tests/integration/wikipedia_download.rs`

The work was rethought and applied consistently everywhere in this single PR — implementation, CI wiring, docs, and changeset — after merging the latest main into the branch.

What's included

JS test — describe.each(['puppeteer', 'playwright']) covers both engines for Markdown + PNG, plus an explicit engine-parity test that downloads with both. Uses the existing createBrowser, convertHtmlToMarkdown, and retry helpers; navigation retries with exponential backoff. Validates the PNG magic number and that the Markdown carries real content without raw HTML scaffolding.
Rust test — 2 offline assertions (the supported engine is chromiumoxide; the URL routes through the generic browser path, not Google Docs) plus 2 live #[tokio::test]s that render Markdown via render_html + convert_html_to_markdown and capture a PNG via capture_screenshot. Live tests short-circuit when no Chrome/Chromium binary is present.
CI — dedicated live steps added to both .github/workflows/js.yml and .github/workflows/rust.yml, mirroring the existing Habr / Google Docs integration steps.
Docs — js/README.md and rust/README.md document the new WIKIPEDIA_INTEGRATION opt-in alongside the existing integration suites.
Changeset — js/.changeset/wikipedia-download-test.md (patch) for the JS package.

Gating

Live tests are skipped by default so offline/default CI stays deterministic. Enable them with:

# JS
WIKIPEDIA_INTEGRATION=true npm test -- --testPathPattern="wikipedia-download"
# Rust
WIKIPEDIA_INTEGRATION=1 cargo test --test integration wikipedia_download::live -- --nocapture

Verification (local)

JS live suite: 5 passed (Puppeteer + Playwright Markdown/image + parity).
Rust: offline 4 passed, live 2 passed.
cargo fmt --check, cargo clippy, prettier --check, eslint (0 errors), and jscpd duplication check all clean.

Adding CLAUDE.md with task information for AI processing. This file will be removed when the task is complete. Issue: undefined

Add comprehensive integration tests that verify the ability to download Wikipedia pages (https://en.wikipedia.org/wiki/Wikipedia) in all three formats (HTML, Markdown, and screenshot) using both Puppeteer and Playwright browser engines. Test coverage includes: - Downloading Wikipedia page as HTML (both engines) - Converting Wikipedia page to Markdown (both engines) - Capturing Wikipedia page as PNG screenshot (both engines) - Comparing content download across both engines Also updated jest.config.mjs to include integration tests in the test suite. Fixes #8 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

This reverts commit 3657e47.

konard · 2025-10-25T08:21:56Z

🤖 Solution Draft Log

This log file contains the complete execution trace of the AI solution draft process.

📎 Log file uploaded as GitHub Gist (279KB)
🔗 View complete solution draft log

Now working session is ended, feel free to review and add any feedback on the solution draft.

konard · 2026-06-01T21:03:27Z

Get latest changes from default branch.

Make sure this feature is rethinked and is properly implemented for all APIs we have in codebase, and docs are updated.

We need to ensure all changes are correct, consistent, validated, tested, documented, logged and fully meet each and all discussed requirements in deepest and widest possible sense (check issue description and all comments in issue and in pull request, make sure each and every requirement listed before actually checking if they were addressed). Nothing should be defered or delayed. The scope is the entire repository, every change in one place should also be applied in all places in the codebase and docs. Ensure all CI/CD checks pass.

Please plan and execute everything in this single pull request, you have unlimited time and context, as context auto-compacts and you can continue indefinitely, until it is each and every requirement fully addressed, and everything is totally done.

konard · 2026-06-01T21:04:11Z

🤖 AI Work Session Started

Starting automated work session at 2026-06-01T21:04:08.881Z

The PR has been converted to draft mode while work is in progress.

This comment marks the beginning of an AI work session. Please wait for the session to finish, and provide your feedback.

# Conflicts: # js/jest.config.mjs # js/tests/integration/wikipedia-download.test.js

…ines (#8) Add live integration tests that actually download the Wikipedia article (https://en.wikipedia.org/wiki/Wikipedia) as both Markdown and a PNG image for every supported browser engine, across both implementations: - JS: Puppeteer and Playwright (js/tests/integration/wikipedia-download.test.js) - Rust: chromiumoxide (rust/tests/integration/wikipedia_download.rs) The suites are gated behind WIKIPEDIA_INTEGRATION and wired into the JS and Rust CI workflows as dedicated live steps, mirroring the existing Habr and Google Docs integration suites. Docs (both READMEs) and a JS changeset are updated accordingly.

konard · 2026-06-01T21:37:51Z

Working session summary

Done. Issue #8 is fully resolved and PR #10 is ready with all CI checks green.

Summary

PR #10 (issue-8-2bf69acf → main): #10 — now ready for review, mergeable, and all CI checks pass.

What was delivered

Tests that actually download the Wikipedia page (https://en.wikipedia.org/wiki/Wikipedia) as both Markdown and an image (PNG) in every supported engine, across both implementations in the monorepo:

Implementation	Engines	Test file
JS	Puppeteer, Playwright	`js/tests/integration/wikipedia-download.test.js`
Rust	chromiumoxide	`rust/tests/integration/wikipedia_download.rs`

Plus: CI live steps in both js.yml and rust.yml, docs updates (js/README.md, rust/README.md), and a JS changeset.

Verified on CI (not just locally)

JS live step: Tests: 5 passed — Puppeteer + Playwright × (Markdown + PNG) + engine parity.
Rust live step: live_download_wikipedia_page_as_markdown ... ok and ...as_image ... ok.
All other jobs green: JS/Rust lint+format, changeset check, publishable-changes check, Rust Test on ubuntu/macOS/windows, Rust Build.

Process notes

Merged latest main into the branch first, resolving the DIRTY conflict (the branch's old flat layout vs. main's js/ + rust/ split).
Live suites are gated behind WIKIPEDIA_INTEGRATION so default/offline runs stay deterministic, mirroring the existing Habr and Google Docs suites.
Working tree is clean; everything committed and pushed to issue-8-2bf69acf (the only branch touched).

This summary was automatically extracted from the AI working session output.

konard · 2026-06-01T21:38:02Z

🤖 Solution Draft Log

This log file contains the complete execution trace of the AI solution draft process.

💰 Cost: $5.823867

📊 Context and tokens usage:

Claude Opus 4.8: (2 sub-sessions)

116.5K / 1M (12%) input tokens, 28.3K / 128K (22%) output tokens
53.5K / 1M (5%) input tokens, 6.7K / 128K (5%) output tokens

Total: (13.4K new + 194.7K cache writes + 7.1M cache reads) input tokens, 40.5K output tokens, $5.823867 cost

🤖 Models used:

Tool: Anthropic Claude Code
Requested: opus
Model: Claude Opus 4.8 (claude-opus-4-8)

📎 Log file uploaded as Gist (2610KB)

View complete solution draft log

Now working session is ended, feel free to review and add any feedback on the solution draft.

konard · 2026-06-01T21:40:24Z

✅ Ready to merge

This pull request is now ready to be merged:

All CI checks have passed
No merge conflicts
No pending changes

Monitored by hive-mind with --auto-restart-until-mergeable flag

Initial commit with task details for issue #8

3657e47

Adding CLAUDE.md with task information for AI processing. This file will be removed when the task is complete. Issue: undefined

konard self-assigned this Oct 25, 2025

konard changed the title ~~[WIP] Add test that we can actually download Wikipedia page (markdown + image) in all our supported engines~~ Add Wikipedia page download test for all supported engines Oct 25, 2025

konard marked this pull request as ready for review October 25, 2025 08:21

Revert "Initial commit with task details for issue #8"

c31c440

This reverts commit 3657e47.

konard mentioned this pull request Apr 10, 2026

npm package version (1.1.2) is behind GitHub source (1.2.0) - missing critical modules #38

Closed

konard marked this pull request as draft June 1, 2026 21:04

konard added 2 commits June 1, 2026 21:07

Merge remote-tracking branch 'origin/main' into issue-8-2bf69acf

d3f828f

# Conflicts: # js/jest.config.mjs # js/tests/integration/wikipedia-download.test.js

konard marked this pull request as ready for review June 1, 2026 21:25

konard merged commit 2f99daf into main Jun 4, 2026
18 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Wikipedia page download test for all supported engines#10

Add Wikipedia page download test for all supported engines#10
konard merged 5 commits into
mainfrom
issue-8-2bf69acf

konard commented Oct 25, 2025 •

edited

Loading

Uh oh!

konard commented Oct 25, 2025

Uh oh!

konard commented Jun 1, 2026

Uh oh!

konard commented Jun 1, 2026

Uh oh!

konard commented Jun 1, 2026

Uh oh!

konard commented Jun 1, 2026

Uh oh!

konard commented Jun 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

konard commented Oct 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What's included

Gating

Verification (local)

Uh oh!

konard commented Oct 25, 2025

🤖 Solution Draft Log

Uh oh!

konard commented Jun 1, 2026

Uh oh!

konard commented Jun 1, 2026

Uh oh!

konard commented Jun 1, 2026

Working session summary

Summary

What was delivered

Verified on CI (not just locally)

Process notes

Uh oh!

konard commented Jun 1, 2026

🤖 Solution Draft Log

💰 Cost: $5.823867

📊 Context and tokens usage:

🤖 Models used:

📎 Log file uploaded as Gist (2610KB)

Uh oh!

konard commented Jun 1, 2026

✅ Ready to merge

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

konard commented Oct 25, 2025 •

edited

Loading