Skip to content

Add Wikipedia page download test for all supported engines#10

Merged
konard merged 5 commits into
mainfrom
issue-8-2bf69acf
Jun 4, 2026
Merged

Add Wikipedia page download test for all supported engines#10
konard merged 5 commits into
mainfrom
issue-8-2bf69acf

Conversation

@konard
Copy link
Copy Markdown
Collaborator

@konard konard commented Oct 25, 2025

Summary

Closes #8.

Adds a live integration test proving web-capture can actually download the Wikipedia page (https://en.wikipedia.org/wiki/Wikipedia) as both Markdown and an image (PNG screenshot) in every supported engine, across both implementations in this monorepo:

Implementation Supported engine(s) New test
JS (js/) Puppeteer, Playwright js/tests/integration/wikipedia-download.test.js
Rust (rust/) chromiumoxide (chrome) rust/tests/integration/wikipedia_download.rs

The work was rethought and applied consistently everywhere in this single PR — implementation, CI wiring, docs, and changeset — after merging the latest main into the branch.

What's included

  • JS testdescribe.each(['puppeteer', 'playwright']) covers both engines for Markdown + PNG, plus an explicit engine-parity test that downloads with both. Uses the existing createBrowser, convertHtmlToMarkdown, and retry helpers; navigation retries with exponential backoff. Validates the PNG magic number and that the Markdown carries real content without raw HTML scaffolding.
  • Rust test — 2 offline assertions (the supported engine is chromiumoxide; the URL routes through the generic browser path, not Google Docs) plus 2 live #[tokio::test]s that render Markdown via render_html + convert_html_to_markdown and capture a PNG via capture_screenshot. Live tests short-circuit when no Chrome/Chromium binary is present.
  • CI — dedicated live steps added to both .github/workflows/js.yml and .github/workflows/rust.yml, mirroring the existing Habr / Google Docs integration steps.
  • Docsjs/README.md and rust/README.md document the new WIKIPEDIA_INTEGRATION opt-in alongside the existing integration suites.
  • Changesetjs/.changeset/wikipedia-download-test.md (patch) for the JS package.

Gating

Live tests are skipped by default so offline/default CI stays deterministic. Enable them with:

# JS
WIKIPEDIA_INTEGRATION=true npm test -- --testPathPattern="wikipedia-download"
# Rust
WIKIPEDIA_INTEGRATION=1 cargo test --test integration wikipedia_download::live -- --nocapture

Verification (local)

  • JS live suite: 5 passed (Puppeteer + Playwright Markdown/image + parity).
  • Rust: offline 4 passed, live 2 passed.
  • cargo fmt --check, cargo clippy, prettier --check, eslint (0 errors), and jscpd duplication check all clean.

Adding CLAUDE.md with task information for AI processing.
This file will be removed when the task is complete.

Issue: undefined
@konard konard self-assigned this Oct 25, 2025
Add comprehensive integration tests that verify the ability to download
Wikipedia pages (https://en.wikipedia.org/wiki/Wikipedia) in all three
formats (HTML, Markdown, and screenshot) using both Puppeteer and
Playwright browser engines.

Test coverage includes:
- Downloading Wikipedia page as HTML (both engines)
- Converting Wikipedia page to Markdown (both engines)
- Capturing Wikipedia page as PNG screenshot (both engines)
- Comparing content download across both engines

Also updated jest.config.mjs to include integration tests in the test
suite.

Fixes #8

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
@konard konard changed the title [WIP] Add test that we can actually download Wikipedia page (markdown + image) in all our supported engines Add Wikipedia page download test for all supported engines Oct 25, 2025
@konard konard marked this pull request as ready for review October 25, 2025 08:21
@konard
Copy link
Copy Markdown
Collaborator Author

konard commented Oct 25, 2025

🤖 Solution Draft Log

This log file contains the complete execution trace of the AI solution draft process.

📎 Log file uploaded as GitHub Gist (279KB)
🔗 View complete solution draft log


Now working session is ended, feel free to review and add any feedback on the solution draft.

@konard
Copy link
Copy Markdown
Collaborator Author

konard commented Jun 1, 2026

Get latest changes from default branch.

Make sure this feature is rethinked and is properly implemented for all APIs we have in codebase, and docs are updated.

We need to ensure all changes are correct, consistent, validated, tested, documented, logged and fully meet each and all discussed requirements in deepest and widest possible sense (check issue description and all comments in issue and in pull request, make sure each and every requirement listed before actually checking if they were addressed). Nothing should be defered or delayed. The scope is the entire repository, every change in one place should also be applied in all places in the codebase and docs. Ensure all CI/CD checks pass.

Please plan and execute everything in this single pull request, you have unlimited time and context, as context auto-compacts and you can continue indefinitely, until it is each and every requirement fully addressed, and everything is totally done.

@konard konard marked this pull request as draft June 1, 2026 21:04
@konard
Copy link
Copy Markdown
Collaborator Author

konard commented Jun 1, 2026

🤖 AI Work Session Started

Starting automated work session at 2026-06-01T21:04:08.881Z

The PR has been converted to draft mode while work is in progress.

This comment marks the beginning of an AI work session. Please wait for the session to finish, and provide your feedback.

konard added 2 commits June 1, 2026 21:07
# Conflicts:
#	js/jest.config.mjs
#	js/tests/integration/wikipedia-download.test.js
…ines (#8)

Add live integration tests that actually download the Wikipedia article
(https://en.wikipedia.org/wiki/Wikipedia) as both Markdown and a PNG image
for every supported browser engine, across both implementations:

- JS: Puppeteer and Playwright (js/tests/integration/wikipedia-download.test.js)
- Rust: chromiumoxide (rust/tests/integration/wikipedia_download.rs)

The suites are gated behind WIKIPEDIA_INTEGRATION and wired into the JS and
Rust CI workflows as dedicated live steps, mirroring the existing Habr and
Google Docs integration suites. Docs (both READMEs) and a JS changeset are
updated accordingly.
@konard konard marked this pull request as ready for review June 1, 2026 21:25
@konard
Copy link
Copy Markdown
Collaborator Author

konard commented Jun 1, 2026

Working session summary

Done. Issue #8 is fully resolved and PR #10 is ready with all CI checks green.

Summary

PR #10 (issue-8-2bf69acfmain): #10 — now ready for review, mergeable, and all CI checks pass.

What was delivered

Tests that actually download the Wikipedia page (https://en.wikipedia.org/wiki/Wikipedia) as both Markdown and an image (PNG) in every supported engine, across both implementations in the monorepo:

Implementation Engines Test file
JS Puppeteer, Playwright js/tests/integration/wikipedia-download.test.js
Rust chromiumoxide rust/tests/integration/wikipedia_download.rs

Plus: CI live steps in both js.yml and rust.yml, docs updates (js/README.md, rust/README.md), and a JS changeset.

Verified on CI (not just locally)

  • JS live step: Tests: 5 passed — Puppeteer + Playwright × (Markdown + PNG) + engine parity.
  • Rust live step: live_download_wikipedia_page_as_markdown ... ok and ...as_image ... ok.
  • All other jobs green: JS/Rust lint+format, changeset check, publishable-changes check, Rust Test on ubuntu/macOS/windows, Rust Build.

Process notes

  • Merged latest main into the branch first, resolving the DIRTY conflict (the branch's old flat layout vs. main's js/ + rust/ split).
  • Live suites are gated behind WIKIPEDIA_INTEGRATION so default/offline runs stay deterministic, mirroring the existing Habr and Google Docs suites.
  • Working tree is clean; everything committed and pushed to issue-8-2bf69acf (the only branch touched).

This summary was automatically extracted from the AI working session output.

@konard
Copy link
Copy Markdown
Collaborator Author

konard commented Jun 1, 2026

🤖 Solution Draft Log

This log file contains the complete execution trace of the AI solution draft process.

💰 Cost: $5.823867

📊 Context and tokens usage:

Claude Opus 4.8: (2 sub-sessions)

  1. 116.5K / 1M (12%) input tokens, 28.3K / 128K (22%) output tokens
  2. 53.5K / 1M (5%) input tokens, 6.7K / 128K (5%) output tokens

Total: (13.4K new + 194.7K cache writes + 7.1M cache reads) input tokens, 40.5K output tokens, $5.823867 cost

🤖 Models used:

  • Tool: Anthropic Claude Code
  • Requested: opus
  • Model: Claude Opus 4.8 (claude-opus-4-8)

📎 Log file uploaded as Gist (2610KB)


Now working session is ended, feel free to review and add any feedback on the solution draft.

@konard
Copy link
Copy Markdown
Collaborator Author

konard commented Jun 1, 2026

✅ Ready to merge

This pull request is now ready to be merged:

  • All CI checks have passed
  • No merge conflicts
  • No pending changes

Monitored by hive-mind with --auto-restart-until-mergeable flag

@konard konard merged commit 2f99daf into main Jun 4, 2026
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add test that we can actually download Wikipedia page (markdown + image) in all our supported engines

1 participant