Skip to content

Add StackOverflow download coverage#12

Open
konard wants to merge 5 commits into
mainfrom
issue-11-2861fe92
Open

Add StackOverflow download coverage#12
konard wants to merge 5 commits into
mainfrom
issue-11-2861fe92

Conversation

@konard
Copy link
Copy Markdown
Collaborator

@konard konard commented Oct 25, 2025

Resolves #11.

Summary

  • Route StackOverflow question URLs through StackPrinter for direct HTML-derived captures in both JavaScript and Rust, with retries for transient StackPrinter error pages.
  • Cover StackOverflow downloads across supported outputs: text, markdown, HTML, image, archive/zip, DOCX, and PDF.
  • Add JavaScript and Rust live integration tests gated by STACKOVERFLOW_INTEGRATION, plus CI steps that run those StackOverflow checks.
  • Preserve Rust CLI binary stdout by moving tracing output to stderr, and use a less brittle Playwright wait condition for StackOverflow screenshots.
  • Pass the workflow GITHUB_TOKEN to existing GitHub repository live tests so CI uses authenticated API limits.

Reproduce

Use the issue URL:

https://stackoverflow.com/questions/927358/how-do-i-undo-the-most-recent-local-commits-in-git

Before this change, direct fetch/document outputs could capture StackOverflow's anti-bot challenge page, and Rust archive stdout could be contaminated by tracing logs. The new tests exercise the same URL through JS and Rust download paths.

Verification

  • npm test -- --testPathPattern="(stackoverflow|stackoverflow-download)" --runInBand
  • STACKOVERFLOW_INTEGRATION=true npm test -- --testPathPattern="stackoverflow-download" --runInBand --testTimeout=180000
  • npm run format:check
  • npm run lint
  • npm run check:duplication
  • node ../scripts/validate-changeset.mjs
  • npm test -- --testPathIgnorePatterns="docker.test.js" --runInBand
  • cargo test --test integration stackoverflow_download -- --nocapture
  • STACKOVERFLOW_INTEGRATION=1 cargo test --test integration stackoverflow_download::live -- --nocapture --test-threads=1
  • cargo fmt --all -- --check
  • cargo clippy --all-targets --all-features -- -D warnings
  • cargo test --all-features --verbose
  • cargo test --doc --verbose
  • git diff --check -- .github/workflows/js.yml .github/workflows/rust.yml

Note: local npm install reported the expected engine warning because this workspace has Node 20.20.2 while the package expects Node >=22 <23; the checks above passed locally, and CI uses its configured Node version.

Adding CLAUDE.md with task information for AI processing.
This file will be removed when the task is complete.

Issue: undefined
@konard konard self-assigned this Oct 25, 2025
…down and image support

This commit implements comprehensive integration tests to verify that both Puppeteer and Playwright engines can:
- Download the StackOverflow page at https://stackoverflow.com/questions/927358/how-do-i-undo-the-most-recent-local-commits-in-git
- Convert the page HTML to markdown format
- Capture screenshots of the page as PNG images

Changes:
- Add StackOverflow download tests for both Puppeteer and Playwright engines in tests/integration/browser-engines.test.js
- Update jest.config.mjs to include integration tests in testMatch pattern
- Fix Playwright adapter's setUserAgent implementation to use route interception (Playwright doesn't have page.setUserAgent())
- Increase timeout for StackOverflow tests to 60000ms for navigation and 90000ms for test completion to handle slower page loads

All tests pass successfully, confirming that both browser engines work correctly for downloading and processing complex real-world pages.

Fixes #11

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
@konard konard changed the title [WIP] Add test that we can actually download StackOverflow page (markdown + image) in all our supported engines Add integration tests for StackOverflow page download with markdown and image support Oct 25, 2025
@konard konard marked this pull request as ready for review October 25, 2025 08:22
@konard
Copy link
Copy Markdown
Collaborator Author

konard commented Oct 25, 2025

🤖 Solution Draft Log

This log file contains the complete execution trace of the AI solution draft process.

📎 Log file uploaded as GitHub Gist (277KB)
🔗 View complete solution draft log


Now working session is ended, feel free to review and add any feedback on the solution draft.

@konard
Copy link
Copy Markdown
Collaborator Author

konard commented Jun 5, 2026

Resolve conflicts, make sure all download types txt, markdown, image, html and so on are fully and correctly supported for StackOverflow. And in all supported languages.

@konard konard marked this pull request as draft June 5, 2026 11:31
@konard konard marked this pull request as draft June 5, 2026 11:31
@konard
Copy link
Copy Markdown
Collaborator Author

konard commented Jun 5, 2026

🤖 AI Work Session Started

Starting automated work session at 2026-06-05T11:31:44.986Z

The PR has been converted to draft mode while work is in progress.

This comment marks the beginning of an AI work session. Please wait for the session to finish, and provide your feedback.

@konard konard changed the title Add integration tests for StackOverflow page download with markdown and image support Add StackOverflow download coverage Jun 5, 2026
@konard konard marked this pull request as ready for review June 5, 2026 12:13
@konard
Copy link
Copy Markdown
Collaborator Author

konard commented Jun 5, 2026

Working session summary

Implemented and pushed the fix to PR 12: #12

What changed:

  • StackOverflow question URLs now route through StackPrinter in JS and Rust for stable text/markdown/html-derived captures.
  • Added live StackOverflow coverage for txt, markdown, HTML, image, archive/zip, DOCX, and PDF.
  • Fixed Rust binary stdout contamination by sending tracing to stderr.
  • Added retries for transient StackPrinter errors.
  • Wired GITHUB_TOKEN into existing GitHub live tests to avoid CI rate-limit failures.

Status:

  • PR is ready for review.
  • Merge status is CLEAN.
  • Latest CI on 63eeb76 is passing for both JavaScript and Rust.
  • Local worktree is clean.

This summary was automatically extracted from the AI working session output.

@konard
Copy link
Copy Markdown
Collaborator Author

konard commented Jun 5, 2026

🤖 Solution Draft Log

This log file contains the complete execution trace of the AI solution draft process.

💰 Cost estimation:

  • Model: GPT-5.5
  • Provider: OpenAI
  • Public pricing estimate: $24.863605

📊 Context and tokens usage:

  • 588.9K / 1.1M (56%) input tokens, 68.5K / 128K (54%) output tokens

Total: (588.9K + 15.9M cached) input tokens, 68.5K output tokens, $24.863605 cost

🤖 Models used:

  • Tool: OpenAI Codex
  • Requested: gpt-5.5
  • Model: GPT-5.5 (gpt-5.5)

📎 Log file uploaded as Repository (95643KB)


Now working session is ended, feel free to review and add any feedback on the solution draft.

@konard
Copy link
Copy Markdown
Collaborator Author

konard commented Jun 5, 2026

✅ Ready to merge

This pull request is now ready to be merged:

  • All CI checks have passed
  • No merge conflicts
  • No pending changes

Monitored by hive-mind with --auto-restart-until-mergeable flag

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add test that we can actually download StackOverflow page (markdown + image) in all our supported engines

1 participant