Skip to content

Releases: ixnode/crawl4md

v0.1.7

17 May 00:15

Choose a tag to compare

[0.1.7] - 2026-05-17

Added

  • Align README import examples with current crawl4md init exports

Full Changelog: v0.1.6...v0.1.7

v0.1.6

16 May 23:50

Choose a tag to compare

[0.1.6] - 2026-05-17

Added

  • Prefix extracted image text with Figure: "..." in remove_images fixtures
  • Localize remove_images figure labels from HTML language with en fallback
  • Propagate detected page language through fetch/convert pipeline into all rule apply calls

Refactored

  • Move markdown-converter session group resolution into test suite
  • Move internal helper functions to crawl4md.utils.check_helpers
  • Move Markdown converter session pydantic models out of tests
  • Extract progress printing helper from tests support module
  • Extract Markdown converter session loading and HTML normalization helpers
  • Extract Markdown converter fetcher/session discovery helpers from tests
  • Extract pipeline check cases into helper class without behavior change
  • Auto-derive progress checks from helper method list
  • Introduce shared ProgressChecksTestCase base for pipeline/profile progress-driven checks
  • Inline HTML fetching/normalization into BaseMarkdownFetcher and remove fetch/html.py
  • Rename Markdown fetcher modules/classes to HTML fetcher naming and update imports
  • Move test progress runner from tests/support into crawl4md.utils and update imports
  • Move preprocessing RuleCase/data-provider helpers into crawl4md.utils and update test imports
  • Move language label mapping into crawl4md/i18n/labels module
  • Move frames module into crawl4md.utils and update check helper imports
  • Move check CLI module into crawl4md.commands and update script entry points
  • Move top-level runtime modules into crawl4md.core and update imports/entrypoint

Test

  • Migrate crawl4ai content-selector unit checks to session fixtures
  • Remove direct unittest main guard from Markdown converter test module
  • Replace private _asyncioRunner access with get_running_loop in asyncSetUp

Style

  • Reduce panel title padding in test headers

Full Changelog: v0.1.5...v0.1.6

v0.1.5

16 May 18:35

Choose a tag to compare

[0.1.5] - 2026-05-03

Added

  • Add data-driven Markdown converter tests
  • Suppress Crawl4AI output during Markdown conversion
  • Add Boeing 707 Wikipedia converter fixture
  • Add Boeing 707 markdown-fit converter fixture
  • Document Markdown converter fixture conventions
  • Add crawl step duration output
  • Add group filtering to Markdown converter checks
  • Handle missing Markdown converter test groups
  • Add list fixture for Markdown preprocessing
  • Normalize spacing between Markdown list items
  • Add kreuzberg-dev Markdown parser
  • Add preprocessing option to normalize Markdown tables
  • Split whitespace and linebreak normalization rules
  • Add preprocessing option to remove cite links
  • Move crawled output to crawled and store docs separately
  • Add base classes for markdown converters and fetchers

Refactored

  • Rename Markdown fetcher and converter for Crawl4AI

Full Changelog: v0.1.4...v0.1.5

v0.1.4

02 May 22:47

Choose a tag to compare

[0.1.4] - 2026-05-03

Fixed

  • Suppress crawl4ai SyntaxWarning under Python 3.14

Full Changelog: v0.1.3...v0.1.4

v0.1.3

02 May 21:46

Choose a tag to compare

[0.1.3] - 2026-05-02

Added

  • Add realistic HTTP headers to HtmlFetcher to prevent 403 responses

Full Changelog: v0.1.2...v0.1.3

v0.1.2

02 May 20:55

Choose a tag to compare

[0.1.2] - 2026-05-02

Added

  • Update README for PyPI package usage and clarify batch crawler setup

Full Changelog: v0.1.1...v0.1.2

v0.1.1

02 May 20:46

Choose a tag to compare

[0.1.1] - 2026-05-02

Added

  • Add uv check command for tests and Ruff linting
  • Export public Python API and expand README with usage and crawl4ai context

Refactored

  • Split fetch_markdown into fetch and convert layers
  • Move markdown preprocessing from CLI into convert pipeline
  • Refactor markdown fetch/convert into classes and add sync APIs

Full Changelog: v0.1.0...v0.1.1

v0.1.0 - Initial Release

02 May 19:14

Choose a tag to compare

[0.1.0] - 2026-05-02

Added

  • Initial release
  • CLI for crawling single pages and sitemaps
  • YAML-based project configuration
  • Deterministic Markdown file output
  • Support for multiple Markdown extraction modes
  • Configurable Markdown preprocessing pipeline
  • Automatic cleanup of common wiki and web artifacts
  • Automatic removal of reference and appendix sections
  • Whitespace and document structure normalization
  • Automatic insertion of missing top-level headings
  • Clear separation of crawling, preprocessing, and file writing
  • Basic test coverage for core Markdown processing

Full Changelog: https://github.com/ixnode/crawl4md/commits/v0.1.0