Releases: ixnode/crawl4md
Releases · ixnode/crawl4md
v0.1.7
[0.1.7] - 2026-05-17
Added
- Align README import examples with current crawl4md init exports
Full Changelog: v0.1.6...v0.1.7
v0.1.6
[0.1.6] - 2026-05-17
Added
- Prefix extracted image text with Figure: "..." in remove_images fixtures
- Localize remove_images figure labels from HTML language with en fallback
- Propagate detected page language through fetch/convert pipeline into all rule apply calls
Refactored
- Move markdown-converter session group resolution into test suite
- Move internal helper functions to crawl4md.utils.check_helpers
- Move Markdown converter session pydantic models out of tests
- Extract progress printing helper from tests support module
- Extract Markdown converter session loading and HTML normalization helpers
- Extract Markdown converter fetcher/session discovery helpers from tests
- Extract pipeline check cases into helper class without behavior change
- Auto-derive progress checks from helper method list
- Introduce shared ProgressChecksTestCase base for pipeline/profile progress-driven checks
- Inline HTML fetching/normalization into BaseMarkdownFetcher and remove fetch/html.py
- Rename Markdown fetcher modules/classes to HTML fetcher naming and update imports
- Move test progress runner from tests/support into crawl4md.utils and update imports
- Move preprocessing RuleCase/data-provider helpers into crawl4md.utils and update test imports
- Move language label mapping into crawl4md/i18n/labels module
- Move frames module into crawl4md.utils and update check helper imports
- Move check CLI module into crawl4md.commands and update script entry points
- Move top-level runtime modules into crawl4md.core and update imports/entrypoint
Test
- Migrate crawl4ai content-selector unit checks to session fixtures
- Remove direct unittest main guard from Markdown converter test module
- Replace private _asyncioRunner access with get_running_loop in asyncSetUp
Style
- Reduce panel title padding in test headers
Full Changelog: v0.1.5...v0.1.6
v0.1.5
[0.1.5] - 2026-05-03
Added
- Add data-driven Markdown converter tests
- Suppress Crawl4AI output during Markdown conversion
- Add Boeing 707 Wikipedia converter fixture
- Add Boeing 707 markdown-fit converter fixture
- Document Markdown converter fixture conventions
- Add crawl step duration output
- Add group filtering to Markdown converter checks
- Handle missing Markdown converter test groups
- Add list fixture for Markdown preprocessing
- Normalize spacing between Markdown list items
- Add kreuzberg-dev Markdown parser
- Add preprocessing option to normalize Markdown tables
- Split whitespace and linebreak normalization rules
- Add preprocessing option to remove cite links
- Move crawled output to crawled and store docs separately
- Add base classes for markdown converters and fetchers
Refactored
- Rename Markdown fetcher and converter for Crawl4AI
Full Changelog: v0.1.4...v0.1.5
v0.1.4
[0.1.4] - 2026-05-03
Fixed
- Suppress crawl4ai SyntaxWarning under Python 3.14
Full Changelog: v0.1.3...v0.1.4
v0.1.3
[0.1.3] - 2026-05-02
Added
- Add realistic HTTP headers to HtmlFetcher to prevent 403 responses
Full Changelog: v0.1.2...v0.1.3
v0.1.2
[0.1.2] - 2026-05-02
Added
- Update README for PyPI package usage and clarify batch crawler setup
Full Changelog: v0.1.1...v0.1.2
v0.1.1
[0.1.1] - 2026-05-02
Added
- Add uv check command for tests and Ruff linting
- Export public Python API and expand README with usage and crawl4ai context
Refactored
- Split
fetch_markdowninto fetch and convert layers - Move markdown preprocessing from CLI into convert pipeline
- Refactor markdown fetch/convert into classes and add sync APIs
Full Changelog: v0.1.0...v0.1.1
v0.1.0 - Initial Release
[0.1.0] - 2026-05-02
Added
- Initial release
- CLI for crawling single pages and sitemaps
- YAML-based project configuration
- Deterministic Markdown file output
- Support for multiple Markdown extraction modes
- Configurable Markdown preprocessing pipeline
- Automatic cleanup of common wiki and web artifacts
- Automatic removal of reference and appendix sections
- Whitespace and document structure normalization
- Automatic insertion of missing top-level headings
- Clear separation of crawling, preprocessing, and file writing
- Basic test coverage for core Markdown processing
Full Changelog: https://github.com/ixnode/crawl4md/commits/v0.1.0