Reliable, deterministic output is a core goal of crawl4md.
Before merging changes, run the relevant checks to ensure behavior stays stable.
- Run a focused check while implementing (
check-preprocessing remove_lines, etc.). - Run a group check (
check-preprocessing,check-profile,check-pipeline, ...). - Run the full suite (
check) before finalizing changes.
| Command | Scope | Parameter(s) | What it validates |
|---|---|---|---|
uv run check |
Full project | none | Runs all project checks (converter, profile, pipeline, preprocessing, language, ruff). |
uv run check-markdown-converter |
Converter fixture suite | optional group, optional --update |
End-to-end HTML → Markdown fixture validation under tests/data/markdown_converter. |
uv run check-markdown-converter <group> |
Converter subgroup | group path (e.g. wikipedia, preprocessing) |
Runs only fixture sessions inside one fixture subtree. |
uv run check-markdown-converter <group> --update |
Converter subgroup + fixture update | group, --update |
Rewrites expected data.md outputs for that group to current converter output. |
uv run check-profile |
Profile merging tests | none | Validates profile defaults, overrides, and unknown-profile error handling. |
uv run check-pipeline |
Preprocessing pipeline orchestration | none | Verifies rule ordering and enabled/disabled pipeline behavior. |
uv run check-preprocessing |
All preprocessing rule groups | none | Runs grouped rule tests (ensure_h1, remove_links, normalize_tables, etc.). |
uv run check-preprocessing <rule> |
Single preprocessing rule group | rule name (e.g. remove_lines) |
Runs only one rule test module (tests/preprocessing/test_<rule>.py). |
uv run check-language |
HTML language detection | none | Validates metadata-based language extraction against tests/data/html/*. |
uv run check-ruff |
Linting | none | Runs static checks (ruff check). |
Run one preprocessing rule while iterating:
uv run check-preprocessing remove_lines
uv run check-preprocessing normalize_tablesRun all preprocessing checks before touching shared logic:
uv run check-preprocessingRun only Wikipedia converter fixtures:
uv run check-markdown-converter wikipediaUpdate expected markdown snapshots after intentional converter changes:
uv run check-markdown-converter wikipedia --updateFinal verification before commit:
uv run check- Prevents accidental regressions in Markdown output.
- Keeps profile defaults and overrides trustworthy.
- Ensures parser and preprocessing changes remain deterministic.
- Makes refactoring safer by validating behavior instead of assumptions.