Skip to content

Fix markdown conversion for headers and footers#5

Open
Bruce-anle wants to merge 1 commit into
sudipnext:mainfrom
Bruce-anle:fix/header-footer-markdown
Open

Fix markdown conversion for headers and footers#5
Bruce-anle wants to merge 1 commit into
sudipnext:mainfrom
Bruce-anle:fix/header-footer-markdown

Conversation

@Bruce-anle
Copy link
Copy Markdown

Summary

  • allow markdown body parsing to traverse w:hdr and w:ftr roots
  • keep existing w:document / w:body parsing behavior unchanged
  • add focused tests for header, footer, and document body roots

Why

convert_to_markdown already reads word/header*.xml and word/footer*.xml, but passed w:hdr/w:ftr roots into parse_body_to_markdown. That parser only looked for w:body, so header and footer content was skipped.

Tests

  • /home/brucean/doc4agent/.venv/bin/python -m pytest tests -q -p no:cacheprovider

Background: convert_to_markdown reads word/header*.xml and word/footer*.xml, but passed w:hdr/w:ftr roots to parse_body_to_markdown. That parser only looked for w:body, so header/footer content was skipped.\n\nChanges: allow parse_body_to_markdown to traverse w:hdr and w:ftr roots directly while preserving normal w:document/w:body behavior.\n\nVerification: /home/brucean/doc4agent/.venv/bin/python -m pytest tests -q -p no:cacheprovider passed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant