Skip to content

Fix disabled pageBreakBefore values in markdown output#1

Open
Bruce-anle wants to merge 1 commit into
sudipnext:mainfrom
Bruce-anle:fix/pagebreak-before-onoff
Open

Fix disabled pageBreakBefore values in markdown output#1
Bruce-anle wants to merge 1 commit into
sudipnext:mainfrom
Bruce-anle:fix/pagebreak-before-onoff

Conversation

@Bruce-anle
Copy link
Copy Markdown

Summary

  • respect OpenXML CT_OnOff semantics for w:pageBreakBefore
  • treat missing w:val as enabled, but explicit 0/false/off as disabled
  • add focused markdown converter tests for enabled and disabled pageBreakBefore values

Why

DOCX files can contain <w:pageBreakBefore w:val="0"/> to explicitly disable page-break-before. The previous converter treated the element's presence as enabled and emitted false markers.

Tests

  • python -m pytest tests -q -p no:cacheprovider

Background: OpenXML CT_OnOff treats missing w:val as enabled, but explicit 0/false/off as disabled. The markdown converter previously treated any w:pageBreakBefore element as enabled, producing false Page Break markers.\n\nChanges: add CT_OnOff helper and tests for missing, true, and false pageBreakBefore values.\n\nVerification: /home/brucean/doc4agent/.venv/bin/python -m pytest tests -q -p no:cacheprovider passed; fixed smoke converted fumin/wenling false pageBreakBefore samples to 0 page break markers while retaining no-val page break markers in official samples.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant