fix(pdf-quality): preserve typography in Phase 3 + ratchet R1 log#242
Merged
Conversation
The first two cleanup pairs (AI Engineering ch5 + ch1 Cover) both showed Claude normalizing smart/curly quotes, typographic apostrophes and dashes to ASCII. The preservation gate compares word tokens, so it doesn't catch this — the prompt is the only enforcement. Added an explicit "preserve typography verbatim" rule. Documents ratchet round 1 in feat-0007: what got encoded (running-header regex), what was a prompt-only tweak (typography), and what stays with the LLM (2-column de-interleave, inline section heading extraction — both too hard for deterministic rules right now). Includes a "how to run the next round" checklist for whoever picks this up next. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Follow-up from the first two real Phase 3 cleanup pairs.
Prompt tightening — both pairs showed Claude normalizing smart/curly
quotes, typographic apostrophes, and em/en-dashes to ASCII. The
preservation gate doesn't catch this (it compares word-token multisets;
punctuation doesn't affect tokens). The prompt was the only enforcement
and didn't mention typography. Added an explicit "preserve typography
verbatim" rule.
Ratchet round 1 documented in
feat-0007-pdf-content-quality.md:PR feat(pdf-quality) [slice 5 r1]: drop O'Reilly running headers #241 — note its immediate effect: AI Engineering content chapters
jumped from ~65 to ~90 on re-upload).
section-heading extraction — both observed in ch1 Cover, both too hard
to do reliably with current geometry-less paragraphs).
Tests
bash -nclean. Docs + script only.🤖 Generated with Claude Code