Skip to content

fix(pdf-quality): preserve typography in Phase 3 + ratchet R1 log#242

Merged
mrviduus merged 1 commit into
mainfrom
fix/pdf-cleanup-prompt-typography
May 23, 2026
Merged

fix(pdf-quality): preserve typography in Phase 3 + ratchet R1 log#242
mrviduus merged 1 commit into
mainfrom
fix/pdf-cleanup-prompt-typography

Conversation

@mrviduus
Copy link
Copy Markdown
Owner

Summary

Follow-up from the first two real Phase 3 cleanup pairs.

Prompt tightening — both pairs showed Claude normalizing smart/curly
quotes, typographic apostrophes, and em/en-dashes to ASCII. The
preservation gate doesn't catch this (it compares word-token multisets;
punctuation doesn't affect tokens). The prompt was the only enforcement
and didn't mention typography. Added an explicit "preserve typography
verbatim" rule.

Ratchet round 1 documented in feat-0007-pdf-content-quality.md:

  • What got encoded as deterministic code (the running-header regex from
    PR feat(pdf-quality) [slice 5 r1]: drop O'Reilly running headers #241 — note its immediate effect: AI Engineering content chapters
    jumped from ~65 to ~90 on re-upload).
  • What was a prompt-only adjustment (this typography fix).
  • What stays with the LLM and why (2-column de-interleaving, inline
    section-heading extraction — both observed in ch1 Cover, both too hard
    to do reliably with current geometry-less paragraphs).
  • A short "how to run the next round" checklist.

Tests

bash -n clean. Docs + script only.

🤖 Generated with Claude Code

The first two cleanup pairs (AI Engineering ch5 + ch1 Cover) both showed
Claude normalizing smart/curly quotes, typographic apostrophes and dashes
to ASCII. The preservation gate compares word tokens, so it doesn't catch
this — the prompt is the only enforcement. Added an explicit "preserve
typography verbatim" rule.

Documents ratchet round 1 in feat-0007: what got encoded (running-header
regex), what was a prompt-only tweak (typography), and what stays with
the LLM (2-column de-interleave, inline section heading extraction —
both too hard for deterministic rules right now). Includes a "how to run
the next round" checklist for whoever picks this up next.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@mrviduus mrviduus merged commit ee79d97 into main May 23, 2026
8 of 9 checks passed
@mrviduus mrviduus deleted the fix/pdf-cleanup-prompt-typography branch May 23, 2026 15:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant