fix: break content truncation at semantic boundaries by ntegrals · Pull Request #30 · ntegrals/openbrowser

ntegrals · 2026-04-02T12:46:29Z

Summary

When extractMarkdown() exceeds maxLength, the previous logic only broke at a paragraph boundary if it was in the last 20% of content. Otherwise it sliced mid-word/sentence, producing broken markdown.

Before

# Some Article

This is a long paragraph about...  ← sliced here mid-sentence

[... content truncated, ~500 chars remaining]

After

The truncation now tries boundaries in priority order:

Paragraph break (\n\n) — cleanest cut
Sentence ending (. , .\n, ? , ! ) — preserves complete thoughts
Word boundary (space) — avoids mid-word cuts
Hard limit — only if no boundary found in the first 50%

All boundaries must be at least 50% into the content to avoid over-truncation.

Code change

packages/core/src/page/content-extractor.ts lines 247-268

Test plan

bun run build — compiles clean
bun run test — all 364 tests pass

Improve markdown truncation to prefer paragraph breaks, then sentence endings, then word boundaries instead of slicing mid-text. Uses a 50% minimum keep ratio so short content isn't over-truncated.

fix: break content truncation at semantic boundaries

3f2e177

Improve markdown truncation to prefer paragraph breaks, then sentence endings, then word boundaries instead of slicing mid-text. Uses a 50% minimum keep ratio so short content isn't over-truncated.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: break content truncation at semantic boundaries#30

fix: break content truncation at semantic boundaries#30
ntegrals wants to merge 1 commit intomasterfrom
fix/content-truncation

ntegrals commented Apr 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ntegrals commented Apr 2, 2026

Summary

Before

After

Code change

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants