Skip to content

chore: improve CJK emphasis handling in markdown processing#331

Open
3w36zj6 wants to merge 2 commits intomainfrom
feature/improve-cjk-emphasis-handling-in-markdown-processing
Open

chore: improve CJK emphasis handling in markdown processing#331
3w36zj6 wants to merge 2 commits intomainfrom
feature/improve-cjk-emphasis-handling-in-markdown-processing

Conversation

@3w36zj6
Copy link
Member

@3w36zj6 3w36zj6 commented Oct 16, 2025

In typst-docs, we use pulldown-cmark to parse Markdown. This library faithfully implements the CommonMark specification along with several dialects.

However, since the CommonMark spec is designed for languages that use word separation, its rules for emphasis are quite odd when applied to CJK markup.

Please add the following markup to an appropriate page:

**この強調は認識されません。**この文のせいで。

これは*強調*です。

- 非分かち書きでは**「鍵括弧」**を強調できない
- 非分かち書きでは**(丸括弧)**を強調できない
- 非分かち書きでは**“ダブルクオーテーション”**を強調できない
- 非分かち書きでは**句読点まで、**強調できない
- 非分かち書きでは**とても難しい?**強調
diff --git a/docs/overview.md b/docs/overview.md
index 9afe778c9..87e7d6c97 100644
--- a/docs/overview.md
+++ b/docs/overview.md
@@ -5,6 +5,16 @@ description: |
 
 # Typstについて
 
+**この強調は認識されません。**この文のせいで。
+
+これは*強調*です。
+
+- 非分かち書きでは**「鍵括弧」**を強調できない
+- 非分かち書きでは**(丸括弧)**を強調できない
+- 非分かち書きでは**“ダブルクオーテーション”**を強調できない
+- 非分かち書きでは**句読点まで、**強調できない
+- 非分かち書きでは**とても難しい?**強調
+
 <div class="info-box">
 
 **はじめに: Typst Japanese Communityより**
<div>
<h1>Typstについて</h1>
<p>**この強調は認識されません。**この文のせいで。</p>
<p>これは<em>強調</em>です。</p>
<ul>
<li>非分かち書きでは**「鍵括弧」**を強調できない</li>
<li>非分かち書きでは**(丸括弧)**を強調できない</li>
<li>非分かち書きでは**“ダブルクオーテーション”**を強調できない</li>
<li>非分かち書きでは**句読点まで、**強調できない</li>
<li>非分かち書きでは**とても難しい?**強調</li>
</ul>
</div>

In such cases, to apply emphasis, you need workarounds like inserting spaces or replacing with HTML tags.

To address this, we replaced pulldown-cmark with a custom version which supports markdown-cjk-friendly, so emphasis can be applied naturally without such workarounds.

References

@3w36zj6 3w36zj6 requested a review from Copilot October 16, 2025 15:12
@3w36zj6 3w36zj6 marked this pull request as ready for review October 16, 2025 15:12
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR improves CJK (Chinese, Japanese, Korean) emphasis handling in markdown processing by inserting zero-width space HTML entities around emphasis markers when they are adjacent to CJK characters without spaces.

  • Adds preprocessing to detect CJK characters and insert HTML entities around emphasis markers
  • Implements post-processing to remove these entities from final HTML output
  • Includes comprehensive handling of code blocks and inline code spans to avoid affecting literal content

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@3w36zj6
Copy link
Member Author

3w36zj6 commented Oct 16, 2025

Hi @YDX-2147483647

What do you think about the idea in this PR? Could you please review it?

@YDX-2147483647
Copy link
Contributor

Hmm, I don't think I'm capable of reviewing this PR. I haven't written any parser before, so I can't offer advices better than LLM.

I have indeed been tortured by this markdown parsing problem, but my usual solution is changing the parser (or installing an extension to the parser) rather than patching it on my own…

@3w36zj6
Copy link
Member Author

3w36zj6 commented Oct 16, 2025

Thank you for your comment.

I also believe that improving the parser logic is the smart way to address this. Since this is mainly a workaround, I don't feel confident enough to propose it upstream.

At the very least, if we can confirm that it doesn't break the documentation, it would be best to first operate this experimentally as a workaround within the Japanese community.

Copy link
Member

@kimushun1101 kimushun1101 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understood the situation and was able to verify the behavior.
However, I’m not as good as Copilot at reviewing my own source code.

I measured the time using the following command:
time mise run generate
The build time hasn’t increased noticeably compared to before.

Copy link
Contributor

@gomazarashi gomazarashi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m not very familiar with the code itself, but I’ve confirmed that the documentation was generated correctly.
Thank you for addressing this issue.

Copy link
Contributor

@ultimatile ultimatile left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks nice, thank you!

@3w36zj6 3w36zj6 marked this pull request as draft January 25, 2026 07:55
@3w36zj6 3w36zj6 force-pushed the feature/improve-cjk-emphasis-handling-in-markdown-processing branch from 731a679 to 9e22d59 Compare February 15, 2026 10:55
@3w36zj6
Copy link
Member Author

3w36zj6 commented Feb 15, 2026

pulldown-cmarkをmarkdown-cjk-friendlyのサポートが追加されたカスタムバージョンに変更し、既存のドキュメントのCJK強調の回避策を削除しました。お手数ですが再度レビューをお願いします。

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 30 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@3w36zj6 3w36zj6 changed the title feat: improve CJK emphasis handling in markdown processing chore: improve CJK emphasis handling in markdown processing Feb 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants