When a page uses MathJax with raw $/$$ delimiters and is extracted without JS execution, the LaTeX is left as plain text in the HTML. Turndown then escapes markdown-special characters inside the math, breaking it.
Example
https://graphdeeplearning.github.io/post/transformers-are-gnns/
The page loads MathJax 3 via <script src="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-chtml.js"> and has LaTeX directly in the HTML text:
<p>$$
h_{i}^{\ell+1} = \text{Attention} \left( Q^{\ell} h_{i}^{\ell} \, K^{\ell} h_{j}^{\ell} \, V^{\ell} h_{j}^{\ell} \right),
$$</p>
Expected:
$$ h_{i}^{\ell+1} = \text{Attention} \left( Q^{\ell} h_{i}^{\ell} \, K^{\ell} h_{j}^{\ell} \, V^{\ell} h_{j}^{\ell} \right), $$
Actual:
$$ h\_{i}^{\\ell+1} = \\text{Attention} \\left( Q^{\\ell} h\_{i}^{\\ell} \\, K^{\\ell} h\_{j}^{\\ell} \\, V^{\\ell} h\_{j}^{\\ell} \\right), $$
Root cause
Defuddle's math pipeline handles MathJax's rendered output (mjx-container, .MathJax, script[type="math/tex"]) but not its unrendered input. When MathJax doesn't execute (static/node extraction), the LaTeX stays as raw text in <p> elements — no structured math elements exist for the math rules to match. Turndown then escapes \ → \\ and _ → \_ in the text, breaking the LaTeX.
This affects any site using MathJax's tex input with $/$$ delimiters when extracted without JS execution (common on Hugo/Jekyll academic blogs, lecture notes, etc.).
Fix direction
Detect raw LaTeX delimiters in text nodes and wrap matched $...$ / $$...$$ spans in <math data-latex="..."> elements before Turndown conversion. This feeds into the existing math → data-latex → markdown pipeline that KaTeX and MathML already use.
When a page uses MathJax with raw
$/$$delimiters and is extracted without JS execution, the LaTeX is left as plain text in the HTML. Turndown then escapes markdown-special characters inside the math, breaking it.Example
https://graphdeeplearning.github.io/post/transformers-are-gnns/
The page loads MathJax 3 via
<script src="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-chtml.js">and has LaTeX directly in the HTML text:Expected:
Actual:
Root cause
Defuddle's math pipeline handles MathJax's rendered output (
mjx-container,.MathJax,script[type="math/tex"]) but not its unrendered input. When MathJax doesn't execute (static/node extraction), the LaTeX stays as raw text in<p>elements — no structured math elements exist for the math rules to match. Turndown then escapes\→\\and_→\_in the text, breaking the LaTeX.This affects any site using MathJax's tex input with
$/$$delimiters when extracted without JS execution (common on Hugo/Jekyll academic blogs, lecture notes, etc.).Fix direction
Detect raw LaTeX delimiters in text nodes and wrap matched
$...$/$$...$$spans in<math data-latex="...">elements before Turndown conversion. This feeds into the existing math →data-latex→ markdown pipeline that KaTeX and MathML already use.