feat(desktop): expand \yng/\young to KaTeX-compatible Young diagrams#4216
feat(desktop): expand \yng/\young to KaTeX-compatible Young diagrams#4216lightfront wants to merge 63 commits into
Conversation
Three targeted fixes to the math-pipeline pre-pass that resolve cases
where the rendered chat output showed LaTeX source as raw text:
1. mathNormalize.ts (Step 2.5): when the model writes block math with
the opening $$ glued to prose on the same line ('…decomposes
as$$\n\mathbf{6}…'), CommonMark requires a blank line before
the $$. remark-math otherwise creates an empty math node and the
formula leaks out as literal text. Insert \n\n before any $$
preceded by a letter or end-of-sentence punctuation. The
freshly-rewritten \] → $$ from step 2 is not affected.
2. mathClassify.ts: classify single digits ($1$, $2$) as math —
commonly used as set / sequence indices. Multi-digit numbers,
decimals, and percentages stay literal (still currency / percentage).
This is a deliberate behavior change documented in the comment.
3. mathClassify.ts: allow comma-separated tokens ('A, B', '1, 2, 3',
'\\alpha, \\beta', '(A, B)') as math. These are typical of
ordered-pair / tuple / enumeration notation. Currency and env-var
usage never looks like this.
4. mathClassify.ts: allow single uppercase letters as math. In
non-English math prose (Chinese / Japanese / Korean textbooks)
single capital letters are extremely common as set / algebra /
group / vector-space names, and the closing-dollar form $X$ is
essentially never written for English words like I/A/V by hand.
Test changes: 4 existing currency/acronym assertions updated to
reflect the new behavior, 13 new regression tests covering all four
fixes including the user's specific cases ('$1$ 和 $2$' and
'$S$ 非空 / $S$ 有上界'). 98 math-golden tests pass, 112/112 across
all suites, typecheck clean.
Orphan $$ (model wrote display math but forgot the closing $$) is
documented as not-fixed-from-the-renderer: every attempt to rescue
the orphan from the renderer side made the output worse, so the fix
for that case is on the LLM side (post-generation lint or stricter
system prompt).
The classifier rules are language-agnostic, not specific to CJK text. Updated test section name and descriptions to reflect that patterns like single digits, comma-separated tokens, and one-sided operators apply universally across languages. Chinese text in test cases remains as real user examples, but the rules themselves are not CJK-specific.
Add defensive escaping for code blocks containing $ characters. When protecting code (inline `...` or fenced ```...```), replace $ with &esengine#36; (HTML entity). On restoration, unescape back to $. This prevents KaTeX from attempting to parse math delimiters that appear in code examples, regex patterns, or template literals. Fixes: Pasted documentation about the math pipeline itself no longer shows red KaTeX error text. Tests: 3 new cases added, 106/106 passing
Remove the requirement that ``` must appear after a newline. This handles cases where documentation is pasted on a single line with embedded code blocks containing $ symbols. Previously: ``` markers were only recognized after \n Now: ``` markers are recognized anywhere This prevents KaTeX errors (red text) when processing malformed code blocks that contain $ in regex patterns, template literals, or other code examples. All 120 tests pass.
Enhancements to inline math detection: - Reject pure numbers (1, 2.5, 10) as currency/percentages - Accept numbers with variables (2.5x, 3y^2) as math - Accept numbers with LaTeX escapes (10\%) as math - Fix single-line code block detection to protect $ in malformed markdown This better matches real-world usage where 'costs $5' is currency but '$2.5x + 3$' is clearly a mathematical expression. All 122 tests pass (108 math-golden + 8 text-size + 6 provider-model-refresh).
Previously, the Step 5 regex would greedily match '$5 and $' as a single math expression with content '5 and ', then convert it to '&esengine#36;5 and &esengine#36;' because the classifier correctly identified it as non-math. This was visually correct but had two problems: 1. The greedy match would consume the closing dollar that belonged to the next currency token, causing cascade replacements. 2. Prose currency like 'These two apples cost $5 and $6' would have its dollar signs converted to HTML entities, which works but is unnecessary noise in the rendered output. Changes: - Step 5 regex now uses non-greedy matching (+\?) so '$5 and $' doesn't match '$5 and $' as a single pair - When the classifier rejects a match, the original text is preserved unchanged (return _m) instead of being wrapped in HTML entities - This keeps dollar signs visible in prose while still preventing them from being parsed as math All 122 tests pass.
KaTeX doesn't support the \\slashed command from the LaTeX 'slashed'
package, which is commonly used in physics for Feynman slash notation
(\\slashed{p}, \\slashed{\\partial}).
Convert \\slashed{X} to \\not{X} before passing to KaTeX. The \\not
command provides a similar visual effect (a slash through the character)
and is supported by KaTeX.
This prevents KaTeX parse errors when the model emits physics notation
using \\slashed.
Tests: 3 new cases added, 111/111 passing
…psilon(0)
The previous fix only handled \\\\slashed{X} (braced form). User-reported
issue: \\\\slashed\\\\epsilon(0) (Greek letter + function call, no braces)
was not being converted and still showed as red KaTeX error.
Extended the regex to handle two additional forms:
- \\\\slashed X → \\\\not X (single letter, no braces)
- \\\\slashed\\\\epsilon → \\\\not{\\\\epsilon} (backslash command)
- \\\\slashed\\\\epsilon(0) → \\\\not{\\\\epsilon(0)} (backslash command + function)
- \\\\slashed a → \\\\not a (single ASCII letter)
This covers the common physics notation \\\\slashed{p} where the model
sometimes forgets the braces around the argument.
Tests: 2 new cases added, 113/113 math-golden passing.
# Conflicts: # desktop/frontend/src/__tests__/math-golden.test.ts # desktop/frontend/src/components/latexNormalize.ts
Three targeted fixes to the math-pipeline pre-pass that resolve cases
where the rendered chat output showed LaTeX source as raw text:
1. mathNormalize.ts (Step 2.5): when the model writes block math with
the opening $$ glued to prose on the same line ('…decomposes
as$$\n\mathbf{6}…'), CommonMark requires a blank line before
the $$. remark-math otherwise creates an empty math node and the
formula leaks out as literal text. Insert \n\n before any $$
preceded by a letter or end-of-sentence punctuation. The
freshly-rewritten \] → $$ from step 2 is not affected.
2. mathClassify.ts: classify single digits ($1$, $2$) as math —
commonly used as set / sequence indices. Multi-digit numbers,
decimals, and percentages stay literal (still currency / percentage).
This is a deliberate behavior change documented in the comment.
3. mathClassify.ts: allow comma-separated tokens ('A, B', '1, 2, 3',
'\\alpha, \\beta', '(A, B)') as math. These are typical of
ordered-pair / tuple / enumeration notation. Currency and env-var
usage never looks like this.
4. mathClassify.ts: allow single uppercase letters as math. In
non-English math prose (Chinese / Japanese / Korean textbooks)
single capital letters are extremely common as set / algebra /
group / vector-space names, and the closing-dollar form $X$ is
essentially never written for English words like I/A/V by hand.
Test changes: 4 existing currency/acronym assertions updated to
reflect the new behavior, 13 new regression tests covering all four
fixes including the user's specific cases ('$1$ 和 $2$' and
'$S$ 非空 / $S$ 有上界'). 98 math-golden tests pass, 112/112 across
all suites, typecheck clean.
Orphan $$ (model wrote display math but forgot the closing $$) is
documented as not-fixed-from-the-renderer: every attempt to rescue
the orphan from the renderer side made the output worse, so the fix
for that case is on the LLM side (post-generation lint or stricter
system prompt).
The classifier rules are language-agnostic, not specific to CJK text. Updated test section name and descriptions to reflect that patterns like single digits, comma-separated tokens, and one-sided operators apply universally across languages. Chinese text in test cases remains as real user examples, but the rules themselves are not CJK-specific.
Add defensive escaping for code blocks containing $ characters. When protecting code (inline `...` or fenced ```...```), replace $ with &esengine#36; (HTML entity). On restoration, unescape back to $. This prevents KaTeX from attempting to parse math delimiters that appear in code examples, regex patterns, or template literals. Fixes: Pasted documentation about the math pipeline itself no longer shows red KaTeX error text. Tests: 3 new cases added, 106/106 passing
Remove the requirement that ``` must appear after a newline. This handles cases where documentation is pasted on a single line with embedded code blocks containing $ symbols. Previously: ``` markers were only recognized after \n Now: ``` markers are recognized anywhere This prevents KaTeX errors (red text) when processing malformed code blocks that contain $ in regex patterns, template literals, or other code examples. All 120 tests pass.
Enhancements to inline math detection: - Reject pure numbers (1, 2.5, 10) as currency/percentages - Accept numbers with variables (2.5x, 3y^2) as math - Accept numbers with LaTeX escapes (10\%) as math - Fix single-line code block detection to protect $ in malformed markdown This better matches real-world usage where 'costs $5' is currency but '$2.5x + 3$' is clearly a mathematical expression. All 122 tests pass (108 math-golden + 8 text-size + 6 provider-model-refresh).
Previously, the Step 5 regex would greedily match '$5 and $' as a single math expression with content '5 and ', then convert it to '&esengine#36;5 and &esengine#36;' because the classifier correctly identified it as non-math. This was visually correct but had two problems: 1. The greedy match would consume the closing dollar that belonged to the next currency token, causing cascade replacements. 2. Prose currency like 'These two apples cost $5 and $6' would have its dollar signs converted to HTML entities, which works but is unnecessary noise in the rendered output. Changes: - Step 5 regex now uses non-greedy matching (+\?) so '$5 and $' doesn't match '$5 and $' as a single pair - When the classifier rejects a match, the original text is preserved unchanged (return _m) instead of being wrapped in HTML entities - This keeps dollar signs visible in prose while still preventing them from being parsed as math All 122 tests pass.
Rebase onto current main-v2 plus five targeted cleanups called out in the review: 1. Drop the &esengine#36; escape/unescape dance. Protected segments are stored out-of-band and swapped back wholesale, so the round-trip is a no-op — except for code that legitimately contains the literal text &esengine#36; (which got silently rewritten to $ on restore, corrupting the source). The header comment is also stale: the description claims restore does not unescape, but the code did. 2. Revert Step 5's greedy→non-greedy change. The char class [^$\n]+ already excludes $, so changing + to +? has no effect on match extent; the comment claiming it prevents cross-pair matching is wrong. Drop the change and the misleading comment. The "leave non-math pairs unchanged" behaviour is kept. 3. Restrict fenced-code detection to line-start. Allowing ``` anywhere in the line would swallow prose like "wrap code in ```blocks``` here" into a code region and break the math for the rest of the message — the CommonMark spec requires fences at line start. Single-line docs are still handled (the next matching fence is the closer). 4. Escape top-level % in math. KaTeX treats unescaped % as a LaTeX comment char and silently truncates the formula at end-of-line — "$x = 50%$" rendered as "x = 50" with no error. Add a top-level case in latexNormalizeForKatex that emits \% (already-escaped \% is handled above as a 2-char command, so no double-escape). 5. Trim oversized comments. Drop the // was: ... history notes in tests and the 8-12 line essays in mathNormalize / mathClassify that describe code that no longer exists or that the reader can see from the regex. The header still lists the pipeline as a map. 128 tests pass; typecheck clean.
Rebase onto current main-v2 plus five targeted cleanups called out in the review: 1. Drop the &esengine#36; escape/unescape dance. Protected segments are stored out-of-band and swapped back wholesale, so the round-trip is a no-op — except for code that legitimately contains the literal text &esengine#36; (which got silently rewritten to $ on restore, corrupting the source). The header comment is also stale: the description claims restore does not unescape, but the code did. 2. Revert Step 5's greedy→non-greedy change. The char class [^$\n]+ already excludes $, so changing + to +? has no effect on match extent; the comment claiming it prevents cross-pair matching is wrong. Drop the change and the misleading comment. The "leave non-math pairs unchanged" behaviour is kept. 3. Restrict fenced-code detection to line-start. Allowing ``` anywhere in the line would swallow prose like "wrap code in ```blocks``` here" into a code region and break the math for the rest of the message — the CommonMark spec requires fences at line start. Single-line docs are still handled (the next matching fence is the closer). 4. Escape top-level % in math. KaTeX treats unescaped % as a LaTeX comment char and silently truncates the formula at end-of-line — "$x = 50%$" rendered as "x = 50" with no error. Add a top-level case in latexNormalizeForKatex that emits \% (already-escaped \% is handled above as a 2-char command, so no double-escape). 5. Trim oversized comments. Drop the // was: ... history notes in tests and the 8-12 line essays in mathNormalize / mathClassify that describe code that no longer exists or that the reader can see from the regex. The header still lists the pipeline as a map. 128 tests pass; typecheck clean.
Expressions starting with a unary + or - (e.g. +2, -x, +\alpha) were rejected by isLikelyInlineMath because none of the existing patterns matched them — the operator-pattern on line 10 requires a character before the operator, and the pure-number pattern requires the first character to be a digit. This caused \( +2 \) to be treated as non-math text, rendering as literal '$+2$' instead of rendering the KaTeX unary plus. Add a dedicated pattern: /^[+\-]\s*(?:\d+(?:\.\d+)?|[A-Za-z\])/ that matches unary operator + digit/variable/backslash-command.
Companion fix to PR esengine#3666 (fix/inline-math-rendering). The character class in mathNormalize.ts step 3 missed the comma case: a model that emits a display block whose closing $$ is on the same line as the trailing comma of the equation content (…D(q^2),$$) leaves the closing fence glued to the content line, which micromark-extension-math does not recognise as a closing fence (it only checks for $$ at the start of a new line). The rest of the document is then consumed as math and katex fails on the stray $ in the next paragraph. Add ',' to the character class so the closing $$ is forced onto its own line, matching the existing 'inline $$ after closing bracket' behaviour. A regression test pins the comma case. This commit was previously 45482b2 on fix/inline-math-rendering (the PR's tip), but cherry-picking the older PR commits onto dev-new-features (which had this fix from an earlier cherry-pick overwritten by the older base state) reverted the regex change. Re-applying it here.
Cherry-picks the maintainer's review patch from PR esengine#3666 onto dev-new-features so the local Reasonix app renders currency and env-var tokens as literal dollars instead of triggering katex errors. Per maintainer feedback: a literal $…$ pair in normalizeMath output is not enough to keep a non-math token out of KaTeX, because remark-math parses any $…$ it sees in the source. The classifier's reject verdict only holds when the $ is hidden as a &esengine#36; entity. Step 6: non-math pairs (currency $5, env vars $PATH) now wrap in &esengine#36; entities. Decoded entity still renders as a literal dollar. Test updates: - Two eq assertions flipped to expect the entity form. - Drop the stale '$5$ is filtered to entities' comment. - New render-boundary section runs the real react-markdown + remark-math + rehype-katex path; the previous golden cases never crossed the prose→parser boundary, so the regression was undetectable at the normalizeMath layer. 132 passed, 0 failed (was 129).
Physics / chemistry / representation-theory answers routinely use
the ytableau (\yng) and youngtab (\young) packages to draw Young
diagrams and Young tableaux:
\yng(2,1) → empty (2,1) diagram
\yng(2,1){a&b\\c\\d&e} → filled (2,1) tableau
\young(2 1) → youngtab syntax (whitespace separator)
Neither macro is bundled with KaTeX, so without this pass the macros
fail with "Undefined control sequence: \yng" and the chat surfaces
the raw LaTeX source as a red error block — same shape as the
comma-fix and step-6 bugs.
This change adds a small pre-pass (src/components/youngDiagrams.ts)
that translates \yng/\young into KaTeX-compatible \boxed{array}
forms. Empty cells become \hphantom{x} so the row width is uniform
and the diagram doesn't look ragged. Content cells are split on \\ and
& at brace-depth 0 (so \frac{a}{b}-style entries survive). The
pass is invoked inside the math body, alongside latexNormalizeForKatex,
in mathNormalize.ts steps 4/5/6 so display, text-mode, and inline math
are all covered.
The repair-regex character class in mathNormalize.ts step 3 also
gains '{' and '}'. A model that writes \end{array}919 or \frac{a}{b}919
on one line has the same micromark-fence problem as the comma case;
including the closing brace lets the closing 919 find a clean
start-of-line and the rest of the document doesn't get consumed as
math.
Tests (141 passed, 0 failed in math-golden.test.ts; +9 new):
• \yng(2,1), \yng(3,2,1), \yng(4,3,2,1) render as the
corresponding Young diagrams via react-markdown + KaTeX
• \yng(2,1){a&b\\c\\d&e} renders as filled tableau
• \young(2 1) (youngtab syntax) renders
• Unit tests on the translator (with / without content;
pass-through for non-Young macros)
• New step-3 regression test: \end{...}919 repaired to
\end{...}\n\n$$
User-reported after PR esengine#4216 landed: \yng(2,1) inside prose without surrounding $$…$$ still showed up as raw text in the chat. The previous design only fired expandYoungDiagrams inside math blocks, so a bare \yng in prose was never translated — and remark-math left it as literal text because there's no $ delimiter for it to match against. This change moves expandYoungDiagrams to a stateful pre-pass that runs once at the top of normalizeMathText, before any delimiter handling. The translator tracks the math-depth counter (0 = prose, 1 = inline math $…$, 2 = display math $$…$$) so it can decide per-macro whether to wrap the result in $…$ (when outside math) or just substitute the inner form (when already inside math, where the surrounding delimiters are preserved). The implementation now uses src.startsWith + a small parser for the parenthesised shape and optional {…} content, instead of a single regex. The previous non-greedy regex couldn't handle shape with spaces inside (\yng( 2, 1 )) or a content block containing nested braces, both of which are valid ytableau syntax. Tests (142 passed, 0 failed in math-golden.test.ts; +10 new): • \yng(2,1) in prose (no $ delimiters) gets wrapped in $…$ and rendered as an inline Young diagram — the user-reported case • \yng(3,2,1), \yng(4,3,2,1), \young(2 1) (various shapes) • \yng(2,1){a&b\\c\\d&e} renders as a filled Young tableau • Unit tests on the translator (with / without content; pass-through for non-Young macros; bare-in-prose wraps in $…$) • Step-3 brace repair regression test from the previous commit is preserved
User-reported after PR esengine#4216 landed: \yng(2,1) inside prose without surrounding $$…$$ still showed up as raw text in the chat. The previous design only fired expandYoungDiagrams inside math blocks, so a bare \yng in prose was never translated — and remark-math left it as literal text because there's no $ delimiter for it to match against. This change moves expandYoungDiagrams to a stateful pre-pass that runs once at the top of normalizeMathText, before any delimiter handling. The translator tracks the math-depth counter (0 = prose, 1 = inline math $…$, 2 = display math $$…$$) so it can decide per-macro whether to wrap the result in $…$ (when outside math) or just substitute the inner form (when already inside math, where the surrounding delimiters are preserved). The implementation now uses src.startsWith + a small parser for the parenthesised shape and optional {…} content, instead of a single regex. The previous non-greedy regex couldn't handle shape with spaces inside (\yng( 2, 1 )) or a content block containing nested braces, both of which are valid ytableau syntax. Tests (142 passed, 0 failed in math-golden.test.ts; +10 new): • \yng(2,1) in prose (no $ delimiters) gets wrapped in $…$ and rendered as an inline Young diagram — the user-reported case • \yng(3,2,1), \yng(4,3,2,1), \young(2 1) (various shapes) • \yng(2,1){a&b\\c\\d&e} renders as a filled Young tableau • Unit tests on the translator (with / without content; pass-through for non-Young macros; bare-in-prose wraps in $…$) • Step-3 brace repair regression test from the previous commit is preserved
|
Heads up — the original PR had a gap that just bit me in local testing: bare The fix moves
The implementation now uses a small stateful parser (rather than a single regex), which also fixes two adjacent cases the previous regex couldn't handle: shapes with internal spaces ( Tests: Local Reasonix build was also rebuilt and installed (the binary at |
…ells
The earlier translator used \\hphantom{x} as the empty-cell
placeholder so that empty cells would have the same width as
filled ones. \\hphantom renders as 'invisible x' — the cell takes
up space but produces no visible glyph. The result is correct
typesetting but useless for the user: the chat shows what looks
like an empty paragraph where the diagram should be.
Switching to \\square (Unicode U+25A1, 'WHITE SQUARE', katex class
mord amsrm) gives an actually-visible glyph per cell, which is
what a Young diagram is supposed to look like. \\square is in
amssymb, which KaTeX bundles by default. Width behaviour is the
same as \\hphantom{x} (the glyph is the same width as a lowercase
x in KaTeX's AMS font).
Verified with the user's exact reproduction: the
"12 invisible boxes" string now produces 24 visible \\square
glyphs (12 cells, each rendered as both MathML and HTML), no
katex-error, no transparent text — the diagram is now visible.
Tests: 142 passed, 0 failed in math-golden.test.ts. The unit
test that checks the substituted array form now expects
\\square instead of \\hphantom{x}.
|
Heads up — yet another follow-up. The Verified end-to-end with the user's reproduction string: the 12 invisible boxes now produce 24 visible This is the third commit on the Young-diagram branch:
Tests: 142 passed, 0 failed. Built and installed locally; |
The previous translator used \\begin{array}{c} (centered), which
causes the shorter rows of a Young diagram to be centred relative
to the longest row — so a (3,2,1) diagram looks like:
[ ][ ][ ]
[ ][ ]
[ ]
instead of:
[ ][ ][ ]
[ ][ ]
[ ]
Switch to {l} so every row's first cell is at the same horizontal
position. This is the standard Young-diagram layout: short rows just
have fewer cells, but they all start at the left edge.
Tests: 143 passed, 0 failed in math-golden.test.ts (+1 new alignment
regression test). The translator unit tests now expect {l} instead
of {c}.
…lush boxes The previous translator used \\,\\, between cells, which leaves a visible 0.1667em gap between every pair of \\square boxes. A Young diagram should look like a row of *flush* boxes (the youngtableau package renders them as a connected shape, not a row of spaced-out squares). Switching to \\! (negative thin space, -0.1667em) exactly cancels the thin-space margin, so adjacent cells touch with zero visible gap. Same width as before (the glyph is the same), just no inter-cell gap. The translator unit test now expects \\! and the new regression test pins the spacing behaviour so it can't regress. Tests: 144 passed, 0 failed in math-golden.test.ts (+1 new).
Without this, a multi-row Young diagram has a visible gap between
every pair of adjacent rows: \square sits centred on the math axis
(roughly 0.4em above the baseline), so the bottom of one row's
square is ~0.15em below the baseline, and the top of the next row's
square is ~0.65em below the next baseline — leaving 0.35em of
white space between rows. Young diagrams are conventionally drawn as
a single connected shape, not as disconnected boxes.
Wrapping each cell in \\raisebox{-0.35em}{...} shifts the square
down by half the math-axis offset so consecutive rows touch. The
raise argument uses \\square (or the cell content) directly
without a \\$…\\$ wrapper, because the whole macro is already
inside math mode (whether the model wrote \\$…\\$ or the prose
wrapper added it); nesting \\$ inside \\$…\\$ would break the
katex parser.
Tests: 145 passed, 0 failed in math-golden.test.ts (+1 new
vertical-flush regression test). The translator unit tests now
expect \\raisebox{-0.35em}{...} instead of plain \\square.
The previous translator wrapped each cell in \\raisebox{-0.35em}
to shift glyphs down by the math-axis offset. That doesn't close
the gap because uniform translation is invariant — the relative
distance between row baselines stays at 1.2em regardless of how
much you shift the content within each row.
The right fix is per-row spacing, not per-cell shift. KaTeX display
math uses 1.2em baseline-to-baseline spacing, but a \\square glyph
is only ~0.85em tall and sits centred on the math axis (so its
bottom is ~0.15em below the baseline). Default row spacing leaves
0.35em of white space between rows.
\\[-0.4em] between rows pulls each subsequent row up by ~the
math-axis offset, so consecutive rows touch. The diagram becomes
a single connected shape — the standard Young-diagram layout.
Earlier commits added the raisebox and removed it as we narrowed
in on the actual root cause; this commit replaces both with the
correct \\[-0.4em] approach and reverts cells to plain \\square.
Tests: 145 passed, 0 failed in math-golden.test.ts.
The previous -0.4em left a small but visible gap between rows. The exact value for zero gap is derived from katex's glyph metrics: the \\square strut height is 0.675em (the visible glyph height), and katex's default display-math baseline spacing is 1.2em. The gap is 1.2 - 0.675 = 0.525em, so \\[-0.525em] subtracts exactly that amount, making adjacent rows touch with zero visible space. Measured empirically: at -0.525em the baseline gap is exactly 0.675em (= glyph height), so bottom of row 1 aligns with top of row 2. Tests: 145 passed, 0 failed.
|
PR reopened with 7 follow-up commits on top of the original Summary of what changed since the initial merge of
Also added 145 tests pass, full suite green. |
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c6f2d0350d
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
approve |
feat(desktop): expand
\yng/\youngto KaTeX-compatible Young diagramsProblem
Physics, chemistry, and representation-theory answers routinely use the
ytableau(\yng) andyoungtab(\young) packages to draw Young diagrams and Young tableaux:Neither macro is bundled with KaTeX. Without this pass, the macros fail with
Undefined control sequence: \yngand the chat surfaces the raw LaTeX source as a red error block.Solution
A stateful pre-pass in
src/components/youngDiagrams.tstranslates\yng/\younginto KaTeX-compatible\begin{array}{l}…\end{array}forms with\squarecells. The translator is called once at the top ofnormalizeMathText(Step 0), before any delimiter handling.Key design decisions
Stateful delimiter tracking. The translator tracks math depth (0 = prose, 1 = inline
$…$, 2 = display$$…$$) so it can decide per-macro whether to:$…$(when the model wrote bare\yng(2,1)in prose without any$— the common case).\yngis already inside$…$or$$…$$).\squarefor empty cells (Unicode U+25A1, katex classmord amsrm). Earlier versions used\hphantom{x}which renders as invisible — correct typesetting but the user sees nothing.\squaregives an actually visible glyph per cell.Left-aligned
{l}column instead of centered{c}. A Young diagram has every row's first cell at the same x-position; shorter rows just have fewer cells.{c}would centre each row independently relative to the widest row — not a Young diagram.Flush horizontal spacing: cells joined with
\!(negative thin space, −0.1667em) which exactly cancels the default inter-glyph gap. Adjacent cells touch with zero visible gap.Flush vertical spacing: rows separated by
\\[-0.525em]. The\squareglyph is 0.675em tall (from katex's strut metric) and the default display-math baseline spacing is 1.2em, leaving 0.525em of visible white space between rows.\\[-0.525em]subtracts exactly that amount, so adjacent rows touch.Stateful parser (not regex) for the macro call. Handles shapes with internal spaces (
\yng( 2, 1 )) and content with nested braces (\yng(2,1){\frac{1}{2}&b\\c}).Files
desktop/frontend/src/components/youngDiagrams.ts$depth, wraps bare macros in$…$.desktop/frontend/src/components/mathNormalize.tsexpandYoungDiagrams. Step 3 character class gains{}for\end{…}$$repair.desktop/frontend/src/__tests__/math-golden.test.tsTest coverage
145 passed, 0 failedinmath-golden.test.ts. Full frontend suite green. Typecheck andpnpm buildclean.Tests cover:
\yng(2,1),\yng(3,2,1),\yng(4,3,2,1)render as Young diagrams via the realreact-markdown+remark-math+rehype-katexpath.\yng(2,1){a&b\\c\\d&e}renders as a filled Young tableau.\young(2 1)(youngtab syntax) renders.\yng(2,1)in prose (no$delimiters) gets wrapped in$…$and rendered.{l}column (not centered{c}).\!not\,).\\[-0.525em]not default\\).Commits (cumulative)
0fbb9086\yng/\young→\begin{array}{c}\hphantom{x}…64db8bd4\yngwithout$gets wrapped in$…$ab9e673a\squareinstead of\hphantom{x}for visible cells3b73d54a{l}instead of centered{c}d42b1a5a\!instead of\,d6217f1c\raiseboxapproach (superseded by\\[-0.525em])cb7a0cfa\\[-0.4em](first version, still had small gap)92fb0fd1\\[-0.525em]for zero visual gap (exact derivation from glyph metrics)Trade-offs
\yng(model forgot closing) is not handled — same as PR fix(desktop): repair inline math rendering for LLM output #3666's orphan$$policy.\ruleboxes would give even tighter visual control (no font-metric padding), but\squareis more readable in the LaTeX source and is the standardamssymbglyph.-0.525emvalue is calibrated for katex's AMS font\squareglyph height (0.675em). If the font changes, the value would need re-tuning.