Skip to content

feat(desktop): expand \yng/\young to KaTeX-compatible Young diagrams#4216

Open
lightfront wants to merge 63 commits into
esengine:main-v2from
lightfront:fix/young-diagrams
Open

feat(desktop): expand \yng/\young to KaTeX-compatible Young diagrams#4216
lightfront wants to merge 63 commits into
esengine:main-v2from
lightfront:fix/young-diagrams

Conversation

@lightfront

@lightfront lightfront commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

feat(desktop): expand \yng/\young to KaTeX-compatible Young diagrams

Problem

Physics, chemistry, and representation-theory answers routinely use the ytableau (\yng) and youngtab (\young) packages to draw Young diagrams and Young tableaux:

\yng(2,1)              % empty (2,1) diagram
\yng(2,1){a&b\\c\\d&e} % filled (2,1) tableau
\young(2 1)            % youngtab syntax (whitespace separator)

Neither macro is bundled with KaTeX. Without this pass, the macros fail with Undefined control sequence: \yng and the chat surfaces the raw LaTeX source as a red error block.

Solution

A stateful pre-pass in src/components/youngDiagrams.ts translates \yng/\young into KaTeX-compatible \begin{array}{l}…\end{array} forms with \square cells. The translator is called once at the top of normalizeMathText (Step 0), before any delimiter handling.

Key design decisions

  1. Stateful delimiter tracking. The translator tracks math depth (0 = prose, 1 = inline $…$, 2 = display $$…$$) so it can decide per-macro whether to:

    • Wrap the expanded form in $…$ (when the model wrote bare \yng(2,1) in prose without any $ — the common case).
    • Just substitute the inner form (when \yng is already inside $…$ or $$…$$).
  2. \square for empty cells (Unicode U+25A1, katex class mord amsrm). Earlier versions used \hphantom{x} which renders as invisible — correct typesetting but the user sees nothing. \square gives an actually visible glyph per cell.

  3. Left-aligned {l} column instead of centered {c}. A Young diagram has every row's first cell at the same x-position; shorter rows just have fewer cells. {c} would centre each row independently relative to the widest row — not a Young diagram.

  4. Flush horizontal spacing: cells joined with \! (negative thin space, −0.1667em) which exactly cancels the default inter-glyph gap. Adjacent cells touch with zero visible gap.

  5. Flush vertical spacing: rows separated by \\[-0.525em]. The \square glyph is 0.675em tall (from katex's strut metric) and the default display-math baseline spacing is 1.2em, leaving 0.525em of visible white space between rows. \\[-0.525em] subtracts exactly that amount, so adjacent rows touch.

  6. Stateful parser (not regex) for the macro call. Handles shapes with internal spaces (\yng( 2, 1 )) and content with nested braces (\yng(2,1){\frac{1}{2}&b\\c}).

Files

File Role
desktop/frontend/src/components/youngDiagrams.ts The translator. Stateful, tracks $ depth, wraps bare macros in $…$.
desktop/frontend/src/components/mathNormalize.ts Step 0 calls expandYoungDiagrams. Step 3 character class gains { } for \end{…}$$ repair.
desktop/frontend/src/__tests__/math-golden.test.ts 11 regression tests for the translator + render-boundary tests.

Test coverage

145 passed, 0 failed in math-golden.test.ts. Full frontend suite green. Typecheck and pnpm build clean.

Tests cover:

  • \yng(2,1), \yng(3,2,1), \yng(4,3,2,1) render as Young diagrams via the real react-markdown + remark-math + rehype-katex path.
  • \yng(2,1){a&b\\c\\d&e} renders as a filled Young tableau.
  • \young(2 1) (youngtab syntax) renders.
  • Bare \yng(2,1) in prose (no $ delimiters) gets wrapped in $…$ and rendered.
  • Left-aligned {l} column (not centered {c}).
  • Flush horizontal spacing (\! not \,).
  • Flush vertical spacing (\\[-0.525em] not default \\).
  • Unit tests on the translator (with/without content; pass-through for non-Young macros).

Commits (cumulative)

Commit Description
0fbb9086 Initial translator: \yng/\young\begin{array}{c}\hphantom{x}…
64db8bd4 Bare-in-prose stateful wrapping; \yng without $ gets wrapped in $…$
ab9e673a \square instead of \hphantom{x} for visible cells
3b73d54a Left-aligned {l} instead of centered {c}
d42b1a5a Flush horizontal spacing: \! instead of \,
d6217f1c Attempted \raisebox approach (superseded by \\[-0.525em])
cb7a0cfa Per-row spacing \\[-0.4em] (first version, still had small gap)
92fb0fd1 Final: \\[-0.525em] for zero visual gap (exact derivation from glyph metrics)

Trade-offs

  • Orphan \yng (model forgot closing) is not handled — same as PR fix(desktop): repair inline math rendering for LLM output #3666's orphan $$ policy.
  • \rule boxes would give even tighter visual control (no font-metric padding), but \square is more readable in the LaTeX source and is the standard amssymb glyph.
  • The -0.525em value is calibrated for katex's AMS font \square glyph height (0.675em). If the font changes, the value would need re-tuning.

Reasonix and others added 30 commits June 9, 2026 19:07
Three targeted fixes to the math-pipeline pre-pass that resolve cases
where the rendered chat output showed LaTeX source as raw text:

1. mathNormalize.ts (Step 2.5): when the model writes block math with
   the opening $$ glued to prose on the same line ('…decomposes
   as$$\n\mathbf{6}…'), CommonMark requires a blank line before
   the $$. remark-math otherwise creates an empty math node and the
   formula leaks out as literal text. Insert \n\n before any $$
   preceded by a letter or end-of-sentence punctuation. The
   freshly-rewritten \] → $$ from step 2 is not affected.

2. mathClassify.ts: classify single digits ($1$, $2$) as math —
   commonly used as set / sequence indices. Multi-digit numbers,
   decimals, and percentages stay literal (still currency / percentage).
   This is a deliberate behavior change documented in the comment.

3. mathClassify.ts: allow comma-separated tokens ('A, B', '1, 2, 3',
   '\\alpha, \\beta', '(A, B)') as math. These are typical of
   ordered-pair / tuple / enumeration notation. Currency and env-var
   usage never looks like this.

4. mathClassify.ts: allow single uppercase letters as math. In
   non-English math prose (Chinese / Japanese / Korean textbooks)
   single capital letters are extremely common as set / algebra /
   group / vector-space names, and the closing-dollar form $X$ is
   essentially never written for English words like I/A/V by hand.

Test changes: 4 existing currency/acronym assertions updated to
reflect the new behavior, 13 new regression tests covering all four
fixes including the user's specific cases ('$1$ 和 $2$' and
'$S$ 非空 / $S$ 有上界'). 98 math-golden tests pass, 112/112 across
all suites, typecheck clean.

Orphan $$ (model wrote display math but forgot the closing $$) is
documented as not-fixed-from-the-renderer: every attempt to rescue
the orphan from the renderer side made the output worse, so the fix
for that case is on the LLM side (post-generation lint or stricter
system prompt).
The classifier rules are language-agnostic, not specific to CJK text.
Updated test section name and descriptions to reflect that patterns
like single digits, comma-separated tokens, and one-sided operators
apply universally across languages. Chinese text in test cases remains
as real user examples, but the rules themselves are not CJK-specific.
Add defensive escaping for code blocks containing $ characters.
When protecting code (inline `...` or fenced ```...```), replace
$ with &esengine#36; (HTML entity). On restoration, unescape back to $.

This prevents KaTeX from attempting to parse math delimiters that
appear in code examples, regex patterns, or template literals.

Fixes: Pasted documentation about the math pipeline itself no longer
shows red KaTeX error text.

Tests: 3 new cases added, 106/106 passing
Remove the requirement that ``` must appear after a newline. This
handles cases where documentation is pasted on a single line with
embedded code blocks containing $ symbols.

Previously: ``` markers were only recognized after \n
Now: ``` markers are recognized anywhere

This prevents KaTeX errors (red text) when processing malformed code
blocks that contain $ in regex patterns, template literals, or other
code examples.

All 120 tests pass.
Enhancements to inline math detection:
- Reject pure numbers (1, 2.5, 10) as currency/percentages
- Accept numbers with variables (2.5x, 3y^2) as math
- Accept numbers with LaTeX escapes (10\%) as math
- Fix single-line code block detection to protect $ in malformed markdown

This better matches real-world usage where 'costs $5' is currency
but '$2.5x + 3$' is clearly a mathematical expression.

All 122 tests pass (108 math-golden + 8 text-size + 6 provider-model-refresh).
Previously, the Step 5 regex would greedily match '$5 and $' as a single
math expression with content '5 and ', then convert it to '&esengine#36;5 and &esengine#36;'
because the classifier correctly identified it as non-math. This was visually
correct but had two problems:

1. The greedy match would consume the closing dollar that belonged to the
   next currency token, causing cascade replacements.
2. Prose currency like 'These two apples cost $5 and $6' would have its
   dollar signs converted to HTML entities, which works but is unnecessary
   noise in the rendered output.

Changes:
- Step 5 regex now uses non-greedy matching (+\?) so '$5 and $' doesn't
  match '$5 and $' as a single pair
- When the classifier rejects a match, the original text is preserved
  unchanged (return _m) instead of being wrapped in HTML entities
- This keeps dollar signs visible in prose while still preventing them from
  being parsed as math

All 122 tests pass.
KaTeX doesn't support the \\slashed command from the LaTeX 'slashed'
package, which is commonly used in physics for Feynman slash notation
(\\slashed{p}, \\slashed{\\partial}).

Convert \\slashed{X} to \\not{X} before passing to KaTeX. The \\not
command provides a similar visual effect (a slash through the character)
and is supported by KaTeX.

This prevents KaTeX parse errors when the model emits physics notation
using \\slashed.

Tests: 3 new cases added, 111/111 passing
…psilon(0)

The previous fix only handled \\\\slashed{X} (braced form). User-reported
issue: \\\\slashed\\\\epsilon(0) (Greek letter + function call, no braces)
was not being converted and still showed as red KaTeX error.

Extended the regex to handle two additional forms:
- \\\\slashed X       → \\\\not X       (single letter, no braces)
- \\\\slashed\\\\epsilon      → \\\\not{\\\\epsilon}    (backslash command)
- \\\\slashed\\\\epsilon(0)   → \\\\not{\\\\epsilon(0)}  (backslash command + function)
- \\\\slashed a       → \\\\not a       (single ASCII letter)

This covers the common physics notation \\\\slashed{p} where the model
sometimes forgets the braces around the argument.

Tests: 2 new cases added, 113/113 math-golden passing.
# Conflicts:
#	desktop/frontend/src/__tests__/math-golden.test.ts
#	desktop/frontend/src/components/latexNormalize.ts
Three targeted fixes to the math-pipeline pre-pass that resolve cases
where the rendered chat output showed LaTeX source as raw text:

1. mathNormalize.ts (Step 2.5): when the model writes block math with
   the opening $$ glued to prose on the same line ('…decomposes
   as$$\n\mathbf{6}…'), CommonMark requires a blank line before
   the $$. remark-math otherwise creates an empty math node and the
   formula leaks out as literal text. Insert \n\n before any $$
   preceded by a letter or end-of-sentence punctuation. The
   freshly-rewritten \] → $$ from step 2 is not affected.

2. mathClassify.ts: classify single digits ($1$, $2$) as math —
   commonly used as set / sequence indices. Multi-digit numbers,
   decimals, and percentages stay literal (still currency / percentage).
   This is a deliberate behavior change documented in the comment.

3. mathClassify.ts: allow comma-separated tokens ('A, B', '1, 2, 3',
   '\\alpha, \\beta', '(A, B)') as math. These are typical of
   ordered-pair / tuple / enumeration notation. Currency and env-var
   usage never looks like this.

4. mathClassify.ts: allow single uppercase letters as math. In
   non-English math prose (Chinese / Japanese / Korean textbooks)
   single capital letters are extremely common as set / algebra /
   group / vector-space names, and the closing-dollar form $X$ is
   essentially never written for English words like I/A/V by hand.

Test changes: 4 existing currency/acronym assertions updated to
reflect the new behavior, 13 new regression tests covering all four
fixes including the user's specific cases ('$1$ 和 $2$' and
'$S$ 非空 / $S$ 有上界'). 98 math-golden tests pass, 112/112 across
all suites, typecheck clean.

Orphan $$ (model wrote display math but forgot the closing $$) is
documented as not-fixed-from-the-renderer: every attempt to rescue
the orphan from the renderer side made the output worse, so the fix
for that case is on the LLM side (post-generation lint or stricter
system prompt).
The classifier rules are language-agnostic, not specific to CJK text.
Updated test section name and descriptions to reflect that patterns
like single digits, comma-separated tokens, and one-sided operators
apply universally across languages. Chinese text in test cases remains
as real user examples, but the rules themselves are not CJK-specific.
Add defensive escaping for code blocks containing $ characters.
When protecting code (inline `...` or fenced ```...```), replace
$ with &esengine#36; (HTML entity). On restoration, unescape back to $.

This prevents KaTeX from attempting to parse math delimiters that
appear in code examples, regex patterns, or template literals.

Fixes: Pasted documentation about the math pipeline itself no longer
shows red KaTeX error text.

Tests: 3 new cases added, 106/106 passing
Remove the requirement that ``` must appear after a newline. This
handles cases where documentation is pasted on a single line with
embedded code blocks containing $ symbols.

Previously: ``` markers were only recognized after \n
Now: ``` markers are recognized anywhere

This prevents KaTeX errors (red text) when processing malformed code
blocks that contain $ in regex patterns, template literals, or other
code examples.

All 120 tests pass.
Enhancements to inline math detection:
- Reject pure numbers (1, 2.5, 10) as currency/percentages
- Accept numbers with variables (2.5x, 3y^2) as math
- Accept numbers with LaTeX escapes (10\%) as math
- Fix single-line code block detection to protect $ in malformed markdown

This better matches real-world usage where 'costs $5' is currency
but '$2.5x + 3$' is clearly a mathematical expression.

All 122 tests pass (108 math-golden + 8 text-size + 6 provider-model-refresh).
Previously, the Step 5 regex would greedily match '$5 and $' as a single
math expression with content '5 and ', then convert it to '&esengine#36;5 and &esengine#36;'
because the classifier correctly identified it as non-math. This was visually
correct but had two problems:

1. The greedy match would consume the closing dollar that belonged to the
   next currency token, causing cascade replacements.
2. Prose currency like 'These two apples cost $5 and $6' would have its
   dollar signs converted to HTML entities, which works but is unnecessary
   noise in the rendered output.

Changes:
- Step 5 regex now uses non-greedy matching (+\?) so '$5 and $' doesn't
  match '$5 and $' as a single pair
- When the classifier rejects a match, the original text is preserved
  unchanged (return _m) instead of being wrapped in HTML entities
- This keeps dollar signs visible in prose while still preventing them from
  being parsed as math

All 122 tests pass.
Rebase onto current main-v2 plus five targeted cleanups called out in
the review:

1. Drop the &esengine#36; escape/unescape dance. Protected segments are stored
   out-of-band and swapped back wholesale, so the round-trip is a no-op
   — except for code that legitimately contains the literal text &esengine#36;
   (which got silently rewritten to $ on restore, corrupting the source).
   The header comment is also stale: the description claims restore does
   not unescape, but the code did.

2. Revert Step 5's greedy→non-greedy change. The char class
   [^$\n]+ already excludes $, so changing + to +? has no effect
   on match extent; the comment claiming it prevents cross-pair
   matching is wrong. Drop the change and the misleading comment.
   The "leave non-math pairs unchanged" behaviour is kept.

3. Restrict fenced-code detection to line-start. Allowing ``` anywhere
   in the line would swallow prose like "wrap code in ```blocks``` here"
   into a code region and break the math for the rest of the message —
   the CommonMark spec requires fences at line start. Single-line docs
   are still handled (the next matching fence is the closer).

4. Escape top-level % in math. KaTeX treats unescaped % as a LaTeX
   comment char and silently truncates the formula at end-of-line —
   "$x = 50%$" rendered as "x = 50" with no error. Add a top-level
   case in latexNormalizeForKatex that emits \% (already-escaped \%
   is handled above as a 2-char command, so no double-escape).

5. Trim oversized comments. Drop the // was: ... history notes in tests
   and the 8-12 line essays in mathNormalize / mathClassify that
   describe code that no longer exists or that the reader can see
   from the regex. The header still lists the pipeline as a map.

128 tests pass; typecheck clean.
Reasonix and others added 6 commits June 13, 2026 00:02
Rebase onto current main-v2 plus five targeted cleanups called out in
the review:

1. Drop the &esengine#36; escape/unescape dance. Protected segments are stored
   out-of-band and swapped back wholesale, so the round-trip is a no-op
   — except for code that legitimately contains the literal text &esengine#36;
   (which got silently rewritten to $ on restore, corrupting the source).
   The header comment is also stale: the description claims restore does
   not unescape, but the code did.

2. Revert Step 5's greedy→non-greedy change. The char class
   [^$\n]+ already excludes $, so changing + to +? has no effect
   on match extent; the comment claiming it prevents cross-pair
   matching is wrong. Drop the change and the misleading comment.
   The "leave non-math pairs unchanged" behaviour is kept.

3. Restrict fenced-code detection to line-start. Allowing ``` anywhere
   in the line would swallow prose like "wrap code in ```blocks``` here"
   into a code region and break the math for the rest of the message —
   the CommonMark spec requires fences at line start. Single-line docs
   are still handled (the next matching fence is the closer).

4. Escape top-level % in math. KaTeX treats unescaped % as a LaTeX
   comment char and silently truncates the formula at end-of-line —
   "$x = 50%$" rendered as "x = 50" with no error. Add a top-level
   case in latexNormalizeForKatex that emits \% (already-escaped \%
   is handled above as a 2-char command, so no double-escape).

5. Trim oversized comments. Drop the // was: ... history notes in tests
   and the 8-12 line essays in mathNormalize / mathClassify that
   describe code that no longer exists or that the reader can see
   from the regex. The header still lists the pipeline as a map.

128 tests pass; typecheck clean.
Expressions starting with a unary + or - (e.g. +2, -x, +\alpha)
were rejected by isLikelyInlineMath because none of the existing
patterns matched them — the operator-pattern on line 10 requires
a character before the operator, and the pure-number pattern
requires the first character to be a digit.

This caused \( +2 \) to be treated as non-math text, rendering
as literal '$+2$' instead of rendering the KaTeX unary plus.

Add a dedicated pattern: /^[+\-]\s*(?:\d+(?:\.\d+)?|[A-Za-z\])/
that matches unary operator + digit/variable/backslash-command.
Companion fix to PR esengine#3666 (fix/inline-math-rendering). The character
class in mathNormalize.ts step 3 missed the comma case: a model that
emits a display block whose closing $$ is on the same line as the
trailing comma of the equation content (…D(q^2),$$) leaves the
closing fence glued to the content line, which micromark-extension-math
does not recognise as a closing fence (it only checks for $$ at
the start of a new line). The rest of the document is then consumed
as math and katex fails on the stray $ in the next paragraph.

Add ',' to the character class so the closing $$ is forced onto its
own line, matching the existing 'inline $$ after closing bracket'
behaviour. A regression test pins the comma case.

This commit was previously 45482b2 on fix/inline-math-rendering
(the PR's tip), but cherry-picking the older PR commits onto
dev-new-features (which had this fix from an earlier cherry-pick
overwritten by the older base state) reverted the regex change.
Re-applying it here.
Cherry-picks the maintainer's review patch from PR esengine#3666 onto
dev-new-features so the local Reasonix app renders currency and
env-var tokens as literal dollars instead of triggering katex
errors.

Per maintainer feedback: a literal $…$ pair in normalizeMath
output is not enough to keep a non-math token out of KaTeX, because
remark-math parses any $…$ it sees in the source. The classifier's
reject verdict only holds when the $ is hidden as a &esengine#36; entity.

Step 6: non-math pairs (currency $5, env vars $PATH) now wrap in
&esengine#36; entities. Decoded entity still renders as a literal dollar.

Test updates:
- Two eq assertions flipped to expect the entity form.
- Drop the stale '$5$ is filtered to entities' comment.
- New render-boundary section runs the real react-markdown +
  remark-math + rehype-katex path; the previous golden cases never
  crossed the prose→parser boundary, so the regression was
  undetectable at the normalizeMath layer.

132 passed, 0 failed (was 129).
Physics / chemistry / representation-theory answers routinely use
the ytableau (\yng) and youngtab (\young) packages to draw Young
diagrams and Young tableaux:

  \yng(2,1)         → empty (2,1) diagram
  \yng(2,1){a&b\\c\\d&e} → filled (2,1) tableau
  \young(2 1)       → youngtab syntax (whitespace separator)

Neither macro is bundled with KaTeX, so without this pass the macros
fail with "Undefined control sequence: \yng" and the chat surfaces
the raw LaTeX source as a red error block — same shape as the
comma-fix and step-6 bugs.

This change adds a small pre-pass (src/components/youngDiagrams.ts)
that translates \yng/\young into KaTeX-compatible \boxed{array}
forms. Empty cells become \hphantom{x} so the row width is uniform
and the diagram doesn't look ragged. Content cells are split on \\ and
& at brace-depth 0 (so \frac{a}{b}-style entries survive). The
pass is invoked inside the math body, alongside latexNormalizeForKatex,
in mathNormalize.ts steps 4/5/6 so display, text-mode, and inline math
are all covered.

The repair-regex character class in mathNormalize.ts step 3 also
gains '{' and '}'. A model that writes \end{array}919 or \frac{a}{b}919
on one line has the same micromark-fence problem as the comma case;
including the closing brace lets the closing 919 find a clean
start-of-line and the rest of the document doesn't get consumed as
math.

Tests (141 passed, 0 failed in math-golden.test.ts; +9 new):
  • \yng(2,1), \yng(3,2,1), \yng(4,3,2,1) render as the
    corresponding Young diagrams via react-markdown + KaTeX
  • \yng(2,1){a&b\\c\\d&e} renders as filled tableau
  • \young(2 1) (youngtab syntax) renders
  • Unit tests on the translator (with / without content;
    pass-through for non-Young macros)
  • New step-3 regression test: \end{...}919 repaired to
    \end{...}\n\n$$
@github-actions github-actions Bot added v2 Go rewrite (1.x) — main-v2 branch, active development desktop Wails desktop app (desktop/**) labels Jun 12, 2026
@lightfront lightfront closed this Jun 12, 2026
User-reported after PR esengine#4216 landed: \yng(2,1) inside prose
without surrounding $$…$$ still showed up as raw text in the
chat. The previous design only fired expandYoungDiagrams inside math
blocks, so a bare \yng in prose was never translated — and remark-math
left it as literal text because there's no $ delimiter for it to
match against.

This change moves expandYoungDiagrams to a stateful pre-pass that
runs once at the top of normalizeMathText, before any delimiter
handling. The translator tracks the math-depth counter (0 = prose,
1 = inline math $…$, 2 = display math $$…$$) so it can decide
per-macro whether to wrap the result in $…$ (when outside math)
or just substitute the inner form (when already inside math, where
the surrounding delimiters are preserved).

The implementation now uses src.startsWith + a small parser for the
parenthesised shape and optional {…} content, instead of a single
regex. The previous non-greedy regex couldn't handle shape with
spaces inside (\yng( 2, 1 )) or a content block containing nested
braces, both of which are valid ytableau syntax.

Tests (142 passed, 0 failed in math-golden.test.ts; +10 new):
  • \yng(2,1) in prose (no $ delimiters) gets wrapped in $…$
    and rendered as an inline Young diagram — the user-reported case
  • \yng(3,2,1), \yng(4,3,2,1), \young(2 1) (various shapes)
  • \yng(2,1){a&b\\c\\d&e} renders as a filled Young tableau
  • Unit tests on the translator (with / without content;
    pass-through for non-Young macros; bare-in-prose wraps in $…$)
  • Step-3 brace repair regression test from the previous commit is
    preserved
lightfront pushed a commit to lightfront/DeepSeek-Reasonix that referenced this pull request Jun 13, 2026
User-reported after PR esengine#4216 landed: \yng(2,1) inside prose
without surrounding $$…$$ still showed up as raw text in the
chat. The previous design only fired expandYoungDiagrams inside math
blocks, so a bare \yng in prose was never translated — and remark-math
left it as literal text because there's no $ delimiter for it to
match against.

This change moves expandYoungDiagrams to a stateful pre-pass that
runs once at the top of normalizeMathText, before any delimiter
handling. The translator tracks the math-depth counter (0 = prose,
1 = inline math $…$, 2 = display math $$…$$) so it can decide
per-macro whether to wrap the result in $…$ (when outside math)
or just substitute the inner form (when already inside math, where
the surrounding delimiters are preserved).

The implementation now uses src.startsWith + a small parser for the
parenthesised shape and optional {…} content, instead of a single
regex. The previous non-greedy regex couldn't handle shape with
spaces inside (\yng( 2, 1 )) or a content block containing nested
braces, both of which are valid ytableau syntax.

Tests (142 passed, 0 failed in math-golden.test.ts; +10 new):
  • \yng(2,1) in prose (no $ delimiters) gets wrapped in $…$
    and rendered as an inline Young diagram — the user-reported case
  • \yng(3,2,1), \yng(4,3,2,1), \young(2 1) (various shapes)
  • \yng(2,1){a&b\\c\\d&e} renders as a filled Young tableau
  • Unit tests on the translator (with / without content;
    pass-through for non-Young macros; bare-in-prose wraps in $…$)
  • Step-3 brace repair regression test from the previous commit is
    preserved
@lightfront

Copy link
Copy Markdown
Contributor Author

Heads up — the original PR had a gap that just bit me in local testing: bare \yng in prose (no $ delimiters around it) didn't get expanded, because expandYoungDiagrams was only called inside the math-body steps of mathNormalize. Just pushed commit 64db8bd4 (also on dev-new-features as 2a87b598) that fixes this.

The fix moves expandYoungDiagrams to a stateful pre-pass at the top of normalizeMathText. The translator now tracks math depth (0 = prose, 1 = inline math, 2 = display math), so:

  • \yng(2,1) in prose (e.g. "the partition \yng(2,1) corresponds to...") gets wrapped in $…$ and rendered as an inline diagram.
  • $\yng(2,1)$ and $$\yng(2,1)$$ still work as before — the translator just substitutes the inner form, leaving the surrounding delimiters alone.

The implementation now uses a small stateful parser (rather than a single regex), which also fixes two adjacent cases the previous regex couldn't handle: shapes with internal spaces (\yng( 2, 1 )) and content with nested braces (\yng(2,1){\frac{1}{2}&b\\c}).

Tests: 142 passed, 0 failed in math-golden.test.ts (+10 new), full frontend suite green, typecheck and pnpm build clean.

Local Reasonix build was also rebuilt and installed (the binary at /Applications/reasonix-desktop.app now embeds the new commit) — so a restart should pick up the fix. If you'd prefer to test on dev-new-features first, the same commit (2a87b598) is there.

…ells

The earlier translator used \\hphantom{x} as the empty-cell
placeholder so that empty cells would have the same width as
filled ones. \\hphantom renders as 'invisible x' — the cell takes
up space but produces no visible glyph. The result is correct
typesetting but useless for the user: the chat shows what looks
like an empty paragraph where the diagram should be.

Switching to \\square (Unicode U+25A1, 'WHITE SQUARE', katex class
mord amsrm) gives an actually-visible glyph per cell, which is
what a Young diagram is supposed to look like. \\square is in
amssymb, which KaTeX bundles by default. Width behaviour is the
same as \\hphantom{x} (the glyph is the same width as a lowercase
x in KaTeX's AMS font).

Verified with the user's exact reproduction: the
"12 invisible boxes" string now produces 24 visible \\square
glyphs (12 cells, each rendered as both MathML and HTML), no
katex-error, no transparent text — the diagram is now visible.

Tests: 142 passed, 0 failed in math-golden.test.ts. The unit
test that checks the substituted array form now expects
\\square instead of \\hphantom{x}.
@lightfront

Copy link
Copy Markdown
Contributor Author

Heads up — yet another follow-up. The \hphantom{x} placeholder I used for empty cells produces invisible output (correct typesetting, but the user sees nothing on screen). Replaced with \square (Unicode U+25A1, katex class mord amsrm) which produces an actual visible glyph per cell. KaTeX bundles amssymb by default, so no new deps.

Verified end-to-end with the user's reproduction string: the 12 invisible boxes now produce 24 visible glyphs (12 cells × 2 — MathML + HTML). No katex-error, no transparent text. Diagram is now actually visible.

This is the third commit on the Young-diagram branch:

  • 0fbb9086 — initial translator
  • 64db8bd4 — bare-in-prose stateful wrapping
  • ab9e673a\square for visibility

Tests: 142 passed, 0 failed. Built and installed locally; /Applications/reasonix-desktop.app now embeds build c78fb58a (squashed: ab9e673a).

The previous translator used \\begin{array}{c} (centered), which
causes the shorter rows of a Young diagram to be centred relative
to the longest row — so a (3,2,1) diagram looks like:

  [ ][ ][ ]
    [ ][ ]
      [ ]

instead of:

  [ ][ ][ ]
  [ ][ ]
  [ ]

Switch to {l} so every row's first cell is at the same horizontal
position. This is the standard Young-diagram layout: short rows just
have fewer cells, but they all start at the left edge.

Tests: 143 passed, 0 failed in math-golden.test.ts (+1 new alignment
regression test). The translator unit tests now expect {l} instead
of {c}.
…lush boxes

The previous translator used \\,\\, between cells, which leaves
a visible 0.1667em gap between every pair of \\square boxes. A
Young diagram should look like a row of *flush* boxes (the
youngtableau package renders them as a connected shape, not a row
of spaced-out squares). Switching to \\! (negative thin space,
-0.1667em) exactly cancels the thin-space margin, so adjacent
cells touch with zero visible gap.

Same width as before (the glyph is the same), just no inter-cell
gap. The translator unit test now expects \\! and the new
regression test pins the spacing behaviour so it can't regress.

Tests: 144 passed, 0 failed in math-golden.test.ts (+1 new).
Without this, a multi-row Young diagram has a visible gap between
every pair of adjacent rows: \square sits centred on the math axis
(roughly 0.4em above the baseline), so the bottom of one row's
square is ~0.15em below the baseline, and the top of the next row's
square is ~0.65em below the next baseline — leaving 0.35em of
white space between rows. Young diagrams are conventionally drawn as
a single connected shape, not as disconnected boxes.

Wrapping each cell in \\raisebox{-0.35em}{...} shifts the square
down by half the math-axis offset so consecutive rows touch. The
raise argument uses \\square (or the cell content) directly
without a \\$…\\$ wrapper, because the whole macro is already
inside math mode (whether the model wrote \\$…\\$ or the prose
wrapper added it); nesting \\$ inside \\$…\\$ would break the
katex parser.

Tests: 145 passed, 0 failed in math-golden.test.ts (+1 new
vertical-flush regression test). The translator unit tests now
expect \\raisebox{-0.35em}{...} instead of plain \\square.
The previous translator wrapped each cell in \\raisebox{-0.35em}
to shift glyphs down by the math-axis offset. That doesn't close
the gap because uniform translation is invariant — the relative
distance between row baselines stays at 1.2em regardless of how
much you shift the content within each row.

The right fix is per-row spacing, not per-cell shift. KaTeX display
math uses 1.2em baseline-to-baseline spacing, but a \\square glyph
is only ~0.85em tall and sits centred on the math axis (so its
bottom is ~0.15em below the baseline). Default row spacing leaves
0.35em of white space between rows.

\\[-0.4em] between rows pulls each subsequent row up by ~the
math-axis offset, so consecutive rows touch. The diagram becomes
a single connected shape — the standard Young-diagram layout.

Earlier commits added the raisebox and removed it as we narrowed
in on the actual root cause; this commit replaces both with the
correct \\[-0.4em] approach and reverts cells to plain \\square.
Tests: 145 passed, 0 failed in math-golden.test.ts.
The previous -0.4em left a small but visible gap between rows. The
exact value for zero gap is derived from katex's glyph metrics:
the \\square strut height is 0.675em (the visible glyph height),
and katex's default display-math baseline spacing is 1.2em. The
gap is 1.2 - 0.675 = 0.525em, so \\[-0.525em] subtracts exactly
that amount, making adjacent rows touch with zero visible space.

Measured empirically: at -0.525em the baseline gap is exactly
0.675em (= glyph height), so bottom of row 1 aligns with top of
row 2. Tests: 145 passed, 0 failed.
@lightfront lightfront reopened this Jun 13, 2026
@lightfront

Copy link
Copy Markdown
Contributor Author

PR reopened with 7 follow-up commits on top of the original 0fbb9086. Description updated.

Summary of what changed since the initial merge of 0fbb9086:

  1. Bare \yng in prose gets wrapped (64db8bd4) — the most important fix. Models write \yng(2,1) without $ delimiters; the translator now tracks math depth and wraps bare macros in $…$ so remark-math sees them.

  2. \square for visibility (ab9e673a) — \hphantom{x} is invisible. \square (Unicode U+25A1) gives a visible white-square glyph per cell.

  3. Left-aligned (3b73d54a) — {l} instead of {c} so shorter rows start at the same x as the longest row.

  4. Flush horizontal (d42b1a5a) — \! (negative thin space) between cells so adjacent boxes touch.

  5. Flush vertical (92fb0fd1) — \[-0.525em] between rows for zero visual gap. The value is derived exactly: \square glyph height is 0.675em (from katex strut), default baseline spacing is 1.2em, so the gap is 1.2 − 0.675 = 0.525em.

Also added { and } to the Step 3 repair-regex character class in mathNormalize.ts so \end{array}$$ on one line gets the same blank-line repair as \D(q^2),$$.

145 tests pass, full suite green.

@SivanCola

Copy link
Copy Markdown
Collaborator

@codex review

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c6f2d0350d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread desktop/frontend/src/components/mathNormalize.ts Outdated
Comment thread desktop/frontend/src/components/youngDiagrams.ts
Comment thread desktop/frontend/src/components/youngDiagrams.ts
Comment thread desktop/frontend/src/components/mathClassify.ts Outdated
Comment thread desktop/frontend/src/components/youngDiagrams.ts Outdated
@SivanCola

Copy link
Copy Markdown
Collaborator

approve

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

desktop Wails desktop app (desktop/**) v2 Go rewrite (1.x) — main-v2 branch, active development

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants