From ccb54390437629c5fa1be0f1c9bf2cfa8182b3d6 Mon Sep 17 00:00:00 2001
From: "codeflash-ai[bot]"
 <148906541+codeflash-ai[bot]@users.noreply.github.com>
Date: Thu, 13 Nov 2025 02:35:17 +0000
Subject: [PATCH] Optimize is_tex_string
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The optimization achieves an **83% speedup** by **precompiling the regex pattern** at module load time instead of recompiling it on every function call.

**Key optimization**: The original code compiled a new regex pattern (`re.compile`) each time `is_tex_string()` was called, which is expensive. The optimized version moves this compilation outside the function, storing the precompiled pattern in the module-level variable `_pat`. Now each function call only performs the fast pattern matching operation.

**Performance impact**: Line profiler results show the dramatic improvement - the original version spent 73.3% of execution time (806μs out of 1100μs total) just compiling the regex pattern on each call. The optimized version eliminates this overhead entirely, reducing total execution time from 1100μs to 220μs.

**Why this works**: Regex compilation involves parsing the pattern string, building a finite state machine, and optimizing it - operations that don't need to be repeated since the MathJax delimiter patterns are constants. Python's `re.compile` returns an optimized pattern object that can be reused indefinitely.

**Test case benefits**: The optimization provides consistent speedups across all test scenarios:
- **Small strings**: 90-230% faster (most common case)
- **Large strings (1000+ chars)**: 10-25% faster (regex matching dominates over compilation)
- **Edge cases**: 100-200% faster for invalid patterns that fail quickly

This optimization is particularly valuable if `is_tex_string()` is called frequently in text processing pipelines, as the compilation overhead elimination scales linearly with call frequency.
---
 src/bokeh/embed/util.py | 20 ++++++++++++--------
 1 file changed, 12 insertions(+), 8 deletions(-)

diff --git a/src/bokeh/embed/util.py b/src/bokeh/embed/util.py
index 3885e0d7d5e..b5bebbca91a 100644
--- a/src/bokeh/embed/util.py
+++ b/src/bokeh/embed/util.py
@@ -46,6 +46,14 @@
     from ..core.types import ID
     from ..document.document import DocJson
 
+_dollars = r"\$\$.*?\$\$"
+
+_braces  = r"\\\[.*?\\\]"
+
+_parens  = r"\\\(.*?\\\)"
+
+_pat = re.compile(f"^({_dollars}|{_braces}|{_parens})$", flags=re.S)
+
 #-----------------------------------------------------------------------------
 # Globals and constants
 #-----------------------------------------------------------------------------
@@ -351,20 +359,16 @@ def submodel_has_python_callbacks(models: Sequence[Model | Document]) -> bool:
     return has_python_callback
 
 def is_tex_string(text: str) -> bool:
-    ''' Whether a string begins and ends with MathJax default delimiters
+    """ Whether a string begins and ends with MathJax default delimiters
+
 
     Args:
         text (str): String to check
 
     Returns:
         bool: True if string begins and ends with delimiters, False if not
-    '''
-    dollars = r"^\$\$.*?\$\$$"
-    braces  = r"^\\\[.*?\\\]$"
-    parens  = r"^\\\(.*?\\\)$"
-
-    pat = re.compile(f"{dollars}|{braces}|{parens}", flags=re.S)
-    return pat.match(text) is not None
+    """
+    return _pat.match(text) is not None
 
 def contains_tex_string(text: str) -> bool:
     ''' Whether a string contains any pair of MathJax default delimiters