Skip to content

fix: replace O(n^2) regex with linear string search in code block extraction (ReDoS)#6118

Open
Ashutosh0x wants to merge 2 commits into
google:mainfrom
Ashutosh0x:fix/redos-code-extraction
Open

fix: replace O(n^2) regex with linear string search in code block extraction (ReDoS)#6118
Ashutosh0x wants to merge 2 commits into
google:mainfrom
Ashutosh0x:fix/redos-code-extraction

Conversation

@Ashutosh0x

Copy link
Copy Markdown
Contributor

Summary

Fix for #5992 — Replace catastrophic O(n²) regex backtracking in extract_code_and_truncate_content with a linear-time string search.

Problem

The regex pattern at line 153 of code_execution_utils.py uses multiple .*? groups with re.DOTALL:

\\python
rf'(?P.?)({leading_delimiter_pattern})(?P.?)({trailing_delimiter_pattern})(?P.*?)$'
\\

When the input is large and contains no matching delimiters (or partial delimiters), the regex engine tries all possible combinations of how the lazy quantifiers can match, causing O(n²) backtracking that hangs the process.

CWE-1333: Inefficient Regular Expression Complexity (ReDoS)

Fix

Replaced the regex with a simple str.find()-based approach:

  1. For each delimiter pair, find the first occurrence of the leading delimiter
  2. Find the corresponding trailing delimiter after it
  3. Pick the earliest match

This runs in O(n × d) time where d = number of delimiter pairs (typically 2-3), which is effectively O(n).

Testing

The fix preserves the same behavior — extracting the first code block and truncating content after it. The string search approach handles the same edge cases:

  • No delimiters found → returns None
  • Empty code block → returns None
  • Multiple code blocks → picks the earliest one

Fixes #5992

@Ashutosh0x

Copy link
Copy Markdown
Contributor Author

Hi @surajksharma07 — this fixes the ReDoS (CWE-1333) reported in #5992.

The regex in extract_code_and_truncate_content() uses multiple .*? groups with re.DOTALL, causing O(n²) backtracking on large inputs without matching delimiters. Replaced with a simple str.find() loop that runs in O(n) time.

Behavioral parity maintained — same inputs produce same outputs, just without the hang.

@rohityan rohityan self-assigned this Jun 15, 2026
@wukath wukath self-assigned this Jun 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants