Skip to content

filters: fix escaped brackets inside *[...] character classes#440

Merged
xroche merged 2 commits into
masterfrom
fix/strjoker-escaped-bracket-148
Jun 28, 2026
Merged

filters: fix escaped brackets inside *[...] character classes#440
xroche merged 2 commits into
masterfrom
fix/strjoker-escaped-bracket-148

Conversation

@xroche

@xroche xroche commented Jun 28, 2026

Copy link
Copy Markdown
Owner

Escaping a bracket inside a *[...] filter class was broken: the matcher's escape branch read two chars ahead of the current position, so a backslash only took effect on the first class member. The documented *[\[\]] ("the [ or ] character") matched only ], *[a,\[] silently dropped the a, and because the loop stopped at the first ] even when escaped, an escaped ] could never be a member. The fix decodes the escape first in the loop body, so a backslash takes the next char as a literal member, an escaped ] is consumed before the terminator check, and escaping runs ahead of the range and size checks (\-, \,, \< are literal). *[\[\]] now matches both brackets as the guide claims.

A self-test already exercised this corner, but its assertions pinned the buggy output as expected (it even flagged #148 as a known quirk). They now assert the documented behavior and fail against the old matcher; the guide example moves to the comma form it documents.

Reviewing this with review-recipe surfaced a separate, pre-existing 1-byte heap over-read in the same loop: a truncated range like *[a- ran i += 3 off the end and then read past the NUL. The second commit guards the range arm on a non-NUL third char, and reworks the filter self-test to copy patterns and strings into exact-size heap buffers so a sanitizer catches that class of over-read (it was invisible before because the pattern came straight from argv, which has no redzone). A *[a- case exercises it.

Closes #148

xroche and others added 2 commits June 28, 2026 12:08
The escape branch in strjoker probed joker[i+2] instead of the current
char, so a backslash escape only worked as the first class member:
'*[\[\]]' (documented as "the [ or ] character") matched only ']', and
'*[a,\[]' dropped the 'a'. The loop also treated any ']' as the class
terminator, so an escaped ']' could never be a member.

Decode the escape first in the loop body: a backslash takes the next char
as the literal member (only that char, not also the backslash the old code
added), and an escaped ']' is consumed before the terminator check. So
'*[\[\]]' now matches both brackets, and escape precedes the range/size
checks ('\-' '\,' '\<' become literal members). The self-test previously
pinned the buggy output as expected; it now asserts the documented
behavior and fails against the old matcher.

Closes #148

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Signed-off-by: Xavier Roche <roche@httrack.com>
The *[...] class parser's range arm does i += 3 unconditionally, so a
pattern ending in a dangling '-' (e.g. *[a-) read one byte past the NUL:
joker[i+2] is the NUL, i jumps to len+1, and the separator skip and loop
guard then read joker[len+1]. Guard the range arm on joker[i+2] != '\0'
so a truncated range falls through to the literal-member path instead of
overshooting.

The filter self-test now copies the pattern and string into exact-size
heap buffers so a sanitizer traps such over-reads; the pattern previously
came straight from argv (no redzone), which is why this stayed invisible.
A *[a- test case exercises it.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Signed-off-by: Xavier Roche <roche@httrack.com>
@xroche xroche force-pushed the fix/strjoker-escaped-bracket-148 branch from 8a5e5e9 to c292454 Compare June 28, 2026 10:46
@xroche xroche merged commit 799ec88 into master Jun 28, 2026
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Help typo

1 participant