The pure versions of regex match extraction functions, Text.ICU.prefix, suffix, and (possibly) group do not correctly handle the case where a group is in a regex but is not used in a match. For example "a(b)?c" against "ac" or "(a)|b" against "b". They assume that start_ and end_ return -1 only when the grouping is out of range, but in fact they can when a grouping does not fire.
> prefix 1 =<< find (regex [] "abc(def)?ghi") "xabcghiy"
*** Exception: Data.Text.Array.new: size overflow
CallStack (from HasCallStack):
error, called at ./Data/Text/Array.hs:129:20 in text-1.2.2.1-FeA6fTH3E2n883cNXIS2Li:Data.Text.Array
> suffix 1 =<< find (regex [] "abc(def)?ghi") "xabcghiy"
Just "\NULxabcghiy"
An out of bounds range gives the expected results:
> prefix 2 =<< find (regex [] "abc(def)?ghi") "xabcghiy"
Nothing
> suffix 2 =<< find (regex [] "abc(def)?ghi") "xabcghiy"
Nothing
group possibly does right thing, but not for the right reason (it extracts -1 to -1), and perhaps should return Nothing instead:
> group 1 =<< find (regex [] "abc(def)?ghi") "xabcghiy"
Just ""
One solution would be to use the safe underlying start and end functions instead, returning Nothing for any underlying Nothing. Happy to submit a PR for this approach.
The pure versions of regex match extraction functions,
Text.ICU.prefix,suffix, and (possibly)groupdo not correctly handle the case where a group is in a regex but is not used in a match. For example "a(b)?c" against "ac" or "(a)|b" against "b". They assume thatstart_andend_return -1 only when the grouping is out of range, but in fact they can when a grouping does not fire.An out of bounds range gives the expected results:
grouppossibly does right thing, but not for the right reason (it extracts -1 to -1), and perhaps should returnNothinginstead:One solution would be to use the safe underlying
startandendfunctions instead, returningNothingfor any underlyingNothing. Happy to submit a PR for this approach.