Consider:
use regex_syntax::ast::parse::ParserBuilder;
fn main() {
let parse = |pattern| {
ParserBuilder::new()
.ignore_whitespace(true)
.build()
.parse_with_comments(pattern)
.unwrap()
};
let wc_1 = parse("a #c\n|b");
let wc_2 = parse("a|#c\n b");
assert_ne!(wc_1, wc_2);
}
The comment #c is attached to different alternatives in the two regex, but the parse output of both are equivalent:
WithComments {
ast: Alternation(Alternation {
span: Span(Position(o: 0, l: 1, c: 1), Position(o: 7, l: 2, c: 3)),
asts: [
Literal(Literal {
span: Span(Position(o: 0, l: 1, c: 1), Position(o: 1, l: 1, c: 2)),
kind: Verbatim,
c: 'a'
}),
Literal(Literal {
span: Span(Position(o: 6, l: 2, c: 2), Position(o: 7, l: 2, c: 3)),
kind: Verbatim,
c: 'b'
})
]
}),
comments: [
Comment {
span: Span(Position(o: 2, l: 1, c: 3), Position(o: 5, l: 2, c: 1)),
comment: "c"
}
]
}
$$\overbrace{\overbrace{\Huge\color{red} \texttt{a}\mathstrut}^{\textrm{Literal(0..1)}}{\Huge\color{blue}\texttt{␣ }}\underbrace{\Huge\color{green}\texttt{\# c ↵}\mathstrut}_{\textrm{Comment(2..5)}}{\Huge\color{blue}\texttt{ |}}\overbrace{\Huge\color{red}\texttt{b}\mathstrut}^{\textrm{Literal(6..7)}}}^{\textrm{Alternation(0..7)}}$$
Without knowing the span of the | punctuation we cannot know if the comment should belong to a or b from parse_with_comments() alone. We have to refer back to the original pattern. At which point perhaps it is easier to just write the parser ourselves 🤷
I think the Ast type itself should include the Span of these marks when their position cannot be inferred, like the | in a|b|c or the , in a{3,100}.
Consider:
The comment
#cis attached to different alternatives in the two regex, but the parse output of both are equivalent:Without knowing the span of the
|punctuation we cannot know if the comment should belong toaorbfromparse_with_comments()alone. We have to refer back to the original pattern. At which point perhaps it is easier to just write the parser ourselves 🤷I think the
Asttype itself should include the Span of these marks when their position cannot be inferred, like the|ina|b|cor the,ina{3,100}.