I want to support matching on patterns of strings / POS tags with expressions similar to NLTK's, for example:
grammar = r"""
NP: {<DT|PP\$>?<JJ>*<NN>} # chunk determiner/possessive, adjectives and nouns
{<NNP>+} # chunk sequences of proper nouns
"""
But with a few changes:
- I don't want to use the string representation (this seems like a /great/ place for an eDSL)
- I want support for literal strings, literal tokens, and tags.
- I'd like to build this on an API that essentially does regex parsing, but by taking a sequence of equality relations across tokens, rather than strict equality. So, instead of expressing the pattern in terms of specific tokens, you can express it in terms of predicates.
I want to support matching on patterns of strings / POS tags with expressions similar to NLTK's, for example:
But with a few changes: