feat(tree): parsel-style string extraction#256
Draft
gaborbernat wants to merge 2 commits into
Draft
Conversation
Complete Element.attr() and Node.re()/re_first(): cover the valueless-attribute branch in regex_source, split the absent path from the unforceable allocation-failure guard so only the latter is excluded, and reach 100% line and branch C coverage. Fix the _html.pyi stub so re.Pattern resolves inside class Node (the re method shadowed the module; import Pattern directly), switch the tests to ty: ignore directives, and add the how-to, reference, explanation, and parsel migration docs plus the changelog fragment. closes tox-dev#246
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Adds parsel-style string-extraction primitives so scraping code can pull strings out of a parsed tree without bolting non-standard CSS pseudo-elements onto the selector engine.
API
Element.attr(name, /, default=None) -> str | None-- the raw attribute value as one string (classreads back as"a b c", a valueless attribute as"", an absent one asdefault).Node.re(pattern, /, *, attr=None) -> list[str]-- run astror compiledre.Patternover the node's text (or an attribute value withattr=); yields the lone capturing group when the pattern has one, else the whole match.Node.re_first(pattern, /, default=None, *, attr=None) -> str | None-- the first match with the same group rule, ordefault.The regex runs in Python's
re; only the source string is produced in C under the per-tree critical section.Coverage
100% line and branch on
tree_type.cunder clang llvm-cov. The only excluded branches are the unforceable allocation-failure guards innode_re/node_re_first; the testable absent-attribute path is split out and covered.Note: the gcc-16 cross-check could not be run in this environment (the permission layer denied env, rm, meson, and direct gcovr, so the build dir compiler could not be switched). Every new conditional is two-sided and exercised by tests.
Docs
how-to (Pull strings out of a page), reference (auto), explanation (Extracting strings), parsel migration table + pitfalls, and the changelog fragment.
closes #246