✨ feat(tree): add Node.prune to keep a CSS selector#258
Merged
Conversation
046f433 to
f0df182
Compare
Add a Node.prune(selector) C method that, after the normal WHATWG parse, removes every descendant not matching the CSS selector and not an ancestor or descendant of a match, trimming a large document to a small tree. This is the post-parse equivalent of BeautifulSoup's SoupStrainer. The match runs first into a snapshot of each match plus its ancestor chain, then a pure-C pass removes the rest, so no structural pointer is dereferenced across a Python call (a regex/string filter) or after a removal rewired it. All work runs under one per-tree critical section, reusing the existing selector engine, atoms, and arena. closes tox-dev#252
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
3be2dab to
4cfb54d
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Adds
Node.prune(selector), the post-parse equivalent of BeautifulSoup'sSoupStrainer: parse the whole document the WHATWG way, then keep only the descendants matching a CSS selector, together with their ancestors up to the node it is called on and the whole subtree under each match, removing everything else in place. It returns the node, so it chains offparse:A selector that matches nothing empties the subtree.
Design
node_prunenext tonode_css_closestintree_type.c, reusing the existing selector engine (selector.h) andth_node_remove.Surface
Node.pruneC method, registered on the sharednode_methodstable._html.pyi:def prune(self, selector: str, /) -> Node: ...Docs / tests
SoupStrainer->prunemigration row plus the updated omissions note.tests/test_tree_prune.py(12 cases) and a concurrency case intests/test_tree_freethread.py.docs/changelog/252.feature.rst.closes #252