Add bail-out handler API for flushing buffered state on graceful bail-out.#323
Open
orium wants to merge 1 commit into
Open
Add bail-out handler API for flushing buffered state on graceful bail-out.#323orium wants to merge 1 commit into
orium wants to merge 1 commit into
Conversation
…-out. Graceful bail-out (`MemoryLimitExceeded` or `ContentHandlerError` with the matching flag on) flushes the unparsed input remainder raw to the sink and propagates the error. That is enough for handlers that only transform tokens they see, but handlers that buffer state across the document (e.g. ROFL's email-obfuscation module, which holds up to ~128 chars in a text buffer while deciding whether they belong to an email) lose that state on bail-out and produce a response with a gap. This commit adds a hook that fires once on a graceful bail-out, immediately before the raw flush, and lets handlers append final bytes to the sink: 1. New rewritable unit `BailOut` with a single method `append(content, content_type)`, modelled after `DocumentEnd::append`. The wrapper carries the rewriter's current encoding (after any `<meta charset>`-driven change), so encoding-correctness is automatic. 2. New builder method `Settings::append_bail_out_handler` (and the `RewriteStrSettings` mirror) plus `bail_out!` macro for type-hint ergonomics, parallel to the existing `element!` / `end!` macros. 3. `HandlerTypes` grows a `BailOutHandler<'h>` associated type, with `LocalHandlerTypes` aliasing `BailOutHandler<'h>` and `SendHandlerTypes` aliasing `BailOutHandlerSend<'h>`. Matching `IntoHandler` impls cover both bare-closure cases. 4. `TransformController` grows a `handle_bail_out` method with an empty default impl so existing implementors (test fixtures, parser-trace tool) keep compiling. `HtmlRewriteController` overrides it to iterate the user-registered handlers in registration order. 5. `Dispatcher::run_bail_out_handlers` constructs the `BailOut` wrapper and delegates to the controller. It is invoked from every existing graceful bail-out site in `TransformStream::write()` (3 sites: `Arena::append`, `Parser::parse`, `Arena::init_with`) and `TransformStream::end()` (1 site), gated on `should_bail_out_for(&err)`. Hook output therefore lands in the sink as `[transformed prefix] + [hook output] + [raw remainder]`. 6. `RewritingError` is marked `#[non_exhaustive]` so we can add variants in future minor releases. `match`es still work; only exhaustive external matches need a catch-all arm. The `end()` bail-out site is defensive: it is symmetric with the `write()` sites but is not reachable through normal input. EOF-in-tag / -attribute / -comment emits as text per HTML5, so content-handler errors don't fire from `parser.parse(_, true)`, and memory errors fire earlier in `write()`. Tested implicitly via the shared call path.
scotchmist
approved these changes
May 26, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Graceful bail-out (
MemoryLimitExceededorContentHandlerErrorwith the matching flag on) flushes the unparsed input remainder raw to the sink and propagates the error. That is enough for handlers that only transform tokens they see, but handlers that buffer state across the document (e.g. ROFL's email-obfuscation module, which holds up to ~128 chars in a text buffer while deciding whether they belong to an email) lose that state on bail-out and produce a response with a gap.This commit adds a hook that fires once on a graceful bail-out, immediately before the raw flush, and lets handlers append final bytes to the sink:
BailOutwith a single methodappend(content, content_type), modelled afterDocumentEnd::append. The wrapper carries the rewriter's current encoding (after any<meta charset>-driven change), so encoding-correctness is automatic.Settings::append_bail_out_handler(and theRewriteStrSettingsmirror) plusbail_out!macro for type-hint ergonomics, parallel to the existingelement!/end!macros.HandlerTypesgrows aBailOutHandler<'h>associated type, withLocalHandlerTypesaliasingBailOutHandler<'h>andSendHandlerTypesaliasingBailOutHandlerSend<'h>. MatchingIntoHandlerimpls cover both bare-closure cases.TransformControllergrows ahandle_bail_outmethod with an empty default impl so existing implementors (test fixtures, parser-trace tool) keep compiling.HtmlRewriteControlleroverrides it to iterate the user-registered handlers in registration order.Dispatcher::run_bail_out_handlersconstructs theBailOutwrapper and delegates to the controller. It is invoked from every existing graceful bail-out site inTransformStream::write()(3 sites:Arena::append,Parser::parse,Arena::init_with) andTransformStream::end()(1 site), gated onshould_bail_out_for(&err). Hook output therefore lands in the sink as[transformed prefix] + [hook output] + [raw remainder].RewritingErroris marked#[non_exhaustive]so we can add variants in future minor releases.matches still work; only exhaustive external matches need a catch-all arm.The
end()bail-out site is defensive: it is symmetric with thewrite()sites but is not reachable through normal input. EOF-in-tag / -attribute / -comment emits as text per HTML5, so content-handler errors don't fire fromparser.parse(_, true), and memory errors fire earlier inwrite(). Tested implicitly via the shared call path.