Skip to content

Add bail-out handler API for flushing buffered state on graceful bail-out.#323

Open
orium wants to merge 1 commit into
mainfrom
bail-out-handlers
Open

Add bail-out handler API for flushing buffered state on graceful bail-out.#323
orium wants to merge 1 commit into
mainfrom
bail-out-handlers

Conversation

@orium
Copy link
Copy Markdown
Member

@orium orium commented May 25, 2026

Graceful bail-out (MemoryLimitExceeded or ContentHandlerError with the matching flag on) flushes the unparsed input remainder raw to the sink and propagates the error. That is enough for handlers that only transform tokens they see, but handlers that buffer state across the document (e.g. ROFL's email-obfuscation module, which holds up to ~128 chars in a text buffer while deciding whether they belong to an email) lose that state on bail-out and produce a response with a gap.

This commit adds a hook that fires once on a graceful bail-out, immediately before the raw flush, and lets handlers append final bytes to the sink:

  1. New rewritable unit BailOut with a single method append(content, content_type), modelled after DocumentEnd::append. The wrapper carries the rewriter's current encoding (after any <meta charset>-driven change), so encoding-correctness is automatic.
  2. New builder method Settings::append_bail_out_handler (and the RewriteStrSettings mirror) plus bail_out! macro for type-hint ergonomics, parallel to the existing element! / end! macros.
  3. HandlerTypes grows a BailOutHandler<'h> associated type, with LocalHandlerTypes aliasing BailOutHandler<'h> and SendHandlerTypes aliasing BailOutHandlerSend<'h>. Matching IntoHandler impls cover both bare-closure cases.
  4. TransformController grows a handle_bail_out method with an empty default impl so existing implementors (test fixtures, parser-trace tool) keep compiling. HtmlRewriteController overrides it to iterate the user-registered handlers in registration order.
  5. Dispatcher::run_bail_out_handlers constructs the BailOut wrapper and delegates to the controller. It is invoked from every existing graceful bail-out site in TransformStream::write() (3 sites: Arena::append, Parser::parse, Arena::init_with) and TransformStream::end() (1 site), gated on should_bail_out_for(&err). Hook output therefore lands in the sink as [transformed prefix] + [hook output] + [raw remainder].
  6. RewritingError is marked #[non_exhaustive] so we can add variants in future minor releases. matches still work; only exhaustive external matches need a catch-all arm.

The end() bail-out site is defensive: it is symmetric with the write() sites but is not reachable through normal input. EOF-in-tag / -attribute / -comment emits as text per HTML5, so content-handler errors don't fire from parser.parse(_, true), and memory errors fire earlier in write(). Tested implicitly via the shared call path.

…-out.

Graceful bail-out (`MemoryLimitExceeded` or `ContentHandlerError` with the
matching flag on) flushes the unparsed input remainder raw to the sink and
propagates the error. That is enough for handlers that only transform tokens
they see, but handlers that buffer state across the document (e.g. ROFL's
email-obfuscation module, which holds up to ~128 chars in a text buffer while
deciding whether they belong to an email) lose that state on bail-out and
produce a response with a gap.

This commit adds a hook that fires once on a graceful bail-out, immediately
before the raw flush, and lets handlers append final bytes to the sink:

1. New rewritable unit `BailOut` with a single method
   `append(content, content_type)`, modelled after `DocumentEnd::append`. The
   wrapper carries the rewriter's current encoding (after any
   `<meta charset>`-driven change), so encoding-correctness is automatic.
2. New builder method `Settings::append_bail_out_handler` (and the
   `RewriteStrSettings` mirror) plus `bail_out!` macro for type-hint
   ergonomics, parallel to the existing `element!` / `end!` macros.
3. `HandlerTypes` grows a `BailOutHandler<'h>` associated type, with
   `LocalHandlerTypes` aliasing `BailOutHandler<'h>` and `SendHandlerTypes`
   aliasing `BailOutHandlerSend<'h>`. Matching `IntoHandler` impls cover both
   bare-closure cases.
4. `TransformController` grows a `handle_bail_out` method with an empty
   default impl so existing implementors (test fixtures, parser-trace tool)
   keep compiling. `HtmlRewriteController` overrides it to iterate the
   user-registered handlers in registration order.
5. `Dispatcher::run_bail_out_handlers` constructs the `BailOut` wrapper and
   delegates to the controller. It is invoked from every existing graceful
   bail-out site in `TransformStream::write()` (3 sites: `Arena::append`,
   `Parser::parse`, `Arena::init_with`) and `TransformStream::end()` (1 site),
   gated on `should_bail_out_for(&err)`. Hook output therefore lands in the
   sink as `[transformed prefix] + [hook output] + [raw remainder]`.
6. `RewritingError` is marked `#[non_exhaustive]` so we can add variants in
   future minor releases. `match`es still work; only exhaustive external
   matches need a catch-all arm.

The `end()` bail-out site is defensive: it is symmetric with the `write()`
sites but is not reachable through normal input. EOF-in-tag / -attribute /
-comment emits as text per HTML5, so content-handler errors don't fire from
`parser.parse(_, true)`, and memory errors fire earlier in `write()`. Tested
implicitly via the shared call path.
@orium orium requested review from a team, Noah-Kennedy and jasnell as code owners May 25, 2026 14:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants