Add opt-in graceful bail-out for memory and content handler errors by orium · Pull Request #320 · cloudflare/lol-html

orium · 2026-05-20T17:38:19Z

Today a MemoryLimitExceeded or ContentHandlerError aborts the rewriter mid-stream and leaves
the downstream response truncated.

This PR adds two opt-in flags:

MemorySettings::graceful_bail_out_on_memory_limit_exceeded
Settings::graceful_bail_out_on_content_handler_error

When set, before returning the error the rewriter flushes every input byte it has received but
not yet emitted to the sink, as-is. The caller can continue the response by writing subsequent
bytes directly to its downstream sink. Both flags default to false: existing behavior is
unchanged.

kornelski

This will need non_exhaustive and semver-major bump.

By default the rewriter aborts processing when `max_allowed_memory_usage` is hit, leaving the output sink in a potentially inconsistent state (i.e. the sink has received whatever transformed bytes the rewriter had already produced, but the remaining input bytes are lost). Downstream this typically manifests as a truncated, broken HTTP response. This change adds an opt-in `MemorySettings::graceful_bail_out_on_memory_limit_exceeded` flag. When set, before returning `MemoryLimitExceededError` the rewriter flushes every input byte it has received but not yet emitted to the sink, as-is. The caller can then continue the response by writing any subsequent input bytes directly to its own downstream sink, bypassing the (now poisoned) rewriter. The resulting response has the rewriter's transformations applied up to some boundary and the original bytes after that, but it is not broken. The flush is wired in at all three memory-failure sites in `TransformStream::write()`/`end()`: 1. `Arena::append()`: the buffered (still-unparsed) bytes from previous calls plus the new chunk can't fit. 2. `Parser::parse()`: the selectors VM stack push exceeds the limit mid-parse. The dispatcher's `remaining_content_start` correctly points at the first byte of the failing lexeme (the limit check runs before `lexeme_consumed()`), so flushing from there preserves everything. 3. `Arena::init_with()`: the leftover unconsumed bytes can't be buffered for the next call. A new `Dispatcher::flush_for_bail_out()` does the actual flushing. Unlike `flush_remaining_input()` it ignores `emission_enabled`, which means a caller that was actively removing element content at the time of the bail-out can end up with mismatched tags (the removed content's start tag was suppressed but the trailing bytes flushed raw may include the matching end tag). The alternative (skipping the flush when emission is disabled) would lose bytes from the input chunk entirely, which the caller can't recover from since it doesn't buffer its input. The caveat is documented on the new field. The C API gains the corresponding `bool graceful_bail_out_on_memory_limit_exceeded` field in `lol_html_memory_settings_t`. Existing C clients using designated initializers continue to work: the field defaults to `false` / `0`. Tested with 10 new tests in `rewriter::tests::fatal_errors`: 1. Three failure-site tests: one per memory-error origin (`test_graceful_bail_out_in_buffer_append`, `_in_buffer_init_with`, `_in_parser`). 2. `test_graceful_bail_out_preserves_prefix_transformations`: verifies handler transformations on tokens processed before the failure survive the bail-out. 3. `test_no_graceful_bail_out_by_default`: sanity check that the flag is opt-in. 4. Five response-reconstruction tests, each verifying `sink_output + unfed_remaining == original_html` for a different real-world high-memory shape: a tag with a huge base64 attribute (the INCIDENT-6638 shape), hundreds of small attributes on one tag, a huge HTML comment, deeply nested elements (selectors VM stack overflow), and many never-closed tags.

This is the symmetric companion to the previous `MemoryLimitExceeded` bail-out. Until now, a content handler returning `Err` would abort the rewriter and leave the output sink in a potentially inconsistent state (the sink has whatever transformed bytes the rewriter had already produced, but the remaining input bytes are lost). Downstream this typically manifests as a truncated, broken HTTP response. This change adds an opt-in `Settings::graceful_bail_out_on_content_handler_error` flag. When set, before propagating `RewritingError::ContentHandlerError` the rewriter flushes every input byte it has received but not yet emitted to the sink, as-is. The caller can then continue the response by writing subsequent bytes directly to its own downstream sink, bypassing the (now poisoned) rewriter. The flag lives on `Settings` rather than `MemorySettings` because the underlying error has nothing to do with memory: a handler returning `Err` is an explicit signal from the application, not an environmental constraint. The two flags are intentionally independent so callers can pick the recovery posture per error class. The implementation requires a small but important refactor of the dispatcher. Previously, `lexeme_consumed()` did two things: emit the chunk of input before the lexeme, and advance `remaining_content_start` past the lexeme. The second step happened before the lexeme's token was actually serialized to the sink, so a handler error from `token_produced()` (or text emission) left `remaining_content_start` past a lexeme whose bytes had never been emitted. Flushing from there as-is would lose the lexeme. We split that into: 1. `emit_chunk_before_lexeme()`: emit the raw input chunk between the previously emitted byte and the start of the lexeme, advance `remaining_content_start` to the lexeme's start. 2. `consume_lexeme()`: advance `remaining_content_start` past the lexeme's end. Called only after the lexeme's token has been successfully emitted. With this split, on a `ContentHandlerError` mid-lexeme `remaining_content_start` still points at the failing lexeme's start, and a graceful bail-out can flush the lexeme + the rest of the chunk raw. The bail-out decision in `TransformStream::write()` and `end()` is centralized in `should_bail_out_for()`, which considers each `RewritingError` variant against its own flag. `ParsingAmbiguity` is never recovered from (the whole point of strict mode is to refuse uncertain markup). The C API is left on the original behavior for now. The previous memory flag fit cleanly into `MemorySettings` (a `#[repr(C)]` struct), so it was an additive change to the C ABI. The new flag lives on `Settings`, which is not `repr(C)`, so exposing it through the C API needs either a new function signature or a new settings struct — a maintainer call. A TODO captures this in `c-api/src/rewriter.rs`, and the CHANGELOG notes it. Tested with 6 new tests in `rewriter::tests::fatal_errors`: 1. `test_graceful_bail_out_on_element_handler_error`: element handler returns `Err`, sink ends up with the transformed prefix + the failing element and everything after it raw. 2. `test_no_graceful_bail_out_on_content_handler_error_by_default`: sanity check that the flag is opt-in. 3. `test_graceful_bail_out_on_comment_handler_error`: comment handler error path (exercises the same `lexeme_consumed` restructure). 4. `test_graceful_bail_out_on_end_handler_error`: `handle_end()` returns `Err` after `flush_remaining_input` has already emitted every input byte; sink already has the complete document. 5. `test_bail_out_reconstruct_handler_error_midstream`: end-to-end reconstruction (`sink_output == original_html`). 6. `test_bail_out_flags_independent`: enabling content-handler bail-out doesn't accidentally enable memory bail-out, and vice versa. Caveats documented on the new field's doc comment: 1. Active content removal at the time of the bail-out can produce mismatched tags. Same caveat as the memory flag; uncommon in practice and not byte-truncating. 2. Multi-chunk text emission where a later chunk's handler errors will re-emit the raw input bytes, duplicating the already-emitted chunks. Rare (only on encoding-converted text); the response is byte-bigger but not truncated.

The two preceding commits added public fields to `MemorySettings` and `Settings`, externally-constructible structs without `#[non_exhaustive]`. `cargo-semver-checks` flags this as requiring a major version bump. Also bumps `js-api` to match. `lol_html_c_api` keeps its own version cadence; the maintainer can bump it separately if desired.

orium requested review from a team, Noah-Kennedy and jasnell as code owners May 20, 2026 17:38

kornelski approved these changes May 20, 2026

View reviewed changes

orium added 3 commits May 20, 2026 19:35

orium force-pushed the memory-limit-better-bail branch from 24faa2d to 402a19a Compare May 20, 2026 20:23

orium merged commit e97d16c into main May 21, 2026
6 checks passed

orium deleted the memory-limit-better-bail branch May 21, 2026 11:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add opt-in graceful bail-out for memory and content handler errors#320

Add opt-in graceful bail-out for memory and content handler errors#320
orium merged 3 commits into
mainfrom
memory-limit-better-bail

orium commented May 20, 2026

Uh oh!

kornelski left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

orium commented May 20, 2026

Uh oh!

kornelski left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants