fix(producer): inline base64 frames in injector to unblock video-heavy renders#1630
fix(producer): inline base64 frames in injector to unblock video-heavy renders#1630jrusso1020 wants to merge 3 commits into
Conversation
…y renders The URL-served frame path (PR #596) hands each injected `<img>` a fileServer URL instead of a base64 data URI, on the theory that shipping a short URL through `page.evaluate` beats shipping a multi-MB base64 string per frame. That holds when the fileServer is otherwise idle. But on video-heavy compositions, the same fileServer also serves every `<video>.src`. The runtime's drift-recovery branch (`runtime/media.ts:294-302`) issues `el.load()` on the underlying `<video>` during seeks, kicking off full-file downloads that occupy the fileServer's single Node event loop (it uses `readFileSync` and offers no `Accept-Ranges`). The injector's `<img>.decode()` then queues behind those video fetches and is never serviced before puppeteer's protocol timeout fires, surfacing as `Runtime.callFunctionOn timed out` in `capture_streaming`. Reproducer (30 × 32 MB videos / 90 s comp / 8-core / 30 GB host): baseline (broken corpus) 537 s render fails baseline (corpus-fixed) 428 s render fails this fix (drop frameSrcResolver) 121 s render succeeds, 69 MB MP4 Control corpus (30 × 1.6 MB / 60 s) shows no regression: 137 s with this change vs ~135 s on \`main\`. The \`createCompiledFrameSrcResolver\` builder and the \`frameSrcResolver\` option stay in the codebase, just unused for now — re-enabling them behind a proper gate ("only use URL-served frames when the page has zero fileServer-bound \`<video>.src\` traffic") is a follow-up. The cache memory ceiling (\`frameDataUriCacheBytesLimitMb\`, default 1500 MB above 8 GB hosts) already bounds the cost of base64 inlining. — Jerrai
jrusso1020
left a comment
There was a problem hiding this comment.
/code-review verdict — leaning approve with two small cleanups + three follow-up notes
Recall pass at xhigh effort: 10 finder angles + cross-file tracer + sweep. No correctness defects survived verification. One initially-promising security finding (path-traversal guard lost) was refuted on code-read — the resolver's null return falls through to fs.readFile() at videoFrameInjector.ts:122, so it was never a guard; just a URL/base64 selector for valid paths.
Findings (8, ranked)
1. Dead code: drop the void call + the orphaned import — THIS PR
renderOrchestrator.ts:1239:
void createCompiledFrameSrcResolver(compiledDir);createCompiledFrameSrcResolver is a pure factory (shared.ts:268-286) with no side effects, and the imported symbol at renderOrchestrator.ts:83 is now only referenced by this no-op call. Either delete both, or — if the intent is to keep the builder reachable for the future-gate follow-up — leave a // eslint-disable / TODO comment so a future reader doesn't strip the void and accidentally drop the call.
Recommendation: delete the void line and remove the import. The builder lives on in shared.ts and the future-gate PR can re-import it then.
2. Untested edge case: 4K frames + the per-worker 1500 MB cache — FOLLOW-UP
Base64-inline puts every active frame's PNG bytes through the LRU cache. For 4K @ ~5 MB/frame and the default frameDataUriCacheBytesLimitMb = 1500 (config.ts:236), the cache fits ~300 frames. A 90 s @ 30 fps 4K comp = 2700 frames → roughly 9× cache turnover with disk re-reads on every backward seek past frame 300. Wall-clock impact not measured.
Recommendation: doesn't block this PR — the workaround restores correctness on the broken shape — but file a follow-up to measure 4K + long-duration before claiming the gate-less default is universally safe.
3. Untested edge case: ≤8 GB hosts get the 256 MB cache cliff — FOLLOW-UP
systemMemory.ts isLowMemorySystem (≤8192 MB) → memoryAdaptiveCacheBytesMb returns 256 MB (config.ts:248-260). At 4K (~5 MB/frame) that fits 38–64 frames — heavy thrashing on anything beyond a short comp. 1080p is fine (~150 KB/frame, ~1700 frames fit, validated by my 30 × 1.6 MB control corpus at 137 s vs 135 s baseline).
Recommendation: also follow-up. Same shape as #2 — the workaround is correct, but the cache budget on small hosts now matters more.
4. Per-worker cache duplication (raised in sweep) — WORTH A LOOK
videoFrameInjector.ts:187 allocates a fresh frameCache per createVideoFrameInjector call. With workerCount=4 and the 1500 MB ceiling each, that's a 6 GB potential frame cache across workers. The existing 8 GB cliff in memoryAdaptiveCacheBytesMb was tuned for a less cache-hungry path; with base64-inline now the only path on the in-process renderer, the per-worker × cache-bytes product is the real budget.
Recommendation: leave for the follow-up that re-introduces the gate; until then, the contention cap at parallelCoordinator.ts:176-180 (≈3 workers on 8-core) keeps the worst-case product in check.
5. Observability gap: frameCache.stats() exists, isn't emitted
videoFrameInjector.ts:138 exposes entries / bytes / evictions / oversizedRejections for telemetry, but no producer codepath logs it. If cache thrashing surfaces in prod on 4K/low-mem hosts, operators have no signal.
Recommendation: drop a one-liner emit at end-of-render. Not blocking.
6. Regression-lock: no test asserts the resolver isn't passed
renderOrchestrator.test.ts has no assertion on the absence of frameSrcResolver in the createVideoFrameInjector opts. Distributed renderChunk.ts:467 independently happens to use inline mode (never passed the resolver), so both call sites are now consistent — but a future contributor reading just the comment and 're-enabling the optimization' would silently re-introduce the bug. fallow audit won't catch it (syntactically valid).
Recommendation: tiny test in renderOrchestrator.test.ts that mocks createVideoFrameInjector and asserts the opts object doesn't carry frameSrcResolver. Optional but cheap.
7. Altitude: shallower fixes weren't explored
The actual trigger is runtime/media.ts:294-302 calling el.load() on <video> elements during seek-drift recovery, which kicks 30 simultaneous full-file fetches at the local fileServer. Two shallower fixes the comment doesn't acknowledge:
- Skip the drift-recovery
el.load()retry for videos that have a__render_frame_<id>__injection sibling (visual comes from the<img>, no audio sync needed on muted videos). - Stream-based fileServer instead of
readFileSync+ no Accept-Ranges. (Range support alone was tried by Home and didn't help on the broken-corpus baseline — that may have been a confound, worth re-running on the corpus-fixed baseline.)
The PR's comment names the gating follow-up but doesn't name these. Worth either trying one of them in a follow-up, OR adding a sentence to the in-source comment noting they were considered.
8. Comment is 27 lines — repro numbers can move to PR body
The in-source comment captures the why well (load-bearing for future-you), but lines 1227-1232 (repro numbers + benchmark setup) duplicate what's in the PR body. Trim to ~12 lines, keep the architectural explanation, link to PR #1630 for the numbers.
Validation summary
- Local CLI: synth 30 × 32 MB / 90 s comp: 537 s → 121 s (4.4×), 30 × 1.6 MB control: 135 s → 137 s parity. ✓
- Dev cluster on the 0.6.122-alpha.0 image (
@hyperframes/engine@alpha):- Distributed chunked path (render
b0e52263, 8 chunks): 19 s wall, clean MP4. ✓ - In-process streaming path via min-frames gate (render
06b3770e, 750 frames < 900): 41 s wall, clean MP4. ✓ — this is the path the fix actually changes.
- Distributed chunked path (render
- Lint/format/typecheck/fallow/commitlint: all green via lefthook pre-commit.
Reviewers — Miguel + Magi: I think #1 is worth folding into this PR (5-line change). #2-#7 are good follow-up issues but shouldn't block merge. WDYT?
— Jerrai
miga-heygen
left a comment
There was a problem hiding this comment.
Independent review — confirming Rames' analysis + one additional observation
Reviewed the diff, the PR description, the createCompiledFrameSrcResolver implementation in shared.ts:268-286, the createFrameSourceCache + createVideoFrameInjector in packages/engine/src/services/videoFrameInjector.ts, and the distributed renderChunk.ts:467 call site.
1. Is the fix correct?
Yes. The contention path is clear and well-documented in both the PR body and the new in-source comment:
createFrameSourceCache.get()atvideoFrameInjector.ts:119callsframeSrcResolver?.(framePath)first. When the resolver is present, it returns a fileServer URL and skips thefs.readFile+ base64 path entirely.- The browser-side
<img>then fetches that URL from the same HonofileServerthat's simultaneously serving<video>.srcdownloads triggered byel.load()from drift-recovery. fileServerusesreadFileSyncwith noAccept-Ranges— a single blocking Node event loop.- Dropping
frameSrcResolvermeansframeSrcResolver?.()returnsundefined, the cache falls through tofs.readFile(Node-side, not competing with the browser's HTTP queue), base64-encodes, and passes the data URI directly throughpage.evaluate. No HTTP round-trip through the congested fileServer.
The fix eliminates the contention by moving frame data delivery from HTTP (browser → fileServer → disk) to IPC (Node fs.readFile → base64 → CDP page.evaluate). This is the correct approach for the in-process path.
2. Is it truly minimal?
Mostly. The actual behavior change is a single-line deletion (frameSrcResolver removed from the options object passed to createVideoFrameInjector). The other changes are:
- 12
package.json+ 3 plugin manifest version bumps (0.6.121→0.6.122-alpha.0) — standard release hygiene, not behavioral. - A 27-line explanatory comment — thorough, perhaps a bit long (Rames' finding #8 is fair: the repro numbers could be trimmed and linked to the PR instead), but architecturally valuable for future readers.
- The
void createCompiledFrameSrcResolver(compiledDir)line — see #4 below.
No collateral behavioral changes. The diff is clean.
3. Risks from base64-inlining?
Bounded but real for edge cases. The frameDataUriCacheBytesLimitMb config (default 1500 MB on >8 GB hosts, 256 MB on ≤8 GB) caps the LRU cache. However:
- 1080p frames (~150 KB base64): negligible. The 30×1.6 MB control corpus shows no regression (137 s vs 135 s). This is the common case and it's fine.
- 4K frames (~5 MB base64): the cache fits ~300 frames at the 1500 MB ceiling. A 90 s @ 30 fps comp = 2700 frames → 9× cache turnover. Rames flagged this as finding #2. I agree it's follow-up, not blocking.
- CDP message size: Puppeteer's CDP transport has no hard limit on
page.evaluateargument size (it's JSON over WebSocket, no protocol cap), but a 5 MB base64 string per frame perpage.evaluatecall adds serialization overhead. For 1080p this is irrelevant (~150 KB). For 4K it's measurable but still sub-second — and the alternative (hanging forever) is worse. - Per-worker cache duplication: Rames' finding #4 is valid. With
workerCount=4, the theoretical ceiling is 4 × 1500 MB = 6 GB of frame cache. TheparallelCoordinatorcaps workers at ~3 on 8-core, so ~4.5 GB worst case. Not great on 8 GB hosts, but the ≤8 GB path already drops to 256 MB/worker. Acceptable for now.
No blocking risk for merge. The base64 path was already the production path for the distributed renderer (renderChunk.ts:467 never passed frameSrcResolver), so this change makes the in-process path consistent with what's already running in production at scale.
4. Dead code cleanup
Should be cleaned up in this PR. I agree with Rames' finding #1:
void createCompiledFrameSrcResolver(compiledDir); // line 1239createCompiledFrameSrcResolver is a pure factory function — resolve(compiledDir) is its only operation, and the returned closure is discarded by void. No side effects. The import at line 83 is now dead. Both should be removed.
The void call reads as "I'm keeping this around intentionally" but without an // eslint-disable or // TODO(gate) annotation, it looks like an oversight to anyone who doesn't read the comment. The builder still lives in shared.ts — the future gating PR can re-import it. Removing 2 lines is cleaner than leaving dead code that needs a comment to justify its existence.
5. Additional observations beyond Rames' review
5a. The comment is accurate but could be tighter. Lines 1228-1232 (repro numbers) duplicate the PR body's test plan table. I'd trim to ~15 lines: keep the architectural explanation (fileServer contention, readFileSync, no Accept-Ranges, the el.load() trigger), drop the benchmark numbers (link to this PR instead), and keep the "until properly gated" note. This is style, not blocking.
5b. The distributed path was already safe — worth calling out. renderChunk.ts:467 already omits frameSrcResolver, meaning every distributed/chunked render was already using base64-inline. This PR brings the in-process executeRenderJob path into alignment. The PR body mentions this but it's worth emphasizing: this isn't introducing a new pattern, it's making the in-process path consistent with what's already proven in production.
5c. No test regression risk from removing the option. frameSrcResolver is typed as optional (frameSrcResolver?: ...) in VideoFrameInjectorOptions. The createFrameSourceCache function handles undefined gracefully via optional chaining (frameSrcResolver?.(framePath)). No type errors, no runtime risk.
Verdict
Leaning approve with one small cleanup (#4: remove the void call + dead import). The fix is correct, minimal, well-documented, and aligns the in-process renderer with the already-production distributed path. The follow-up items (cache budget on 4K/low-mem hosts, proper gating, regression test) are valid but don't block this.
— Review by Miga
…nder orchestrator Followup to the previous commit. The void-call and the `createCompiledFrameSrcResolver` import in `renderOrchestrator.ts` were left behind as a no-op breadcrumb for the future gating PR. Code review (PR #1630) correctly flagged this as dead code — the builder is a pure factory with no side effects, so calling it and discarding the result is just wasted CPU. Remove both and explain in the in-source comment where the builder still lives, so the gating PR knows where to re-import from. — Jerrai
What
Drop the
frameSrcResolverarg from thecreateVideoFrameInjectorcall site inrenderOrchestrator.ts. That stops the injector from pointing<img>srcs at fileServer URLs and forces it back to base64-inline data URIs.One-line code change. The
createCompiledFrameSrcResolverbuilder + theframeSrcResolveroption stay in the codebase, just unused — re-enabling them behind a proper gate is a follow-up.Why
PR #596 added URL-served frames as a speedup: hand the injector a fileServer URL instead of base64-inlining each ~1 MB frame through
page.evaluate. That holds when the fileServer is otherwise idle.But on video-heavy compositions the same fileServer also serves every
<video>.src. The runtime's drift-recovery branch (runtime/media.ts:294-302) issuesel.load()on the underlying<video>during seeks, kicking off full-file downloads that occupy the fileServer's single Node event loop (it usesreadFileSyncand offers noAccept-Ranges). The injector's<img>.decode()then queues behind those video fetches and is never serviced before puppeteer's protocol timeout fires, surfacing asRuntime.callFunctionOn timed outincapture_streaming.This is the failure shape OSS user @daybreakdeath hit on a 24 GB WSL host, and the failure Rahino sees on heavy real-world comps.
How
Per-phase instrumentation localised the hang. Frame 0 succeeds (~290 ms total, beforeCapture 199 ms). Frame 1 hangs in
session.onBeforeCapturefor the full 300 s protocol timeout. Splitting the injector hook's fourpage.evaluatecalls (syncVideoFrameVisibility,injectVideoFramesBatch,__hfReseekGpu,redrawRuntimeColorGrading) showedinjectVideoFramesBatchwas the one that never returned. That function ends withawait Promise.all(pendingDecodes)where eachpendingDecodeisimg.decode()on an<img>whose newsrcis a fileServer URL. With 30 concurrent<video>.srcdownloads already in flight, the PNG fetch never returns.Disabling the resolver (force base64 inline mode) —
dataUribecomesdata:image/png;base64,...of 137-284 KB — collapses the hang to a 5-50 msinjectVideoFramesBatchcall.Test plan
Repro corpus: synth 30 × 32 MB videos / 90 s comp on an 8-core / 30 GB host.
window.__timelines+ no<video id>)Runtime.callFunctionOn timed out)frameSrcResolver)Control corpus: synth 30 × 1.6 MB / 60 s — 137 s with this change vs ~135 s on
main, no regression on the no-bottleneck case.ffprobe on the heavy-comp output confirms 90.000 s duration, 1920×1080, 30 fps, h264 High profile, mid-clip frames extract to legitimate 880 KB PNGs (not black/empty).
frameSrcResolverre-enable behind "no fileServer-bound<video>.srctraffic" check, so the PR fix: speed up video frame injection renders #596 optimization still pays on the comps it was designed forNotes
createCompiledFrameSrcResolverand the option are untouched in the codebase — this change is the single arg deletion at the producer call site.main, per James's standing request for OSS-heavy changes.Authored by Jerrai (Rames team).