fix(engine): tune VP9 cpu-used across render paths#1614
Conversation
|
Preview deployment for your docs. Learn more about Mintlify Previews.
💡 Tip: Enable Workflows to automatically generate PRs for you. |
james-russo-rames-d-jusso
left a comment
There was a problem hiding this comment.
Reviewed at 3b296d8571d0bf7f8db05863ee70f04bbe718a58. The threading is clean: single normalizeVp9CpuUsed helper, shared between streaming and disk/chunk; engine-config normalization runs at both the env-read entry point and the final merge; distributed flow persists the resolved value into meta/encoder.json so workers reproduce the controller's choice. Findings + a few open questions for the author below.
Concerns
- Stale
-cpu-used 2docs —docs/deploy/migrating-to-hyperframes-lambda.mdx:76,78still describes distributed renders as-cpu-used 2. Both lines are factually wrong after this PR (controller now persists4by default, propagated to workers via plan). Worth updating in this PR so the new default + env var land documented end-to-end, not just in the producer README + perf guide. - Stale test-fixture description —
packages/producer/tests/distributed/webm-vp9/meta.json:3description string says-cpu-used 2. Fixture data isn't affected, but the description is now misleading for anyone reading the harness intent. - Sibling VP9 encoders still hardcoded — two other libvpx-vp9 encode paths don't pick up the new tunable. Both are out of the
#1606happy path so I'd expect deliberate scope discipline, but flagging for the record:packages/cli/src/background-removal/pipeline.ts:153— webm encode with no-cpu-usedat all.packages/producer/src/services/animatedGifPrep.ts:223— animated-GIF→WebM transcode, no-cpu-used.- The two decode-side
libvpx-vp9callsites (snapshot.ts,transparency-test.ts) are decoder-selection, not in scope.
- Plan-replay byte-identity drift on old planDirs — the prior closed-GOP rationale comment explicitly said
-cpu-used 2was load-bearing for planHash determinism across worker images. PlanHash is still deterministic because canonicalencoder.jsonbytes drive it (old plans hash the same; new plans embedvp9CpuUsed: 4). But an old plan replayed by a new worker now encodes at cpu-used4instead of2, so the chunk bytes differ from what the original controller would have emitted. If anything downstream compares chunk bytes across replays (cache validation, retry-on-failure expecting byte-identity), worth confirming this is fine. If nothing does, a one-line note onfreezePlan.ts:56-61saying "old planDirs replay at the new default, not the historical2" would save a future debugger an hour.
Nits
EncodeStageInput.engineConfigtype becamePick<…, "ffmpegEncodeTimeout"> & Partial<Pick<…, "vp9CpuUsed">>(encodeStage.ts:89-90). Reads cleanly but the mixed-required/optional Pick is a small papercut; a singlePick<EngineConfig, "ffmpegEncodeTimeout" | "vp9CpuUsed">withvp9CpuUsedalready beingnumber(always-resolved in the config layer) would let you drop the Partial. Non-blocking.vp9Options.tsis tiny and lives underservices/, but is really a generic encoder-option helper rather than a service. Filename pattern (utils/or co-locating withffmpegBinaries.ts) is the more common shape in this package. (nit)
Questions
- Default
4: the perf-guide measurement table (8s 1280×720 @ 15fps, SSIM 0.998611, PSNR 50.5 dB at +8.4% size) is on a single composition. Defaulting toward speed across all production renders — including long-form non-overlay content — is a reasonable call given #1606's framing (overlays, 3-10s, 15fps), but I want to make sure the choice is intentional and not just a one-clip-driven default. Is the assumption "all current production WebM is overlay-shaped"? If long-form WebM is on the roadmap, would a CRF-conditional default (e.g.cpu-used 4at draft,2at high-quality preset) be cleaner than a single global default? Happy to defer if you've already considered this. - The
producer/README.mdchange advertisesPRODUCER_VP9_CPU_USED(-8to8). Is there a parallel CLI flag worth adding (--vp9-cpu-used)? Right now the only knob is env-var, which means SDK / CLI users have to set process env to tune. Not a blocker — env-var-only matches the issue's ask — but a--vp9-cpu-usedflag would close the loop.
What I didn't verify
- I didn't try a real ffmpeg invocation against the new args end-to-end on this branch; relied on the unit tests + your devbox measurements.
- I didn't trace whether Lambda / Cloud Run deployment configs need an explicit env passthrough — assumed
process.envinheritance from the container env is the standard pattern there.
Overall this is a clean threading job for the in-scope happy path. Nothing here is blocking from my side; the stale docs + the planDir-replay note feel like the items most worth picking up before merge.
— Rames D Jusso
vanceingalls
left a comment
There was a problem hiding this comment.
Reviewed at 3b296d8. Concur with @james-russo-rames-d-jusso — clean threading job, single shared normalizeVp9CpuUsed helper, defense-in-depth normalization at both env-read AND final merge, distributed flow correctly persists the resolved value into meta/encoder.json so workers reproduce the controller's choice.
Verified at HEAD (independent corroboration of Rames's structural pass):
packages/engine/src/services/vp9Options.ts:1-13—normalizeVp9CpuUsedhandlesundefined/NaN/non-integer/out-of-range viaMath.trunc+ clamp-8..8.appendVp9CpuUsedArgnormalizes on every push, so undefined callers get default4. Both safe.packages/engine/src/config.ts:284-289,402-411— env reader returnsDEFAULT_CONFIG.vp9CpuUsedonly when raw isundefined/""; garbage strings go throughnormalizeVp9CpuUsed(Number("fast"))→NaN→ default. Final merge re-normalizesmerged.vp9CpuUsedso a badoverrides.vp9CpuUsedis also caught.packages/engine/src/services/chunkEncoder.ts:271-294—appendVp9CpuUsedArgcorrectly placed insidecodec === "vp9"branch only. Closed-GOPlockGopVp9path no longer pushes its own-cpu-used 2literal; relies on the unconditional emit above. Verified no double-emit. Anti-band-aid: the hardcoded2was actually rewired, not left as a parallel knob.packages/engine/src/services/streamingEncoder.ts:303-306— same pattern, gated tocodec === "vp9".- Distributed plumbing:
plan.ts:577,597,961→buildLockedRenderConfigwrites normalized value;renderChunk.ts:638-641readsencoder.vp9CpuUsedfrom planDir's encoder.json (no re-normalization — controller already wrote normalized);encodeStage.ts:89-90,292threads through. Correct end-to-end. - HDR mutual-exclusion: HDR callers (
captureHdrStage.ts:284-294and friends) don't passvp9CpuUsed, butgetEncoderPreset(..., hdr)returnscodec: "h265"nevervp9so the absence is correct (HDR + VP9 are mutually exclusive in the preset table). Not a gap.
Concur with @james-russo-rames-d-jusso:
- Stale
-cpu-used 2references atdocs/deploy/migrating-to-hyperframes-lambda.mdx:76,78— both call-outs are factually wrong on post-mergemain. Worth updating in this PR so docs land end-to-end. - Stale
packages/producer/tests/distributed/webm-vp9/meta.json:3description — same. - Sibling VP9 encode paths NOT in scope: verified two genuinely-out-of-scope-from-the-fix:
packages/producer/src/services/animatedGifPrep.ts:223—buildAnimatedGifTranscodeArgsemits libvpx-vp9 with-deadline good -crf 18 -b:v 0and no-cpu-used→ runs at libvpx's default (1for-deadline good), the slowest setting. If the animated-GIF→WebM transcode is perf-hot in production, this is the issue #1606 problem in a different file.packages/cli/src/background-removal/pipeline.ts:153— webm encode with-deadline good -row-mt 1and no-cpu-used. Same shape.- Confirmed the two decode-side
libvpx-vp9callsites (snapshot.ts:41,transparency-test.ts:97) are-c:v libvpx-vp9BEFORE-i(decoder selection), not in scope.
- Plan-replay byte-identity drift on OLD planDirs (new workers encode at
4instead of historical2) — concur, a one-linefreezePlan.ts:56-61note saves a future debugger an hour. - Default-of-4 question — concur; the perf-guide benchmark is one composition. A CRF-conditional default (
cpu-used 4at draft,2at high-quality preset) would be cleaner if long-form WebM joins the roadmap, but not a blocker. - CLI flag follow-on (
--vp9-cpu-used) — fair non-blocking ask; env-var-only matches the issue but closes the loop for SDK/CLI tuners.
Net-new (not in Rames):
-
planHashbaseline shifts for ALL codecs, not just VP9.buildLockedRenderConfigatplan.ts:597unconditionally writesvp9CpuUsed: normalizeVp9CpuUsed(...)for every distributed plan — including mp4 (h264/h265), mov (prores), and png-sequence. The value is4(not undefined), sostripUndefineddoesn't drop it, and the canonical encoder.json bytes that feed planHash now includevp9CpuUsed: 4for codecs that never read the value.Operational consequence: every distributed plan-hash baseline shifts after this PR, not just webm ones. Any downstream system keying off planHash (CI cache, retry-on-planHash, customer-facing render caches) sees a 100% cache miss on first replay against the new bundle for h264/h265/prores deployments too. This widens the cache-bust surface beyond what the PR body implies (which reads as "VP9 path threading"). Cheap fix: only emit
vp9CpuUsedinto LockedRenderConfig whenpixelFormat/encoderis the VP9 path — gates the cache-bust to deployments actually using VP9. Either that or a one-line note in the PR body so the operational migration cost isn't a surprise.This is structurally adjacent to Rames's plan-replay byte-drift finding but distinct in scope: Rames flagged divergence at the FFmpeg-args level for old VP9 plans replayed against new workers. This finding is one layer above — the LockedRenderConfig-level baseline shift hits every codec's planHash because the field is unconditionally populated. NIT-level (deployment-mechanical, not a functional bug), but worth knowing.
-
-cpu-usedclamp range vs-deadline realtime. Whenpreset === "ultrafast"the encoders emit-deadline realtimeAND a derived-cpu-usedfrom the same clamp. libvpx-vp9 accepts-16..16under realtime deadline (vs-8..8undergood/best). The PR's clamp pegs at8for both — ultrafast realtime callers can't reach16. Almost certainly fine (8is already very fast and the public docs commit to-8..8), but worth knowing the clamp is intentionally conservative for the realtime regime. Soft NIT. -
engineCfgfallback atencodeStage.ts:254—input.engineConfig ?? job.config.producerConfig ?? resolveConfig(). ThePartial<Pick<…, "vp9CpuUsed">>type means the first two fallback rungs could resolve to objects withoutvp9CpuUsed, in which case the helper's undefined-handling silently defaults to4. Functionally correct but relies on the helper rather than the type system. Tightening the type as Rames suggested (drop thePartial, since config-layer always resolvesvp9CpuUsed) also tightens this fallback contract. Same conclusion as Rames's nit, different framing.
Cross-ref to PR body:
- "Threaded through every VP9 encoder path" — true within the render pipeline (streaming, disk/chunk, distributed plan+worker). False if read broadly:
animatedGifPrep.tsandbackground-removal/pipeline.tsare also producer-package VP9 encode paths that don't pick up the tunable (concur with Rames's flag). Worth a one-line scope qualifier in the body even if intentionally scoped to render. - "Distributed plans persist the resolved value into
meta/encoder.jsonso workers reproduce the controller's choice" — verified end-to-end viabuildLockedRenderConfig→freezePlancanonical write →renderChunkreadingencoder.vp9CpuUsed. Correct. - Devbox benchmark numbers (6279ms → 2646ms, SSIM 0.998611) — surfaced in both PR body AND
docs/guides/performance.mdx:111-113. Not independently re-measured.
CI state at HEAD: all required checks green (Lint, Typecheck, Build, Test, Format, CodeQL × 3, Semantic PR title, Fallow audit, CLI smoke, runtime contract, SDK unit+contract+smoke, Mintlify, player-perf, preview-regression, Windows tests, Studio load smoke). regression-shards 6/8 green, 2/8 still in_progress (no failures). Skipped jobs are expected for this surface (perf matrix shards, preview parity, two of three Preflight variants).
Verdict: NIT. Threading job is clean; no band-aid pattern (the hardcoded 2 was actually rewired, single shared writer, normalization disciplined). Nothing blocks merge from my side. The strongest pickup item is the planHash-shift-for-all-codecs (operational migration cost is wider than implied), followed by the stale docs + planDir-replay note Rames flagged. Defer the per-quality-default question to Miguel.
Review by Via
|
Review feedback addressed in
Addressed: updated the Lambda migration guide and distributed WebM fixture description to the new default
Addressed:
Addressed:
Addressed: new WebM planDirs persist their resolved value, but old distributed WebM planDirs with no
Addressed:
Addressed: added
Addressed in docs/PR body: default Verification is listed in the updated PR body. New CI is running on |
vanceingalls
left a comment
There was a problem hiding this comment.
Re-reviewed at 7767d117. Verified each disposition item at HEAD — all addressed cleanly. LGTM from me on this revision.
Verified at HEAD:
- planHash gating (my R1 net-new):
plan.ts:585-586now emitsvp9CpuUsedonly whenencoder === "libvpx-vp9-software". Non-VP9 distributed plan baselines (h264/h265/prores/png-sequence) no longer shift. Exactly the right scope. - OLD planDir replay compat (Rames R1 item 4):
renderChunk.ts:299-305resolveLockedVp9CpuUsed(lockedEncoder, ...)returnslockedEncoder.vp9CpuUsed ?? LEGACY_DISTRIBUTED_VP9_CPU_USED. Comment names the invariant: "Pre-vp9CpuUsed WebM planDirs used the old closed-GOP literal. Keep replay [historical]." PlusfreezePlan.ts:61-63documents the compat field. Clean — old planDirs replay at historical2, new planDirs use their persisted value. - Sibling VP9 encoders (Rames R1 item 3 + my concur):
animatedGifPrep.ts:230emits-cpu-used;background-removal/pipeline.ts:163emits-cpu-used. Both wired through the shared engine default. The "two other libvpx-vp9 encode paths that don't pick up the tunable" gap is closed. - Stale docs (Rames R1 items 1+2):
docs/deploy/migrating-to-hyperframes-lambda.mdx:76now reads-cpu-used 4 by default;packages/producer/tests/distributed/webm-vp9/meta.jsondescription updated to match. Plus the background-removal guide caught the same treatment as a bonus. - CLI flag
--vp9-cpu-used(Rames R1 question):packages/cli/src/commands/render.ts:228defines the option;:582-588validates and threads. Closes the env-var-only loop for SDK/CLI tuners. engineConfigtype ergonomics (Rames R1 + my soft NIT):encodeStage.ts:89nowPick<EngineConfig, "ffmpegEncodeTimeout" | "vp9CpuUsed">—Partial<…>dropped. Both fields required when the input is provided; defaults applied at the resolution layer. The "fallback atencodeStage.ts:254relies on helper's undefined-handling rather than type system" concern is now also closed structurally.- Default
4+ realtime-clamp questions: defended in PR body —4is intentional for the transparent-overlay WebM cohort; quality-sensitive paths documented to tune down. Clamp-8..8kept because the config is global across presets while16is only valid under-deadline realtime. Defensible answer.
No remaining concerns from my side. Per the bot-authored protocol Rames flagged in his R1, stamp routing through @james-russo-rames-d-jusso's team is the documented path; my role is co-reviewer.
CI at HEAD: all non-shard required checks green (Build, Test, Typecheck, Format, Semantic PR title, Fallow, CLI smoke, runtime contract, SDK suite, preview-regression, player-perf, Studio load smoke). Regression shards still in_progress on the new commit; @miguel-heygen / Magi babysitting per the thread.
Review by Via
james-russo-rames-d-jusso
left a comment
There was a problem hiding this comment.
R2 verification at 7767d117360e0ae6152dc1844c7f04a0a1585891 — all 5 R1 findings (mine + Via's planHash baseline shift) ✅ resolved with file:line evidence + pinning tests. Strong convergence with Via's R2.
| # | R1 Finding | R2 | Evidence |
|---|---|---|---|
| 1 | Stale -cpu-used 2 in migrating-to-hyperframes-lambda.mdx |
✅ | migrating-to-hyperframes-lambda.mdx:76 now 4; same fix at docs/guides/remove-background.mdx:336,392. |
| 2 | Stale fixture description | ✅ | packages/producer/tests/distributed/webm-vp9/meta.json:3 reads cpu-used 4 by default. |
| 3 | Sibling VP9 encoders untouched | ✅ | pipeline.ts:163-164 + animatedGifPrep.ts:230-231 emit "-cpu-used", String(DEFAULT_VP9_CPU_USED) from shared vp9Options.ts. New tests pin both. |
| 4 | Plan-replay drift on pre-PR planDirs | ✅ | renderChunk.ts:299-306 resolveLockedVp9CpuUsed: lockedEncoder.vp9CpuUsed ?? LEGACY_DISTRIBUTED_VP9_CPU_USED (=2 at :96), gated to libvpx-vp9-software. 3 unit tests at renderChunk.test.ts:455-466 pin new=4 / legacy=undefined→2 / non-VP9=undefined. |
| 5 | (Via) buildLockedRenderConfig planHash baseline shift |
✅ | plan.ts:584-587 conditionally spreads vp9CpuUsed only for libvpx-vp9-software. plan.test.ts:512 pins expect(encoder).not.toHaveProperty("vp9CpuUsed") for h264. h264/h265/prores/pngseq plans byte-identical to pre-PR baseline → planHash stable. |
Sibling-sweep on new code:
- Item 4
LEGACY_DISTRIBUTED_VP9_CPU_USED = 2is a file-local constant insiderenderChunk.ts, not on sharedvp9Options.tssurface. Defensible (distributed-replay-specific compat, not engine-default policy), worth a one-line comment historicizing the value for future engineers. - Item 5 non-VP9 leak path:
renderChunk.ts:651falls back tocfg.vp9CpuUsed(engine runtime config) whenresolveLockedVp9CpuUsedreturns undefined. For non-VP9 plans this means a non-undefined value gets passed inengineConfig.vp9CpuUsed. Verified safe:streamingEncoder.ts:307+chunkEncoder.ts:274only emit-cpu-usedinside the libvpx-vp9 codec branch (:304/:271), so the value is silently ignored on h264/h265/prores/pngseq. No FFmpeg args drift, no encode-bytes drift, no planHash drift.
Bonus scope: CLI flag --vp9-cpu-used at commands/render.ts:228 with validation + Docker forwarding (matches my R1 question — yes, worth doing). EncodeStageInput.engineConfig tightened to Pick<EngineConfig, "ffmpegEncodeTimeout" | "vp9CpuUsed">, both required.
Net verdict: Clean R2. PR-comment disposition matches actual diff (no claim/diff mismatch). Mergeable. Per CLAUDE.md bot-authored protocol — holding stamp pending James's explicit go-ahead.
— Rames D Jusso
Summary
Transparent WebM renders now pass a normalized
-cpu-usedvalue tolibvpx-vp9across the render pipeline: streaming encode, disk/chunk encode, distributed WebM plan+worker encode, animated-GIF-to-WebM prep, and the background-removal WebM encoder. The default is4, withPRODUCER_VP9_CPU_USEDfor deployments andhyperframes render --vp9-cpu-usedfor one-off local/ Docker CLI renders.Fixes #1606.
Findings
The slow path in #1606 was not just missing one FFmpeg flag in one call site. The default WebM render path uses streaming encode, and that VP9 path never set
-cpu-used, so FFmpeg/libvpx fell back to its slower default. The chunked path had a separate hardcoded-cpu-used 2only for closed-GOP concat, which made VP9 tuning inconsistent across render modes.This PR moves the setting into engine config, normalizes it once, and threads it through the VP9 encode surfaces that produce WebM output. Distributed WebM plans persist the resolved value into
meta/encoder.jsonso workers reproduce the controller's choice; non-VP9 distributed plans omit the field so h264/h265/ProRes/png-sequence plan hashes do not shift. Older distributed WebM planDirs that lackvp9CpuUsedstill replay with the historical closed-GOP value2.The default
4is intentionally speed-biased for the current transparent-overlay WebM cohort. Docs call out lowering the value for quality-sensitive or long-form WebM.Local devbox encode check, 8s 1280x720 at 15fps VP9 alpha, CRF 18,
deadline=good:-cpu-used 2-cpu-used 4Verification
PATH=/home/ubuntu/.nvm/versions/node/v22.22.2/bin:$PATH bun run --filter @hyperframes/engine test -- src/config.test.ts src/services/chunkEncoder.test.ts src/services/streamingEncoder.test.ts— 137 passedPATH=/home/ubuntu/.nvm/versions/node/v22.22.2/bin:$PATH bun test packages/producer/src/services/distributed/plan.test.ts --test-name-pattern "plan\\(\\) — (codec knob|webm format)"— 7 passedPATH=/home/ubuntu/.nvm/versions/node/v22.22.2/bin:$PATH bun test packages/producer/src/services/distributed/renderChunk.test.ts --test-name-pattern "resolve(LockedVp9CpuUsed|PresetForLockedEncoder)" --timeout 30000— 7 passed; the file's host Chrome smoke soft-skipped after timeout as expected on this hostPATH=/home/ubuntu/.nvm/versions/node/v22.22.2/bin:$PATH bun run --filter @hyperframes/cli test -- src/background-removal/pipeline.test.ts src/utils/dockerRunArgs.test.ts src/commands/render.test.ts— 79 passedPATH=/home/ubuntu/.nvm/versions/node/v22.22.2/bin:$PATH bun test packages/producer/src/services/animatedGifPrep.test.ts --test-name-pattern "buildAnimatedGifTranscodeArgs"— 2 passedPATH=/home/ubuntu/.nvm/versions/node/v22.22.2/bin:$PATH bun run --filter @hyperframes/engine typecheckPATH=/home/ubuntu/.nvm/versions/node/v22.22.2/bin:$PATH bun run --filter @hyperframes/producer typecheckPATH=/home/ubuntu/.nvm/versions/node/v22.22.2/bin:$PATH bun run --filter @hyperframes/cli typecheckPATH=/home/ubuntu/.nvm/versions/node/v22.22.2/bin:$PATH bun run lintenv -u GIT_DIR -u GIT_INDEX_FILE -u GIT_WORK_TREE PATH=/home/ubuntu/.nvm/versions/node/v22.22.2/bin:$PATH bunx fallow audit --base origin/main --fail-on-issuesPATH=/home/ubuntu/.nvm/versions/node/v22.22.2/bin:$PATH bunx oxfmt --check $(git diff --name-only --diff-filter=ACM)git diff --check-cpu-used 4; config probe confirmed default4, override2, and env clamp to8.