Skip to content

feat(cli): opt-in parallel batch rendering (#62)#64

Merged
jbrecht merged 3 commits into
mainfrom
feat-62-render-concurrency
Jun 16, 2026
Merged

feat(cli): opt-in parallel batch rendering (#62)#64
jbrecht merged 3 commits into
mainfrom
feat-62-render-concurrency

Conversation

@jbrecht

@jbrecht jbrecht commented Jun 16, 2026

Copy link
Copy Markdown
Owner

Closes #62. Implements the one performance lever the baseline audit recommended.

What

Render multiple tutorial × language jobs concurrently via --render-concurrency <n> (or renderConcurrency in forge.config.ts). Default is 1 (serial) — today's behavior, byte-for-byte.

Because the record phase is mostly real-time waiting (each step holds for its narration; the CPU is near-idle), a batch — e.g. regenerating a whole tutorial set like umami's — runs near-linearly faster in parallel. No change to any rendered output.

How

  • The nested for tutorial { for lang { await render } } loop is flattened to a job list run through the existing mapLimit primitive (now exported from core). mapLimit(jobs, 1, …) reproduces the old serial, ordered, fail-fast behavior exactly.
  • config.renderConcurrency (zod-validated positive int) + the CLI flag; resolveRenderConcurrency clamps bad/missing values to 1, so the worst case is serial.

Opt-in by design

Concurrency > 1 requires a parallel-safe adapter — concurrent renders each run their own setup/teardown against your app, so a shared seed DB must isolate per render. There's no fork here to decide: tutorial-forge can't isolate your DB, so it offers the knob + documents the contract (new adapters.md → Parallel rendering section). Whether umami's adapter can use it is an umami-side concern. TTS cache is already concurrency-safe (atomic writes).

Tests

mapLimit unit tests (run in CI): respects the concurrency cap, preserves result order, and — the key guarantee — limit 1 is strictly sequential and fail-fast, so the default path is provably unchanged. 179 core tests pass; build/typecheck clean across all packages.

Coverage note: the CLI batch loop is thin glue over mapLimit + render() (build/type-checked); the concurrency engine itself is unit-tested. I didn't add a live multi-render test because the example app's adapter isn't guaranteed parallel-safe — exactly the contract this documents.

Semver

Adds a CLI flag + config option + the mapLimit export (all additive) → next release is a minor (0.13.0), bundling this with #50 and #49.

🤖 Generated with Claude Code

jbrecht and others added 3 commits June 15, 2026 21:56
Render multiple tutorial×language jobs concurrently via
--render-concurrency <n> (or renderConcurrency in config); default 1
keeps today's serial, fail-fast behavior exactly. The record phase is
mostly real-time waiting (the CPU is near-idle), so a batch regen runs
near-linearly faster in parallel — the single biggest perf lever from
the baseline audit, and the only one that changes no output.

Opt-in because concurrency > 1 requires a parallel-safe adapter:
concurrent renders each run their own setup/teardown, so a shared seed
DB must isolate per render. Documented the contract in adapters.md.

- Flatten the nested tutorial/lang loops into a job list run through
  the existing mapLimit primitive (now exported from core).
- mapLimit unit tests: cap respected, order preserved, limit 1 = strict
  serial fail-fast (proves the default path is unchanged).
- config.renderConcurrency (zod-validated) + CLI flag.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…e semantics (#62)

Address code review on PR #64:
- mapLimit: add a concurrency>1 throw-rejects test (now load-bearing
  public API) and an empty-input case.
- adapters.md: note that a failure under concurrency stops scheduling
  new renders but lets in-flight ones finish.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
From the performance-engineer and qa-engineer reviews of PR #64 — three
real correctness fixes plus coverage:

- **mapLimit fail semantics (qa).** On error, stop scheduling new items
  AND await the in-flight ones before rejecting (was: Promise.all
  rejected immediately, leaving started renders running orphaned past the
  call). Now matches the documented "in-flight finish, no new scheduling".
- **Dedupe languages (qa).** `--lang "es,es"` produced two jobs writing
  the same .forge/<id>.es dir — a serial overwrite before, simultaneous
  corruption under concurrency. Extracted a pure `buildRenderJobs` that
  de-dupes langs before flattening.
- **TTS cache temp-file race (performance-engineer).** Temp paths were
  derived only from the content hash, so two concurrent renders
  synthesizing the same key (same-language batch regen) could clobber
  each other mid-write. Make the temp suffix unique per call (pid+uuid);
  the final rename stays atomic.

Coverage (qa): add a CLI vitest harness (resolveRenderConcurrency table +
buildRenderJobs construction/dedupe/order), extend mapLimit tests with
the stop-on-error partial-output semantics, and flip CI to `pnpm -r test`
so the CLI package's logic is actually gated.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@jbrecht

jbrecht commented Jun 16, 2026

Copy link
Copy Markdown
Owner Author

Review round complete — all three reviewers + measured verdict

Following code-reviewer (correctness), performance-engineer (delivery), and qa-engineer (coverage):

Three real fixes landed (commit c8793d5)

  • mapLimit fail semantics — on error, now awaits in-flight items before rejecting (was orphaning started renders past the call); matches the documented "in-flight finish, no new scheduling".
  • Lang dedupe--lang "es,es" no longer produces two jobs writing the same work dir (corruption under concurrency). Extracted pure buildRenderJobs.
  • TTS cache temp-file race — unique per-call temp paths so concurrent renders synthesizing the same key (same-language batch regen) can't clobber each other.

Coverage

CLI vitest harness added (resolveRenderConcurrency + buildRenderJobs), mapLimit stop-on-error semantics pinned, and CI flipped to pnpm -r test so the CLI package's logic is now gated (it wasn't before). 190 unit tests across core+cli.

Measured speedup (the perf-engineer's sandbox couldn't run renders; I did)

2-job batch (getting-started ×2), warm cache, this machine:

wall-clock
serial 122.8s
--render-concurrency 2 63.1s
speedup 1.95×

Output is byte-identical (durations Δ=0ms) — concurrency changes only when jobs run. Near-linear because the record phase (~92%, idle-bound) overlaps cleanly and the ~2s encode barely contends.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Cross-tutorial render parallelism for batch regeneration

1 participant