DemchaAV · DemchaAV · Jun 15, 2026 · Jun 14, 2026 · Jun 14, 2026 · Jun 14, 2026
@@ -208,6 +208,12 @@ jobs:
       - name: Compile benchmarks module
         run: ./mvnw -B -ntp -f benchmarks/pom.xml clean compile
 
+      - name: Run deterministic benchmark gates
+        # Fast, machine-independent unit/gate tests (image-cache reuse,
+        # render-operator coalescing, scenario/threshold coverage, diff tooling).
+        # Catches structural regressions the timing smoke run cannot.
+        run: ./mvnw -B -ntp -f benchmarks/pom.xml test
+
       - name: Run coarse performance smoke benchmark
         run: |
           ./mvnw -B -ntp -f benchmarks/pom.xml -DskipTests \
@@ -223,6 +229,14 @@ jobs:
           path: benchmarks/target/benchmarks/current-speed/**
           if-no-files-found: ignore
 
+      - name: Upload benchmark gate reports
+        if: always()
+        uses: actions/upload-artifact@v7
+        with:
+          name: benchmark-gate-reports-${{ github.run_id }}
+          path: benchmarks/target/surefire-reports/**
+          if-no-files-found: ignore
+
   benchmark-diff:
     name: Weekly Benchmark Diff
     if: github.event_name == 'schedule' || github.event_name == 'workflow_dispatch'

@@ -37,6 +37,7 @@ build/
 ### Mac OS ###
 .DS_Store
 /logs/
+benchmarks/logs/
 /CV_Generated.pdf
 *.pdf
 # Allow PDF previews that are committed README assets.

@@ -337,6 +337,55 @@ Entries land here as they merge.
 
 ### Internal
 
+- **Benchmark suite cleanup (not shipped).** Removed three redundant
+  benchmark mains: `FullCvBenchmark` (superseded by the JMH
+  `TemplateCvJmhBenchmark`), `GraphComposeBenchmark` (early-engine relic
+  duplicating `CurrentSpeedBenchmark`'s `engine-simple` scenario), and
+  `ScalabilityBenchmark` (its thread-scaling sweep folded into
+  `CurrentSpeedBenchmark`'s full-profile throughput run, now `1,2,4,8,16`).
+  Dropped the matching `run-benchmarks.ps1` steps and doc entries.
+- **Feature-object benchmarks for the v1.8 vector surface (not shipped).**
+  The suite previously exercised only text/table primitives. Added JMH render
+  benches and deterministic probes over the new vector features:
+  `SvgJmhBenchmark` (path parse / whole-file icon read / icon→node) plus a
+  `SvgParseAllocProbe`; `ChartJmhBenchmark` (bar + line + pie render) plus a
+  `ChartAllocProbe` (layout-compile allocation); `VectorRenderOperatorProbe`
+  (the same paths drawn flat vs. gradient vs. translucent, counted as PDF
+  content-stream operators); `IconRampJmhBenchmark` (icon-placement scaling,
+  `@Param` 8/32/128); and `MixedShowcaseJmhBenchmark` (one document combining
+  prose, inline sparklines, bar + pie charts, SVG icons and a gradient path).
+  Shared `SvgBenchmarkFixtures` / `ChartBenchmarkFixtures` hold the inputs so
+  each bench and its probe measure identical data.
+- **Current-speed report carries a stage breakdown and a run summary (not
+  shipped).** `CurrentSpeedBenchmark` persists a per-scenario compose / layout /
+  render split (`stages[]`, median ms) to the JSON and a `stages` CSV, and
+  writes a readable `summary.md`. `BenchmarkDiffTool` consumes `stages[]`,
+  prints a per-stage delta table, and reports the scenarios added/removed
+  between two runs.
+- **Every current-speed scenario is now covered by the smoke perf gate (not
+  shipped).** The `long-token` scenario previously had no SMOKE threshold and
+  silently escaped the gate; it now has one, and `CurrentSpeedScenarioGateTest`
+  fails the build if any scenario lacks a threshold.
+- **Benchmark coverage for the render hot paths (not shipped).** Added an image
+  embed/scale gate (`ImageCacheOperatorProbe` + `ImageBenchmarkFixtures` +
+  `ImageJmhBenchmark`, with `ImageCacheGateTest` pinning `PdfImageCache` reuse), a
+  single-shot cold-start render bench (`ColdStartJmhBenchmark`), a report-scaling
+  sweep in `ComparativeBenchmark` (equivalent content across GraphCompose /
+  iText 9 / JasperReports at 40 / 200 / 1000 table rows — iText upgraded from the
+  EOL 5.5.x to current 9.x — printing a per-size GraphCompose-advantage ratio plus
+  a post-run sample-PDF dump per library/size), a
+  production-scale `LargeTableJmhBenchmark`, an allocation-rate / GC-pressure probe
+  (`AllocationRateProbe`), and an accented-Latin measurement scenario.
+- **Deterministic benchmark gates run on every PR (not shipped).** The benchmarks
+  module's tests never ran in CI; the `perf-smoke` job now runs them, so the
+  image-cache, render-operator (F5 coalescing), vector-paint (flat / gradient /
+  alpha / stroked / dashed operator structure), and scenario-coverage gates fail a
+  PR on a structural regression. A `vector-rich` scenario (charts + SVG icons +
+  gradient) joins the gated current-speed harness; `BenchmarkMedianTool` carries the
+  stage breakdown into its aggregate; and the smoke gate's GC-noisy `peakHeapMb`
+  check is now advisory (fails only on average latency). Chart-layout variants
+  (horizontal / stacked / donut / value-axis-min), a sparkline ramp, and a
+  per-paint-mode vector render bench round out the JMH suite.
 - **Removed the `java.awt.*` / `java.util.*` co-wildcard in four files.**
   `InvoiceTemplateComposer`, `ProposalTemplateComposer`,
   `WeeklyScheduleTemplateComposer`, and the engine `PdfRenderingSystemECS`

@@ -23,7 +23,7 @@
 ## When to use the harness
 
 - **Smoke check before a release** — `CurrentSpeedBenchmark -Dgraphcompose.benchmark.profile=smoke`
-  takes ~15 s, exercises the canonical render path through 5 fixture
+  takes ~15 s, exercises the canonical render path through 7 fixture
   scenarios, and prints a single-page latency / throughput table.
   CI runs this on every PR (the `perf-smoke` job); the goal is "did
   this PR make a representative render visibly slower?" — *not* "is
@@ -51,25 +51,54 @@
   layout-pass count) and reason about it; the harness is a sanity
   check after you've already chosen, not a decision tool before.
 - For **comparing GraphCompose to another PDF library** —
-  `ComparativeBenchmark` does render the same fixture through iText /
-  openHTMLToPDF / JasperReports for rough sizing, but the comparison
-  is a manual smoke test: each library has different defaults
-  (compression, font embedding, image resampling) and reading too much
-  into a single number is the wrong call.
+  `ComparativeBenchmark` does render equivalent content through iText /
+  JasperReports for rough sizing (a tiny single-page invoice for fixed
+  overhead, plus a report-scaling sweep — title + prose + an N-row table
+  at N = 40 / 200 / 1000 — that shows how each engine scales and prints a
+  GraphCompose-advantage ratio per size), but the comparison is a manual smoke test:
+  each library has different defaults (compression, font embedding, image
+  resampling) and reading too much into a single number is the wrong call.
+  Note one boundary asymmetry: the JasperReports figure measures fill +
+  PDF export with the design compiled once outside the loop, while the
+  GraphCompose and iText figures include per-iteration document
+  construction — so the Jasper number excludes work the other two pay.
+  `openHTMLtoPDF` is intentionally absent: its current release (1.0.10)
+  targets PDFBox 2.x and fails at runtime against the PDFBox 3.x this
+  project uses (no PDFBox-3-compatible openhtmltopdf release exists yet),
+  so it cannot share GraphCompose's classpath.
+
+## What runs on a PR — and what is on-demand (by design)
+
+The per-PR CI gate is deliberately light and deterministic:
+
+- **`perf-smoke` job** — `CurrentSpeedBenchmark` in the `smoke` profile with
+  absolute latency / heap thresholds (a gross-regression tripwire), plus the
+  module's deterministic gate tests (`mvnw -f benchmarks/pom.xml test`:
+  image-cache reuse, render-operator coalescing, scenario/threshold coverage).
+
+These are intentionally **not** on the per-PR path:
+
+- **The JMH benches** (`*JmhBenchmark`) are full / on-demand only. A forked,
+  warmed JMH run of the whole suite takes minutes; running it per PR is too
+  expensive for the signal. Run them by hand (or on a schedule) before a release
+  and quote those numbers for rigorous claims.
+- **The relative `BenchmarkVerdictTool` gate** (±% vs a committed baseline) runs
+  locally only, and no static `smoke` baseline is committed: absolute timings are
+  machine-specific, so a baseline captured on one machine would false-positive on
+  another. Use a local same-machine A/B (a `-Repeat` median before/after) for
+  relative comparison; the absolute smoke thresholds are the CI safety net.
 
 ## Files in this module
 
 | File | Role |
 |---|---|
 | `CurrentSpeedBenchmark` | Default scenario runner — what CI's `perf-smoke` job exercises. Takes a `-Dgraphcompose.benchmark.profile=smoke\|full\|stress` switch. |
-| `ComparativeBenchmark` | Renders the same fixtures through GraphCompose, iText, openHTMLToPDF, JasperReports. **Rough local comparison only** — see "When not to use" above. |
-| `FullCvBenchmark`, `ScalabilityBenchmark` | Fixture-specific runners for CV and table-heavy scenarios. |
+| `ComparativeBenchmark` | Renders equivalent content through GraphCompose, iText, JasperReports — a small-invoice tier plus a report-scaling sweep (40 / 200 / 1000 rows) with a per-size advantage ratio, and dumps a sample PDF per library/size. **Rough local comparison only** — see "When not to use" above. |
 | `CanonicalBenchmarkSupport`, `BenchmarkSupport` | Shared fixture builders + measurement helpers. |
 | `BenchmarkReportWriter` | Writes JSON / CSV / text reports under `benchmarks/target/benchmarks/`. |
 | `BenchmarkDiffTool` | Compares two JSON reports and prints a delta table. Useful for pre/post comparisons. |
 | `BenchmarkMedianTool` | Median + dispersion across N runs of the same scenario. |
 | `GraphComposeStressTest`, `EnduranceTest` | Long-running stress / endurance harnesses. |
-| `GraphComposeBenchmark` | Legacy entry point preserved for one downstream caller. New work should target `CurrentSpeedBenchmark`. |
 
 ## Running
 
@@ -97,28 +126,46 @@ without reproducing locally.
 ## How to read a report
 
 The JSON shape is intentionally simple — a top-level run record with
-per-scenario sub-records. Each sub-record carries:
-
-- `avgMs`, `p50Ms`, `p95Ms`, `maxMs` — latency distribution across
-  iterations within the run.
-- `docsPerSec` — rough throughput; **not statistically rigorous**,
-  intended only as a relative number against a sibling scenario or a
-  previous run on the same machine.
-- `avgKB` — average output byte size. Stable across runs on the same
-  fixture; useful for catching content corruption (size shifts by
-  > a few hundred bytes are usually a bug, not a benchmark fluctuation).
-- `peakMB` — peak heap as observed by `MemoryMXBean`; coarse, do not
-  use for memory-budget enforcement.
+per-scenario sub-records. The latency rows carry these fields (the JSON
+keys are camelCase; the CSV columns are the snake_case equivalents):
+
+- `avgMillis`, `p50Millis`, `p95Millis`, `maxMillis` — latency distribution
+  across iterations within the run.
+- `docsPerSecond` — a **derived** figure, `1000 / avgMillis`: the reciprocal of
+  average latency, **not** a measured throughput rate. Real parallel throughput
+  lives in the separate `throughput[]` section (full profile only). Treat it as
+  a relative number against a sibling scenario or a previous run on the same
+  machine, not a publishable rate.
+- `avgKilobytes` — average output byte size. Stable across runs on the same
+  fixture; useful for catching content corruption (size shifts by more than a
+  few hundred bytes are usually a bug, not a benchmark fluctuation).
+- `peakHeapMb` — used-heap **delta** over the post-warmup baseline (closer to
+  per-iteration allocation pressure than to absolute live heap). GC-timing
+  noisy, so **advisory only** — for a deterministic memory signal use the
+  allocation bytes from `MeasurementCountBenchmark` or the alloc probes.
+
+A `stages[]` array carries the per-template-scenario compose / layout / render
+median split (`composeMillis` / `layoutMillis` / `renderMillis` / `totalMillis`),
+present when the run has enough measurement iterations.
 
 ## Strict JMH layer
 
 The Track C JMH layer (forked JVM, warmup + measurement, JIT-stable numbers)
 lives alongside this manual harness. JMH benchmarks are annotated classes under
 `com.demcha.compose.jmh`; the shade plugin builds a self-contained runner jar so
-forked benchmark JVMs inherit the full classpath. Present benchmarks:
-`CanonicalRender` (bare-DSL multi-section render), `TemplateCv` (the
-`ModernProfessional` layered template), and `PaginatedDocument` (a multi-page
-document parameterised by section count).
+forked benchmark JVMs inherit the full classpath. The suite spans steady-state
+render benches (`CanonicalRender`, `TemplateCv`, `Chart`, `ChartVariant`, `Image`,
+`MixedShowcase`), parameterised scaling ramps (`IconRamp`, `LargeTable`,
+`SparklineRamp`, `PaginatedDocument`, `VectorPaint`), the SVG-import micro-benches
+(`Svg`), and a single-shot cold-start bench (`ColdStart`).
+
+Every steady-state JMH bench uses `@Fork(1)` with a 3×2s warmup / 5×2s measurement
+window — a deliberately fast default for on-demand local iteration (a single fork,
+so the reported `Error` column is blank). For a number you intend to quote, pass
+more forks on the CLI (e.g. `-f 5`) for a cross-fork error estimate. The exception
+is `ColdStart`, which is single-shot (`Mode.SingleShotTime`, `@Warmup(0)`,
+`@Fork(10)`) — it deliberately measures the JIT-cold first render across ten fresh
+JVMs.
 
 The measured region differs per benchmark: `TemplateCv` hoists fixture
 construction into `@Setup` and times the render only, while `CanonicalRender` and

@@ -30,7 +30,7 @@
         <logback.version>1.5.34</logback.version>
 
         <openhtmltopdf.version>1.0.10</openhtmltopdf.version>
-        <itextpdf.version>5.5.13.3</itextpdf.version>
+        <itext.version>9.6.0</itext.version>
         <jasperreports.version>7.0.7</jasperreports.version>
     </properties>
 
@@ -100,8 +100,9 @@
         </dependency>
         <dependency>
             <groupId>com.itextpdf</groupId>
-            <artifactId>itextpdf</artifactId>
-            <version>${itextpdf.version}</version>
+            <artifactId>itext-core</artifactId>
+            <version>${itext.version}</version>
+            <type>pom</type>
         </dependency>
         <dependency>
             <groupId>net.sf.jasperreports</groupId>