From 917f3ce440b30b7a0d51f911f5e4b7fc0381ecb9 Mon Sep 17 00:00:00 2001
From: DemchaAV <demchaav@gmail.com>
Date: Sun, 14 Jun 2026 19:02:44 +0100
Subject: [PATCH 01/10] chore(benchmarks): remove three redundant benchmark
 mains

FullCvBenchmark duplicated the JMH TemplateCvJmhBenchmark (CV through
ModernProfessional) with a hand-rolled, JIT-noisier loop and no report.
GraphComposeBenchmark was an early-engine relic measuring the same
title+body+divider doc as CurrentSpeedBenchmark's engine-simple scenario.
ScalabilityBenchmark's thread-scaling sweep is folded into
CurrentSpeedBenchmark's full-profile throughput run (thread counts now
1,2,4,8,16).

Drop the matching run-benchmarks.ps1 steps and the benchmarks.md /
benchmarks/README.md entries. ComparativeBenchmark, the JMH benches, the
deterministic probes, and the soak/stress runners stay. Benchmark module
compiles; its 28 tests pass.
---
 CHANGELOG.md                                  |  7 ++
 benchmarks/README.md                          |  6 +-
 .../demcha/compose/CurrentSpeedBenchmark.java |  4 +-
 .../com/demcha/compose/FullCvBenchmark.java   | 84 ------------------
 .../demcha/compose/GraphComposeBenchmark.java | 79 -----------------
 .../demcha/compose/ScalabilityBenchmark.java  | 88 -------------------
 docs/operations/benchmarks.md                 |  9 +-
 scripts/run-benchmarks.ps1                    |  7 +-
 8 files changed, 15 insertions(+), 269 deletions(-)
 delete mode 100644 benchmarks/src/main/java/com/demcha/compose/FullCvBenchmark.java
 delete mode 100644 benchmarks/src/main/java/com/demcha/compose/GraphComposeBenchmark.java
 delete mode 100644 benchmarks/src/main/java/com/demcha/compose/ScalabilityBenchmark.java

diff --git a/CHANGELOG.md b/CHANGELOG.md
index 19c44ff5f..e9f7124c2 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -337,6 +337,13 @@ Entries land here as they merge.
 
 ### Internal
 
+- **Benchmark suite cleanup (not shipped).** Removed three redundant
+  benchmark mains: `FullCvBenchmark` (superseded by the JMH
+  `TemplateCvJmhBenchmark`), `GraphComposeBenchmark` (early-engine relic
+  duplicating `CurrentSpeedBenchmark`'s `engine-simple` scenario), and
+  `ScalabilityBenchmark` (its thread-scaling sweep folded into
+  `CurrentSpeedBenchmark`'s full-profile throughput run, now `1,2,4,8,16`).
+  Dropped the matching `run-benchmarks.ps1` steps and doc entries.
 - **Removed the `java.awt.*` / `java.util.*` co-wildcard in four files.**
   `InvoiceTemplateComposer`, `ProposalTemplateComposer`,
   `WeeklyScheduleTemplateComposer`, and the engine `PdfRenderingSystemECS`
diff --git a/benchmarks/README.md b/benchmarks/README.md
index f6041365c..e232c6e21 100644
--- a/benchmarks/README.md
+++ b/benchmarks/README.md
@@ -62,15 +62,11 @@
 | File | Role |
 |---|---|
 | `CurrentSpeedBenchmark` | Default scenario runner — what CI's `perf-smoke` job exercises. Takes a `-Dgraphcompose.benchmark.profile=smoke\|full\|stress` switch. |
-| `ComparativeBenchmark` | Renders the same fixtures through GraphCompose, iText, openHTMLToPDF, JasperReports. **Rough local comparison only** — see "When not to use" above. |
-| `FullCvBenchmark`, `ScalabilityBenchmark` | Fixture-specific runners for CV and table-heavy scenarios. |
-| `CanonicalBenchmarkSupport`, `BenchmarkSupport` | Shared fixture builders + measurement helpers. |
+| `ComparativeBenchmark` | Renders the same fixtures through GraphCompose, iText, openHTMLToPDF, JasperReports. **Rough local comparison only** — see "When not to use" above. || `CanonicalBenchmarkSupport`, `BenchmarkSupport` | Shared fixture builders + measurement helpers. |
 | `BenchmarkReportWriter` | Writes JSON / CSV / text reports under `benchmarks/target/benchmarks/`. |
 | `BenchmarkDiffTool` | Compares two JSON reports and prints a delta table. Useful for pre/post comparisons. |
 | `BenchmarkMedianTool` | Median + dispersion across N runs of the same scenario. |
 | `GraphComposeStressTest`, `EnduranceTest` | Long-running stress / endurance harnesses. |
-| `GraphComposeBenchmark` | Legacy entry point preserved for one downstream caller. New work should target `CurrentSpeedBenchmark`. |
-
 ## Running
 
 From the repo root:
diff --git a/benchmarks/src/main/java/com/demcha/compose/CurrentSpeedBenchmark.java b/benchmarks/src/main/java/com/demcha/compose/CurrentSpeedBenchmark.java
index 2858d64a6..bbda30b8f 100644
--- a/benchmarks/src/main/java/com/demcha/compose/CurrentSpeedBenchmark.java
+++ b/benchmarks/src/main/java/com/demcha/compose/CurrentSpeedBenchmark.java
@@ -55,7 +55,9 @@ public final class CurrentSpeedBenchmark {
     private static final int DEFAULT_FULL_WARMUP_ITERATIONS = 12;
     private static final int DEFAULT_FULL_MEASUREMENT_ITERATIONS = 40;
     private static final int DEFAULT_FULL_DOCS_PER_THREAD = 12;
-    private static final String DEFAULT_FULL_THREAD_COUNTS = "1,2,4,8";
+    // The 16-thread tier is absorbed from the removed ScalabilityBenchmark so the
+    // full profile keeps a thread-scaling data point (smoke runs no throughput).
+    private static final String DEFAULT_FULL_THREAD_COUNTS = "1,2,4,8,16";
     // Bumped from 2/5 to 30/100 so smoke runs reach a steady JIT state and the
     // p95 calculation actually has enough samples to interpolate rather than
     // collapsing to the maximum observed time. The smoke profile remains the
diff --git a/benchmarks/src/main/java/com/demcha/compose/FullCvBenchmark.java b/benchmarks/src/main/java/com/demcha/compose/FullCvBenchmark.java
deleted file mode 100644
index c035f96e3..000000000
--- a/benchmarks/src/main/java/com/demcha/compose/FullCvBenchmark.java
+++ /dev/null
@@ -1,84 +0,0 @@
-package com.demcha.compose;
-
-import com.demcha.compose.document.api.DocumentSession;
-import com.demcha.compose.document.templates.api.DocumentTemplate;
-import com.demcha.compose.document.templates.cv.presets.ModernProfessional;
-import com.demcha.compose.document.templates.cv.spec.CvSpec;
-import com.demcha.compose.document.theme.BusinessTheme;
-import org.apache.pdfbox.pdmodel.common.PDRectangle;
-
-import java.util.Arrays;
-
-public class FullCvBenchmark {
-
-    private static final int WARMUP_ITERATIONS = Integer.getInteger("graphcompose.benchmark.fullCv.warmup", 100);
-    private static final int MEASUREMENT_ITERATIONS = Integer.getInteger("graphcompose.benchmark.fullCv.iterations", 500);
-
-    public static void main(String[] args) {
-        BenchmarkSupport.configureQuietLogging();
-        System.out.println("Starting FullCvBenchmark...");
-
-        CvSpec cv = CanonicalBenchmarkSupport.canonicalCv();
-        DocumentTemplate<CvSpec> template = ModernProfessional.create(BusinessTheme.modern());
-
-        System.out.println("Warming up JVM (JIT compilation, font cache warmup)...");
-        for (int i = 0; i < WARMUP_ITERATIONS; i++) {
-            generateCvInMemory(template, cv);
-        }
-
-        System.out.println("Measuring performance (" + MEASUREMENT_ITERATIONS + " iterations)...");
-        long[] durationsNs = new long[MEASUREMENT_ITERATIONS];
-
-        for (int i = 0; i < MEASUREMENT_ITERATIONS; i++) {
-            long start = System.nanoTime();
-            generateCvInMemory(template, cv);
-            long end = System.nanoTime();
-            durationsNs[i] = end - start;
-        }
-
-        printStatistics(durationsNs);
-    }
-
-    private static void generateCvInMemory(DocumentTemplate<CvSpec> template, CvSpec cv) {
-        try (DocumentSession document = GraphCompose.document()
-                .pageSize(com.demcha.compose.document.api.DocumentPageSize.A4)
-                .margin(15, 10, 15, 15)
-                .create()) {
-            template.compose(document, cv);
-            document.toPdfBytes();
-        } catch (Exception e) {
-            throw new RuntimeException("Failed to generate PDF", e);
-        }
-    }
-
-    private static void printStatistics(long[] durationsNs) {
-        Arrays.sort(durationsNs);
-
-        double[] durationsMs = Arrays.stream(durationsNs).mapToDouble(ns -> ns / 1_000_000.0).toArray();
-
-        double min = durationsMs[0];
-        double max = durationsMs[durationsMs.length - 1];
-        double avg = Arrays.stream(durationsMs).average().orElse(0.0);
-        double median = durationsMs[(int) (durationsMs.length * 0.5)];
-        double p95 = durationsMs[(int) (durationsMs.length * 0.95)];
-        double p99 = durationsMs[(int) (durationsMs.length * 0.99)];
-
-        System.out.println("\nBenchmark results (milliseconds):");
-        System.out.println("------------------------------------------------");
-        System.out.printf("Min time:           %.2f ms%n", min);
-        System.out.printf("Average time:       %.2f ms%n", avg);
-        System.out.printf("Median (50%%):       %.2f ms (typical response time)%n", median);
-        System.out.printf("95th percentile:    %.2f ms (95%% of runs finish within this)%n", p95);
-        System.out.printf("99th percentile:    %.2f ms (rare spikes or GC pressure)%n", p99);
-        System.out.printf("Max time:           %.2f ms%n", max);
-        System.out.println("------------------------------------------------");
-
-        if (median < 200) {
-            System.out.println("Verdict: Excellent. The engine is very fast for this scenario.");
-        } else if (median < 1000) {
-            System.out.println("Verdict: Good. This is a healthy speed for complex generation.");
-        } else {
-            System.out.println("Verdict: Slow enough to investigate with a profiler.");
-        }
-    }
-}
diff --git a/benchmarks/src/main/java/com/demcha/compose/GraphComposeBenchmark.java b/benchmarks/src/main/java/com/demcha/compose/GraphComposeBenchmark.java
deleted file mode 100644
index f4717e66c..000000000
--- a/benchmarks/src/main/java/com/demcha/compose/GraphComposeBenchmark.java
+++ /dev/null
@@ -1,79 +0,0 @@
-package com.demcha.compose;
-
-import com.demcha.compose.engine.components.style.Margin;
-import org.apache.pdfbox.pdmodel.common.PDRectangle;
-
-import java.util.Arrays;
-
-public class GraphComposeBenchmark {
-
-    private static final int WARMUP_ITERATIONS = Integer.getInteger("graphcompose.benchmark.coreEngine.warmup", 100);
-    private static final int MEASUREMENT_ITERATIONS = Integer.getInteger("graphcompose.benchmark.coreEngine.iterations", 500);
-
-    public static void main(String[] args) {
-        BenchmarkSupport.configureQuietLogging();
-        System.out.println("Starting GraphComposeBenchmark...");
-
-        System.out.println("Warming up JVM (JIT compilation, font cache warmup)...");
-        for (int i = 0; i < WARMUP_ITERATIONS; i++) {
-            generateCvInMemory();
-        }
-
-        System.out.println("Measuring performance (" + MEASUREMENT_ITERATIONS + " iterations)...");
-        long[] durationsNs = new long[MEASUREMENT_ITERATIONS];
-
-        for (int i = 0; i < MEASUREMENT_ITERATIONS; i++) {
-            long start = System.nanoTime();
-            generateCvInMemory();
-            long end = System.nanoTime();
-            durationsNs[i] = end - start;
-        }
-
-        printStatistics(durationsNs);
-    }
-
-    private static void generateCvInMemory() {
-        try {
-            CanonicalBenchmarkSupport.renderSimpleBenchmarkDocument(
-                    PDRectangle.A4,
-                    Margin.of(24),
-                    "CoreEngineRoot",
-                    "GraphCompose Core Benchmark",
-                    "Analytical engineer focused on reliable platform design. "
-                            + "Testing paragraph breaking and layout calculation engine.");
-        } catch (Exception e) {
-            throw new RuntimeException("Failed to generate PDF", e);
-        }
-    }
-
-    private static void printStatistics(long[] durationsNs) {
-        Arrays.sort(durationsNs);
-
-        double[] durationsMs = Arrays.stream(durationsNs).mapToDouble(ns -> ns / 1_000_000.0).toArray();
-
-        double min = durationsMs[0];
-        double max = durationsMs[durationsMs.length - 1];
-        double avg = Arrays.stream(durationsMs).average().orElse(0.0);
-        double median = durationsMs[(int) (durationsMs.length * 0.5)];
-        double p95 = durationsMs[(int) (durationsMs.length * 0.95)];
-        double p99 = durationsMs[(int) (durationsMs.length * 0.99)];
-
-        System.out.println("\nBenchmark results (milliseconds):");
-        System.out.println("------------------------------------------------");
-        System.out.printf("Min time:           %.2f ms%n", min);
-        System.out.printf("Average time:       %.2f ms%n", avg);
-        System.out.printf("Median (50%%):       %.2f ms (typical response time)%n", median);
-        System.out.printf("95th percentile:    %.2f ms (95%% of runs finish within this)%n", p95);
-        System.out.printf("99th percentile:    %.2f ms (rare spikes or GC pressure)%n", p99);
-        System.out.printf("Max time:           %.2f ms%n", max);
-        System.out.println("------------------------------------------------");
-
-        if (median < 100) {
-            System.out.println("Verdict: Excellent. The engine is very fast for this scenario.");
-        } else if (median < 500) {
-            System.out.println("Verdict: Good. This is a healthy speed for a synchronous REST API.");
-        } else {
-            System.out.println("Verdict: Slow enough to investigate with a profiler.");
-        }
-    }
-}
diff --git a/benchmarks/src/main/java/com/demcha/compose/ScalabilityBenchmark.java b/benchmarks/src/main/java/com/demcha/compose/ScalabilityBenchmark.java
deleted file mode 100644
index b8e945ef6..000000000
--- a/benchmarks/src/main/java/com/demcha/compose/ScalabilityBenchmark.java
+++ /dev/null
@@ -1,88 +0,0 @@
-package com.demcha.compose;
-
-import com.demcha.compose.engine.components.style.Margin;
-import org.apache.pdfbox.pdmodel.common.PDRectangle;
-
-import java.util.ArrayList;
-import java.util.Arrays;
-import java.util.List;
-import java.util.concurrent.*;
-
-/**
- * Linear Scalability Test
- * Measures throughput (documents per second) as thread count increases.
- */
-public class ScalabilityBenchmark {
-
-    private static final int DOCUMENTS_PER_THREAD = Integer.getInteger("graphcompose.scalability.documentsPerThread", 100);
-    private static final int WARMUP_DOCS = Integer.getInteger("graphcompose.scalability.warmupDocs", 100);
-    private static final String THREAD_COUNTS = System.getProperty("graphcompose.scalability.threads", "1,2,4,8,16");
-
-    public static void main(String[] args) throws Exception {
-        BenchmarkSupport.configureQuietLogging();
-        System.out.println("Starting Scalability Benchmark: Linear Scalability");
-        System.out.println("------------------------------------------------------------");
-
-        // Warmup
-        for (int i = 0; i < WARMUP_DOCS; i++) {
-            generateOne();
-        }
-
-        int[] threadCounts = parseThreadCounts(THREAD_COUNTS);
-        System.out.println(String.format("%-10s | %-15s | %-12s", "Threads", "Total Docs", "Throughput (docs/sec)"));
-        System.out.println("------------------------------------------------------------");
-
-        for (int threads : threadCounts) {
-            runScalabilityTest(threads);
-        }
-    }
-
-    private static void runScalabilityTest(int threads) throws Exception {
-        int totalDocs = threads * DOCUMENTS_PER_THREAD;
-        ExecutorService executor = Executors.newFixedThreadPool(threads);
-        
-        long startTime = System.nanoTime();
-        
-        List<Future<?>> futures = new ArrayList<>();
-        for (int i = 0; i < totalDocs; i++) {
-            futures.add(executor.submit(() -> {
-                try {
-                    generateOne();
-                } catch (Exception e) {
-                    e.printStackTrace();
-                }
-            }));
-        }
-
-        for (Future<?> future : futures) {
-            future.get();
-        }
-
-        long endTime = System.nanoTime();
-        executor.shutdown();
-        executor.awaitTermination(1, TimeUnit.MINUTES);
-
-        double durationSec = (endTime - startTime) / 1_000_000_000.0;
-        double throughput = totalDocs / durationSec;
-
-        System.out.println(String.format("%-10d | %-15d | %12.2f", threads, totalDocs, throughput));
-    }
-
-    private static void generateOne() throws Exception {
-        CanonicalBenchmarkSupport.renderSimpleBenchmarkDocument(
-                PDRectangle.A4,
-                Margin.of(24),
-                "ScalabilityRoot",
-                "Scalability",
-                "Scalability test message.");
-    }
-
-    private static int[] parseThreadCounts(String raw) {
-        return Arrays.stream(raw.split(","))
-                .map(String::trim)
-                .filter(value -> !value.isEmpty())
-                .mapToInt(Integer::parseInt)
-                .filter(value -> value > 0)
-                .toArray();
-    }
-}
diff --git a/docs/operations/benchmarks.md b/docs/operations/benchmarks.md
index 315f4d523..775483384 100644
--- a/docs/operations/benchmarks.md
+++ b/docs/operations/benchmarks.md
@@ -36,15 +36,10 @@ The script prints numbered sections so you can map console output to the pipelin
 1. `01-build-classpath`
    Builds the test classpath once and writes `target/benchmark.classpath`.
 2. `02-current-speed`
-   Runs `CurrentSpeedBenchmark` in the selected profile.
+   Runs `CurrentSpeedBenchmark` in the selected profile. The full profile also
+   runs the thread-scaling throughput sweep (1 → 16 threads).
 3. `03-comparative`
    Runs the GraphCompose canonical vs iText 5 vs JasperReports comparison.
-4. `04-core-engine`
-   Runs `GraphComposeBenchmark`.
-5. `05-full-cv`
-   Runs `FullCvBenchmark`.
-6. `06-scalability`
-   Runs the thread-scaling throughput benchmark.
 7. `07-stress`
    Runs the concurrent stability stress test.
 8. `08-endurance`
diff --git a/scripts/run-benchmarks.ps1 b/scripts/run-benchmarks.ps1
index dbe162c08..e3d3947b6 100644
--- a/scripts/run-benchmarks.ps1
+++ b/scripts/run-benchmarks.ps1
@@ -5,8 +5,8 @@ Runs the local GraphCompose benchmark pipeline and stores timestamped logs and r
 
 .DESCRIPTION
 The wrapper performs a staged local run:
-01 build classpath, 02 current-speed, 03 comparative, 04 core engine, 05 full CV, 06 scalability,
-07 stress, optional 08 endurance, then 09/10 diff steps.
+01 build classpath, 02 current-speed, 03 comparative, 07 stress,
+optional 08 endurance, then 09/10 diff steps.
 
 Current-speed diffs are profile-aware. The wrapper only compares reports
 from the same current-speed profile (`smoke` or `full`) and skips the
@@ -368,9 +368,6 @@ try {
                 -InputPaths $comparativeRuns | Out-Null
         }
 
-        Invoke-JavaMain -Name "04-core-engine" -Classpath $javaClasspath -MainClass "com.demcha.compose.GraphComposeBenchmark"
-        Invoke-JavaMain -Name "05-full-cv" -Classpath $javaClasspath -MainClass "com.demcha.compose.FullCvBenchmark"
-        Invoke-JavaMain -Name "06-scalability" -Classpath $javaClasspath -MainClass "com.demcha.compose.ScalabilityBenchmark"
         Invoke-JavaMain -Name "07-stress" -Classpath $javaClasspath -MainClass "com.demcha.compose.GraphComposeStressTest"
 
         if ($IncludeEndurance) {

From 019f64b32cd23aa44a0694cd43604e11d2c88818 Mon Sep 17 00:00:00 2001
From: DemchaAV <demchaav@gmail.com>
Date: Sun, 14 Jun 2026 19:26:31 +0100
Subject: [PATCH 02/10] perf(benchmarks): persist compose/layout/render stages
 + a run summary.md
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The stage breakdown (per-template compose / layout / render medians) was
printed to the console and discarded. Promote it into the report:
runStageBreakdown returns a StageRow, CurrentSpeedReport carries a stages[]
array, and a stages CSV is written — so a diff can attribute a regression to
an engine stage, not just the blended total. Also write a per-run summary.md
(latency + stages + throughput tables) so a reviewer reads one file instead
of the JSON plus several CSVs.

Additive output only: diff/verdict/median read the report by field and ignore
the new array. Benchmark module compiles; 28 tests pass; verified on a smoke
run (stages[] present, summary.md readable, perf gate passes).
---
 .../demcha/compose/BenchmarkReportWriter.java |   8 +
 .../demcha/compose/CurrentSpeedBenchmark.java | 144 +++++++++++++++---
 2 files changed, 131 insertions(+), 21 deletions(-)

diff --git a/benchmarks/src/main/java/com/demcha/compose/BenchmarkReportWriter.java b/benchmarks/src/main/java/com/demcha/compose/BenchmarkReportWriter.java
index 73e061d3d..51d2b2e42 100644
--- a/benchmarks/src/main/java/com/demcha/compose/BenchmarkReportWriter.java
+++ b/benchmarks/src/main/java/com/demcha/compose/BenchmarkReportWriter.java
@@ -60,6 +60,14 @@ Path writeCsv(String tableName, List<String> headers, List<List<String>> rows) t
             return archived;
         }
 
+        Path writeMarkdown(String name, String content) throws IOException {
+            Path latest = directory.resolve("latest-" + name + ".md");
+            Path archived = directory.resolve(name + "-" + timestamp + ".md");
+            Files.writeString(latest, content, StandardCharsets.UTF_8);
+            Files.writeString(archived, content, StandardCharsets.UTF_8);
+            return archived;
+        }
+
         Path directory() {
             return directory;
         }
diff --git a/benchmarks/src/main/java/com/demcha/compose/CurrentSpeedBenchmark.java b/benchmarks/src/main/java/com/demcha/compose/CurrentSpeedBenchmark.java
index bbda30b8f..e3d877943 100644
--- a/benchmarks/src/main/java/com/demcha/compose/CurrentSpeedBenchmark.java
+++ b/benchmarks/src/main/java/com/demcha/compose/CurrentSpeedBenchmark.java
@@ -143,20 +143,21 @@ private void run() throws Exception {
 
         // Stage breakdown: for each template scenario we time compose / layout
         // / render separately so consumers can attribute regressions to the
-        // engine vs. PDFBox. Engine-simple and feature-rich scenarios also
-        // use the canonical pipeline and benefit from the same probe.
+        // engine vs. PDFBox. Only the template scenarios are probed here; the
+        // latency table above still covers every scenario.
+        List<StageRow> stageRows = new ArrayList<>();
         if (profile != BenchmarkProfile.SMOKE || config.measurementIterations() >= 20) {
             System.out.println();
             System.out.println("Stage breakdown (median ms per stage)");
             System.out.printf("%-18s | %12s | %12s | %12s | %12s%n",
                     "Scenario", "Compose", "Layout", "Render", "Total");
             System.out.println("-".repeat(78));
-            runStageBreakdown("invoice-template", () -> openInvoiceSession(),
-                    s -> invoiceTemplate.compose(s, invoice), config.measurementIterations());
-            runStageBreakdown("cv-template", () -> openCvSession(),
-                    s -> cvTemplate.compose(s, cv), config.measurementIterations());
-            runStageBreakdown("proposal-template", () -> openProposalSession(),
-                    s -> proposalTemplate.compose(s, proposal), config.measurementIterations());
+            stageRows.add(runStageBreakdown("invoice-template", () -> openInvoiceSession(),
+                    s -> invoiceTemplate.compose(s, invoice), config.measurementIterations()));
+            stageRows.add(runStageBreakdown("cv-template", () -> openCvSession(),
+                    s -> cvTemplate.compose(s, cv), config.measurementIterations()));
+            stageRows.add(runStageBreakdown("proposal-template", () -> openProposalSession(),
+                    s -> proposalTemplate.compose(s, proposal), config.measurementIterations()));
         }
 
         List<ThroughputRow> throughputRows = new ArrayList<>();
@@ -201,10 +202,13 @@ private void run() throws Exception {
                 config.docsPerThread(),
                 config.threadCounts(),
                 latencyRows,
+                stageRows,
                 throughputRows,
                 totalBenchmarkBytes);
         System.out.println("Saved JSON benchmark report to " + summary.jsonPath());
-        System.out.println("Saved CSV benchmark reports to " + summary.latencyCsvPath() + " and " + summary.throughputCsvPath());
+        System.out.println("Saved CSV benchmark reports to " + summary.latencyCsvPath() + ", "
+                + summary.stagesCsvPath() + ", and " + summary.throughputCsvPath());
+        System.out.println("Saved markdown summary to " + summary.summaryMarkdownPath());
 
         if (enforceGate) {
             PerformanceGateResult gateResult = evaluatePerformanceGate(profile, latencyRows);
@@ -363,10 +367,10 @@ private interface SessionComposer {
      * median-ms-per-stage row so callers can attribute regressions to
      * compose / layout / render independently.
      */
-    private void runStageBreakdown(String scenario,
-                                   SessionFactory factory,
-                                   SessionComposer composer,
-                                   int iterations) throws Exception {
+    private StageRow runStageBreakdown(String scenario,
+                                       SessionFactory factory,
+                                       SessionComposer composer,
+                                       int iterations) throws Exception {
         int warmup = Math.max(2, Math.min(20, iterations / 5));
         for (int i = 0; i < warmup; i++) {
             try (DocumentSession session = factory.open()) {
@@ -398,12 +402,13 @@ private void runStageBreakdown(String scenario,
                 throw new AssertionError();
             }
         }
+        double composeMs = medianMs(composeNs);
+        double layoutMs = medianMs(layoutNs);
+        double renderMs = medianMs(renderNs);
+        double totalMs = medianMs(totalNs);
         System.out.printf("%-18s | %12.3f | %12.3f | %12.3f | %12.3f%n",
-                scenario,
-                medianMs(composeNs),
-                medianMs(layoutNs),
-                medianMs(renderNs),
-                medianMs(totalNs));
+                scenario, composeMs, layoutMs, renderMs, totalMs);
+        return new StageRow(scenario, round(composeMs), round(layoutMs), round(renderMs), round(totalMs));
     }
 
     private static double medianMs(long[] arr) {
@@ -677,16 +682,19 @@ private PathSummary writeReports(BenchmarkReportWriter.BenchmarkArtifacts artifa
                                      int docsPerThread,
                                      int[] threadCounts,
                                      List<LatencyRow> latencyRows,
+                                     List<StageRow> stageRows,
                                      List<ThroughputRow> throughputRows,
                                      long totalBenchmarkBytes) throws Exception {
+        String timestamp = LocalDateTime.now().format(TIMESTAMP_FORMAT);
         CurrentSpeedReport report = new CurrentSpeedReport(
-                LocalDateTime.now().format(TIMESTAMP_FORMAT),
+                timestamp,
                 profileId,
                 warmupIterations,
                 measurementIterations,
                 docsPerThread,
                 Arrays.stream(threadCounts).boxed().toList(),
                 latencyRows,
+                stageRows,
                 throughputRows,
                 totalBenchmarkBytes);
 
@@ -717,8 +725,88 @@ private PathSummary writeReports(BenchmarkReportWriter.BenchmarkArtifacts artifa
                                 format(row.docsPerSecond()),
                                 format(row.avgMillisPerDoc())))
                         .toList());
+        var stagesCsvPath = artifacts.writeCsv(
+                "stages",
+                List.of("scenario", "compose_ms", "layout_ms", "render_ms", "total_ms"),
+                stageRows.stream()
+                        .map(row -> List.of(
+                                row.scenario(),
+                                format(row.composeMillis()),
+                                format(row.layoutMillis()),
+                                format(row.renderMillis()),
+                                format(row.totalMillis())))
+                        .toList());
+        var summaryMarkdownPath = artifacts.writeMarkdown(
+                "summary",
+                buildSummaryMarkdown(timestamp, profileId, latencyRows, stageRows,
+                        throughputRows, totalBenchmarkBytes));
+
+        return new PathSummary(jsonPath.toString(), latencyCsvPath.toString(),
+                stagesCsvPath.toString(), throughputCsvPath.toString(),
+                summaryMarkdownPath.toString());
+    }
+
+    /**
+     * Renders a single human-readable summary of the run — the latency table,
+     * the per-stage compose/layout/render split (the only place the suite
+     * attributes time to engine stages vs. PDFBox), and the throughput table
+     * when present — so a reviewer reads one file instead of stitching the JSON
+     * and several CSVs together.
+     */
+    private static String buildSummaryMarkdown(String timestamp,
+                                               String profileId,
+                                               List<LatencyRow> latencyRows,
+                                               List<StageRow> stageRows,
+                                               List<ThroughputRow> throughputRows,
+                                               long totalBenchmarkBytes) {
+        StringBuilder md = new StringBuilder();
+        md.append("# Current-speed benchmark — ").append(profileId).append(" profile\n\n");
+        md.append('`').append(timestamp).append("`\n\n");
+
+        md.append("## Latency (ms)\n\n");
+        md.append("| Scenario | Avg | p50 | p95 | Max | Docs/s | Avg KB | Peak MB |\n");
+        md.append("|---|---:|---:|---:|---:|---:|---:|---:|\n");
+        for (LatencyRow row : latencyRows) {
+            md.append("| ").append(row.scenario())
+                    .append(" | ").append(format(row.avgMillis()))
+                    .append(" | ").append(format(row.p50Millis()))
+                    .append(" | ").append(format(row.p95Millis()))
+                    .append(" | ").append(format(row.maxMillis()))
+                    .append(" | ").append(format(row.docsPerSecond()))
+                    .append(" | ").append(format(row.avgKilobytes()))
+                    .append(" | ").append(format(row.peakHeapMb()))
+                    .append(" |\n");
+        }
 
-        return new PathSummary(jsonPath.toString(), latencyCsvPath.toString(), throughputCsvPath.toString());
+        if (!stageRows.isEmpty()) {
+            md.append("\n## Stages — template scenarios (median ms — compose / layout / render)\n\n");
+            md.append("| Scenario | Compose | Layout | Render | Total |\n");
+            md.append("|---|---:|---:|---:|---:|\n");
+            for (StageRow row : stageRows) {
+                md.append("| ").append(row.scenario())
+                        .append(" | ").append(format(row.composeMillis()))
+                        .append(" | ").append(format(row.layoutMillis()))
+                        .append(" | ").append(format(row.renderMillis()))
+                        .append(" | ").append(format(row.totalMillis()))
+                        .append(" |\n");
+            }
+        }
+
+        if (!throughputRows.isEmpty()) {
+            md.append("\n## Throughput\n\n");
+            md.append("| Threads | Total docs | Docs/s | Avg doc ms |\n");
+            md.append("|---:|---:|---:|---:|\n");
+            for (ThroughputRow row : throughputRows) {
+                md.append("| ").append(row.threads())
+                        .append(" | ").append(row.totalDocs())
+                        .append(" | ").append(format(row.docsPerSecond()))
+                        .append(" | ").append(format(row.avgMillisPerDoc()))
+                        .append(" |\n");
+            }
+        }
+
+        md.append("\nByte guard: ").append(totalBenchmarkBytes).append('\n');
+        return md.toString();
     }
 
     private static double round(double value) {
@@ -772,6 +860,18 @@ private record ThroughputRow(String scenario,
                                  double avgMillisPerDoc) {
     }
 
+    /**
+     * Per-scenario compose / layout / render split (median ms). Persisted so a
+     * diff can attribute a regression to an engine stage rather than only the
+     * blended total — previously this was printed to the console and discarded.
+     */
+    private record StageRow(String scenario,
+                            double composeMillis,
+                            double layoutMillis,
+                            double renderMillis,
+                            double totalMillis) {
+    }
+
     private record CurrentSpeedReport(String timestamp,
                                       String profile,
                                       int warmupIterations,
@@ -779,11 +879,13 @@ private record CurrentSpeedReport(String timestamp,
                                       int docsPerThread,
                                       List<Integer> threadCounts,
                                       List<LatencyRow> latency,
+                                      List<StageRow> stages,
                                       List<ThroughputRow> throughput,
                                       long totalBytes) {
     }
 
-    private record PathSummary(String jsonPath, String latencyCsvPath, String throughputCsvPath) {
+    private record PathSummary(String jsonPath, String latencyCsvPath, String stagesCsvPath,
+                               String throughputCsvPath, String summaryMarkdownPath) {
     }
 
     private record BenchmarkConfig(int warmupIterations,

From 2d2785208a73d5fd4a3337cf63d72b4a869be487 Mon Sep 17 00:00:00 2001
From: DemchaAV <demchaav@gmail.com>
Date: Sun, 14 Jun 2026 19:37:13 +0100
Subject: [PATCH 03/10] perf(benchmarks): diff consumes stages[] and reports
 added/removed scenarios
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

BenchmarkDiffTool now (1) surfaces scenario set changes — addedScenarios /
removedScenarios — instead of silently intersecting, so a newly-added (or
dropped) scenario can no longer vanish from a diff unnoticed; and (2) diffs
the stages[] array, emitting per-scenario compose/layout/render/total percent
deltas (console block + stages-diff CSV) so a regression can be attributed to
an engine stage.

Backward-compatible: a report without stages[] yields an empty stage diff
(MissingNode iterates empty); latency/throughput delta rows stay
intersection-only; the diff report is terminal (median/verdict read producer
reports, not diffs). Adds a DiffToolTest case; 29 bench tests pass.
---
 .../com/demcha/compose/BenchmarkDiffTool.java | 100 +++++++++++++++++-
 .../demcha/compose/BenchmarkDiffToolTest.java |  61 +++++++++++
 2 files changed, 160 insertions(+), 1 deletion(-)

diff --git a/benchmarks/src/main/java/com/demcha/compose/BenchmarkDiffTool.java b/benchmarks/src/main/java/com/demcha/compose/BenchmarkDiffTool.java
index 9b99d272f..0fb058bf8 100644
--- a/benchmarks/src/main/java/com/demcha/compose/BenchmarkDiffTool.java
+++ b/benchmarks/src/main/java/com/demcha/compose/BenchmarkDiffTool.java
@@ -93,6 +93,31 @@ private void diffCurrentSpeed(DiffInput input,
                     signedPercent(row.peakHeapMbDeltaPct()));
         }
 
+        if (!report.addedScenarios().isEmpty() || !report.removedScenarios().isEmpty()) {
+            System.out.println();
+            System.out.println("Scenario set changes");
+            System.out.println("  Added in candidate:    "
+                    + (report.addedScenarios().isEmpty() ? "(none)" : String.join(", ", report.addedScenarios())));
+            System.out.println("  Removed from baseline: "
+                    + (report.removedScenarios().isEmpty() ? "(none)" : String.join(", ", report.removedScenarios())));
+        }
+
+        if (!report.stages().isEmpty()) {
+            System.out.println();
+            System.out.println("Stage diff (pct delta per stage)");
+            System.out.printf("%-18s | %12s | %12s | %12s | %12s%n",
+                    "Scenario", "Compose pct", "Layout pct", "Render pct", "Total pct");
+            System.out.println("-".repeat(78));
+            for (StageDiff row : report.stages()) {
+                System.out.printf("%-18s | %12s | %12s | %12s | %12s%n",
+                        row.scenario(),
+                        signedPercent(row.composeDeltaPct()),
+                        signedPercent(row.layoutDeltaPct()),
+                        signedPercent(row.renderDeltaPct()),
+                        signedPercent(row.totalDeltaPct()));
+            }
+        }
+
         System.out.println();
         System.out.println("Throughput diff");
         System.out.printf("%-18s | %8s | %12s | %14s%n",
@@ -143,10 +168,29 @@ private void diffCurrentSpeed(DiffInput input,
                                 format(row.candidateAvgMillisPerDoc()),
                                 format(row.avgMillisPerDocDeltaPct())))
                         .toList());
+        Path stagesCsv = artifacts.writeCsv(
+                "stages-diff",
+                List.of("scenario", "baseline_compose_ms", "candidate_compose_ms", "compose_delta_pct", "baseline_layout_ms", "candidate_layout_ms", "layout_delta_pct", "baseline_render_ms", "candidate_render_ms", "render_delta_pct", "baseline_total_ms", "candidate_total_ms", "total_delta_pct"),
+                report.stages().stream()
+                        .map(row -> List.of(
+                                row.scenario(),
+                                format(row.baselineComposeMillis()),
+                                format(row.candidateComposeMillis()),
+                                format(row.composeDeltaPct()),
+                                format(row.baselineLayoutMillis()),
+                                format(row.candidateLayoutMillis()),
+                                format(row.layoutDeltaPct()),
+                                format(row.baselineRenderMillis()),
+                                format(row.candidateRenderMillis()),
+                                format(row.renderDeltaPct()),
+                                format(row.baselineTotalMillis()),
+                                format(row.candidateTotalMillis()),
+                                format(row.totalDeltaPct())))
+                        .toList());
 
         System.out.println();
         System.out.println("Saved JSON diff report to " + jsonPath);
-        System.out.println("Saved CSV diff reports to " + latencyCsv + " and " + throughputCsv);
+        System.out.println("Saved CSV diff reports to " + latencyCsv + ", " + throughputCsv + ", and " + stagesCsv);
     }
 
     private void diffComparative(DiffInput input,
@@ -214,6 +258,29 @@ private CurrentSpeedDiffReport buildCurrentSpeedDiff(DiffInput input, JsonNode b
                 })
                 .toList();
 
+        Map<String, JsonNode> baselineStages = indexBy(baseline.path("stages"), "scenario");
+        Map<String, JsonNode> candidateStages = indexBy(candidate.path("stages"), "scenario");
+        List<StageDiff> stageDiffs = intersectKeys(baselineStages, candidateStages).stream()
+                .map(key -> {
+                    JsonNode before = baselineStages.get(key);
+                    JsonNode after = candidateStages.get(key);
+                    return new StageDiff(
+                            key,
+                            before.path("composeMillis").asDouble(),
+                            after.path("composeMillis").asDouble(),
+                            percentDelta(before.path("composeMillis").asDouble(), after.path("composeMillis").asDouble()),
+                            before.path("layoutMillis").asDouble(),
+                            after.path("layoutMillis").asDouble(),
+                            percentDelta(before.path("layoutMillis").asDouble(), after.path("layoutMillis").asDouble()),
+                            before.path("renderMillis").asDouble(),
+                            after.path("renderMillis").asDouble(),
+                            percentDelta(before.path("renderMillis").asDouble(), after.path("renderMillis").asDouble()),
+                            before.path("totalMillis").asDouble(),
+                            after.path("totalMillis").asDouble(),
+                            percentDelta(before.path("totalMillis").asDouble(), after.path("totalMillis").asDouble()));
+                })
+                .toList();
+
         Map<String, JsonNode> baselineThroughput = indexThroughput(baseline.path("throughput"));
         Map<String, JsonNode> candidateThroughput = indexThroughput(candidate.path("throughput"));
         List<CurrentSpeedThroughputDiff> throughputDiffs = intersectKeys(baselineThroughput, candidateThroughput).stream()
@@ -237,7 +304,10 @@ private CurrentSpeedDiffReport buildCurrentSpeedDiff(DiffInput input, JsonNode b
                 input.candidatePath().toString(),
                 baseline.path("timestamp").asText(),
                 candidate.path("timestamp").asText(),
+                addedKeys(baselineLatency, candidateLatency),
+                removedKeys(baselineLatency, candidateLatency),
                 latencyDiffs,
+                stageDiffs,
                 throughputDiffs
         );
     }
@@ -294,6 +364,16 @@ private static List<String> intersectKeys(Map<String, JsonNode> left, Map<String
                 .toList();
     }
 
+    /** Keys present in {@code candidate} but not {@code baseline} (new scenarios). */
+    private static List<String> addedKeys(Map<String, JsonNode> baseline, Map<String, JsonNode> candidate) {
+        return candidate.keySet().stream().filter(key -> !baseline.containsKey(key)).sorted().toList();
+    }
+
+    /** Keys present in {@code baseline} but not {@code candidate} (dropped scenarios). */
+    private static List<String> removedKeys(Map<String, JsonNode> baseline, Map<String, JsonNode> candidate) {
+        return baseline.keySet().stream().filter(key -> !candidate.containsKey(key)).sorted().toList();
+    }
+
     private static Iterable<JsonNode> iterable(JsonNode array) {
         return () -> new Iterator<>() {
             private final Iterator<JsonNode> delegate = array.iterator();
@@ -477,11 +557,29 @@ private record CurrentSpeedThroughputDiff(String scenario,
                                               double avgMillisPerDocDeltaPct) {
     }
 
+    private record StageDiff(String scenario,
+                             double baselineComposeMillis,
+                             double candidateComposeMillis,
+                             double composeDeltaPct,
+                             double baselineLayoutMillis,
+                             double candidateLayoutMillis,
+                             double layoutDeltaPct,
+                             double baselineRenderMillis,
+                             double candidateRenderMillis,
+                             double renderDeltaPct,
+                             double baselineTotalMillis,
+                             double candidateTotalMillis,
+                             double totalDeltaPct) {
+    }
+
     private record CurrentSpeedDiffReport(String baselinePath,
                                           String candidatePath,
                                           String baselineTimestamp,
                                           String candidateTimestamp,
+                                          List<String> addedScenarios,
+                                          List<String> removedScenarios,
                                           List<CurrentSpeedLatencyDiff> latency,
+                                          List<StageDiff> stages,
                                           List<CurrentSpeedThroughputDiff> throughput) {
     }
 
diff --git a/benchmarks/src/test/java/com/demcha/compose/BenchmarkDiffToolTest.java b/benchmarks/src/test/java/com/demcha/compose/BenchmarkDiffToolTest.java
index 783ad2479..d3319131c 100644
--- a/benchmarks/src/test/java/com/demcha/compose/BenchmarkDiffToolTest.java
+++ b/benchmarks/src/test/java/com/demcha/compose/BenchmarkDiffToolTest.java
@@ -93,6 +93,35 @@ void currentSpeedDiffKeepsOnlyScenariosPresentInBothRuns() throws Exception {
         assertThat(diff.path("throughput").get(0).path("scenario").asText()).isEqualTo("shared");
     }
 
+    @Test
+    void currentSpeedDiffSurfacesAddedRemovedScenariosAndStageDeltas() throws Exception {
+        System.setProperty("graphcompose.benchmark.root", tempDir.toString());
+        Path baseline = write("baseline.json", currentSpeedWithStages("full",
+                latency("shared", 10.0, 10.0, 100.0, 1.0, 100.0) + ","
+                        + latency("only-in-baseline", 10.0, 10.0, 100.0, 1.0, 100.0),
+                stage("shared", 1.0, 2.0, 4.0, 7.0),
+                throughput("shared", 1, 50.0, 20.0)));
+        Path candidate = write("candidate.json", currentSpeedWithStages("full",
+                latency("shared", 10.0, 10.0, 100.0, 1.0, 100.0) + ","
+                        + latency("only-in-candidate", 5.0, 5.0, 200.0, 0.5, 90.0),
+                stage("shared", 1.0, 2.0, 8.0, 11.0),
+                throughput("shared", 1, 50.0, 20.0)));
+
+        BenchmarkDiffTool.main(new String[]{baseline.toString(), candidate.toString()});
+
+        JsonNode diff = readDiff("current-speed");
+        // Loud set-changes: one-sided scenarios are surfaced, not silently dropped.
+        assertThat(toStrings(diff.path("addedScenarios"))).containsExactly("only-in-candidate");
+        assertThat(toStrings(diff.path("removedScenarios"))).containsExactly("only-in-baseline");
+        // The shared scenario is still the only intersected latency delta row.
+        assertThat(diff.path("latency").size()).isEqualTo(1);
+        // Stage diff: render 4 -> 8 = +100%, compose unchanged.
+        JsonNode stageDiff = diff.path("stages").get(0);
+        assertThat(stageDiff.path("scenario").asText()).isEqualTo("shared");
+        assertThat(stageDiff.path("renderDeltaPct").asDouble()).isCloseTo(100.0, within(EPS));
+        assertThat(stageDiff.path("composeDeltaPct").asDouble()).isCloseTo(0.0, within(EPS));
+    }
+
     @Test
     void currentSpeedDiffTreatsZeroBaselineAsHundredPercentAndZeroToZeroAsZero() throws Exception {
         System.setProperty("graphcompose.benchmark.root", tempDir.toString());
@@ -228,6 +257,38 @@ private static String latency(String scenario,
                 """.formatted(scenario, scenario, avgMillis, p95Millis, docsPerSecond, avgKilobytes, peakHeapMb);
     }
 
+    private static String currentSpeedWithStages(String profile, String latencyItems,
+                                                 String stageItems, String throughputItems) {
+        return """
+                {
+                  "timestamp": "2026-04-14 21:00:00",
+                  "profile": "%s",
+                  "latency": [%s],
+                  "stages": [%s],
+                  "throughput": [%s]
+                }
+                """.formatted(profile, latencyItems, stageItems, throughputItems);
+    }
+
+    private static String stage(String scenario, double composeMs, double layoutMs,
+                                double renderMs, double totalMs) {
+        return """
+                {
+                  "scenario": "%s",
+                  "composeMillis": %s,
+                  "layoutMillis": %s,
+                  "renderMillis": %s,
+                  "totalMillis": %s
+                }
+                """.formatted(scenario, composeMs, layoutMs, renderMs, totalMs);
+    }
+
+    private static java.util.List<String> toStrings(JsonNode array) {
+        java.util.List<String> values = new java.util.ArrayList<>();
+        array.forEach(node -> values.add(node.asText()));
+        return values;
+    }
+
     private static String throughput(String scenario, int threads, double docsPerSecond, double avgMillisPerDoc) {
         return """
                 {

From faec9e3f23c02eb54e2fa5fa5d6ab9fc94d1ae9c Mon Sep 17 00:00:00 2001
From: DemchaAV <demchaav@gmail.com>
Date: Sun, 14 Jun 2026 19:55:24 +0100
Subject: [PATCH 04/10] perf(benchmarks): add SVG-import feature benches (parse
 / read / node)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

First feature-object benchmarks for the v1.8 vector surface (the rest of the
suite is text/table only):
- SvgJmhBenchmark (forked JMH): SvgPath.parse of a real Material heart d,
  SvgIcon.parse of a multi-layer icon, SvgIcon.node on a pre-parsed icon.
- SvgParseAllocProbe (deterministic ThreadMXBean alloc, median of 11): KB/op
  for the same three operations.
- SvgBenchmarkFixtures: the heart d (vendored — the benchmark module can't
  reach the test/example copies) and a synthetic multi-layer icon (gradient
  bg + transformed groups + stroked curves) within the reader's supported
  subset, so it always parses.

Run on demand, not per-PR: java -jar benchmarks/target/benchmarks.jar Svg.
Verified: compiles; both benches run — path parse ~3.6 us/op, icon read
~308 us/op (DOM-parse dominated, 114 KB/op), node build ~0.4 us/op / 2 KB/op.
---
 .../demcha/compose/SvgBenchmarkFixtures.java  | 55 +++++++++++
 .../demcha/compose/SvgParseAllocProbe.java    | 93 ++++++++++++++++++
 .../demcha/compose/jmh/SvgJmhBenchmark.java   | 97 +++++++++++++++++++
 3 files changed, 245 insertions(+)
 create mode 100644 benchmarks/src/main/java/com/demcha/compose/SvgBenchmarkFixtures.java
 create mode 100644 benchmarks/src/main/java/com/demcha/compose/SvgParseAllocProbe.java
 create mode 100644 benchmarks/src/main/java/com/demcha/compose/jmh/SvgJmhBenchmark.java

diff --git a/benchmarks/src/main/java/com/demcha/compose/SvgBenchmarkFixtures.java b/benchmarks/src/main/java/com/demcha/compose/SvgBenchmarkFixtures.java
new file mode 100644
index 000000000..120741433
--- /dev/null
+++ b/benchmarks/src/main/java/com/demcha/compose/SvgBenchmarkFixtures.java
@@ -0,0 +1,55 @@
+package com.demcha.compose;
+
+/**
+ * Shared SVG fixtures for the v1.8 vector-import benchmarks (path parse, whole
+ * icon read, icon → node build).
+ *
+ * <p>Self-contained on purpose: the benchmarks module cannot reach the
+ * main-module test constants or the examples module, so the heart path is
+ * vendored here (it also lives in {@code SvgPathTest} / {@code VectorPathExample}
+ * in their own modules). The icon is a synthetic but realistic multi-layer
+ * document — a gradient-filled background, a {@code translate}+{@code scale}
+ * group of filled paths and a stroked circle, and a {@code rotate} group with a
+ * polygon and a quadratic-curve stroke — so it exercises XML parse, {@code <g>}
+ * transform accumulation, gradient resolution and per-layer path lowering the
+ * way a real exporter file would, while staying entirely within the reader's
+ * supported subset (so it never throws).</p>
+ *
+ * @author Artem Demchyshyn
+ */
+public final class SvgBenchmarkFixtures {
+
+    /** Material "favorite" heart — the same {@code d} used in the SVG tests/examples. */
+    public static final String MATERIAL_HEART_D =
+            "M12 21.35l-1.45-1.32C5.4 15.36 2 12.28 2 8.5 2 5.42 4.42 3 7.5 3"
+            + "c1.74 0 3.41.81 4.5 2.09C13.09 3.81 14.76 3 16.5 3 19.58 3 22 5.42 22 8.5"
+            + "c0 3.78-3.4 6.86-8.55 11.54L12 21.35z";
+
+    /** Heart viewBox edge (square 24×24), passed to {@code SvgPath.parse}. */
+    public static final double HEART_VIEWBOX = 24.0;
+
+    /** A realistic multi-layer icon: gradient bg + transformed groups + stroked curves. */
+    public static final String MULTI_LAYER_ICON_SVG = """
+            <svg viewBox="0 0 48 48" xmlns="http://www.w3.org/2000/svg">
+              <defs>
+                <linearGradient id="sky" x1="0" y1="0" x2="0" y2="48" gradientUnits="userSpaceOnUse">
+                  <stop offset="0" stop-color="#3b82f6"/>
+                  <stop offset="1" stop-color="#1e3a8a"/>
+                </linearGradient>
+              </defs>
+              <rect x="0" y="0" width="48" height="48" rx="6" fill="url(#sky)"/>
+              <g transform="translate(6 6) scale(1.1)">
+                <path d="M0 24 L12 4 L24 24 Z" fill="#fbbf24"/>
+                <path d="M6 24 L16 10 L26 24 Z" fill="#f59e0b"/>
+                <circle cx="20" cy="8" r="4" fill="#fde68a" stroke="#92400e" stroke-width="1.5"/>
+              </g>
+              <g transform="rotate(8 24 40)">
+                <polygon points="4,40 44,40 40,46 8,46" fill="#10b981"/>
+                <path d="M10 42 Q24 38 38 42" fill="none" stroke="#065f46" stroke-width="2"/>
+              </g>
+            </svg>
+            """;
+
+    private SvgBenchmarkFixtures() {
+    }
+}
diff --git a/benchmarks/src/main/java/com/demcha/compose/SvgParseAllocProbe.java b/benchmarks/src/main/java/com/demcha/compose/SvgParseAllocProbe.java
new file mode 100644
index 000000000..b8df62a2b
--- /dev/null
+++ b/benchmarks/src/main/java/com/demcha/compose/SvgParseAllocProbe.java
@@ -0,0 +1,93 @@
+package com.demcha.compose;
+
+import com.demcha.compose.document.svg.SvgIcon;
+import com.demcha.compose.document.svg.SvgPath;
+
+import java.lang.management.ManagementFactory;
+import java.util.Arrays;
+import java.util.function.Supplier;
+
+/**
+ * Deterministic allocation probe for the v1.8 SVG-import path: warm
+ * (JIT-steady) bytes allocated per {@link SvgPath#parse}, per
+ * {@link SvgIcon#parse}, and per {@link SvgIcon#node} — the three operations
+ * with no analogue in the rest of the suite (which is text / table only).
+ *
+ * <p>Allocation counts are noise-free (unlike wall-clock or {@code peakHeapMb}),
+ * so this is the signal the "optimize the engine, not benchmarks" rule wants:
+ * a develop-vs-branch A/B shows a parse/read/node allocation change directly.
+ * No {@code src/main} changes.</p>
+ *
+ * @author Artem Demchyshyn
+ */
+public final class SvgParseAllocProbe {
+
+    private static final com.sun.management.ThreadMXBean THREAD_MX =
+            (com.sun.management.ThreadMXBean) ManagementFactory.getThreadMXBean();
+
+    private static final int WARMUP = 60;
+    private static final int MEASURE = 11;
+
+    /** Escape sink so the JIT cannot elide the measured allocations. */
+    private static long sink;
+
+    public static void main(String[] args) {
+        BenchmarkSupport.configureQuietLogging();
+        enableAllocationMeasurement();
+
+        SvgIcon icon = SvgIcon.parse(SvgBenchmarkFixtures.MULTI_LAYER_ICON_SVG);
+
+        double parseKb = measureAllocKb(() -> SvgPath.parse(
+                SvgBenchmarkFixtures.MATERIAL_HEART_D,
+                0, 0, SvgBenchmarkFixtures.HEART_VIEWBOX, SvgBenchmarkFixtures.HEART_VIEWBOX));
+        double readKb = measureAllocKb(() -> SvgIcon.parse(SvgBenchmarkFixtures.MULTI_LAYER_ICON_SVG));
+        double nodeKb = measureAllocKb(() -> icon.node(48.0));
+
+        System.out.println("GraphCompose SVG-import allocation probe (median of " + MEASURE + ")");
+        System.out.printf("  SvgPath.parse (heart d)     : %s%n", kb(parseKb));
+        System.out.printf("  SvgIcon.parse (multi-layer) : %s%n", kb(readKb));
+        System.out.printf("  SvgIcon.node(48)            : %s%n", kb(nodeKb));
+        System.out.println("alloc sink: " + sink);
+    }
+
+    private static double measureAllocKb(Supplier<Object> op) {
+        for (int i = 0; i < WARMUP; i++) {
+            sink += System.identityHashCode(op.get());
+        }
+        long[] alloc = new long[MEASURE];
+        for (int m = 0; m < MEASURE; m++) {
+            long before = currentThreadAllocatedBytes();
+            Object result = op.get();
+            long after = currentThreadAllocatedBytes();
+            sink += System.identityHashCode(result);
+            alloc[m] = before < 0 ? -1 : after - before;
+        }
+        Arrays.sort(alloc);
+        return alloc[MEASURE / 2] / 1024.0;
+    }
+
+    private static String kb(double value) {
+        return value < 0 ? "n/a (allocation measurement unsupported)" : "%.1f KB/op".formatted(value);
+    }
+
+    private static void enableAllocationMeasurement() {
+        try {
+            if (THREAD_MX.isThreadAllocatedMemorySupported() && !THREAD_MX.isThreadAllocatedMemoryEnabled()) {
+                THREAD_MX.setThreadAllocatedMemoryEnabled(true);
+            }
+        } catch (UnsupportedOperationException ignored) {
+            // Allocation measurement unsupported on this JVM; the probe reports n/a.
+        }
+    }
+
+    private static long currentThreadAllocatedBytes() {
+        try {
+            if (!THREAD_MX.isThreadAllocatedMemorySupported() || !THREAD_MX.isThreadAllocatedMemoryEnabled()) {
+                return -1;
+            }
+        } catch (UnsupportedOperationException ex) {
+            return -1;
+        }
+        return THREAD_MX.getCurrentThreadAllocatedBytes();
+    }
+}
diff --git a/benchmarks/src/main/java/com/demcha/compose/jmh/SvgJmhBenchmark.java b/benchmarks/src/main/java/com/demcha/compose/jmh/SvgJmhBenchmark.java
new file mode 100644
index 000000000..f7a63b30c
--- /dev/null
+++ b/benchmarks/src/main/java/com/demcha/compose/jmh/SvgJmhBenchmark.java
@@ -0,0 +1,97 @@
+package com.demcha.compose.jmh;
+
+import com.demcha.compose.SvgBenchmarkFixtures;
+import com.demcha.compose.document.svg.SvgIcon;
+import com.demcha.compose.document.svg.SvgPath;
+import org.openjdk.jmh.annotations.Benchmark;
+import org.openjdk.jmh.annotations.BenchmarkMode;
+import org.openjdk.jmh.annotations.Fork;
+import org.openjdk.jmh.annotations.Measurement;
+import org.openjdk.jmh.annotations.Mode;
+import org.openjdk.jmh.annotations.OutputTimeUnit;
+import org.openjdk.jmh.annotations.Scope;
+import org.openjdk.jmh.annotations.State;
+import org.openjdk.jmh.annotations.Warmup;
+import org.openjdk.jmh.infra.Blackhole;
+
+import java.util.concurrent.TimeUnit;
+
+/**
+ * Strict JMH micro-benchmark for the v1.8 SVG-import surface — the first
+ * feature-object benchmark (the rest of the suite renders text / tables only).
+ *
+ * <p>Three measured operations, all pure CPU + allocation (no
+ * {@code DocumentSession}, no PDF render):</p>
+ * <ul>
+ *   <li>{@code parseSvgPath} — {@link SvgPath#parse} of a real Material icon
+ *       {@code d} string (arc→cubic conversion, normalization).</li>
+ *   <li>{@code readSvgIcon} — {@link SvgIcon#parse} of a multi-layer icon (XML
+ *       parse, {@code <g>} transform accumulation, gradient resolution, one
+ *       {@link SvgPath} per layer).</li>
+ *   <li>{@code svgIconToNode} — {@link SvgIcon#node} on a pre-parsed icon (the
+ *       {@code PathNode} / layer-stack allocation done once per placement).</li>
+ * </ul>
+ *
+ * <p>Microsecond-scale work, so it needs the forked, JIT-stable JMH harness
+ * (an {@code exec:java} run cannot fork). Build the runner jar and run:</p>
+ *
+ * <pre>
+ *   ./mvnw -f benchmarks/pom.xml clean package -DskipTests
+ *   java -jar benchmarks/target/benchmarks.jar Svg
+ * </pre>
+ *
+ * @author Artem Demchyshyn
+ */
+@BenchmarkMode({Mode.AverageTime, Mode.Throughput})
+@OutputTimeUnit(TimeUnit.MICROSECONDS)
+@State(Scope.Benchmark)
+@Warmup(iterations = 3, time = 2)
+@Measurement(iterations = 5, time = 2)
+@Fork(1)
+public class SvgJmhBenchmark {
+
+    /** Parsed once so {@code svgIconToNode} measures only the node-build cost. */
+    private final SvgIcon icon = SvgIcon.parse(SvgBenchmarkFixtures.MULTI_LAYER_ICON_SVG);
+
+    /**
+     * Parses a real icon path-data string into normalized segments.
+     *
+     * @param blackhole JMH sink
+     */
+    @Benchmark
+    public void parseSvgPath(Blackhole blackhole) {
+        blackhole.consume(SvgPath.parse(
+                SvgBenchmarkFixtures.MATERIAL_HEART_D,
+                0, 0, SvgBenchmarkFixtures.HEART_VIEWBOX, SvgBenchmarkFixtures.HEART_VIEWBOX));
+    }
+
+    /**
+     * Reads a whole multi-layer SVG icon (XML parse → layers).
+     *
+     * @param blackhole JMH sink
+     */
+    @Benchmark
+    public void readSvgIcon(Blackhole blackhole) {
+        blackhole.consume(SvgIcon.parse(SvgBenchmarkFixtures.MULTI_LAYER_ICON_SVG));
+    }
+
+    /**
+     * Builds a placeable node (path nodes + layer stack) from a parsed icon.
+     *
+     * @param blackhole JMH sink
+     */
+    @Benchmark
+    public void svgIconToNode(Blackhole blackhole) {
+        blackhole.consume(icon.node(48.0));
+    }
+
+    /**
+     * Runs the JMH harness over this benchmark.
+     *
+     * @param args JMH CLI arguments
+     * @throws Exception if the JMH runner fails
+     */
+    public static void main(String[] args) throws Exception {
+        org.openjdk.jmh.Main.main(args);
+    }
+}

From ae025075d8e660b04e65e9438237873fd9aee026 Mon Sep 17 00:00:00 2001
From: DemchaAV <demchaav@gmail.com>
Date: Sun, 14 Jun 2026 20:00:55 +0100
Subject: [PATCH 05/10] perf(benchmarks): add chart feature benches (render +
 compile alloc)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

S4 of the modernization — the first chart benchmarks (the suite otherwise
renders text/tables only):
- ChartJmhBenchmark (forked JMH): end-to-end render of a chart-heavy doc —
  grouped bar + multi-series line (12 categories x 3 series) + 6-slice pie.
- ChartAllocProbe (deterministic ThreadMXBean, median of 11): warm
  layout-compile allocation, isolating chart-resolve + geometry emission.
- ChartBenchmarkFixtures: the shared bar/line/pie specs + data.

Run on demand, not per-PR: java -jar benchmarks/target/benchmarks.jar Chart.
Verified: compiles; render ~2.8 ms/op; compile alloc 446.8 KB (deterministic,
min=max=median, 1 page).
---
 .../com/demcha/compose/ChartAllocProbe.java   | 114 ++++++++++++++++++
 .../compose/ChartBenchmarkFixtures.java       |  91 ++++++++++++++
 .../demcha/compose/jmh/ChartJmhBenchmark.java |  79 ++++++++++++
 3 files changed, 284 insertions(+)
 create mode 100644 benchmarks/src/main/java/com/demcha/compose/ChartAllocProbe.java
 create mode 100644 benchmarks/src/main/java/com/demcha/compose/ChartBenchmarkFixtures.java
 create mode 100644 benchmarks/src/main/java/com/demcha/compose/jmh/ChartJmhBenchmark.java

diff --git a/benchmarks/src/main/java/com/demcha/compose/ChartAllocProbe.java b/benchmarks/src/main/java/com/demcha/compose/ChartAllocProbe.java
new file mode 100644
index 000000000..2921bde80
--- /dev/null
+++ b/benchmarks/src/main/java/com/demcha/compose/ChartAllocProbe.java
@@ -0,0 +1,114 @@
+package com.demcha.compose;
+
+import com.demcha.compose.document.api.DocumentPageSize;
+import com.demcha.compose.document.api.DocumentSession;
+import com.demcha.compose.document.backend.fixed.pdf.PdfMeasurementResources;
+import com.demcha.compose.document.layout.DocumentGraph;
+import com.demcha.compose.document.layout.DocumentLayoutPassContext;
+import com.demcha.compose.document.layout.LayoutCanvas;
+import com.demcha.compose.document.layout.LayoutCompiler;
+import com.demcha.compose.document.layout.LayoutGraph;
+import com.demcha.compose.document.layout.NodeRegistry;
+import com.demcha.compose.document.node.DocumentNode;
+
+import java.lang.management.ManagementFactory;
+import java.util.Arrays;
+import java.util.List;
+
+/**
+ * Deterministic allocation probe for the v1.8 chart subsystem: warm
+ * (JIT-steady) bytes allocated by the layout-compile pass of a chart-heavy
+ * document (a grouped bar, a multi-series line, and a pie). Charts are resolved
+ * into engine primitives during compile, so this isolates the chart-resolve +
+ * geometry-emission allocation — the noise-free signal a develop-vs-branch A/B
+ * needs. No {@code src/main} changes.
+ *
+ * @author Artem Demchyshyn
+ */
+public final class ChartAllocProbe {
+
+    private static final com.sun.management.ThreadMXBean THREAD_MX =
+            (com.sun.management.ThreadMXBean) ManagementFactory.getThreadMXBean();
+
+    private static final int WARMUP = 60;
+    private static final int MEASURE = 11;
+
+    public static void main(String[] args) throws Exception {
+        BenchmarkSupport.configureQuietLogging();
+        enableAllocationMeasurement();
+
+        try (DocumentSession session = GraphCompose.document()
+                .pageSize(DocumentPageSize.A4)
+                .margin(24, 24, 24, 24)
+                .create()) {
+            session.pageFlow(flow -> flow
+                    .chart(ChartBenchmarkFixtures.barSpec(), ChartBenchmarkFixtures.barStyle())
+                    .chart(ChartBenchmarkFixtures.lineSpec(), ChartBenchmarkFixtures.lineStyle())
+                    .chart(ChartBenchmarkFixtures.pieSpec()));
+
+            List<DocumentNode> roots = session.roots();
+            LayoutCanvas canvas = session.canvas();
+            NodeRegistry registry = session.registry();
+
+            try (PdfMeasurementResources resources = PdfMeasurementResources.open(List.of())) {
+                LayoutCompiler compiler = new LayoutCompiler(registry);
+                DocumentGraph graph = new DocumentGraph(roots);
+
+                int pages = 0;
+                // Warm up so the measured allocation is JIT steady state, not
+                // class-load / first-call cold start.
+                for (int i = 0; i < WARMUP; i++) {
+                    pages = compile(compiler, graph, registry, canvas, resources).totalPages();
+                }
+
+                long[] alloc = new long[MEASURE];
+                for (int m = 0; m < MEASURE; m++) {
+                    long before = currentThreadAllocatedBytes();
+                    LayoutGraph layout = compile(compiler, graph, registry, canvas, resources);
+                    alloc[m] = before < 0 ? -1 : currentThreadAllocatedBytes() - before;
+                    pages = layout.totalPages();
+                }
+                Arrays.sort(alloc);
+
+                System.out.println("GraphCompose chart layout-compile allocation probe");
+                System.out.printf("document: grouped bar + line (12 cats x 3 series) + 6-slice pie, pages: %d%n", pages);
+                System.out.printf("warm compile allocation (median of %d): %s%n",
+                        MEASURE, kb(alloc[MEASURE / 2]));
+                System.out.printf("  min %s / max %s%n", kb(alloc[0]), kb(alloc[MEASURE - 1]));
+            }
+        }
+    }
+
+    private static LayoutGraph compile(LayoutCompiler compiler, DocumentGraph graph,
+                                       NodeRegistry registry, LayoutCanvas canvas,
+                                       PdfMeasurementResources resources) {
+        DocumentLayoutPassContext context = new DocumentLayoutPassContext(
+                registry, canvas, resources.fontLibrary(), resources.textMeasurementSystem(), false);
+        return compiler.compile(graph, context, context);
+    }
+
+    private static String kb(long bytes) {
+        return bytes < 0 ? "n/a (allocation measurement unsupported)" : "%.1f KB".formatted(bytes / 1024.0);
+    }
+
+    private static void enableAllocationMeasurement() {
+        try {
+            if (THREAD_MX.isThreadAllocatedMemorySupported() && !THREAD_MX.isThreadAllocatedMemoryEnabled()) {
+                THREAD_MX.setThreadAllocatedMemoryEnabled(true);
+            }
+        } catch (UnsupportedOperationException ignored) {
+            // Allocation measurement unsupported on this JVM; the probe reports n/a.
+        }
+    }
+
+    private static long currentThreadAllocatedBytes() {
+        try {
+            if (!THREAD_MX.isThreadAllocatedMemorySupported() || !THREAD_MX.isThreadAllocatedMemoryEnabled()) {
+                return -1;
+            }
+        } catch (UnsupportedOperationException ex) {
+            return -1;
+        }
+        return THREAD_MX.getCurrentThreadAllocatedBytes();
+    }
+}
diff --git a/benchmarks/src/main/java/com/demcha/compose/ChartBenchmarkFixtures.java b/benchmarks/src/main/java/com/demcha/compose/ChartBenchmarkFixtures.java
new file mode 100644
index 000000000..59aa1578b
--- /dev/null
+++ b/benchmarks/src/main/java/com/demcha/compose/ChartBenchmarkFixtures.java
@@ -0,0 +1,91 @@
+package com.demcha.compose;
+
+import com.demcha.compose.document.chart.AxisSpec;
+import com.demcha.compose.document.chart.ChartData;
+import com.demcha.compose.document.chart.ChartSize;
+import com.demcha.compose.document.chart.ChartSpec;
+import com.demcha.compose.document.chart.ChartStyle;
+import com.demcha.compose.document.chart.LegendPosition;
+import com.demcha.compose.document.chart.PointMarker;
+import com.demcha.compose.document.chart.SliceLabelMode;
+import com.demcha.compose.document.chart.ValueLabelMode;
+import com.demcha.compose.document.style.DocumentColor;
+import com.demcha.compose.document.style.DocumentPaint;
+import com.demcha.compose.document.style.DocumentStroke;
+
+/**
+ * Shared fixtures for the v1.8 chart benchmarks: a non-trivial grouped bar and
+ * multi-series line (12 categories × 3 series) plus a 6-slice pie. Charts
+ * compile at layout time into ordinary shapes / lines / polygons / labels, so
+ * these stress {@code ChartLayoutResolver} + per-primitive geometry + label
+ * text-metrics — the cost no text/table bench exercises.
+ *
+ * @author Artem Demchyshyn
+ */
+public final class ChartBenchmarkFixtures {
+
+    private ChartBenchmarkFixtures() {
+    }
+
+    /** 12 categories × 3 series — a representative grouped-bar / line workload. */
+    public static ChartData monthlySeries() {
+        return ChartData.builder()
+                .categories("Jan", "Feb", "Mar", "Apr", "May", "Jun",
+                        "Jul", "Aug", "Sep", "Oct", "Nov", "Dec")
+                .series("2023", 12.4, 15.1, 9.8, 14.2, 16.0, 13.3, 17.1, 18.4, 15.9, 14.0, 19.2, 21.1)
+                .series("2024", 14.0, 18.2, 11.3, 16.9, 17.5, 15.0, 19.0, 20.2, 17.1, 16.4, 21.0, 23.5)
+                .series("2025", 15.5, 19.0, 12.0, 18.0, 19.1, 16.2, 20.5, 22.0, 18.9, 17.7, 22.8, 25.0)
+                .build();
+    }
+
+    /** 6-slice single-series data for the pie. */
+    public static ChartData regionShare() {
+        return ChartData.builder()
+                .categories("EMEA", "Americas", "APAC", "LATAM", "MEA", "Other")
+                .series("Share", 31.0, 27.0, 19.0, 10.0, 8.0, 5.0)
+                .build();
+    }
+
+    public static ChartSpec barSpec() {
+        return ChartSpec.bar()
+                .data(monthlySeries())
+                .valueAxis(AxisSpec.builder().baselineAtZero(true).build())
+                .legend(LegendPosition.BOTTOM)
+                .valueLabels(ValueLabelMode.OUTSIDE)
+                .size(ChartSize.aspectRatio(16, 7))
+                .build();
+    }
+
+    public static ChartStyle barStyle() {
+        return ChartStyle.builder()
+                .seriesPaint(0, DocumentPaint.solid(DocumentColor.rgb(20, 80, 95)))
+                .seriesPaint(1, DocumentPaint.solid(DocumentColor.rgb(196, 153, 76)))
+                .seriesPaint(2, DocumentPaint.solid(DocumentColor.rgb(120, 60, 140)))
+                .build();
+    }
+
+    public static ChartSpec lineSpec() {
+        return ChartSpec.line()
+                .data(monthlySeries())
+                .valueAxis(AxisSpec.builder().baselineAtZero(true).build())
+                .legend(LegendPosition.BOTTOM)
+                .size(ChartSize.aspectRatio(16, 7))
+                .build();
+    }
+
+    public static ChartStyle lineStyle() {
+        return ChartStyle.builder()
+                .lineWidth(1.8)
+                .pointMarker(PointMarker.circle(5.0)
+                        .withStroke(DocumentStroke.of(DocumentColor.WHITE, 1.0)))
+                .build();
+    }
+
+    public static ChartSpec pieSpec() {
+        return ChartSpec.pie()
+                .data(regionShare())
+                .sliceLabels(SliceLabelMode.CATEGORY_PERCENT)
+                .size(ChartSize.fixedHeight(190))
+                .build();
+    }
+}
diff --git a/benchmarks/src/main/java/com/demcha/compose/jmh/ChartJmhBenchmark.java b/benchmarks/src/main/java/com/demcha/compose/jmh/ChartJmhBenchmark.java
new file mode 100644
index 000000000..760592853
--- /dev/null
+++ b/benchmarks/src/main/java/com/demcha/compose/jmh/ChartJmhBenchmark.java
@@ -0,0 +1,79 @@
+package com.demcha.compose.jmh;
+
+import com.demcha.compose.ChartBenchmarkFixtures;
+import com.demcha.compose.GraphCompose;
+import com.demcha.compose.document.api.DocumentPageSize;
+import com.demcha.compose.document.api.DocumentSession;
+import com.demcha.compose.document.style.DocumentInsets;
+import org.openjdk.jmh.annotations.Benchmark;
+import org.openjdk.jmh.annotations.BenchmarkMode;
+import org.openjdk.jmh.annotations.Fork;
+import org.openjdk.jmh.annotations.Measurement;
+import org.openjdk.jmh.annotations.Mode;
+import org.openjdk.jmh.annotations.OutputTimeUnit;
+import org.openjdk.jmh.annotations.Scope;
+import org.openjdk.jmh.annotations.State;
+import org.openjdk.jmh.annotations.Warmup;
+import org.openjdk.jmh.infra.Blackhole;
+
+import java.util.concurrent.TimeUnit;
+
+/**
+ * Strict JMH micro-benchmark: end-to-end render of a chart-heavy document — a
+ * grouped bar, a multi-series line (both 12 categories × 3 series) and a 6-slice
+ * pie — to PDF bytes. Charts compile into engine primitives at layout time, so
+ * this exercises {@code ChartLayoutResolver} + per-primitive geometry + label
+ * text-metrics on top of the normal compose / layout / render pipeline.
+ *
+ * <pre>
+ *   ./mvnw -f benchmarks/pom.xml clean package -DskipTests
+ *   java -jar benchmarks/target/benchmarks.jar Chart
+ * </pre>
+ *
+ * @author Artem Demchyshyn
+ */
+@BenchmarkMode({Mode.AverageTime, Mode.Throughput})
+@OutputTimeUnit(TimeUnit.MILLISECONDS)
+@State(Scope.Benchmark)
+@Warmup(iterations = 3, time = 2)
+@Measurement(iterations = 5, time = 2)
+@Fork(1)
+public class ChartJmhBenchmark {
+
+    /**
+     * Builds the three-chart document and renders it to PDF bytes.
+     *
+     * @param blackhole JMH sink that consumes the rendered bytes
+     * @throws Exception if rendering fails
+     */
+    @Benchmark
+    public void renderChartDocument(Blackhole blackhole) throws Exception {
+        blackhole.consume(renderDocument());
+    }
+
+    private static byte[] renderDocument() throws Exception {
+        try (DocumentSession document = GraphCompose.document()
+                .pageSize(DocumentPageSize.A4)
+                .margin(DocumentInsets.of(36))
+                .create()) {
+            document.pageFlow()
+                    .name("ChartBenchmark")
+                    .spacing(12)
+                    .chart(ChartBenchmarkFixtures.barSpec(), ChartBenchmarkFixtures.barStyle())
+                    .chart(ChartBenchmarkFixtures.lineSpec(), ChartBenchmarkFixtures.lineStyle())
+                    .chart(ChartBenchmarkFixtures.pieSpec())
+                    .build();
+            return document.toPdfBytes();
+        }
+    }
+
+    /**
+     * Runs the JMH harness over this benchmark.
+     *
+     * @param args JMH CLI arguments
+     * @throws Exception if the JMH runner fails
+     */
+    public static void main(String[] args) throws Exception {
+        org.openjdk.jmh.Main.main(args);
+    }
+}

From 1747446cce56d31a307de251ba5e55719342a4fd Mon Sep 17 00:00:00 2001
From: DemchaAV <demchaav@gmail.com>
Date: Sun, 14 Jun 2026 20:16:51 +0100
Subject: [PATCH 06/10] perf(benchmarks): add vector-paint render-operator
 probe (S5/S6)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

VectorRenderOperatorProbe renders the same 40 curved blob paths three ways —
flat solid fill, linear gradient, and translucent (alpha) — and counts the PDF
content-stream operators, so the deltas isolate what each paint mode costs at
render time. Flat takes the fast fill path (sh=0, gs=0, W=0); a gradient fill
adds one shading + one clip per shape (sh, W); a translucent fill adds one
ExtGState (gs). Byte-deterministic, no A/B build needed; catches a regression
where a flat path wrongly takes the gradient branch (sh would jump from 0).

Verified: flat 0/0/0, gradient sh=40/W=40, alpha gs=40 over 40 paths.
---
 .../compose/VectorRenderOperatorProbe.java    | 102 ++++++++++++++++++
 1 file changed, 102 insertions(+)
 create mode 100644 benchmarks/src/main/java/com/demcha/compose/VectorRenderOperatorProbe.java

diff --git a/benchmarks/src/main/java/com/demcha/compose/VectorRenderOperatorProbe.java b/benchmarks/src/main/java/com/demcha/compose/VectorRenderOperatorProbe.java
new file mode 100644
index 000000000..8ea5652c2
--- /dev/null
+++ b/benchmarks/src/main/java/com/demcha/compose/VectorRenderOperatorProbe.java
@@ -0,0 +1,102 @@
+package com.demcha.compose;
+
+import com.demcha.compose.document.api.DocumentPageSize;
+import com.demcha.compose.document.api.DocumentSession;
+import com.demcha.compose.document.dsl.PageFlowBuilder;
+import com.demcha.compose.document.style.DocumentColor;
+import com.demcha.compose.document.style.DocumentPaint;
+import org.apache.pdfbox.Loader;
+import org.apache.pdfbox.contentstream.operator.Operator;
+import org.apache.pdfbox.pdfparser.PDFStreamParser;
+import org.apache.pdfbox.pdmodel.PDDocument;
+
+import java.io.IOException;
+import java.util.List;
+
+/**
+ * Deterministic content-stream operator probe for the v1.8 vector-paint render
+ * paths (S5/S6): the same {@code N} curved blob paths rendered three ways —
+ * flat solid fill, linear gradient, and translucent (alpha) fill — so the
+ * operator deltas isolate exactly what each paint mode costs at the PDF level.
+ *
+ * <p>A flat path takes the fast {@code fillAndStrokePath} route (just curve +
+ * fill operators). A gradient fill clips to the path and paints a shading
+ * ({@code q} / {@code W n} clip / {@code sh} / {@code Q} per shape); a
+ * translucent fill sets an ExtGState alpha ({@code gs}). Counting {@code sh} /
+ * {@code gs} / {@code W} against the flat baseline proves the per-shape cost
+ * structure and catches a regression where a flat path accidentally takes the
+ * heavier gradient branch. Byte-deterministic — no A/B build needed.</p>
+ *
+ * @author Artem Demchyshyn
+ */
+public final class VectorRenderOperatorProbe {
+
+    private static final int PATHS = 40;
+
+    private enum PaintMode { FLAT, GRADIENT, ALPHA }
+
+    public static void main(String[] args) throws Exception {
+        BenchmarkSupport.configureQuietLogging();
+
+        System.out.println("GraphCompose vector-paint render-operator probe (" + PATHS + " blob paths each)");
+        System.out.printf("%-10s | %6s | %6s | %6s | %6s%n", "Mode", "c", "sh", "gs", "W");
+        System.out.println("-".repeat(46));
+        for (PaintMode mode : PaintMode.values()) {
+            report(mode);
+        }
+        System.out.println();
+        System.out.println("c=cubic curve, sh=shading fill, gs=ExtGState (alpha), W=clip. "
+                + "Flat takes the fast path (no sh/gs/W); gradient adds sh+W per shape; alpha adds gs.");
+    }
+
+    private static void report(PaintMode mode) throws Exception {
+        byte[] pdf;
+        try (DocumentSession session = GraphCompose.document()
+                .pageSize(DocumentPageSize.A4).margin(28, 28, 28, 28).create()) {
+            session.pageFlow(flow -> authorBlobs(flow, mode));
+            pdf = session.toPdfBytes();
+        }
+        try (PDDocument document = Loader.loadPDF(pdf)) {
+            System.out.printf("%-10s | %6d | %6d | %6d | %6d%n",
+                    mode.name().toLowerCase(),
+                    count(document, "c"),
+                    count(document, "sh"),
+                    count(document, "gs"),
+                    count(document, "W"));
+        }
+    }
+
+    private static void authorBlobs(PageFlowBuilder flow, PaintMode mode) {
+        DocumentPaint gradient = DocumentPaint.linear(
+                DocumentColor.rgb(167, 139, 250), DocumentColor.rgb(97, 40, 217));
+        DocumentColor flat = DocumentColor.rgb(40, 90, 160);
+        DocumentColor translucent = DocumentColor.rgb(40, 90, 160).withOpacity(0.5);
+        for (int i = 0; i < PATHS; i++) {
+            flow.addPath(p -> {
+                p.size(60, 36)
+                        .moveTo(0.0, 0.5)
+                        .curveTo(0.25, 1.0, 0.75, 1.0, 1.0, 0.5)
+                        .curveTo(0.75, 0.0, 0.25, 0.0, 0.0, 0.5)
+                        .closePath();
+                switch (mode) {
+                    case FLAT -> p.fillColor(flat);
+                    case GRADIENT -> p.fill(gradient);
+                    case ALPHA -> p.fillColor(translucent);
+                }
+            });
+        }
+    }
+
+    private static int count(PDDocument document, String op) throws IOException {
+        int n = 0;
+        for (var page : document.getPages()) {
+            List<Object> tokens = new PDFStreamParser(page).parse();
+            for (Object token : tokens) {
+                if (token instanceof Operator operator && op.equals(operator.getName())) {
+                    n++;
+                }
+            }
+        }
+        return n;
+    }
+}

From c249e537f7bf676366c4efd11dfe5057198cfdd5 Mon Sep 17 00:00:00 2001
From: DemchaAV <demchaav@gmail.com>
Date: Sun, 14 Jun 2026 20:23:31 +0100
Subject: [PATCH 07/10] bench(jmh): add icon-ramp and mixed v1.8 showcase
 render benches

IconRampJmhBenchmark places N copies of a multi-layer SVG icon
(@Param 8/32/128) and renders to PDF, so the per-icon node-build +
layout + render scaling is visible; the icon is parsed once in setup
so the ramp measures placement, not re-parsing.

MixedShowcaseJmhBenchmark renders one realistic document mixing every
v1.8 vector feature -- prose with two inline sparklines, a grouped bar
chart and a pie chart, a row of SVG icons, and a gradient accent path
-- as a single integration canary for "did a v1.8 feature regress a
realistic doc?".

Both reuse the existing SvgBenchmarkFixtures / ChartBenchmarkFixtures;
no src/main change.
---
 .../compose/jmh/IconRampJmhBenchmark.java     | 82 ++++++++++++++++
 .../jmh/MixedShowcaseJmhBenchmark.java        | 95 +++++++++++++++++++
 2 files changed, 177 insertions(+)
 create mode 100644 benchmarks/src/main/java/com/demcha/compose/jmh/IconRampJmhBenchmark.java
 create mode 100644 benchmarks/src/main/java/com/demcha/compose/jmh/MixedShowcaseJmhBenchmark.java

diff --git a/benchmarks/src/main/java/com/demcha/compose/jmh/IconRampJmhBenchmark.java b/benchmarks/src/main/java/com/demcha/compose/jmh/IconRampJmhBenchmark.java
new file mode 100644
index 000000000..ec655616d
--- /dev/null
+++ b/benchmarks/src/main/java/com/demcha/compose/jmh/IconRampJmhBenchmark.java
@@ -0,0 +1,82 @@
+package com.demcha.compose.jmh;
+
+import com.demcha.compose.GraphCompose;
+import com.demcha.compose.SvgBenchmarkFixtures;
+import com.demcha.compose.document.api.DocumentPageSize;
+import com.demcha.compose.document.api.DocumentSession;
+import com.demcha.compose.document.dsl.PageFlowBuilder;
+import com.demcha.compose.document.style.DocumentInsets;
+import com.demcha.compose.document.svg.SvgIcon;
+import org.openjdk.jmh.annotations.Benchmark;
+import org.openjdk.jmh.annotations.BenchmarkMode;
+import org.openjdk.jmh.annotations.Fork;
+import org.openjdk.jmh.annotations.Measurement;
+import org.openjdk.jmh.annotations.Mode;
+import org.openjdk.jmh.annotations.OutputTimeUnit;
+import org.openjdk.jmh.annotations.Param;
+import org.openjdk.jmh.annotations.Scope;
+import org.openjdk.jmh.annotations.State;
+import org.openjdk.jmh.annotations.Warmup;
+import org.openjdk.jmh.infra.Blackhole;
+
+import java.util.concurrent.TimeUnit;
+
+/**
+ * Strict JMH micro-benchmark: an "icon ramp" — place {@code N} copies of a
+ * multi-layer SVG icon (the realistic icon-grid / skills-ribbon workload) and
+ * render to PDF. Parameterized over N so the trend (node-build + layout +
+ * render per icon) is visible; the icon is parsed once in setup so the ramp
+ * measures placement scaling, not re-parsing.
+ *
+ * <pre>
+ *   ./mvnw -f benchmarks/pom.xml clean package -DskipTests
+ *   java -jar benchmarks/target/benchmarks.jar IconRamp
+ * </pre>
+ *
+ * @author Artem Demchyshyn
+ */
+@BenchmarkMode({Mode.AverageTime, Mode.Throughput})
+@OutputTimeUnit(TimeUnit.MILLISECONDS)
+@State(Scope.Benchmark)
+@Warmup(iterations = 3, time = 2)
+@Measurement(iterations = 5, time = 2)
+@Fork(1)
+public class IconRampJmhBenchmark {
+
+    @Param({"8", "32", "128"})
+    public int iconCount;
+
+    /** Parsed once: the ramp measures node-build + layout + render scaling, not re-parsing. */
+    private final SvgIcon icon = SvgIcon.parse(SvgBenchmarkFixtures.MULTI_LAYER_ICON_SVG);
+
+    /**
+     * Places {@code iconCount} icons in a flow and renders the document.
+     *
+     * @param blackhole JMH sink
+     * @throws Exception if rendering fails
+     */
+    @Benchmark
+    public void renderIconRamp(Blackhole blackhole) throws Exception {
+        try (DocumentSession document = GraphCompose.document()
+                .pageSize(DocumentPageSize.A4)
+                .margin(DocumentInsets.of(24))
+                .create()) {
+            PageFlowBuilder flow = document.pageFlow().name("IconRamp").spacing(4);
+            for (int i = 0; i < iconCount; i++) {
+                flow.addSvgIcon(icon, 32);
+            }
+            flow.build();
+            blackhole.consume(document.toPdfBytes());
+        }
+    }
+
+    /**
+     * Runs the JMH harness over this benchmark.
+     *
+     * @param args JMH CLI arguments
+     * @throws Exception if the JMH runner fails
+     */
+    public static void main(String[] args) throws Exception {
+        org.openjdk.jmh.Main.main(args);
+    }
+}
diff --git a/benchmarks/src/main/java/com/demcha/compose/jmh/MixedShowcaseJmhBenchmark.java b/benchmarks/src/main/java/com/demcha/compose/jmh/MixedShowcaseJmhBenchmark.java
new file mode 100644
index 000000000..ae139a705
--- /dev/null
+++ b/benchmarks/src/main/java/com/demcha/compose/jmh/MixedShowcaseJmhBenchmark.java
@@ -0,0 +1,95 @@
+package com.demcha.compose.jmh;
+
+import com.demcha.compose.ChartBenchmarkFixtures;
+import com.demcha.compose.GraphCompose;
+import com.demcha.compose.SvgBenchmarkFixtures;
+import com.demcha.compose.document.api.DocumentPageSize;
+import com.demcha.compose.document.api.DocumentSession;
+import com.demcha.compose.document.dsl.PageFlowBuilder;
+import com.demcha.compose.document.style.DocumentColor;
+import com.demcha.compose.document.style.DocumentInsets;
+import com.demcha.compose.document.style.DocumentPaint;
+import com.demcha.compose.document.svg.SvgIcon;
+import org.openjdk.jmh.annotations.Benchmark;
+import org.openjdk.jmh.annotations.BenchmarkMode;
+import org.openjdk.jmh.annotations.Fork;
+import org.openjdk.jmh.annotations.Measurement;
+import org.openjdk.jmh.annotations.Mode;
+import org.openjdk.jmh.annotations.OutputTimeUnit;
+import org.openjdk.jmh.annotations.Scope;
+import org.openjdk.jmh.annotations.State;
+import org.openjdk.jmh.annotations.Warmup;
+import org.openjdk.jmh.infra.Blackhole;
+
+import java.util.concurrent.TimeUnit;
+
+/**
+ * Strict JMH micro-benchmark: a representative "v1.8 showcase" document that
+ * mixes every new vector feature in one render — running prose with two inline
+ * sparklines, a grouped bar chart and a pie chart, a row of SVG icons, and
+ * gradient accent paths. This is the integration canary: it answers "did adding
+ * any v1.8 feature blow up a realistic document?" in one number.
+ *
+ * <pre>
+ *   ./mvnw -f benchmarks/pom.xml clean package -DskipTests
+ *   java -jar benchmarks/target/benchmarks.jar MixedShowcase
+ * </pre>
+ *
+ * @author Artem Demchyshyn
+ */
+@BenchmarkMode({Mode.AverageTime, Mode.Throughput})
+@OutputTimeUnit(TimeUnit.MILLISECONDS)
+@State(Scope.Benchmark)
+@Warmup(iterations = 3, time = 2)
+@Measurement(iterations = 5, time = 2)
+@Fork(1)
+public class MixedShowcaseJmhBenchmark {
+
+    private static final int ICONS = 8;
+
+    /** Parsed once; the bench measures the mixed render, not icon parsing. */
+    private final SvgIcon icon = SvgIcon.parse(SvgBenchmarkFixtures.MULTI_LAYER_ICON_SVG);
+
+    /**
+     * Renders the mixed v1.8 showcase document to PDF bytes.
+     *
+     * @param blackhole JMH sink
+     * @throws Exception if rendering fails
+     */
+    @Benchmark
+    public void renderMixedShowcase(Blackhole blackhole) throws Exception {
+        DocumentPaint accent = DocumentPaint.linear(
+                DocumentColor.rgb(167, 139, 250), DocumentColor.rgb(97, 40, 217));
+        try (DocumentSession document = GraphCompose.document()
+                .pageSize(DocumentPageSize.A4)
+                .margin(DocumentInsets.of(32))
+                .create()) {
+            PageFlowBuilder flow = document.pageFlow().name("MixedShowcase").spacing(12);
+            flow.addParagraph("v1.8 feature showcase");
+            flow.addRich(r -> r
+                    .plain("Revenue ")
+                    .sparkline(42, 9, DocumentColor.rgb(20, 80, 95), 65.2, 69.8, 74.1, 81.3, 88.2)
+                    .plain("   profit ")
+                    .sparklineLine(42, 9, 1.6, DocumentColor.rgb(196, 153, 76), 28.1, 30.7, 32.9, 36.4, 39.5));
+            flow.chart(ChartBenchmarkFixtures.barSpec(), ChartBenchmarkFixtures.barStyle());
+            flow.chart(ChartBenchmarkFixtures.pieSpec());
+            for (int i = 0; i < ICONS; i++) {
+                flow.addSvgIcon(icon, 32);
+            }
+            flow.addPath(p -> p.size(220, 28)
+                    .moveTo(0.0, 0.5).curveTo(0.25, 1.0, 0.75, 0.0, 1.0, 0.5).fill(accent));
+            flow.build();
+            blackhole.consume(document.toPdfBytes());
+        }
+    }
+
+    /**
+     * Runs the JMH harness over this benchmark.
+     *
+     * @param args JMH CLI arguments
+     * @throws Exception if the JMH runner fails
+     */
+    public static void main(String[] args) throws Exception {
+        org.openjdk.jmh.Main.main(args);
+    }
+}

From 2bdb59b36decd2eefec3e9f1b96810e107ffe701 Mon Sep 17 00:00:00 2001
From: DemchaAV <demchaav@gmail.com>
Date: Sun, 14 Jun 2026 23:02:12 +0100
Subject: [PATCH 08/10] bench(gate): gate the long-token scenario and guard
 scenario/threshold coverage

The smoke perf gate ignores any scenario without a configured threshold,
so long-token (the 6th latency scenario) was silently ungated -- a real
regression there would never fail the gate. Add its SMOKE threshold
(10.0 ms / 256.0 MB, ~3x the observed ~3.2 ms / ~94 MB, matching the
existing per-scenario calibration headroom).

Hoist the scenario list to a static SCENARIO_DEFS so the names are
readable without re-measuring, and add CurrentSpeedScenarioGateTest,
which fails the build if any scenario lacks a SMOKE threshold. No
behaviour change to the run itself -- same six scenarios, same order.
---
 .../demcha/compose/CurrentSpeedBenchmark.java | 57 +++++++++++++++----
 .../compose/CurrentSpeedScenarioGateTest.java | 35 ++++++++++++
 2 files changed, 82 insertions(+), 10 deletions(-)
 create mode 100644 benchmarks/src/test/java/com/demcha/compose/CurrentSpeedScenarioGateTest.java

diff --git a/benchmarks/src/main/java/com/demcha/compose/CurrentSpeedBenchmark.java b/benchmarks/src/main/java/com/demcha/compose/CurrentSpeedBenchmark.java
index e3d877943..64e113d20 100644
--- a/benchmarks/src/main/java/com/demcha/compose/CurrentSpeedBenchmark.java
+++ b/benchmarks/src/main/java/com/demcha/compose/CurrentSpeedBenchmark.java
@@ -32,6 +32,7 @@
 import java.util.concurrent.ExecutorService;
 import java.util.concurrent.Executors;
 import java.util.concurrent.Future;
+import java.util.function.Function;
 
 /**
  * Focused local benchmark harness for current GraphCompose performance.
@@ -87,6 +88,36 @@ public final class CurrentSpeedBenchmark {
     private final ProposalDocumentSpec proposal = CanonicalBenchmarkSupport.canonicalProposal();
     private final CvSpec cv = CanonicalBenchmarkSupport.canonicalCv();
 
+    // Canonical scenario list, in table order. Declared statically (the
+    // renderer is bound to an instance at run time) so the gate-coverage guard
+    // test can read the scenario names without re-measuring: a scenario added
+    // here without a matching SMOKE threshold below would silently escape the
+    // perf gate, and CurrentSpeedScenarioGateTest fails loudly if that happens.
+    private static final List<ScenarioDef> SCENARIO_DEFS = List.of(
+            new ScenarioDef("engine-simple", "One-page engine composition",
+                    b -> b::renderEngineSimpleDocument),
+            new ScenarioDef("invoice-template", "Compose-first invoice template",
+                    b -> b::renderInvoiceTemplateDocument),
+            new ScenarioDef("cv-template", "Compose-first CV template",
+                    b -> b::renderCvTemplateDocument),
+            new ScenarioDef("proposal-template", "Long multi-page proposal template",
+                    b -> b::renderProposalTemplateDocument),
+            new ScenarioDef("feature-rich", "QR, barcode, watermark, header/footer, page break",
+                    b -> b::renderFeatureRichDocument),
+            new ScenarioDef("long-token", "Long unbreakable tokens (URLs/IDs) forcing character-level wrap",
+                    b -> b::renderLongTokenDocument)
+    );
+
+    /**
+     * Ordered scenario names. Read by {@code CurrentSpeedScenarioGateTest} to
+     * assert every scenario is covered by a SMOKE gate threshold.
+     *
+     * @return the canonical scenario names in table order
+     */
+    static List<String> scenarioNames() {
+        return SCENARIO_DEFS.stream().map(ScenarioDef::name).toList();
+    }
+
     public static void main(String[] args) throws Exception {
         BenchmarkSupport.configureQuietLogging();
         new CurrentSpeedBenchmark().run();
@@ -109,14 +140,9 @@ private void run() throws Exception {
         System.out.println("Perf gate: " + (enforceGate ? "enabled" : "disabled"));
         System.out.println();
 
-        List<Scenario> scenarios = List.of(
-                new Scenario("engine-simple", "One-page engine composition", this::renderEngineSimpleDocument),
-                new Scenario("invoice-template", "Compose-first invoice template", this::renderInvoiceTemplateDocument),
-                new Scenario("cv-template", "Compose-first CV template", this::renderCvTemplateDocument),
-                new Scenario("proposal-template", "Long multi-page proposal template", this::renderProposalTemplateDocument),
-                new Scenario("feature-rich", "QR, barcode, watermark, header/footer, page break", this::renderFeatureRichDocument),
-                new Scenario("long-token", "Long unbreakable tokens (URLs/IDs) forcing character-level wrap", this::renderLongTokenDocument)
-        );
+        List<Scenario> scenarios = SCENARIO_DEFS.stream()
+                .map(def -> new Scenario(def.name(), def.description(), def.renderer().apply(this)))
+                .toList();
 
         System.out.println("Latency benchmark");
         System.out.printf("%-18s | %10s | %10s | %10s | %10s | %11s | %10s | %10s%n",
@@ -820,6 +846,13 @@ private static String format(double value) {
     private record Scenario(String name, String description, Renderer renderer) {
     }
 
+    // Static scenario template: name + description + a factory that binds the
+    // renderer to a benchmark instance. Keeps the scenario list declarable as a
+    // static constant (so the gate-coverage test can read it) while the actual
+    // render still runs against per-run instance state.
+    private record ScenarioDef(String name, String description, Function<CurrentSpeedBenchmark, Renderer> renderer) {
+    }
+
     @FunctionalInterface
     private interface Renderer {
         byte[] render() throws Exception;
@@ -909,12 +942,16 @@ enum BenchmarkProfile {
                 // (typically 1.5-2x slower) does not produce false positives
                 // while real regressions of 50% or more still trigger. The
                 // previous values (800-2600 ms) were 50-100x looser and would
-                // not have flagged even a 10x slowdown.
+                // not have flagged even a 10x slowdown. long-token (observed
+                // ~3.2 ms / ~94 MB) is gated too so every scenario in the
+                // latency table is covered — CurrentSpeedScenarioGateTest pins
+                // that invariant.
                 "engine-simple", new SmokeThreshold(8.0, 96.0),
                 "invoice-template", new SmokeThreshold(35.0, 384.0),
                 "cv-template", new SmokeThreshold(25.0, 192.0),
                 "proposal-template", new SmokeThreshold(45.0, 384.0),
-                "feature-rich", new SmokeThreshold(100.0, 256.0)
+                "feature-rich", new SmokeThreshold(100.0, 256.0),
+                "long-token", new SmokeThreshold(10.0, 256.0)
         ));
 
         private final String id;
diff --git a/benchmarks/src/test/java/com/demcha/compose/CurrentSpeedScenarioGateTest.java b/benchmarks/src/test/java/com/demcha/compose/CurrentSpeedScenarioGateTest.java
new file mode 100644
index 000000000..da7296d45
--- /dev/null
+++ b/benchmarks/src/test/java/com/demcha/compose/CurrentSpeedScenarioGateTest.java
@@ -0,0 +1,35 @@
+package com.demcha.compose;
+
+import org.junit.jupiter.api.Test;
+
+import java.util.List;
+
+import static org.assertj.core.api.Assertions.assertThat;
+
+/**
+ * Guards that every CurrentSpeed latency scenario is covered by a SMOKE gate
+ * threshold.
+ *
+ * <p>The smoke perf gate silently ignores a scenario that has no configured
+ * threshold (by design — see
+ * {@link CurrentSpeedBenchmarkPerfGateTest#ignoresScenariosWithoutAConfiguredThreshold()}).
+ * That defensive behaviour means a newly added scenario would escape the gate
+ * unnoticed. This test makes the omission fail loudly instead: adding a scenario
+ * to {@code SCENARIO_DEFS} without a matching {@code SMOKE} threshold breaks the
+ * build.</p>
+ */
+class CurrentSpeedScenarioGateTest {
+
+    @Test
+    void everyScenarioHasASmokeGateThreshold() {
+        var gated = CurrentSpeedBenchmark.BenchmarkProfile.SMOKE.smokeThresholds().keySet();
+
+        List<String> ungated = CurrentSpeedBenchmark.scenarioNames().stream()
+                .filter(name -> !gated.contains(name))
+                .toList();
+
+        assertThat(ungated)
+                .as("CurrentSpeed scenarios missing a SMOKE gate threshold")
+                .isEmpty();
+    }
+}

From c2317f5cc14e316e5fa8f06bdab2db4f88b69b8e Mon Sep 17 00:00:00 2001
From: DemchaAV <demchaav@gmail.com>
Date: Sun, 14 Jun 2026 23:15:13 +0100
Subject: [PATCH 09/10] docs(benchmarks): finish the removed-bench cleanup and
 fix two stale Javadocs

Sweep the references the three removed benchmark mains (FullCvBenchmark,
GraphComposeBenchmark, ScalabilityBenchmark) left behind, and correct two
docs that overstated what the code does:

- ab-bench.ps1 no longer parses the retired 04/05/06 logs (they are no
  longer produced); it reads the surviving stress log, and the
  thread-scaling series still comes from the current-speed JSON report.
- benchmarks/README.md "Files in this module": split a row that had been
  merged onto one line and restore the blank line before "## Running".
- docs/operations/performance.md: mark it a frozen v1.4 snapshot and note
  the retired suites/mains so it no longer contradicts benchmarks.md.
- docs/operations/benchmarks.md and the run-benchmarks.ps1 synopsis: note
  that steps 04-06 were retired, so the 03 -> 07 numbering gap is intentional.
- SvgJmhBenchmark Javadoc: describe the heart-path parse accurately
  (tokenize / cubic-line lowering / viewBox normalization); the fixture
  has no arc command, so the old "arc->cubic" wording was wrong.
- BenchmarkMedianTool Javadoc: note that stages[] is not carried into the
  median aggregate, so a median-vs-median diff shows no stage deltas.
---
 benchmarks/README.md                          |  4 +++-
 .../demcha/compose/BenchmarkMedianTool.java   |  5 +++++
 .../demcha/compose/jmh/SvgJmhBenchmark.java   |  3 ++-
 docs/operations/benchmarks.md                 |  4 ++++
 docs/operations/performance.md                | 14 ++++++++++++--
 scripts/ab-bench.ps1                          | 19 ++++---------------
 scripts/run-benchmarks.ps1                    |  4 +++-
 7 files changed, 33 insertions(+), 20 deletions(-)

diff --git a/benchmarks/README.md b/benchmarks/README.md
index e232c6e21..48c953b20 100644
--- a/benchmarks/README.md
+++ b/benchmarks/README.md
@@ -62,11 +62,13 @@
 | File | Role |
 |---|---|
 | `CurrentSpeedBenchmark` | Default scenario runner — what CI's `perf-smoke` job exercises. Takes a `-Dgraphcompose.benchmark.profile=smoke\|full\|stress` switch. |
-| `ComparativeBenchmark` | Renders the same fixtures through GraphCompose, iText, openHTMLToPDF, JasperReports. **Rough local comparison only** — see "When not to use" above. || `CanonicalBenchmarkSupport`, `BenchmarkSupport` | Shared fixture builders + measurement helpers. |
+| `ComparativeBenchmark` | Renders the same fixtures through GraphCompose, iText, openHTMLToPDF, JasperReports. **Rough local comparison only** — see "When not to use" above. |
+| `CanonicalBenchmarkSupport`, `BenchmarkSupport` | Shared fixture builders + measurement helpers. |
 | `BenchmarkReportWriter` | Writes JSON / CSV / text reports under `benchmarks/target/benchmarks/`. |
 | `BenchmarkDiffTool` | Compares two JSON reports and prints a delta table. Useful for pre/post comparisons. |
 | `BenchmarkMedianTool` | Median + dispersion across N runs of the same scenario. |
 | `GraphComposeStressTest`, `EnduranceTest` | Long-running stress / endurance harnesses. |
+
 ## Running
 
 From the repo root:
diff --git a/benchmarks/src/main/java/com/demcha/compose/BenchmarkMedianTool.java b/benchmarks/src/main/java/com/demcha/compose/BenchmarkMedianTool.java
index 5eb786649..f82d0b6f8 100644
--- a/benchmarks/src/main/java/com/demcha/compose/BenchmarkMedianTool.java
+++ b/benchmarks/src/main/java/com/demcha/compose/BenchmarkMedianTool.java
@@ -24,6 +24,11 @@
  * possible, so it can be diffed by {@link BenchmarkDiffTool}. The tool is meant
  * for local benchmark sessions where a few repeated runs are needed to reduce
  * machine noise before comparing results.</p>
+ *
+ * <p>The current-speed per-stage breakdown ({@code stages[]}) is <em>not</em>
+ * carried into the median aggregate — only latency and throughput are medianed.
+ * A median-vs-median diff therefore shows no compose/layout/render stage deltas;
+ * diff a single-run pair when you need stage attribution.</p>
  */
 public final class BenchmarkMedianTool {
 
diff --git a/benchmarks/src/main/java/com/demcha/compose/jmh/SvgJmhBenchmark.java b/benchmarks/src/main/java/com/demcha/compose/jmh/SvgJmhBenchmark.java
index f7a63b30c..58ed3f99f 100644
--- a/benchmarks/src/main/java/com/demcha/compose/jmh/SvgJmhBenchmark.java
+++ b/benchmarks/src/main/java/com/demcha/compose/jmh/SvgJmhBenchmark.java
@@ -24,7 +24,8 @@
  * {@code DocumentSession}, no PDF render):</p>
  * <ul>
  *   <li>{@code parseSvgPath} — {@link SvgPath#parse} of a real Material icon
- *       {@code d} string (arc→cubic conversion, normalization).</li>
+ *       {@code d} string (tokenize, relative/absolute resolution, cubic/line
+ *       lowering, viewBox normalization).</li>
  *   <li>{@code readSvgIcon} — {@link SvgIcon#parse} of a multi-layer icon (XML
  *       parse, {@code <g>} transform accumulation, gradient resolution, one
  *       {@link SvgPath} per layer).</li>
diff --git a/docs/operations/benchmarks.md b/docs/operations/benchmarks.md
index 775483384..3611d877e 100644
--- a/docs/operations/benchmarks.md
+++ b/docs/operations/benchmarks.md
@@ -40,6 +40,10 @@ The script prints numbered sections so you can map console output to the pipelin
    runs the thread-scaling throughput sweep (1 → 16 threads).
 3. `03-comparative`
    Runs the GraphCompose canonical vs iText 5 vs JasperReports comparison.
+
+   _Steps 04–06 (`core-engine`, `full-cv`, `scalability`) were retired. The
+   surviving steps keep their original `NN-` console prefixes, so the labels
+   jump from `03-` to `07-`._
 7. `07-stress`
    Runs the concurrent stability stress test.
 8. `08-endurance`
diff --git a/docs/operations/performance.md b/docs/operations/performance.md
index ecf02c5b7..7fc02d480 100644
--- a/docs/operations/performance.md
+++ b/docs/operations/performance.md
@@ -1,7 +1,13 @@
 # Performance — v1.4 numbers
 
-All numbers below come from `scripts/run-benchmarks.ps1` — the full local
-benchmark workflow that builds the test classpath once and runs
+> **Historical snapshot (v1.4).** The numbers and suite list below are frozen
+> as captured for v1.4 and are kept for reference. The pipeline has since
+> changed: the `core-engine`, `full-cv`, and `scalability` suites were retired,
+> and current numbers come from the `current-speed` / `comparative` / `stress`
+> pipeline plus the JMH suite. See [docs/operations/benchmarks.md](./benchmarks.md).
+
+All numbers below were captured from `scripts/run-benchmarks.ps1` — the full
+local benchmark workflow that built the test classpath once and ran
 `current-speed`, `comparative`, `core-engine`, `full-cv`, `scalability`,
 and `stress` suites in sequence. They were captured on a developer
 laptop; CI machines are typically 1.5–2× slower. The benchmark
@@ -93,5 +99,9 @@ snapshots.
 
 ## Engine-only timings
 
+_The `GraphComposeBenchmark` and `FullCvBenchmark` mains below were retired
+after v1.4. Equivalent timings now come from the `CurrentSpeedBenchmark`
+`engine-simple` scenario and the JMH `TemplateCvJmhBenchmark`._
+
 - `GraphComposeBenchmark` (engine-only, no PDF render): avg **1.04 ms**, p50 **0.97 ms**, p95 **1.64 ms**.
 - `FullCvBenchmark` (full CV template, including render): avg **4.14 ms**, p50 **3.80 ms**, p95 **6.37 ms**.
diff --git a/scripts/ab-bench.ps1 b/scripts/ab-bench.ps1
index 5a3e4eb42..a237ec203 100644
--- a/scripts/ab-bench.ps1
+++ b/scripts/ab-bench.ps1
@@ -110,21 +110,10 @@ function Parse-Comparative($jsonPath) {
 }
 function Parse-Logs($logsDir) {
     $o = @{}
-    $scal = Join-Path $logsDir "06-scalability.log"
-    if (Test-Path $scal) {
-        foreach ($line in (Get-Content $scal)) {
-            if ($line -match '^\s*(\d+)\s*\|\s*\d+\s*\|\s*([\d.]+)\s*$') {
-                $o["scalability | $($matches[1])t | docs/s"] = [double]$matches[2]
-            }
-        }
-    }
-    foreach ($pair in @(@("04-core-engine.log", "core-engine"), @("05-full-cv.log", "full-cv"))) {
-        $p = Join-Path $logsDir $pair[0]
-        if (Test-Path $p) {
-            $txt = Get-Content $p -Raw
-            if ($txt -match 'Median[^\r\n]*?:\s*([\d.]+)\s*ms') { $o["$($pair[1]) | median ms"] = [double]$matches[1] }
-        }
-    }
+    # Steps 04-06 (core-engine, full-cv, scalability) were retired, so their logs
+    # are no longer produced. Current-speed throughput — including the
+    # thread-scaling series — is read from the JSON report by Parse-CurrentSpeed;
+    # only the surviving stress log is parsed here.
     $stress = Join-Path $logsDir "07-stress.log"
     if (Test-Path $stress) {
         $txt = Get-Content $stress -Raw
diff --git a/scripts/run-benchmarks.ps1 b/scripts/run-benchmarks.ps1
index e3d3947b6..a0dd2c777 100644
--- a/scripts/run-benchmarks.ps1
+++ b/scripts/run-benchmarks.ps1
@@ -6,7 +6,9 @@ Runs the local GraphCompose benchmark pipeline and stores timestamped logs and r
 .DESCRIPTION
 The wrapper performs a staged local run:
 01 build classpath, 02 current-speed, 03 comparative, 07 stress,
-optional 08 endurance, then 09/10 diff steps.
+optional 08 endurance, then 09/10 diff and 11 verdict steps. Steps 04-06
+(core-engine, full-cv, scalability) were retired; the surviving steps keep
+their original numeric prefixes, so the numbering jumps from 03 to 07.
 
 Current-speed diffs are profile-aware. The wrapper only compares reports
 from the same current-speed profile (`smoke` or `full`) and skips the

From b93c44ec62ce1a386889302cec4383f3b3f31405 Mon Sep 17 00:00:00 2001
From: DemchaAV <demchaav@gmail.com>
Date: Sun, 14 Jun 2026 23:15:51 +0100
Subject: [PATCH 10/10] docs(changelog): note the v1.8 feature-object benches,
 stage output, and gate coverage

---
 CHANGELOG.md | 22 ++++++++++++++++++++++
 1 file changed, 22 insertions(+)

diff --git a/CHANGELOG.md b/CHANGELOG.md
index e9f7124c2..6cb0e7074 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -344,6 +344,28 @@ Entries land here as they merge.
   `ScalabilityBenchmark` (its thread-scaling sweep folded into
   `CurrentSpeedBenchmark`'s full-profile throughput run, now `1,2,4,8,16`).
   Dropped the matching `run-benchmarks.ps1` steps and doc entries.
+- **Feature-object benchmarks for the v1.8 vector surface (not shipped).**
+  The suite previously exercised only text/table primitives. Added JMH render
+  benches and deterministic probes over the new vector features:
+  `SvgJmhBenchmark` (path parse / whole-file icon read / icon→node) plus a
+  `SvgParseAllocProbe`; `ChartJmhBenchmark` (bar + line + pie render) plus a
+  `ChartAllocProbe` (layout-compile allocation); `VectorRenderOperatorProbe`
+  (the same paths drawn flat vs. gradient vs. translucent, counted as PDF
+  content-stream operators); `IconRampJmhBenchmark` (icon-placement scaling,
+  `@Param` 8/32/128); and `MixedShowcaseJmhBenchmark` (one document combining
+  prose, inline sparklines, bar + pie charts, SVG icons and a gradient path).
+  Shared `SvgBenchmarkFixtures` / `ChartBenchmarkFixtures` hold the inputs so
+  each bench and its probe measure identical data.
+- **Current-speed report carries a stage breakdown and a run summary (not
+  shipped).** `CurrentSpeedBenchmark` persists a per-scenario compose / layout /
+  render split (`stages[]`, median ms) to the JSON and a `stages` CSV, and
+  writes a readable `summary.md`. `BenchmarkDiffTool` consumes `stages[]`,
+  prints a per-stage delta table, and reports the scenarios added/removed
+  between two runs.
+- **Every current-speed scenario is now covered by the smoke perf gate (not
+  shipped).** The `long-token` scenario previously had no SMOKE threshold and
+  silently escaped the gate; it now has one, and `CurrentSpeedScenarioGateTest`
+  fails the build if any scenario lacks a threshold.
 - **Removed the `java.awt.*` / `java.util.*` co-wildcard in four files.**
   `InvoiceTemplateComposer`, `ProposalTemplateComposer`,
   `WeeklyScheduleTemplateComposer`, and the engine `PdfRenderingSystemECS`