chore(benchmarks): modernize for v1.8 — feature benches, stage output, gate coverage by DemchaAV · Pull Request #195 · DemchaAV/GraphCompose

DemchaAV · 2026-06-14T22:19:52Z

Summary

The benchmark suite measured only text/table primitives — none of the v1.8
vector features (SVG import, native charts, vector paths, gradients) had a
bench, and the current-speed report blended compose/layout/render into one
number. This branch adds feature-object benches and deterministic probes over
the new surface, splits the current-speed report into per-stage timings with a
readable summary.md, closes a silent gap in the smoke perf gate, and removes
three redundant benches. Everything lives in the separate benchmarks/ module
plus its docs — no library src/main is touched.

Commits (reviewed sequentially)

chore(benchmarks): remove three redundant benchmark mains — drop
FullCvBenchmark (superseded by the JMH TemplateCvJmhBenchmark),
GraphComposeBenchmark (duplicated CurrentSpeedBenchmark's engine-simple
scenario), and ScalabilityBenchmark (thread sweep folded into the
full-profile throughput run, now 1,2,4,8,16); prune the matching
run-benchmarks.ps1 steps.
perf(benchmarks): persist compose/layout/render stages + a run summary —
CurrentSpeedBenchmark writes a per-scenario stage split (stages[],
median ms) to the JSON and a stages CSV, plus a human-readable summary.md.
perf(benchmarks): diff consumes stages[] and reports added/removed scenarios — BenchmarkDiffTool prints a per-stage delta table and the
scenario set-change (added/removed) between two runs.
perf(benchmarks): add SVG-import feature benches — SvgJmhBenchmark
(path parse / whole-file icon read / icon→node) + SvgParseAllocProbe.
perf(benchmarks): add chart feature benches — ChartJmhBenchmark
(bar + line + pie render) + ChartAllocProbe (layout-compile allocation).
perf(benchmarks): add vector-paint render-operator probe —
VectorRenderOperatorProbe draws the same paths flat vs. gradient vs.
translucent and counts the PDF content-stream operators each paint mode emits.
bench(jmh): add icon-ramp and mixed v1.8 showcase benches —
IconRampJmhBenchmark (icon-placement scaling, @Param 8/32/128) and
MixedShowcaseJmhBenchmark (one document mixing prose, inline sparklines,
bar + pie charts, SVG icons and a gradient path) as the integration canary.
bench(gate): gate the long-token scenario and guard threshold coverage —
long-token had no SMOKE threshold and silently escaped the gate; add one
(10.0 ms / 256.0 MB, ~3× its observed ~3.2 ms / ~94 MB) and add
CurrentSpeedScenarioGateTest, which fails the build if any scenario lacks a
threshold. Scenario list hoisted to a static SCENARIO_DEFS so the names are
testable; the six scenarios, order, descriptions and renderers are unchanged.
docs(benchmarks): finish the removed-bench cleanup and fix two stale Javadocs — sweep references the removed mains left behind (ab-bench.ps1
log parsing, performance.md, a merged row in the module README) and correct
two docs that overstated the code (the SVG-parse fixture has no arc command;
stages[] is not carried into the median aggregate).
docs(changelog): note the v1.8 feature-object benches, stage output, and gate coverage.

Testing

./mvnw -B -ntp verify -pl . → BUILD SUCCESS, 1380 tests (canonical suite +
japicmp + javadoc; the CHANGELOG/docs guards VersionConsistencyGuardTest +
CanonicalSurfaceGuardTest stay green after the doc edits).
./mvnw -B -ntp -f benchmarks/pom.xml verify → BUILD SUCCESS, 30 tests —
incl. new CurrentSpeedScenarioGateTest and a new BenchmarkDiffToolTest case
covering added/removed scenarios + stage deltas.
New JMH benches run end-to-end: IconRamp shows a clean near-linear
placement ramp (8 → 0.86, 32 → 2.74, 128 → 8.79 ms/op); MixedShowcase
~8.7 ms/op. A smoke run with the gate enabled passes with long-token now
gated. The alloc/operator probes report deterministic counts (chart
compile-pass ~446.8 KB; gradient paint adds sh+W per shape, alpha adds gs).

Notes for review

Benchmark-only branch — no library src/main change. The new fixtures
(SvgBenchmarkFixtures, ChartBenchmarkFixtures) are public only because the
JMH benches live in the .jmh subpackage; each fixture is reused by both the
bench and its probe so the two measure identical data.
stages[] is intentionally not medianed — BenchmarkMedianTool aggregates
latency/throughput only, so a median-vs-median diff shows no stage deltas; diff a
single-run pair for stage attribution (noted in the median-tool Javadoc).
No committed smoke baseline — the verdict gate stays a local same-machine
A/B tool (a static committed baseline would false-positive on machine variance);
run-benchmarks.ps1 already skips the verdict step gracefully when none exists.
The absolute SMOKE gate (now covering all six scenarios) is the CI net.

Lane: test — benchmark tooling + operations docs; no canonical / shared-engine / legacy surface touched.

FullCvBenchmark duplicated the JMH TemplateCvJmhBenchmark (CV through ModernProfessional) with a hand-rolled, JIT-noisier loop and no report. GraphComposeBenchmark was an early-engine relic measuring the same title+body+divider doc as CurrentSpeedBenchmark's engine-simple scenario. ScalabilityBenchmark's thread-scaling sweep is folded into CurrentSpeedBenchmark's full-profile throughput run (thread counts now 1,2,4,8,16). Drop the matching run-benchmarks.ps1 steps and the benchmarks.md / benchmarks/README.md entries. ComparativeBenchmark, the JMH benches, the deterministic probes, and the soak/stress runners stay. Benchmark module compiles; its 28 tests pass.

…y.md The stage breakdown (per-template compose / layout / render medians) was printed to the console and discarded. Promote it into the report: runStageBreakdown returns a StageRow, CurrentSpeedReport carries a stages[] array, and a stages CSV is written — so a diff can attribute a regression to an engine stage, not just the blended total. Also write a per-run summary.md (latency + stages + throughput tables) so a reviewer reads one file instead of the JSON plus several CSVs. Additive output only: diff/verdict/median read the report by field and ignore the new array. Benchmark module compiles; 28 tests pass; verified on a smoke run (stages[] present, summary.md readable, perf gate passes).

…enarios BenchmarkDiffTool now (1) surfaces scenario set changes — addedScenarios / removedScenarios — instead of silently intersecting, so a newly-added (or dropped) scenario can no longer vanish from a diff unnoticed; and (2) diffs the stages[] array, emitting per-scenario compose/layout/render/total percent deltas (console block + stages-diff CSV) so a regression can be attributed to an engine stage. Backward-compatible: a report without stages[] yields an empty stage diff (MissingNode iterates empty); latency/throughput delta rows stay intersection-only; the diff report is terminal (median/verdict read producer reports, not diffs). Adds a DiffToolTest case; 29 bench tests pass.

First feature-object benchmarks for the v1.8 vector surface (the rest of the suite is text/table only): - SvgJmhBenchmark (forked JMH): SvgPath.parse of a real Material heart d, SvgIcon.parse of a multi-layer icon, SvgIcon.node on a pre-parsed icon. - SvgParseAllocProbe (deterministic ThreadMXBean alloc, median of 11): KB/op for the same three operations. - SvgBenchmarkFixtures: the heart d (vendored — the benchmark module can't reach the test/example copies) and a synthetic multi-layer icon (gradient bg + transformed groups + stroked curves) within the reader's supported subset, so it always parses. Run on demand, not per-PR: java -jar benchmarks/target/benchmarks.jar Svg. Verified: compiles; both benches run — path parse ~3.6 us/op, icon read ~308 us/op (DOM-parse dominated, 114 KB/op), node build ~0.4 us/op / 2 KB/op.

S4 of the modernization — the first chart benchmarks (the suite otherwise renders text/tables only): - ChartJmhBenchmark (forked JMH): end-to-end render of a chart-heavy doc — grouped bar + multi-series line (12 categories x 3 series) + 6-slice pie. - ChartAllocProbe (deterministic ThreadMXBean, median of 11): warm layout-compile allocation, isolating chart-resolve + geometry emission. - ChartBenchmarkFixtures: the shared bar/line/pie specs + data. Run on demand, not per-PR: java -jar benchmarks/target/benchmarks.jar Chart. Verified: compiles; render ~2.8 ms/op; compile alloc 446.8 KB (deterministic, min=max=median, 1 page).

VectorRenderOperatorProbe renders the same 40 curved blob paths three ways — flat solid fill, linear gradient, and translucent (alpha) — and counts the PDF content-stream operators, so the deltas isolate what each paint mode costs at render time. Flat takes the fast fill path (sh=0, gs=0, W=0); a gradient fill adds one shading + one clip per shape (sh, W); a translucent fill adds one ExtGState (gs). Byte-deterministic, no A/B build needed; catches a regression where a flat path wrongly takes the gradient branch (sh would jump from 0). Verified: flat 0/0/0, gradient sh=40/W=40, alpha gs=40 over 40 paths.

IconRampJmhBenchmark places N copies of a multi-layer SVG icon (@Param 8/32/128) and renders to PDF, so the per-icon node-build + layout + render scaling is visible; the icon is parsed once in setup so the ramp measures placement, not re-parsing. MixedShowcaseJmhBenchmark renders one realistic document mixing every v1.8 vector feature -- prose with two inline sparklines, a grouped bar chart and a pie chart, a row of SVG icons, and a gradient accent path -- as a single integration canary for "did a v1.8 feature regress a realistic doc?". Both reuse the existing SvgBenchmarkFixtures / ChartBenchmarkFixtures; no src/main change.

…d coverage The smoke perf gate ignores any scenario without a configured threshold, so long-token (the 6th latency scenario) was silently ungated -- a real regression there would never fail the gate. Add its SMOKE threshold (10.0 ms / 256.0 MB, ~3x the observed ~3.2 ms / ~94 MB, matching the existing per-scenario calibration headroom). Hoist the scenario list to a static SCENARIO_DEFS so the names are readable without re-measuring, and add CurrentSpeedScenarioGateTest, which fails the build if any scenario lacks a SMOKE threshold. No behaviour change to the run itself -- same six scenarios, same order.

…Javadocs Sweep the references the three removed benchmark mains (FullCvBenchmark, GraphComposeBenchmark, ScalabilityBenchmark) left behind, and correct two docs that overstated what the code does: - ab-bench.ps1 no longer parses the retired 04/05/06 logs (they are no longer produced); it reads the surviving stress log, and the thread-scaling series still comes from the current-speed JSON report. - benchmarks/README.md "Files in this module": split a row that had been merged onto one line and restore the blank line before "## Running". - docs/operations/performance.md: mark it a frozen v1.4 snapshot and note the retired suites/mains so it no longer contradicts benchmarks.md. - docs/operations/benchmarks.md and the run-benchmarks.ps1 synopsis: note that steps 04-06 were retired, so the 03 -> 07 numbering gap is intentional. - SvgJmhBenchmark Javadoc: describe the heart-path parse accurately (tokenize / cubic-line lowering / viewBox normalization); the fixture has no arc command, so the old "arc->cubic" wording was wrong. - BenchmarkMedianTool Javadoc: note that stages[] is not carried into the median aggregate, so a median-vs-median diff shows no stage deltas.

…and gate coverage

DemchaAV · 2026-06-14T22:22:53Z

Closing — opened prematurely. Work continues on the chore/benchmark-modernization branch; the PR to develop is the maintainer's call once the branch is finalized.

DemchaAV added 10 commits June 14, 2026 19:04

docs(changelog): note the v1.8 feature-object benches, stage output, …

b93c44e

…and gate coverage

DemchaAV closed this Jun 14, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore(benchmarks): modernize for v1.8 — feature benches, stage output, gate coverage#195

chore(benchmarks): modernize for v1.8 — feature benches, stage output, gate coverage#195
DemchaAV wants to merge 10 commits into
developfrom
chore/benchmark-modernization

DemchaAV commented Jun 14, 2026

Uh oh!

DemchaAV commented Jun 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

DemchaAV commented Jun 14, 2026

Summary

Commits (reviewed sequentially)

Testing

Notes for review

Uh oh!

DemchaAV commented Jun 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant