Skip to content

chore(benchmarks): modernize for v1.8 — feature benches, stage output, gate coverage#195

Closed
DemchaAV wants to merge 10 commits into
developfrom
chore/benchmark-modernization
Closed

chore(benchmarks): modernize for v1.8 — feature benches, stage output, gate coverage#195
DemchaAV wants to merge 10 commits into
developfrom
chore/benchmark-modernization

Conversation

@DemchaAV

Copy link
Copy Markdown
Owner

Summary

The benchmark suite measured only text/table primitives — none of the v1.8
vector features (SVG import, native charts, vector paths, gradients) had a
bench, and the current-speed report blended compose/layout/render into one
number. This branch adds feature-object benches and deterministic probes over
the new surface, splits the current-speed report into per-stage timings with a
readable summary.md, closes a silent gap in the smoke perf gate, and removes
three redundant benches. Everything lives in the separate benchmarks/ module
plus its docs — no library src/main is touched.

Commits (reviewed sequentially)

  1. chore(benchmarks): remove three redundant benchmark mains — drop
    FullCvBenchmark (superseded by the JMH TemplateCvJmhBenchmark),
    GraphComposeBenchmark (duplicated CurrentSpeedBenchmark's engine-simple
    scenario), and ScalabilityBenchmark (thread sweep folded into the
    full-profile throughput run, now 1,2,4,8,16); prune the matching
    run-benchmarks.ps1 steps.
  2. perf(benchmarks): persist compose/layout/render stages + a run summary
    CurrentSpeedBenchmark writes a per-scenario stage split (stages[],
    median ms) to the JSON and a stages CSV, plus a human-readable summary.md.
  3. perf(benchmarks): diff consumes stages[] and reports added/removed scenariosBenchmarkDiffTool prints a per-stage delta table and the
    scenario set-change (added/removed) between two runs.
  4. perf(benchmarks): add SVG-import feature benchesSvgJmhBenchmark
    (path parse / whole-file icon read / icon→node) + SvgParseAllocProbe.
  5. perf(benchmarks): add chart feature benchesChartJmhBenchmark
    (bar + line + pie render) + ChartAllocProbe (layout-compile allocation).
  6. perf(benchmarks): add vector-paint render-operator probe
    VectorRenderOperatorProbe draws the same paths flat vs. gradient vs.
    translucent and counts the PDF content-stream operators each paint mode emits.
  7. bench(jmh): add icon-ramp and mixed v1.8 showcase benches
    IconRampJmhBenchmark (icon-placement scaling, @Param 8/32/128) and
    MixedShowcaseJmhBenchmark (one document mixing prose, inline sparklines,
    bar + pie charts, SVG icons and a gradient path) as the integration canary.
  8. bench(gate): gate the long-token scenario and guard threshold coverage
    long-token had no SMOKE threshold and silently escaped the gate; add one
    (10.0 ms / 256.0 MB, ~3× its observed ~3.2 ms / ~94 MB) and add
    CurrentSpeedScenarioGateTest, which fails the build if any scenario lacks a
    threshold. Scenario list hoisted to a static SCENARIO_DEFS so the names are
    testable; the six scenarios, order, descriptions and renderers are unchanged.
  9. docs(benchmarks): finish the removed-bench cleanup and fix two stale Javadocs — sweep references the removed mains left behind (ab-bench.ps1
    log parsing, performance.md, a merged row in the module README) and correct
    two docs that overstated the code (the SVG-parse fixture has no arc command;
    stages[] is not carried into the median aggregate).
  10. docs(changelog): note the v1.8 feature-object benches, stage output, and gate coverage.

Testing

  • ./mvnw -B -ntp verify -pl .BUILD SUCCESS, 1380 tests (canonical suite +
    japicmp + javadoc; the CHANGELOG/docs guards VersionConsistencyGuardTest +
    CanonicalSurfaceGuardTest stay green after the doc edits).
  • ./mvnw -B -ntp -f benchmarks/pom.xml verifyBUILD SUCCESS, 30 tests
    incl. new CurrentSpeedScenarioGateTest and a new BenchmarkDiffToolTest case
    covering added/removed scenarios + stage deltas.
  • New JMH benches run end-to-end: IconRamp shows a clean near-linear
    placement ramp (8 → 0.86, 32 → 2.74, 128 → 8.79 ms/op); MixedShowcase
    ~8.7 ms/op. A smoke run with the gate enabled passes with long-token now
    gated. The alloc/operator probes report deterministic counts (chart
    compile-pass ~446.8 KB; gradient paint adds sh+W per shape, alpha adds gs).

Notes for review

  • Benchmark-only branch — no library src/main change. The new fixtures
    (SvgBenchmarkFixtures, ChartBenchmarkFixtures) are public only because the
    JMH benches live in the .jmh subpackage; each fixture is reused by both the
    bench and its probe so the two measure identical data.
  • stages[] is intentionally not medianedBenchmarkMedianTool aggregates
    latency/throughput only, so a median-vs-median diff shows no stage deltas; diff a
    single-run pair for stage attribution (noted in the median-tool Javadoc).
  • No committed smoke baseline — the verdict gate stays a local same-machine
    A/B tool (a static committed baseline would false-positive on machine variance);
    run-benchmarks.ps1 already skips the verdict step gracefully when none exists.
    The absolute SMOKE gate (now covering all six scenarios) is the CI net.

Lane: test — benchmark tooling + operations docs; no canonical / shared-engine / legacy surface touched.

DemchaAV added 10 commits June 14, 2026 19:04
FullCvBenchmark duplicated the JMH TemplateCvJmhBenchmark (CV through
ModernProfessional) with a hand-rolled, JIT-noisier loop and no report.
GraphComposeBenchmark was an early-engine relic measuring the same
title+body+divider doc as CurrentSpeedBenchmark's engine-simple scenario.
ScalabilityBenchmark's thread-scaling sweep is folded into
CurrentSpeedBenchmark's full-profile throughput run (thread counts now
1,2,4,8,16).

Drop the matching run-benchmarks.ps1 steps and the benchmarks.md /
benchmarks/README.md entries. ComparativeBenchmark, the JMH benches, the
deterministic probes, and the soak/stress runners stay. Benchmark module
compiles; its 28 tests pass.
…y.md

The stage breakdown (per-template compose / layout / render medians) was
printed to the console and discarded. Promote it into the report:
runStageBreakdown returns a StageRow, CurrentSpeedReport carries a stages[]
array, and a stages CSV is written — so a diff can attribute a regression to
an engine stage, not just the blended total. Also write a per-run summary.md
(latency + stages + throughput tables) so a reviewer reads one file instead
of the JSON plus several CSVs.

Additive output only: diff/verdict/median read the report by field and ignore
the new array. Benchmark module compiles; 28 tests pass; verified on a smoke
run (stages[] present, summary.md readable, perf gate passes).
…enarios

BenchmarkDiffTool now (1) surfaces scenario set changes — addedScenarios /
removedScenarios — instead of silently intersecting, so a newly-added (or
dropped) scenario can no longer vanish from a diff unnoticed; and (2) diffs
the stages[] array, emitting per-scenario compose/layout/render/total percent
deltas (console block + stages-diff CSV) so a regression can be attributed to
an engine stage.

Backward-compatible: a report without stages[] yields an empty stage diff
(MissingNode iterates empty); latency/throughput delta rows stay
intersection-only; the diff report is terminal (median/verdict read producer
reports, not diffs). Adds a DiffToolTest case; 29 bench tests pass.
First feature-object benchmarks for the v1.8 vector surface (the rest of the
suite is text/table only):
- SvgJmhBenchmark (forked JMH): SvgPath.parse of a real Material heart d,
  SvgIcon.parse of a multi-layer icon, SvgIcon.node on a pre-parsed icon.
- SvgParseAllocProbe (deterministic ThreadMXBean alloc, median of 11): KB/op
  for the same three operations.
- SvgBenchmarkFixtures: the heart d (vendored — the benchmark module can't
  reach the test/example copies) and a synthetic multi-layer icon (gradient
  bg + transformed groups + stroked curves) within the reader's supported
  subset, so it always parses.

Run on demand, not per-PR: java -jar benchmarks/target/benchmarks.jar Svg.
Verified: compiles; both benches run — path parse ~3.6 us/op, icon read
~308 us/op (DOM-parse dominated, 114 KB/op), node build ~0.4 us/op / 2 KB/op.
S4 of the modernization — the first chart benchmarks (the suite otherwise
renders text/tables only):
- ChartJmhBenchmark (forked JMH): end-to-end render of a chart-heavy doc —
  grouped bar + multi-series line (12 categories x 3 series) + 6-slice pie.
- ChartAllocProbe (deterministic ThreadMXBean, median of 11): warm
  layout-compile allocation, isolating chart-resolve + geometry emission.
- ChartBenchmarkFixtures: the shared bar/line/pie specs + data.

Run on demand, not per-PR: java -jar benchmarks/target/benchmarks.jar Chart.
Verified: compiles; render ~2.8 ms/op; compile alloc 446.8 KB (deterministic,
min=max=median, 1 page).
VectorRenderOperatorProbe renders the same 40 curved blob paths three ways —
flat solid fill, linear gradient, and translucent (alpha) — and counts the PDF
content-stream operators, so the deltas isolate what each paint mode costs at
render time. Flat takes the fast fill path (sh=0, gs=0, W=0); a gradient fill
adds one shading + one clip per shape (sh, W); a translucent fill adds one
ExtGState (gs). Byte-deterministic, no A/B build needed; catches a regression
where a flat path wrongly takes the gradient branch (sh would jump from 0).

Verified: flat 0/0/0, gradient sh=40/W=40, alpha gs=40 over 40 paths.
IconRampJmhBenchmark places N copies of a multi-layer SVG icon
(@Param 8/32/128) and renders to PDF, so the per-icon node-build +
layout + render scaling is visible; the icon is parsed once in setup
so the ramp measures placement, not re-parsing.

MixedShowcaseJmhBenchmark renders one realistic document mixing every
v1.8 vector feature -- prose with two inline sparklines, a grouped bar
chart and a pie chart, a row of SVG icons, and a gradient accent path
-- as a single integration canary for "did a v1.8 feature regress a
realistic doc?".

Both reuse the existing SvgBenchmarkFixtures / ChartBenchmarkFixtures;
no src/main change.
…d coverage

The smoke perf gate ignores any scenario without a configured threshold,
so long-token (the 6th latency scenario) was silently ungated -- a real
regression there would never fail the gate. Add its SMOKE threshold
(10.0 ms / 256.0 MB, ~3x the observed ~3.2 ms / ~94 MB, matching the
existing per-scenario calibration headroom).

Hoist the scenario list to a static SCENARIO_DEFS so the names are
readable without re-measuring, and add CurrentSpeedScenarioGateTest,
which fails the build if any scenario lacks a SMOKE threshold. No
behaviour change to the run itself -- same six scenarios, same order.
…Javadocs

Sweep the references the three removed benchmark mains (FullCvBenchmark,
GraphComposeBenchmark, ScalabilityBenchmark) left behind, and correct two
docs that overstated what the code does:

- ab-bench.ps1 no longer parses the retired 04/05/06 logs (they are no
  longer produced); it reads the surviving stress log, and the
  thread-scaling series still comes from the current-speed JSON report.
- benchmarks/README.md "Files in this module": split a row that had been
  merged onto one line and restore the blank line before "## Running".
- docs/operations/performance.md: mark it a frozen v1.4 snapshot and note
  the retired suites/mains so it no longer contradicts benchmarks.md.
- docs/operations/benchmarks.md and the run-benchmarks.ps1 synopsis: note
  that steps 04-06 were retired, so the 03 -> 07 numbering gap is intentional.
- SvgJmhBenchmark Javadoc: describe the heart-path parse accurately
  (tokenize / cubic-line lowering / viewBox normalization); the fixture
  has no arc command, so the old "arc->cubic" wording was wrong.
- BenchmarkMedianTool Javadoc: note that stages[] is not carried into the
  median aggregate, so a median-vs-median diff shows no stage deltas.
@DemchaAV

Copy link
Copy Markdown
Owner Author

Closing — opened prematurely. Work continues on the chore/benchmark-modernization branch; the PR to develop is the maintainer's call once the branch is finalized.

@DemchaAV DemchaAV closed this Jun 14, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant