chore(benchmarks): modernize for v1.8 — feature benches, stage output, gate coverage#195
Closed
DemchaAV wants to merge 10 commits into
Closed
chore(benchmarks): modernize for v1.8 — feature benches, stage output, gate coverage#195DemchaAV wants to merge 10 commits into
DemchaAV wants to merge 10 commits into
Conversation
FullCvBenchmark duplicated the JMH TemplateCvJmhBenchmark (CV through ModernProfessional) with a hand-rolled, JIT-noisier loop and no report. GraphComposeBenchmark was an early-engine relic measuring the same title+body+divider doc as CurrentSpeedBenchmark's engine-simple scenario. ScalabilityBenchmark's thread-scaling sweep is folded into CurrentSpeedBenchmark's full-profile throughput run (thread counts now 1,2,4,8,16). Drop the matching run-benchmarks.ps1 steps and the benchmarks.md / benchmarks/README.md entries. ComparativeBenchmark, the JMH benches, the deterministic probes, and the soak/stress runners stay. Benchmark module compiles; its 28 tests pass.
…y.md The stage breakdown (per-template compose / layout / render medians) was printed to the console and discarded. Promote it into the report: runStageBreakdown returns a StageRow, CurrentSpeedReport carries a stages[] array, and a stages CSV is written — so a diff can attribute a regression to an engine stage, not just the blended total. Also write a per-run summary.md (latency + stages + throughput tables) so a reviewer reads one file instead of the JSON plus several CSVs. Additive output only: diff/verdict/median read the report by field and ignore the new array. Benchmark module compiles; 28 tests pass; verified on a smoke run (stages[] present, summary.md readable, perf gate passes).
…enarios BenchmarkDiffTool now (1) surfaces scenario set changes — addedScenarios / removedScenarios — instead of silently intersecting, so a newly-added (or dropped) scenario can no longer vanish from a diff unnoticed; and (2) diffs the stages[] array, emitting per-scenario compose/layout/render/total percent deltas (console block + stages-diff CSV) so a regression can be attributed to an engine stage. Backward-compatible: a report without stages[] yields an empty stage diff (MissingNode iterates empty); latency/throughput delta rows stay intersection-only; the diff report is terminal (median/verdict read producer reports, not diffs). Adds a DiffToolTest case; 29 bench tests pass.
First feature-object benchmarks for the v1.8 vector surface (the rest of the suite is text/table only): - SvgJmhBenchmark (forked JMH): SvgPath.parse of a real Material heart d, SvgIcon.parse of a multi-layer icon, SvgIcon.node on a pre-parsed icon. - SvgParseAllocProbe (deterministic ThreadMXBean alloc, median of 11): KB/op for the same three operations. - SvgBenchmarkFixtures: the heart d (vendored — the benchmark module can't reach the test/example copies) and a synthetic multi-layer icon (gradient bg + transformed groups + stroked curves) within the reader's supported subset, so it always parses. Run on demand, not per-PR: java -jar benchmarks/target/benchmarks.jar Svg. Verified: compiles; both benches run — path parse ~3.6 us/op, icon read ~308 us/op (DOM-parse dominated, 114 KB/op), node build ~0.4 us/op / 2 KB/op.
S4 of the modernization — the first chart benchmarks (the suite otherwise renders text/tables only): - ChartJmhBenchmark (forked JMH): end-to-end render of a chart-heavy doc — grouped bar + multi-series line (12 categories x 3 series) + 6-slice pie. - ChartAllocProbe (deterministic ThreadMXBean, median of 11): warm layout-compile allocation, isolating chart-resolve + geometry emission. - ChartBenchmarkFixtures: the shared bar/line/pie specs + data. Run on demand, not per-PR: java -jar benchmarks/target/benchmarks.jar Chart. Verified: compiles; render ~2.8 ms/op; compile alloc 446.8 KB (deterministic, min=max=median, 1 page).
VectorRenderOperatorProbe renders the same 40 curved blob paths three ways — flat solid fill, linear gradient, and translucent (alpha) — and counts the PDF content-stream operators, so the deltas isolate what each paint mode costs at render time. Flat takes the fast fill path (sh=0, gs=0, W=0); a gradient fill adds one shading + one clip per shape (sh, W); a translucent fill adds one ExtGState (gs). Byte-deterministic, no A/B build needed; catches a regression where a flat path wrongly takes the gradient branch (sh would jump from 0). Verified: flat 0/0/0, gradient sh=40/W=40, alpha gs=40 over 40 paths.
IconRampJmhBenchmark places N copies of a multi-layer SVG icon (@Param 8/32/128) and renders to PDF, so the per-icon node-build + layout + render scaling is visible; the icon is parsed once in setup so the ramp measures placement, not re-parsing. MixedShowcaseJmhBenchmark renders one realistic document mixing every v1.8 vector feature -- prose with two inline sparklines, a grouped bar chart and a pie chart, a row of SVG icons, and a gradient accent path -- as a single integration canary for "did a v1.8 feature regress a realistic doc?". Both reuse the existing SvgBenchmarkFixtures / ChartBenchmarkFixtures; no src/main change.
…d coverage The smoke perf gate ignores any scenario without a configured threshold, so long-token (the 6th latency scenario) was silently ungated -- a real regression there would never fail the gate. Add its SMOKE threshold (10.0 ms / 256.0 MB, ~3x the observed ~3.2 ms / ~94 MB, matching the existing per-scenario calibration headroom). Hoist the scenario list to a static SCENARIO_DEFS so the names are readable without re-measuring, and add CurrentSpeedScenarioGateTest, which fails the build if any scenario lacks a SMOKE threshold. No behaviour change to the run itself -- same six scenarios, same order.
…Javadocs Sweep the references the three removed benchmark mains (FullCvBenchmark, GraphComposeBenchmark, ScalabilityBenchmark) left behind, and correct two docs that overstated what the code does: - ab-bench.ps1 no longer parses the retired 04/05/06 logs (they are no longer produced); it reads the surviving stress log, and the thread-scaling series still comes from the current-speed JSON report. - benchmarks/README.md "Files in this module": split a row that had been merged onto one line and restore the blank line before "## Running". - docs/operations/performance.md: mark it a frozen v1.4 snapshot and note the retired suites/mains so it no longer contradicts benchmarks.md. - docs/operations/benchmarks.md and the run-benchmarks.ps1 synopsis: note that steps 04-06 were retired, so the 03 -> 07 numbering gap is intentional. - SvgJmhBenchmark Javadoc: describe the heart-path parse accurately (tokenize / cubic-line lowering / viewBox normalization); the fixture has no arc command, so the old "arc->cubic" wording was wrong. - BenchmarkMedianTool Javadoc: note that stages[] is not carried into the median aggregate, so a median-vs-median diff shows no stage deltas.
…and gate coverage
Owner
Author
|
Closing — opened prematurely. Work continues on the chore/benchmark-modernization branch; the PR to develop is the maintainer's call once the branch is finalized. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
The benchmark suite measured only text/table primitives — none of the v1.8
vector features (SVG import, native charts, vector paths, gradients) had a
bench, and the current-speed report blended compose/layout/render into one
number. This branch adds feature-object benches and deterministic probes over
the new surface, splits the current-speed report into per-stage timings with a
readable
summary.md, closes a silent gap in the smoke perf gate, and removesthree redundant benches. Everything lives in the separate
benchmarks/moduleplus its docs — no library
src/mainis touched.Commits (reviewed sequentially)
chore(benchmarks): remove three redundant benchmark mains— dropFullCvBenchmark(superseded by the JMHTemplateCvJmhBenchmark),GraphComposeBenchmark(duplicatedCurrentSpeedBenchmark'sengine-simplescenario), and
ScalabilityBenchmark(thread sweep folded into thefull-profile throughput run, now
1,2,4,8,16); prune the matchingrun-benchmarks.ps1steps.perf(benchmarks): persist compose/layout/render stages + a run summary—CurrentSpeedBenchmarkwrites a per-scenario stage split (stages[],median ms) to the JSON and a
stagesCSV, plus a human-readablesummary.md.perf(benchmarks): diff consumes stages[] and reports added/removed scenarios—BenchmarkDiffToolprints a per-stage delta table and thescenario set-change (added/removed) between two runs.
perf(benchmarks): add SVG-import feature benches—SvgJmhBenchmark(path parse / whole-file icon read / icon→node) +
SvgParseAllocProbe.perf(benchmarks): add chart feature benches—ChartJmhBenchmark(bar + line + pie render) +
ChartAllocProbe(layout-compile allocation).perf(benchmarks): add vector-paint render-operator probe—VectorRenderOperatorProbedraws the same paths flat vs. gradient vs.translucent and counts the PDF content-stream operators each paint mode emits.
bench(jmh): add icon-ramp and mixed v1.8 showcase benches—IconRampJmhBenchmark(icon-placement scaling,@Param8/32/128) andMixedShowcaseJmhBenchmark(one document mixing prose, inline sparklines,bar + pie charts, SVG icons and a gradient path) as the integration canary.
bench(gate): gate the long-token scenario and guard threshold coverage—long-tokenhad no SMOKE threshold and silently escaped the gate; add one(10.0 ms / 256.0 MB, ~3× its observed ~3.2 ms / ~94 MB) and add
CurrentSpeedScenarioGateTest, which fails the build if any scenario lacks athreshold. Scenario list hoisted to a static
SCENARIO_DEFSso the names aretestable; the six scenarios, order, descriptions and renderers are unchanged.
docs(benchmarks): finish the removed-bench cleanup and fix two stale Javadocs— sweep references the removed mains left behind (ab-bench.ps1log parsing,
performance.md, a merged row in the module README) and correcttwo docs that overstated the code (the SVG-parse fixture has no arc command;
stages[]is not carried into the median aggregate).docs(changelog): note the v1.8 feature-object benches, stage output, and gate coverage.Testing
./mvnw -B -ntp verify -pl .→ BUILD SUCCESS, 1380 tests (canonical suite +japicmp + javadoc; the CHANGELOG/docs guards
VersionConsistencyGuardTest+CanonicalSurfaceGuardTeststay green after the doc edits)../mvnw -B -ntp -f benchmarks/pom.xml verify→ BUILD SUCCESS, 30 tests —incl. new
CurrentSpeedScenarioGateTestand a newBenchmarkDiffToolTestcasecovering added/removed scenarios + stage deltas.
IconRampshows a clean near-linearplacement ramp (8 → 0.86, 32 → 2.74, 128 → 8.79 ms/op);
MixedShowcase~8.7 ms/op. A smoke run with the gate enabled passes with
long-tokennowgated. The alloc/operator probes report deterministic counts (chart
compile-pass ~446.8 KB; gradient paint adds
sh+Wper shape, alpha addsgs).Notes for review
src/mainchange. The new fixtures(
SvgBenchmarkFixtures,ChartBenchmarkFixtures) arepubliconly because theJMH benches live in the
.jmhsubpackage; each fixture is reused by both thebench and its probe so the two measure identical data.
stages[]is intentionally not medianed —BenchmarkMedianToolaggregateslatency/throughput only, so a median-vs-median diff shows no stage deltas; diff a
single-run pair for stage attribution (noted in the median-tool Javadoc).
A/B tool (a static committed baseline would false-positive on machine variance);
run-benchmarks.ps1already skips the verdict step gracefully when none exists.The absolute SMOKE gate (now covering all six scenarios) is the CI net.
Lane: test — benchmark tooling + operations docs; no canonical / shared-engine / legacy surface touched.