Skip to content

chore(benchmarks): modernize and expand the suite (v1.8 coverage, CI gates)#196

Merged
DemchaAV merged 37 commits into
developfrom
chore/benchmark-suite
Jun 15, 2026
Merged

chore(benchmarks): modernize and expand the suite (v1.8 coverage, CI gates)#196
DemchaAV merged 37 commits into
developfrom
chore/benchmark-suite

Conversation

@DemchaAV

Copy link
Copy Markdown
Owner

Summary

A benchmark-module-only branch (no library src/main is touched) that modernizes and
expands the performance suite. It removes three redundant benches, adds feature-object
coverage for the v1.8 vector surface (SVG, charts, vector paths, gradients, images), wires
the suite's deterministic checks into the per-PR CI gate, splits the current-speed report
into per-stage timings, and adds a multi-page cross-library comparative tier with a visual
sample dump. It consolidates the earlier benchmark-modernization and benchmark-coverage
lines into one branch.

Highlights

Cleanup. Removed FullCvBenchmark / GraphComposeBenchmark / ScalabilityBenchmark
(redundant with TemplateCvJmhBenchmark / the engine-simple scenario); the thread-scaling
tier folded into the full-profile throughput run.

v1.8 feature coverage (the suite previously exercised only text/table primitives) — JMH
render benches and deterministic operator/allocation probes for SVG (parse / whole-file icon /
node), charts (bar / line / pie + horizontal / stacked / donut / value-axis-min variants),
vector paint (flat / gradient / alpha / stroked / dashed), image embed+scale (PdfImageCache
reuse), inline sparklines, an icon-placement ramp, and a mixed-showcase canary.

Deterministic gates in CI. The benchmarks module's tests never ran in CI; the perf-smoke
job now runs them, so image-cache reuse, render-operator (F5) coalescing, the vector-paint
operator structure, and scenario/threshold coverage are build-failing gates. A vector-rich
scenario (charts + SVG icons + gradient) joins the gated current-speed harness.

Richer, honest output. The current-speed report carries a compose/layout/render stages[]
split + a summary.md; the diff shows per-stage deltas and added/removed scenarios; the median
tool carries stages[]. The smoke gate's GC-noisy peakHeapMb is now advisory (fails only on
latency). README field names corrected to match the emitted JSON.

Comparative + realism. ComparativeBenchmark gained a multi-page report tier (equivalent
content across GraphCompose / iText / JasperReports, full-width tables) plus a post-run
sample-PDF dump per library/scenario. Added a single-shot cold-start bench, a production-scale
large-table bench, and an allocation-rate / GC-pressure probe.

Verification

  • ./mvnw -B -ntp verify -pl .BUILD SUCCESS, 1380 tests (canonical suite +
    japicmp + javadoc; the CHANGELOG/docs guards stay green).
  • ./mvnw -B -ntp -f benchmarks/pom.xml verifyBUILD SUCCESS, 39 tests — incl. the new
    image-cache, render-operator, vector-paint, and scenario-coverage gates and the median
    stage-carry case.
  • Every JMH bench runs end-to-end; a perf-smoke smoke run passes with the vector-rich
    scenario gated and peakHeapMb advisory.

Notes for review

  • openHTMLtoPDF is intentionally excluded from the comparative: openhtmltopdf 1.0.10 targets
    PDFBox 2.x and fails against the PDFBox 3.x the project uses (no PDFBox-3 release exists).
    Documented in the README.
  • The JMH benches and the relative ±%-vs-baseline verdict are on-demand / local only by design
    (a per-PR JMH run is too slow; a static smoke baseline is machine-specific). The per-PR gate
    is the smoke absolute thresholds + the deterministic gate tests. docs_per_sec is a derived
    1000/avg reciprocal (documented), not measured throughput.
  • The new "Run deterministic benchmark gates" CI step makes the benchmark-module test classes
    PR-blocking (all deterministic, sub-second).

Lane: test — benchmark module + operations docs + CI; no canonical / shared-engine / legacy surface touched.

DemchaAV added 30 commits June 14, 2026 19:04
FullCvBenchmark duplicated the JMH TemplateCvJmhBenchmark (CV through
ModernProfessional) with a hand-rolled, JIT-noisier loop and no report.
GraphComposeBenchmark was an early-engine relic measuring the same
title+body+divider doc as CurrentSpeedBenchmark's engine-simple scenario.
ScalabilityBenchmark's thread-scaling sweep is folded into
CurrentSpeedBenchmark's full-profile throughput run (thread counts now
1,2,4,8,16).

Drop the matching run-benchmarks.ps1 steps and the benchmarks.md /
benchmarks/README.md entries. ComparativeBenchmark, the JMH benches, the
deterministic probes, and the soak/stress runners stay. Benchmark module
compiles; its 28 tests pass.
…y.md

The stage breakdown (per-template compose / layout / render medians) was
printed to the console and discarded. Promote it into the report:
runStageBreakdown returns a StageRow, CurrentSpeedReport carries a stages[]
array, and a stages CSV is written — so a diff can attribute a regression to
an engine stage, not just the blended total. Also write a per-run summary.md
(latency + stages + throughput tables) so a reviewer reads one file instead
of the JSON plus several CSVs.

Additive output only: diff/verdict/median read the report by field and ignore
the new array. Benchmark module compiles; 28 tests pass; verified on a smoke
run (stages[] present, summary.md readable, perf gate passes).
…enarios

BenchmarkDiffTool now (1) surfaces scenario set changes — addedScenarios /
removedScenarios — instead of silently intersecting, so a newly-added (or
dropped) scenario can no longer vanish from a diff unnoticed; and (2) diffs
the stages[] array, emitting per-scenario compose/layout/render/total percent
deltas (console block + stages-diff CSV) so a regression can be attributed to
an engine stage.

Backward-compatible: a report without stages[] yields an empty stage diff
(MissingNode iterates empty); latency/throughput delta rows stay
intersection-only; the diff report is terminal (median/verdict read producer
reports, not diffs). Adds a DiffToolTest case; 29 bench tests pass.
First feature-object benchmarks for the v1.8 vector surface (the rest of the
suite is text/table only):
- SvgJmhBenchmark (forked JMH): SvgPath.parse of a real Material heart d,
  SvgIcon.parse of a multi-layer icon, SvgIcon.node on a pre-parsed icon.
- SvgParseAllocProbe (deterministic ThreadMXBean alloc, median of 11): KB/op
  for the same three operations.
- SvgBenchmarkFixtures: the heart d (vendored — the benchmark module can't
  reach the test/example copies) and a synthetic multi-layer icon (gradient
  bg + transformed groups + stroked curves) within the reader's supported
  subset, so it always parses.

Run on demand, not per-PR: java -jar benchmarks/target/benchmarks.jar Svg.
Verified: compiles; both benches run — path parse ~3.6 us/op, icon read
~308 us/op (DOM-parse dominated, 114 KB/op), node build ~0.4 us/op / 2 KB/op.
S4 of the modernization — the first chart benchmarks (the suite otherwise
renders text/tables only):
- ChartJmhBenchmark (forked JMH): end-to-end render of a chart-heavy doc —
  grouped bar + multi-series line (12 categories x 3 series) + 6-slice pie.
- ChartAllocProbe (deterministic ThreadMXBean, median of 11): warm
  layout-compile allocation, isolating chart-resolve + geometry emission.
- ChartBenchmarkFixtures: the shared bar/line/pie specs + data.

Run on demand, not per-PR: java -jar benchmarks/target/benchmarks.jar Chart.
Verified: compiles; render ~2.8 ms/op; compile alloc 446.8 KB (deterministic,
min=max=median, 1 page).
VectorRenderOperatorProbe renders the same 40 curved blob paths three ways —
flat solid fill, linear gradient, and translucent (alpha) — and counts the PDF
content-stream operators, so the deltas isolate what each paint mode costs at
render time. Flat takes the fast fill path (sh=0, gs=0, W=0); a gradient fill
adds one shading + one clip per shape (sh, W); a translucent fill adds one
ExtGState (gs). Byte-deterministic, no A/B build needed; catches a regression
where a flat path wrongly takes the gradient branch (sh would jump from 0).

Verified: flat 0/0/0, gradient sh=40/W=40, alpha gs=40 over 40 paths.
IconRampJmhBenchmark places N copies of a multi-layer SVG icon
(@Param 8/32/128) and renders to PDF, so the per-icon node-build +
layout + render scaling is visible; the icon is parsed once in setup
so the ramp measures placement, not re-parsing.

MixedShowcaseJmhBenchmark renders one realistic document mixing every
v1.8 vector feature -- prose with two inline sparklines, a grouped bar
chart and a pie chart, a row of SVG icons, and a gradient accent path
-- as a single integration canary for "did a v1.8 feature regress a
realistic doc?".

Both reuse the existing SvgBenchmarkFixtures / ChartBenchmarkFixtures;
no src/main change.
…d coverage

The smoke perf gate ignores any scenario without a configured threshold,
so long-token (the 6th latency scenario) was silently ungated -- a real
regression there would never fail the gate. Add its SMOKE threshold
(10.0 ms / 256.0 MB, ~3x the observed ~3.2 ms / ~94 MB, matching the
existing per-scenario calibration headroom).

Hoist the scenario list to a static SCENARIO_DEFS so the names are
readable without re-measuring, and add CurrentSpeedScenarioGateTest,
which fails the build if any scenario lacks a SMOKE threshold. No
behaviour change to the run itself -- same six scenarios, same order.
…Javadocs

Sweep the references the three removed benchmark mains (FullCvBenchmark,
GraphComposeBenchmark, ScalabilityBenchmark) left behind, and correct two
docs that overstated what the code does:

- ab-bench.ps1 no longer parses the retired 04/05/06 logs (they are no
  longer produced); it reads the surviving stress log, and the
  thread-scaling series still comes from the current-speed JSON report.
- benchmarks/README.md "Files in this module": split a row that had been
  merged onto one line and restore the blank line before "## Running".
- docs/operations/performance.md: mark it a frozen v1.4 snapshot and note
  the retired suites/mains so it no longer contradicts benchmarks.md.
- docs/operations/benchmarks.md and the run-benchmarks.ps1 synopsis: note
  that steps 04-06 were retired, so the 03 -> 07 numbering gap is intentional.
- SvgJmhBenchmark Javadoc: describe the heart-path parse accurately
  (tokenize / cubic-line lowering / viewBox normalization); the fixture
  has no arc command, so the old "arc->cubic" wording was wrong.
- BenchmarkMedianTool Javadoc: note that stages[] is not carried into the
  median aggregate, so a median-vs-median diff shows no stage deltas.
…e Canonical fits

The comparative-diff table printed the Library column as %-20s, but "GraphCompose Canonical" is 22 chars, so it overflowed the field and pushed the | separator right, misaligning that row. Widen to %-24s (matching the comparative run table in ComparativeBenchmark) and extend the rule to 56 so the column fits the longest library label.
…use gate

The suite had no image coverage at all: no bench or probe placed a raster image, so the embed/scale hot path and PdfImageCache dedup could regress unmeasured.

ImageBenchmarkFixtures builds deterministic in-code synthetic PNGs (a shared demoImage plus distinctImage(i)), so no binary asset is committed. ImageCacheOperatorProbe places one image N times vs N distinct images and counts embedded image XObjects + Do draws (same image x30 -> 1 embed/30 draws; 30 distinct -> 30/30). ImageCacheGateTest turns that reuse invariant into a build-failing assertion (1 embed for the same image regardless of placements; N for N distinct), so a dedup regression cannot pass silently. ImageJmhBenchmark renders a 12-image thumbnail document, driving the ImageIO decode + bicubic rescale + embed path that nothing else exercised.
… render-operator gate

The deterministic probes produce machine-independent counts, but nothing asserted on them and the benchmarks module's tests never ran in CI (perf-smoke used -DskipTests; the root verify skips the standalone module), so an operator-count or cache regression passed CI silently.

Add a 'Run deterministic benchmark gates' step to the PR-triggered perf-smoke job (./mvnw -f benchmarks/pom.xml test) so the image-cache reuse gate, the scenario/threshold coverage gate, and the diff-tooling tests now fail the build on a structural regression. Refactor RenderOperatorProbe to expose countOperators(...) and add RenderOperatorGateTest, which pins the F5 coalescing invariant: a long single-style paragraph keeps Tf/colour ops below the per-line text-draw count, so a regression back to per-span font ops breaks the test. Probe console output is unchanged.
Every JMH bench reported steady-state (warm) timings, which is what a long-lived server pays; nothing measured the JIT-cold first render a short-lived CLI invocation or a serverless cold-start actually pays.

ColdStartJmhBenchmark uses Mode.SingleShotTime with @WarmUp(0)/@measurement(1)/@fork(10) to sample the cold first render across ten fresh JVMs, over the same workloads as the warm benches (an inline engine doc, InvoiceTemplateV1, the ModernProfessional CV preset). Specs and templates are built in @setup so the measured shot is the cold render path, not fixture assembly. Observed cold first render ~370-510 ms/op locally, vs the warm ms-scale numbers -- the headline metric for CLI/Lambda consumers.
…chmark

The comparative benchmark only rendered a trivial 3-line invoice -- too small to show GraphCompose's standing on real multi-page work (all three libraries finished in fixed overhead).

Add a 'business report' tier (title + 40-row line-item table + prose) rendered with equivalent content across all three: GraphCompose via the public pageFlow DSL with a repeating table header; iText via PdfPTable with setHeaderRows(1); JasperReports via a datasource-driven detail band with a repeating column header, the prose bound to a parameter and rendered through a stretch-height text field both before and after the table so all three lay out the same text. The small invoice stays as the fixed-overhead baseline; output prints two labelled scenario tables and the report carries a row per (library, scenario). README notes the Jasper fill-vs-build measurement boundary.

Local report numbers: GraphCompose 5.1ms/0.87MB, iText 2.8ms/4.97MB, JasperReports 9.3ms/2.51MB -- GraphCompose is mid on time but allocates ~5.7x less than iText.
The suite rendered only small documents; nothing measured end-to-end render of a genuinely large multi-page table (TablePaginationAllocProbe covers layout-compile allocation only, not render). LargeTableJmhBenchmark renders a priced 5-column table parameterized over 100/500/1000 rows, with the header repeating on every page, so the large-table pagination + render scaling trend is visible. Observed ~9 / ~32 / ~77 ms/op locally. JMH full/on-demand, no CI gate.
The endurance and stress harnesses only check that sustained rendering stays stable and under a heap ceiling; nothing reported how much garbage a single render churns -- the driver of GC pressure for a high-throughput server.

AllocationRateProbe renders many warm documents of two realistic templates (invoice, proposal) and reports warm per-document allocation (ThreadMXBean current-thread bytes/doc, a deterministic A/B signal) plus the JVM garbage collections those renders triggered (count + time via GarbageCollectorMXBean, advisory). Observed ~3.9 MB/doc (invoice) and ~3.8 MB/doc (proposal), ~1 GC per ~18 renders. No src/main changes.
…mparative run

After all measurement, ComparativeBenchmark writes one rendered PDF per library and scenario (graphcompose/itext/jasper x small/report) under target/benchmarks/comparative/samples/, so the exact documents the benchmark measured can be opened and inspected visually -- you can see what each library actually rendered.

The dump runs outside the measured region (after the report is written), so it cannot affect the timing or allocation numbers.
…orts and rival libraries

Both the comparative report table and the large-table bench used autoColumns (content-width), so the GraphCompose table hugged its text while iText (setWidthPercentage 100) and JasperReports (full-column-width cells) filled the page. The comparative documents were therefore not layout-equivalent, and a content-width production-scale table is unrealistic.

Use equal fixed columns summing to the usable page width (page width minus the L/R margins), matching the rival libraries and real report layout. Comparative timing is unchanged (~4.3 ms / 0.87 MB for GraphCompose); the sample-dump PDFs now show equivalent full-width tables across all three libraries.
…ks/logs, fix a stale README row

The Jasper report cells spanned 552 of the 555pt column; the last cell now absorbs the remainder so the table fills the full column width like GraphCompose and iText.

Add benchmarks/logs/ to .gitignore: the benchmarks logback config writes a relative logs/ directory (now produced by the new benchmark-gate test step), and the root-anchored /logs/ rule did not cover it. README 'Files in this module': ComparativeBenchmark no longer renders through openHTMLToPDF -- describe the two tiers plus the sample dump.
…tive

openHTMLtoPDF 1.0.10 (the declared version) targets PDFBox 2.x and fails at runtime against the PDFBox 3.x GraphCompose uses (PDType1Font.COURIER_BOLD_OBLIQUE and the other Standard-14 static fields were removed in PDFBox 3.x), so it cannot share GraphCompose's classpath and no PDFBox-3-compatible openhtmltopdf release exists yet. Document this in the comparative note so the exclusion is a known, reasoned decision rather than an oversight.
The measurement-count probe's text fixtures were all ASCII-Latin, so the distinct-width-request / repeat-rate counters never reflected a high-glyph-diversity non-ASCII workload.

Add an accented-Latin (Latin-1) scenario -- varied diacritic words (cafe/Genève/Größe/coração/fjörð...) covered by Standard-14 Helvetica -- alongside long-text/long-token/large-table. Observed 37 distinct width requests vs 32 for the repeated-ASCII long-text, and a lower repeat-rate. True CJK/Cyrillic would need an embedded font (noted in the fixture comment).
Record what gates every PR (the perf-smoke smoke run with absolute thresholds + the deterministic benchmark gate tests) and what is intentionally on-demand/local only: the JMH benches (a per-PR forked run of the whole suite is too slow for the signal) and the relative BenchmarkVerdictTool gate (no static smoke baseline is committed because absolute timings are machine-specific and would false-positive across machines; use a local same-machine A/B median instead). Makes the gate scope a stated design decision.
…e suite branch

# Conflicts:
#	benchmarks/README.md
The deterministic v1.8 vector-paint probe (VectorRenderOperatorProbe) printed its operator counts but nothing asserted on them, so a regression in the gradient/alpha render branches would pass CI silently.

Refactor it to expose countOperators(PaintMode) and add VectorRenderOperatorGateTest pinning the per-mode cost structure: a flat fill emits no shading/alpha/clip (the fast path), a linear gradient emits one shading + one clip per shape, and a translucent fill sets one ExtGState per shape. The perf-smoke CI gate step (mvnw -f benchmarks/pom.xml test) now picks it up, extending the deterministic gate to the v1.8 vector render path. Probe console output is unchanged.
…ed smoke harness

All six current-speed scenarios were text/table, so no v1.8 vector feature was under the per-PR perf gate; a regression in the chart/SVG-icon/gradient render path would not trip it.

Add a 'vector-rich' scenario (bar + pie charts, 8 SVG icons, a gradient accent path, reusing ChartBenchmarkFixtures / SvgBenchmarkFixtures) with a SMOKE threshold (20.0 ms / 256 MB, ~3.5x the observed ~5.7 ms / ~86 MB). CurrentSpeedScenarioGateTest enforces the threshold exists, and the perf-smoke gate now catches a regression in the vector render path the way it already gates text and tables.
BenchmarkMedianTool medianed only latency and throughput, so a median-vs-median BenchmarkDiffTool run lost the compose/layout/render stage attribution -- the deterministic signal the stage breakdown adds, dropped on exactly the noise-reduced path used for real decisions.

Add aggregateCurrentSpeedStages (medians composeMillis/layoutMillis/renderMillis/totalMillis per scenario, paralleling the latency aggregation), carry stages[] in CurrentSpeedMedianReport, and emit a stages CSV when present. Lenient: stages[] is optional (CurrentSpeedBenchmark emits it only for runs with enough iterations), so it aggregates only when every source run carries a matching stages[] and is omitted otherwise -- no throw on the optional field. New BenchmarkMedianToolTest case asserts the medianed stages; the existing no-stages cases still pass.
… / peak heap

The "How to read a report" section documented field names (avgMs / p50Ms / peakMB) that the emitted JSON never uses — the real keys are avgMillis / p50Millis / p95Millis / maxMillis / docsPerSecond / avgKilobytes / peakHeapMb. Correct them and fix two misleading descriptions: docsPerSecond is a derived 1000/avgMillis reciprocal of latency (real throughput is the separate throughput[] section), not a measured rate; peakHeapMb is a GC-noisy post-warmup heap delta (advisory), not an absolute MemoryMXBean reading. Also document the stages[] array. Doc-only; the JSON schema is unchanged.
…d/donut/axis-min)

Chart coverage was only vertical grouped-bar + line + full pie; the horizontal-transpose, stacked, donut, and non-zero value-axis-min resolver branches had no number, so a regression in any of them would go unmeasured.

Add horizontalBarSpec / stackedBarSpec / axisMinBarSpec / donutSpec to ChartBenchmarkFixtures and a ChartVariantJmhBenchmark that renders each one (@Param over the seven variants) so every ChartLayoutResolver branch has its own render-time row instead of being blended into the three-chart total.
… probe + gate

VectorRenderOperatorProbe covered only the three fill modes (flat/gradient/alpha); the stroke and dash render branches had no operator coverage, so a regression there would pass silently.

Add STROKED and DASHED paint modes (counting S/s stroke and d dash-array operators) and pin them in VectorRenderOperatorGateTest: a stroked path strokes once per shape and sets no dash, a dashed stroke sets a dash array once per shape and still strokes, and a flat fill strokes/dashes never. Observed flat S=0/d=0, stroked S=40/d=0, dashed S=40/d=40.
DemchaAV added 5 commits June 15, 2026 16:01
… hard fail

The smoke perf gate hard-failed on peakHeapMb -- a GC-timing-noisy used-heap delta -- so a GC blip could redden a PR on a non-regression. BenchmarkVerdictTool already treats heap as advisory; align them. evaluatePerformanceGate now fails only on avgMillis and reports any peak-heap breach as an advisory note (passed stays true). The deterministic memory signal remains the allocation-bytes probes. The perf-gate test is updated accordingly (treatsPeakHeapAsAdvisoryNotAGateFailure).
… benches

Sparklines were measured only inside MixedShowcaseJmhBenchmark, and the vector paint modes only at the operator-count level (VectorRenderOperatorProbe), never as render time.

SparklineRampJmhBenchmark renders a rich paragraph of N inline sparklines (@Param 8/32/128) so the per-sparkline inline-fragment cost scales visibly. VectorPaintJmhBenchmark renders 40 blob paths flat/gradient/alpha (@Param) for the render-time complement to the operator probe. Observed sparkline ramp ~2.6/5.4/17.9 ms; flat ~1.9 / gradient ~3.5 / alpha ~1.7 ms.
…ench list

The "Strict JMH layer" section listed only 3 of the now-12 JMH benches and never stated the fork choice. Refresh the list (steady-state render / parameterised scaling ramps / SVG micro-benches / single-shot cold-start) and document that @fork(1) is the deliberately fast on-demand default -- pass -f N for a cross-fork error estimate when quoting a number.
…fixtures to @setup

README: scope the @fork(1) note to steady-state benches (ColdStart is single-shot @fork(10)) and correct the smoke scenario count (5 -> 7). CurrentSpeedBenchmark class Javadoc now lists all seven scenarios (adds long-token and vector-rich). CHANGELOG Internal gains notes for the render-hot-path coverage (image/cold-start/comparative tier + sample dump/large-table/GC-churn/accented-Latin) and the CI-run deterministic gates (+ vector-rich scenario, median stages[], advisory peakHeapMb). VectorPaintJmhBenchmark builds its paint objects in @setup like the sibling benches instead of inside the measured method. BenchmarkMedianToolTest asserts the no-stages lenient path omits stages without throwing.
… log dropped median stages

Review follow-ups on the suite:

- The vector-rich current-speed scenario parsed its SVG icon and built its gradient inside the per-iteration render method, so it measured a re-parse the other (pre-built-fixture) scenarios don't; hoist both to instance fields. Widen its SMOKE threshold 20 -> 25 ms (charts + SVG icons vary more than the text scenarios) and document the observed ~5-6 ms basis.

- ImageBenchmarkFixtures.distinctImage relied on modular gradient/line colours that can repeat at large indices, risking duplicate fingerprints; add a seed-positioned 1px marker so each index < native width yields byte-distinct content, keeping the distinct-embed gate robust.

- BenchmarkMedianTool now logs a note when it omits stages[] because the source runs' stage scenario sets differ, instead of dropping them silently.
@DemchaAV

Copy link
Copy Markdown
Owner Author

Review summary

High-effort review of the branch (7 finder angles → recall-biased verification). No correctness bugs — 10 PLAUSIBLE notes, 0 CONFIRMED. Applied four small follow-ups:

  • vector-rich scenario — parse the SVG icon and build the gradient once (instance fields) instead of inside the per-iteration render method, so it times the render rather than a re-parse; widened its SMOKE threshold 20 → 25 ms (charts + SVG icons vary more than the text scenarios) and documented the observed ~5–6 ms basis.
  • ImageBenchmarkFixtures.distinctImage — added a seed-positioned 1px marker so each index yields byte-distinct content (the modular gradient/line colours alone could repeat at large indices), keeping the distinct-embed gate robust.
  • BenchmarkMedianTool — logs a note when it omits stages[] because the source runs' stage scenario sets differ, instead of dropping them silently.

Confirmed intentional, left as-is: peakHeapMb advisory in the smoke gate (GC-timing noisy; the deterministic memory signal is the allocation probes); ColdStart's @Setup (measures the cold render path, by design); the operator probe's whole-page counting (controlled fixture + exact ==N assertions are self-protecting).

Verification: ./mvnw verify -pl . → 1380 tests, ./mvnw -f benchmarks/pom.xml verify → 39 tests, both BUILD SUCCESS.

DemchaAV added 2 commits June 15, 2026 18:12
The comparative pinned iText 5.5.13.3 (EOL ~2021, the monolithic com.itextpdf.text API). Upgrade to iText Core 9.6.0 (current) and rewrite benchmarkIText / benchmarkITextReport to the kernel + layout API (PdfDocument + layout Document + Table/Cell with useAllAvailableWidth and repeating header cells); relabel the rows "iText 9".

Against the current iText engine the picture changes: on the multi-page report GraphCompose now leads on both time (~5.0 vs ~12.5 ms) and allocation (~0.88 vs ~2.95 MB) -- the old iText-5 time advantage was against a 2020 engine. No PDFBox conflict (iText is its own classpath island and is still excluded from the shade jar). Sample dump confirms a valid 2-page iText report.
Render the same title + prose + N-row table through GraphCompose, iText 9, and JasperReports at N = 40 / 200 / 1000 instead of a single 40-row size, and print a per-size GraphCompose-advantage ratio (time and heap vs each library) so the scaling trend is measured rather than assumed. Dump the smallest and largest report per library as sample PDFs.

The heap column and its ratios now enable per-thread allocation tracking explicitly (failing loudly if unsupported), and the advantage ratios are computed from full-precision averages rather than the rounded report rows.
@DemchaAV DemchaAV merged commit ce2ecdb into develop Jun 15, 2026
11 checks passed
@DemchaAV DemchaAV deleted the chore/benchmark-suite branch June 15, 2026 17:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant