Skip to content

Bench 31132 base#182

Open
andrewtoth wants to merge 56 commits into
masterfrom
bench-31132-base
Open

Bench 31132 base#182
andrewtoth wants to merge 56 commits into
masterfrom
bench-31132-base

Conversation

@andrewtoth
Copy link
Copy Markdown
Collaborator

Let's get a baseline!

willcl-ark and others added 30 commits April 30, 2026 04:28
Adds build configuration, benchmarking CI workflows, Python
dependencies, plotting tools, and documentation for benchcoin.

Co-authored-by: David Gumberg <davidzgumberg@gmail.com>
Co-authored-by: Lőrinc <pap.lorinc@gmail.com>
- Fix empty chart: use get_chart_data() instead of to_dict() so JS
  filters can match config strings ("450", "32000") instead of objects
- Capture machine specs on self-hosted runner during build job and pass
  via --machine-specs flag to nightly append, instead of detecting on
  the ubuntu-latest publish runner
Run LogParser + PlotGenerator from bench/analyze.py during artifact
copying to produce static PNG charts from debug.log files. This
pre-generates the same 11 chart types that were previously rendered
client-side via JavaScript.

Changes to report.py:
- Import HAS_MATPLOTLIB, LogParser, PlotGenerator from bench.analyze
- _copy_network_artifacts: generate plots after each debug.log with
  "{network}-{name}" prefix (e.g. "450-uninstrumented-pr")
- _copy_artifacts: generate plots for single-directory mode, including
  when input_dir == output_dir
- _prepare_graphs_data: add "plots" key with relative paths to PNGs
- generate(): reorder to copy artifacts before HTML rendering so
  _prepare_graphs_data can find the generated plot files

Plot generation is guarded by HAS_MATPLOTLIB for graceful fallback
when matplotlib is unavailable.
The pr-report.html template previously included debug-log-charts.html
which fetched multi-hundred-MB debug.log.gz files in the browser,
decompressed them with pako.js, parsed every line, and rendered 11
Plotly charts client-side. This made report pages unresponsive.

Now that report.py pre-generates the charts as static PNGs:
- pr-report.html: replace the debug-log-charts.html include with an
  img loop over graph.plots, using loading="lazy"
- debug-log-charts.html: delete (344 lines of client-side JS)
- base.html: remove pako.js and Plotly CDN scripts (both are
  independently included by pr-chart.html and nightly-chart.html
  via their own script tags)

The debug.log download link is preserved.
Rewrite to document the TOML config + matrix entry workflow,
removing stale references to the old two-commit comparison CLI,
--datadir requirement, profiles, and BENCH_DATADIR env var.
Debug logs were consuming 388MB on gh-pages. They are already uploaded
as CI artifacts with 90-day retention during benchmark runs.

- Remove gzip compression and copying of debug logs in report generation
- Remove debug log extraction in publish-results workflow
- Replace per-graph "Download debug.log" links with a single link to
  the CI run page where artifacts can be downloaded
- Keep matplotlib plot generation from debug logs (plots are still
  generated during report phase, just the raw logs aren't published)
The PR comment with result links was posted before GitHub Pages
finished deploying, leading to broken links. Add a wait-for-pages
job that polls for the pages-build-deployment run matching our
exact gh-pages commit, then blocks until it completes.
Manual (workflow_dispatch) runs are now stored separately from scheduled
nightly runs. Scheduled runs still dedup by (date, commit, dbcache) to
handle retries. Manual runs always append, appearing as diamond markers
on the chart alongside the nightly trend line.

Also ruff format.
Manual (workflow_dispatch) runs no longer get a separate "(manual)"
legend entry with diamond markers. They appear as regular points in
the same series trace as scheduled runs.
Adds a separate benchmark job (benchmark-noav) that runs IBD with
-assumevalid=0 to measure full script verification performance.
Uses a dedicated TOML config with uninstrumented-only matrix, and
prefixes artifacts with noav- so the publish workflow can handle
them alongside existing runs.
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 1, 2026

Benchmark Results

Comparison to nightly master (median of last 7 runs):

  • 450 MB: 31 min (nightly median of 7: 43 min, 2026-04-22 to 2026-04-29) → +28.3% faster
  • 32000 MB: 32 min (nightly median of 7: 38 min, 2026-04-22 to 2026-04-29) → +14.3% faster
  • noav-450 MB: 63 min (no nightly baseline)
  • noav-32000 MB: 65 min (no nightly baseline)

View detailed results
View nightly trend chart

andrewtoth and others added 12 commits May 1, 2026 23:26
Introduce CoinsViewOverlay::StartFetching, which maps all input prevouts of a
block to a new m_inputs vector of InputToFetch elements. Returns a ResetGuard
which is lifetime bound to the block, while the InputToFetch elements are
lifetime bound to the block as well.

Introduce StopFetching to clear the m_inputs vector.
CCoinsViewCache::Reset is made virtual and is overridden in CoinsViewOverlay.
StopFetching is called on Reset, so the InputToFetch objects will not
exceed the lifetime of the block.

Introduce ProcessInput to fetch the utxo of an individual input in m_inputs.
Each caller fetches the input at m_input_head and increments it, so each call
will fetch the next input in the queue.

Fetch coins from the m_inputs vector in FetchCoinFromBase by scanning all inputs
until we discover the input with the correct outpoint.

This is designed deliberately so multiple threads can call ProcessInput independently.

Co-authored-by: l0rinc <pap.lorinc@gmail.com>
Co-authored-by: Hodlinator <172445034+hodlinator@users.noreply.github.com>
Inputs spending outputs of an earlier transaction in the same block won't
be in the cache or the db. They also won't be requested by FetchCoinFromBase,
so we can filter them out to not waste time trying to fetch them.

Build an unordered set of seen txids while flattening m_inputs and skip
any prevout whose hash is already in the set. The set is held as a
member so its bucket array is reused across blocks, and cleared once the
input vector is built.

Co-authored-by: l0rinc <pap.lorinc@gmail.com>
Provides a worst-case upper bound on the number of inputs that can fit in
a block, so callers (e.g. parallel input prefetching) can pre-allocate
stable storage and rule out reallocation of per-input state.

Cherry-picked from PR bitcoin#9938 (Lock-Free CheckQueue), with MAX_TXINS_PER_BLOCK
renamed to MAX_INPUTS_PER_BLOCK to match the call site.

Co-authored-by: Jeremy Rubin <jeremy.l.rubin@gmail.com>
Prepares for ProcessInput to be called from multiple threads.

This flag acts as a memory fence around InputToFetch::coin. There is no lock
guarding reads and writes of the coin field.
Instead we use the flag's release/acquire semantics to ensure that when the
main thread reads the coin it will have happened after a worker thread has
finished writing it.

Co-authored-by: l0rinc <pap.lorinc@gmail.com>
Prepares for ProcessInput to be called from multiple threads.

ProcessInput reads from base. For ProcessInput to be safe to call in parallel
on separate threads, it must not be mutated.
Flush, Sync, and SetBackend can modify base, so we override these and
StopFetching before calling the base class.

Co-authored-by: l0rinc <pap.lorinc@gmail.com>
Add a configuration option for the number of worker threads used for
parallel UTXO input fetching during block connection.

Default is 4 threads, max is 16, 0 disables parallel fetching.
Prepares for ProcessInput to be called from multiple threads.

Introduce a ThreadPool shared pointer to CoinsViewOverlay. A pool managed
externally can be passed in the constructor.

A global thread pool is used in fuzz harnesses since iterations can happen
faster than the OS can create and tear down thread pools.
This can cause a memory leak when fuzzing.

Co-authored-by: l0rinc <pap.lorinc@gmail.com>
Leverages the thread pool to fetch inputs on multiple threads, while the overlay
serves inputs on the main thread.

This is a performance improvement over blocking the main thread to fetch inputs.

Co-authored-by: l0rinc <pap.lorinc@gmail.com>
Co-authored-by: l0rinc <pap.lorinc@gmail.com>
Co-authored-by: l0rinc <pap.lorinc@gmail.com>
Co-authored-by: sedited <seb.kung@gmail.com>
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 2, 2026

Benchmark Results

Comparison to nightly master (median of last 7 runs):

  • 450 MB: 30 min (nightly median of 7: 44 min, 2026-04-23 to 2026-04-30) → +30.3% faster
  • 32000 MB: 32 min (nightly median of 7: 38 min, 2026-04-23 to 2026-04-30) → +14.3% faster
  • noav-450 MB: 63 min (no nightly baseline)
  • noav-32000 MB: 66 min (no nightly baseline)

View detailed results
View nightly trend chart

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants