Skip to content

perf: cut verifier memory — gate storage_logs trace (Boojum-safe) + consolidate WorldDiff maps#109

Draft
0xVolosnikov wants to merge 8 commits into
masterfrom
vv/memopt-clean
Draft

perf: cut verifier memory — gate storage_logs trace (Boojum-safe) + consolidate WorldDiff maps#109
0xVolosnikov wants to merge 8 commits into
masterfrom
vv/memopt-clean

Conversation

@0xVolosnikov

Copy link
Copy Markdown
Contributor

Summary

Three independent optimizations to reduce WorldDiff verifier memory:

  1. Gate storage_logs recording (dual-mode) — conditional per-access trace
    recording, Boojum-memory-identical, opt-out saves ~270 MiB on large batches.
  2. Consolidate WorldDiff (address, key) maps — merge duplicate keyed
    collections into unified entries with bit flags.
  3. Capacity reservation helpers — pre-allocate to suppress mid-execution
    vector-doubling transients.

Impact

  • Worst-case production batch (size 67912): ~920–952 MiB → ~720 MiB (~200 MiB saved).
  • Cycle-neutral (−0.007% measured).
  • Public API: downstream consumers project .value from the new StorageWriteEntry type.

Changes

Net diff vs master: 7 source files (world_diff.rs, rollback.rs, heap.rs,
vm.rs, tracing.rs, lib.rs, single_instruction_test/heap.rs).
Cargo.lock unchanged; no dependency changes.

@0xVolosnikov 0xVolosnikov requested a review from a team as a code owner July 2, 2026 21:13
@0xVolosnikov 0xVolosnikov requested a review from 0xValera July 2, 2026 21:13
vv-dev-ai and others added 8 commits July 2, 2026 16:19
In zkVM verifier guests where the in-guest heap is tight (768 MiB on
the eravm-airbender-verifier corpus), `WorldDiff::storage_logs` grew
to ~220 MiB on real-world batches and `rollback_storage_logs` added
another ~50 MiB — the largest single-Vec contribution to guest peak
memory. The accumulated trace is consumed only by
`circuit_sequencer_api::sort_storage_access_queries` to derive a
per-slot summary, and that summary can be derived directly from the
existing rollback-aware maps without the per-access trace.

This PR:

* Stops pushing per-access entries to `storage_logs` and
  `rollback_storage_logs` in `read_storage_inner` and `write_storage`.
* Caches initial values on first read in `storage_initial_values`
  (previously only writes populated it; reads went through
  `just_read_storage` which doesn't cache). This is needed because
  downstream summarizers can no longer recover the initial value from
  the storage_logs trace.
* Adds `WorldDiff::committed_reads_at_depth_zero` — a
  `RollbackableSet<(H160, U256)>` that materializes the dedup
  function's `did_read_at_depth_zero` predicate incrementally:
  a slot is added by `read_storage_inner` iff `storage_changes`
  doesn't contain it at the time of read (i.e. no pending write for
  that slot). Rolled back together with the other "committed"
  trackers in `external_rollback`; not rolled back by internal
  `rollback` (matches the storage_logs behavior the dedup observed).
* Public accessors:
  - `WorldDiff::reserve_storage_log_capacity` / `reserve_auxiliary_log_capacity`
    — reserve the inner Vecs from witness counts (avoids the
    doubling-realloc transients that double peak memory).
  - `WorldDiff::committed_reads_at_depth_zero_iter`
  - `WorldDiff::initial_storage_value(contract, key) -> Option<StorageSlot>`
  - `WorldDiff::read_storage_slots_iter`
  - `Heaps::reserve_dynamic_groups` + `VirtualMachine::reserve_dynamic_heap_capacity`
  - `RollbackableLog::reserve`
* Together with the consumer changes in
  matter-labs/eravm-airbender-verifier#18 and the
  zksync-protocol PR (linked from there), per-batch guest peak drops
  from 1.16 GiB to ~700 MiB.

## Breaking changes (intentional, want feedback)

`storage_log_queries()` and `storage_logs_after()` now return empty
slices in steady state. The only in-repo consumer
(`circuit_sequencer_api::sort_storage_access_queries` via `vm_fast`
and `vm_latest`) is rewired in the linked PRs. Externally, anyone
relying on the per-access trace for witness generation will be
affected.

If we want to land this without breaking existing users, the
`storage_logs` accumulation should be gated by a `Settings` flag
(opt-out) or by a constructor variant. Happy to add that based on
review feedback.

## Status

Draft, posted for discussion. Functional on the eravm-airbender-verifier
corpus end-to-end through `verify()` — output matches the original
`sort_storage_access_queries` count exactly (10729 entries on batch
67901, 8817 on batch 67911) thanks to the
`committed_reads_at_depth_zero` predicate matching the dedup's
`did_read_at_depth_zero` semantics.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The prior commit removed the per-access `storage_logs` / `rollback_storage_logs`
trace unconditionally to save ~270 MiB in the Airbender re-execution verifier.
That breaks any consumer that builds an in-circuit storage argument from
`storage_log_queries()` (Boojum witness generation via
`sort_storage_access_queries`), and it left 5 storage-log tests failing.

Make recording configurable instead:

- Add `WorldDiff::set_record_storage_logs(record)` and a `skip_storage_logs`
  flag (default `false` = recording ON, preserving the pre-existing Boojum
  behavior). `read_storage_inner` / `write_storage` gate the trace pushes on it.
- Re-execution verifiers with no in-circuit storage argument (Airbender) call
  `set_record_storage_logs(false)` to derive the deduplicated set from
  `committed_reads_at_depth_zero` + `storage_changes` and drop the trace cost.

All 54 lib tests pass, including the storage-log trace tests (restored under
the default record mode) plus a new test locking the opt-out path.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Doc-only + formatting, no behavior change (54 lib tests pass):
- add missing backticks around code identifiers; reflow a doc line so a leading
  '+' isn't parsed as a markdown list bullet (clippy doc lints)
- rustfmt: collapse a method-chain and wrap a long test assertion

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…#105)

Reduces `WorldDiff` memory on large mainnet batches (consumer:
matter-labs/eravm-airbender-verifier#18) by removing duplicated
`(address, key)` keys across its maps. **Stacked on #104**
(`vv/memopt-on-popzxc`).

> **Scope note:** this PR now contains **both** consolidation steps —
the experimental "Group A" (#106) was merged into this branch, so it's
no longer separate. Both are described below.

## Changes
- **Group B — merge the three membership sets** (`read_storage_slots`,
`written_storage_slots`, `committed_reads_at_depth_zero`) into one
`slot_flags: RollbackableMap<(H160,U256), u8>` of bit flags.
External-rollback semantics preserved; public
`committed_reads_at_depth_zero_iter` kept (now filters the flag).
- **Group A — merge the two internal-rollback write maps**
(`storage_changes: U256` + `paid_changes: u32`) into one
`storage_writes: RollbackableMap<(H160,U256), StorageWriteEntry { value,
paid }>`. `transient_storage_changes` left separate (distinct keyspace).

## ⚠️ Public API change
`WorldDiff::get_storage_state()` now returns `&BTreeMap<_,
StorageWriteEntry>` (was `…, U256>`), and `StorageWriteEntry` is
re-exported from the crate root. **Direct callers must project
`.value`.** The downstream consumer is eravm-airbender-verifier's
`vm_fast` (2 sites) — its vm2 pin bump must land together with that
`.value` projection.

## Correctness
- Rollback groups unchanged (`slot_flags` external; `storage_writes`
internal).
- `write_storage` does a single insert per path; `prepaid` reuses the
prior entry value (no redundant lookup/history).
- **55 lib tests pass**, including the `storage_changes_*` proptests,
the Boojum storage-log trace tests, and a new
`merged_storage_write_tracks_paid_and_rolls_back` covering non-zero
`prepaid` + rollback.

## Measured impact
On the eravm guest, the worst-case production batch (67912) drops from
needing ~920–952 MiB to **fitting at ~720 MiB** (~200 MiB off peak) —
turning a <32 MiB margin into comfortable headroom.

## Review items addressed
P2 (re-export `StorageWriteEntry`), P3 (single insert in
`write_storage`; add non-zero-paid test), and stale doc-comment
field-name refs — all in `1044b47`. P1 (scope) addressed by this
description; the API break is called out above for the coordinated eravm
change.

---------

Co-authored-by: Claude Fable 5 <noreply@anthropic.com>
P2: gate the read-only storage_initial_values cache and the
SLOT_COMMITTED_READ_Z0 predicate behind opt-out mode. In recording
(Boojum) mode read_storage_inner now reads via just_read_storage exactly
like the pre-optimization base, so no per-read map entries are added and
memory behavior is unchanged.

P3: assert no storage access has happened when set_record_storage_logs is
called, so a mid-run toggle panics instead of silently producing a partial
trace / dedup state.

Add regression tests for both.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Dedup the skip_storage_logs field doc against the set_record_storage_logs
method doc, and tighten the read_storage_inner branch comments. No code
change.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
slot_add_flag did a get + conditional insert (two BTreeMap traversals, plus
a key clone and journal push on change). Add RollbackableMap::add_flags: an
entry-based OR-merge that traverses once and journals only when a bit
actually changes. Rollback semantics are identical (journals (key, Some(old))
on an existing entry, (key, None) on a fresh one; nothing when unchanged).

Recovers most of the ~0.24% cycle overhead the map consolidation added on
the verifier's storage-heavy path.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Merging value+paid into one storage_writes entry made write_storage read the
prior entry before rewriting it (a separate get on every write, doubling the
storage-map ops on the free-storage path). Remove it:

- non-free writes take `prepaid` from RollbackableMap::insert's returned old
  value instead of a standalone lookup;
- free writes use a new single-traversal RollbackableMap::update that journals
  the prior value and recomputes the entry in place.

Behavior and rollback journaling are identical; one fewer BTreeMap traversal
per write. Targets the ~230M-cycle Group-A overhead measured on batch 67912.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@0xVolosnikov 0xVolosnikov requested review from shamatar and removed request for shamatar July 2, 2026 21:20
@0xVolosnikov 0xVolosnikov marked this pull request as draft July 2, 2026 22:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants