(feat): tracing support for state-sync transactions#2172
(feat): tracing support for state-sync transactions#2172
Conversation
Add live tracing and refactor tracing methods for state-sync transactions. Co-authored-by: Milan Došlić <milan.doslic@tenderly.co>
There was a problem hiding this comment.
Claude Code Review
This repository is configured for manual code reviews. Comment @claude review to trigger a review and subscribe this PR to future pushes, or @claude review once for a one-time review.
Tip: disable this comment in your organization's Code Review settings.
|
@claude review |
|
Claude finished @manav2401's task in 3m 51s —— View job Code Review: State-Sync Tracing Support
SummaryGood PR overall. The refactoring consolidates state-sync tracing into the standard tracing codepaths (removing the separate Below are the findings, organized by severity. Issues1. [HIGH]
|
There was a problem hiding this comment.
Pull request overview
Adds Bor-specific tracing support so state-sync (bridge) transactions are included in VM/live traces, and refactors tracer codepaths to treat state-sync txs as first-class transactions (post-Madhugiri). Also updates consensus “system tx” execution to propagate VM tracer config and adds end-to-end tests that replay state-sync events via the StateReceiver contract.
Changes:
- Extend tracing APIs to replay and trace post-Madhugiri
StateSyncTxexecution, including correct log indexing/cumulative context plumbing. - Propagate
vmtraceVM config into Bor consensus system calls (span/state commits) so live tracing covers bridge processing. - Add E2E state-sync tracing tests using actual StateReceiver bytecode and update docs/config defaults for VM tracing.
Reviewed changes
Copilot reviewed 36 out of 37 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/bor/mocks/IHeimdallClient.go | Regenerated mocks (mockgen metadata/any updates). |
| tests/bor/helper.go | Update mock expectations for CommitSpan signature change. |
| tests/bor/bor_test.go | Update mock expectations for CommitSpan signature change. |
| params/protocol_params.go | Introduce params.BorSystemAddress for Bor system transactions. |
| internal/cli/server/service.go | Switch DebugBlock tracing implementation to tracer API’s TraceBlockByNumber. |
| internal/cli/server/flags.go | Clarify vmtrace flag usage text. |
| internal/cli/server/config.go | Extend/reorder defaults (vmtrace/jsonconfig, rpc limits, sync/gc defaults, etc.). |
| eth/tracers/internal/tracetest/supply_test.go | Update expected supply trace fields (issuance reward). |
| eth/tracers/dir.go | Add Bor-specific context fields for state-sync tracing (cumulative gas/log index). |
| eth/tracers/api.go | Major tracer refactor: state-sync tx replay/tracing, new traceTx return signature, removal of BorTraceEnabled paths. |
| eth/tracers/api_test.go | Remove IO tracing dump test and related helpers. |
| eth/tracers/api_statesync_test.go | Add E2E tests for tracing state-sync txs via StateReceiver + target contract. |
| eth/tracers/api_bor.go | Remove Bor-only tracing API (TraceBorBlock) implementation. |
| eth/state_accessor.go | Add bounds check for txIndex in stateAtTransaction. |
| eth/filters/IIterator.go | Regenerated mocks (mockgen metadata/structure). |
| eth/filters/IDatabase.go | Regenerated mocks + parameter naming cleanup/any updates. |
| eth/filters/IBatch.go | Regenerated mocks + parameter naming cleanup/any updates. |
| eth/filters/IBackend.go | Regenerated mocks + interface additions reflected in mock. |
| eth/ethconfig/config.go | Use centralized Bor ABI package for validator set ABI. |
| eth/backend.go | Pass VM tracer config into Bor engine for system tx tracing (SetVMConfig). |
| docs/cli/server.md | Document updated vmtrace / vmtrace.jsonconfig defaults. |
| docs/cli/example_config.toml | Add VM tracing example config entries. |
| docs/cli/default_config.toml | Default vmtrace.jsonconfig to {} instead of empty string. |
| core/types/transaction.go | Add Transaction.GetStateSyncData() helper for StateSyncTx. |
| core/state_processor.go | Fire tracer hooks around state-sync tx processing and pass hooked state into Finalize. |
| core/parallel_state_processor.go | Fire tracer hooks around state-sync tx processing in parallel processor. |
| consensus/bor/statefull/processor.go | Add replay helper ApplyStateSyncEvents and plumb VM config into system calls. |
| consensus/bor/span.go | Extend Spanner.CommitSpan signature to accept vm.Config. |
| consensus/bor/span_mock.go | Regenerated mock for updated Spanner interface. |
| consensus/bor/heimdall/span/spanner.go | Pass VM config through to system call execution; use centralized SystemTxGas constant. |
| consensus/bor/genesis.go | Extend GenesisContract CommitState signature to accept vm.Config. |
| consensus/bor/genesis_contract_mock.go | Regenerated mock for updated GenesisContract interface. |
| consensus/bor/contract/client.go | Move ABI + SystemTxGas constants to consensus/bor/abi. |
| consensus/bor/bor.go | Store VM config in Bor engine and use it for span/state system transactions. |
| consensus/bor/bor_test.go | Update fakes/mocks for new CommitSpan / CommitState signatures. |
| consensus/bor/abi/common.go | New centralized Bor ABI + SystemTxGas definitions. |
| cmd/keeper/go.sum | Dependency checksum updates. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Code ReviewFound 3 issues: 1. Bug: Duplicate import causes compilation failure - consensus/bor/heimdall/span/spanner.go:13-14 The same package consensus/bor/abi is imported twice (once as abi, once as borabi). Go does not allow this. Fix: Remove the unaliased import and use borabi consistently. 2. Bug: OnTxEnd called without guard - panics on empty blocks - core/state_processor.go:176-181, core/parallel_state_processor.go:448-453 OnTxEnd is inside the IsMadhugiri block but NOT guarded by appliedNewStateSyncReceipt. On empty blocks, receipts[len(receipts)-1] panics with index out of range. On blocks without state-sync tx, OnTxEnd fires without matching OnTxStart. Fix: Move OnTxEnd inside the appliedNewStateSyncReceipt guard. 3. Bug: time.Now() produces non-deterministic traces - consensus/bor/statefull/processor.go:181 ApplyStateSyncEvents uses time.Now().Unix() as syncTime, but production path uses deterministic event.Time from Heimdall. Makes trace results non-reproducible. Root cause: StateSyncData lacks a Time field. Fix: Add Time field to StateSyncData and use it instead of time.Now(). |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 36 out of 37 changed files in this pull request and generated 5 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Codecov Report❌ Patch coverage is ❌ Your patch check has failed because the patch coverage (29.30%) is below the target coverage (90.00%). You can increase the patch coverage or adjust the target coverage. Additional details and impacted files@@ Coverage Diff @@
## develop #2172 +/- ##
===========================================
- Coverage 51.93% 51.88% -0.05%
===========================================
Files 884 885 +1
Lines 155446 155440 -6
===========================================
- Hits 80731 80652 -79
- Misses 69509 69591 +82
+ Partials 5206 5197 -9
... and 17 files with indirect coverage changes
🚀 New features to boost your workflow:
|
|
@claude review |
|
Claude finished @manav2401's task in 2m 53s —— View job Code Review: State-Sync Tracing Support
I reviewed the latest state of the PR (head Issues1. [MEDIUM]
|
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 37 out of 38 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 37 out of 38 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 40 out of 41 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // Trace state-sync transaction (if present and if tracer is enabled) | ||
| if len(txs) > 0 && txs[len(txs)-1].Type() == types.StateSyncTxType { | ||
| if hooks := evm.Config.Tracer; hooks != nil && hooks.OnTxStart != nil { | ||
| hooks.OnTxStart(evm.GetVMContext(), txs[len(txs)-1], params.BorSystemAddress) | ||
| } | ||
| } |
There was a problem hiding this comment.
Live tracers can receive multiple independent EVM root calls (depth==0) during a single state-sync transaction because Bor processes each event via a separate system CALL. At least the built-in live supply tracer assumes exactly one depth==0 frame per tx (it finalizes on the first root exit), so multiple root frames will cause incorrect accounting (e.g., double-counting or ignoring later events). Consider wrapping the tracer hooks for the state-sync tx similarly to NewStateSyncTxnTracer (synthetic root + depth shift), or alternatively invoke OnTxStart/OnTxEnd per event instead of once per state-sync tx so live tracers remain correct.
| if len(txs) > 0 && txs[len(txs)-1].Type() == types.StateSyncTxType { | ||
| if hooks := vmenv.Config.Tracer; hooks != nil && hooks.OnTxStart != nil { | ||
| hooks.OnTxStart(vmenv.GetVMContext(), txs[len(txs)-1], params.BorSystemAddress) | ||
| } | ||
| } |
There was a problem hiding this comment.
Same as StateProcessor: for state-sync transactions, Bor may execute multiple independent root CALLs (depth==0) as it commits each event separately. Calling OnTxStart once here means live tracers that assume a single root call per tx (e.g., eth/tracers/live/supply.go) can produce incorrect results when multiple events exist. Consider wrapping the tracer hooks for the state-sync tx with a synthetic root/depth shift (like NewStateSyncTxnTracer) or scoping OnTxStart/OnTxEnd to each event call.
| // Warn in case tx indexer is not done. | ||
| if !api.backend.TxIndexDone() { | ||
| return nil, ethapi.NewTxIndexingError() | ||
| } |
There was a problem hiding this comment.
TraceTransaction now returns errTxNotFound for Bor (fake) transactions that exist in the Bor tx lookup table but are not part of the canonical tx index (pre-Madhugiri state-sync). Previously this path returned an empty trace result, which some callers may rely on for backwards compatibility. Consider preserving the old behavior (return an empty trace result) or returning a more explicit error (e.g., "bor state-sync txs are not traceable pre-Madhugiri") to avoid a silent API behavior change.
| } | |
| } | |
| // Bor state-sync (fake) txs may exist in the Bor tx lookup table without being | |
| // part of the canonical tx index on pre-Madhugiri chains. Return a dedicated | |
| // error instead of the generic not-found to avoid a silent API behavior change. | |
| if rawdb.ReadBorTxLookupEntry(api.backend.ChainDb(), hash) != nil { | |
| return nil, errors.New("bor state-sync txs are not traceable pre-Madhugiri") | |
| } |
|
Code Review (commit f499ceb): Found 3 bugs across 4 locations. (1) nil dereference on .Bor config in IntermediateRoots [api.go:605] and standardTraceBlockToFile [api.go:904]. (2) Unbalanced OnTxStart/OnTxEnd in state_processor.go:159 and parallel_state_processor.go:425. Details in follow-up comments. |
|
Bug [eth/tracers/api.go:605] — Missing nil guard on .Bor (potential nil pointer dereference). api.backend.ChainConfig().Bor is a *params.BorConfig that is nil on non-Bor chains, but is dereferenced unconditionally in IntermediateRoots. The old getAllBlockTransactions had an explicit nil check that was dropped in this refactor. Since IntermediateRoots is RPC-exposed, a malicious caller could trigger a nil deref by tracing a block with a StateSyncTxType tx on a non-Bor node. Fix: guard with 'if borConfig := api.backend.ChainConfig().Bor; borConfig != nil' before accessing borConfig.StateReceiverContract. See also line 938 in the same file which correctly guards this access. |
|
Bug [eth/tracers/api.go:904] — Same missing nil guard on .Bor as above. api.backend.ChainConfig().Bor.StateReceiverContract is accessed without checking if Bor is nil in standardTraceBlockToFile, which is also RPC-exposed. Fix: same pattern — 'if borConfig := api.backend.ChainConfig().Bor; borConfig != nil { stateReceiverAddress = borConfig.StateReceiverContract }'. |
|
Bug [core/state_processor.go:159] — Unbalanced OnTxStart/OnTxEnd tracer lifecycle. OnTxStart fires unconditionally (line 157-159) when the last tx is StateSyncTxType, but OnTxEnd is only called when appliedNewStateSyncReceipt is true (line 176-180). If Finalize does not produce a receipt (stateSyncData is empty), the tx/receipt count mismatch at line 171 causes an error return without ever calling OnTxEnd. This leaves tracers like callTracer in an inconsistent state. Same issue exists in parallel_state_processor.go lines 425-453. Fix: move OnTxStart after Finalize succeeds, or call OnTxEnd before the error return at line 172. |
| func (t *stateSyncTxnTracer) OnTxEnd(receipt *types.Receipt, err error) { | ||
| // Close the synthetic top-level call frame before forwarding OnTxEnd, | ||
| // but only if it was actually created (i.e., at least one OnEnter occurred). | ||
| if t.createdTopLevel && t.tracer.OnExit != nil { | ||
| t.tracer.OnExit(0, nil, 0, err, err != nil) | ||
| } | ||
|
|
||
| if t.tracer.OnTxEnd != nil { | ||
| t.tracer.OnTxEnd(receipt, err) | ||
| } |
There was a problem hiding this comment.
🔴 In stateSyncTxnTracer.OnTxEnd (state_sync_txn_tracer.go:71-73), the inner OnTxEnd is forwarded unconditionally regardless of whether createdTopLevel is true — unlike OnExit on line 67 which is correctly guarded. When a StateSyncTxType transaction carries zero bridge events, ApplyStateSyncEvents returns early and no EVM hooks fire, leaving the inner callTracer's callstack empty; the unconditional OnTxEnd forward then causes a runtime panic (index out of range [0] with length 0) when callTracer.OnTxEnd accesses callstack[0]. Fix by wrapping lines 71-73 with if t.createdTopLevel { ... }, mirroring the existing guard on the OnExit call above.
Extended reasoning...
Bug description
In stateSyncTxnTracer.OnTxEnd (eth/tracers/state_sync_txn_tracer.go:64-73), the synthetic root frame bookkeeping has an asymmetry: the OnExit forwarding on line 67 is guarded by if t.createdTopLevel, but the OnTxEnd forwarding on lines 71-73 is not guarded. Any tracer whose OnTxEnd assumes that at least one call frame was seen (i.e. that OnEnter was called at least once) will misbehave when the wrapper forwards OnTxEnd without a corresponding OnEnter ever being sent.
Code path that triggers it
- A block contains a
StateSyncTxTypetransaction with zero bridge events (empty event list). ApplyStateSyncEventsinconsensus/bor/statefull/processor.gohits the early-return branch:events := tx.GetStateSyncData() if len(events) == 0 { return &core.ExecutionResult{UsedGas: 0, ReturnData: nil}, nil }
- No
ApplyBorMessageis called, so no EVM hooks (OnEnter/OnExit) fire.createdTopLevelremainsfalse. stateSyncTxnTracer.OnTxEndis called by the tracing infrastructure (everyOnTxStartis unconditionally paired withOnTxEnd).- The
OnExitblock (line 67) is skipped correctly because\!t.createdTopLevel. - The
OnTxEndblock (lines 71-73) is not skipped and forwards to the inner tracer.
Why existing code does not prevent it
The outer if t.tracer.OnTxEnd \!= nil check (line 71) only guards against a nil function pointer; it does not account for the invariant that the inner tracer may require at least one call frame to have been started before OnTxEnd is called.
Impact
For the callTracer (eth/tracers/native/call.go), callstack is initialized as make([]callFrame, 0, 1) — an empty slice. OnTxEnd unconditionally reads t.callstack[0] when err == nil and receipt \!= nil:
t.callstack[0].GasUsed = receipt.GasUsed // panic: index out of range [0] with length 0This causes a runtime panic that crashes the tracing goroutine (or the entire node if unrecovered) whenever a user issues a debug_traceBlock / debug_traceTransaction call that covers a block containing a 0-event state-sync transaction.
Step-by-step proof
- Transaction
Tis aStateSyncTxTypewithGetStateSyncData()returning an empty slice. - Block
Bis processed; the tracing framework callsstateSyncTxnTracer.OnTxStart(env, T, from)— inner tracer receives it and initializescallstack = []. ApplyStateSyncEventsis called, seeslen(events) == 0, returns immediately — zero EVM hooks fire.createdTopLevelis stillfalse.- Tracing framework calls
stateSyncTxnTracer.OnTxEnd(receipt, nil). - Line 67:
t.createdTopLevelisfalse→OnExitis skipped. ✓ - Line 71:
t.tracer.OnTxEnd \!= nilistrue→t.tracer.OnTxEnd(receipt, nil)is called. - Inside
callTracer.OnTxEnd:err == nilandreceipt \!= nil→ executest.callstack[0].GasUsed = receipt.GasUsed→ panic: runtime error: index out of range [0] with length 0.
Proposed fix
Wrap the OnTxEnd forwarding with the same createdTopLevel guard already used for OnExit:
func (t *stateSyncTxnTracer) OnTxEnd(receipt *types.Receipt, err error) {
if t.createdTopLevel {
if t.tracer.OnExit \!= nil {
t.tracer.OnExit(0, nil, 0, err, err \!= nil)
}
if t.tracer.OnTxEnd \!= nil {
t.tracer.OnTxEnd(receipt, err)
}
}
}This ensures the inner tracer's call-frame lifecycle invariant (OnTxStart → OnEnter → OnExit → OnTxEnd) is preserved: if no OnEnter was ever sent, neither OnExit nor OnTxEnd is forwarded.
| if isStateSyncPresent { | ||
| tx := txs[len(txs)-1] | ||
| msg, _ := core.TransactionToMessage(tx, signer, block.BaseFee()) | ||
| txctx := &Context{ | ||
| BlockHash: blockHash, | ||
| BlockNumber: block.Number(), | ||
| TxIndex: len(txs) - 1, | ||
| TxHash: tx.Hash(), | ||
| CumulativeGasUsed: cumulativeGasUsed, | ||
| LogIndex: len(statedb.Logs()), | ||
| } | ||
| res, _, err := api.traceTx(ctx, tx, msg, txctx, blockCtx, statedb, config, nil) | ||
| if err != nil { | ||
| results[txctx.TxIndex] = &txTraceResult{TxHash: tx.Hash(), Error: err.Error()} | ||
| } else { | ||
| results[txctx.TxIndex] = &txTraceResult{TxHash: tx.Hash(), Result: res} | ||
| } | ||
| } |
There was a problem hiding this comment.
🔴 In traceBlockParallel (api.go:773-825), the txloop sets isStateSyncPresent = true for any StateSyncTxType at index i, but the post-loop section unconditionally uses txs[len(txs)-1] and TxIndex: len(txs)-1, hardcoding the assumption that state-sync is always the last transaction. When debug_traceBadBlock is called on a malformed bad block where the state-sync tx is at position j < len(txs)-1, this produces three errors: (1) results[j] stays nil (loop skipped it with continue), (2) results[len(txs)-1] is overwritten with an incorrect trace for the wrong tx, and (3) worker threads traced txs at positions j+1..len(txs)-1 against a statedb that was never advanced by the state-sync effects. The fix is to record the actual state-sync index during the loop (e.g. stateSyncIdx = i) and use it in the post-loop section instead of len(txs)-1.
Extended reasoning...
What the bug is and how it manifests
In traceBlockParallel (api.go lines 770–825), the main txloop iterates over all transactions. When a StateSyncTxType transaction is found at position i, the code sets isStateSyncPresent = true and continues — it never sends the tx to the worker goroutines and never records the actual index. After the loop completes, the post-loop section unconditionally runs:
if isStateSyncPresent {
tx := txs[len(txs)-1] // hardcoded: assumes last tx is the state-sync tx
txctx := &Context{
TxIndex: len(txs) - 1, // hardcoded
...
}
...
}This works correctly on canonical Bor blocks, where the consensus layer enforces that any state-sync tx is always placed last (validated in bor.go). However, traceBlockParallel is reachable via TraceBadBlock (api.go:527→533→traceBlock→traceBlockParallel for JS tracers). Bad blocks are blocks that failed validation and could have state-sync at any position.
The specific code path that triggers it
A caller invokes debug_traceBadBlock with a JS tracer on a bad block whose body has a StateSyncTxType tx at position j < len(txs)-1. Execution reaches traceBlockParallel:
- At index
j:tx.Type() == StateSyncTxType→isStateSyncPresent = true; continue. No worker task is queued for indexj.results[j]remains nil. - Indexes
j+1throughlen(txs)-1: worker goroutines trace these txs. They received statedb snapshots taken before the state-sync effects were applied (the main thread skipped it), so their state is incorrect. - After the loop: the post-loop block runs with
txs[len(txs)-1]— a regular tx at the last position, not the actual state-sync tx. It is traced as a state-sync tx (viaApplyStateSyncEvents) with wrong hash, wrong index, and wrong cumulative gas.results[len(txs)-1]is overwritten, clobbering the worker's correct result.
Why existing code does not prevent it
The entire Bor codebase (including state_processor.go:156 and parallel_state_processor.go:425) consistently assumes state-sync is the last tx. This is enforced for valid blocks by the consensus layer. There is no guard in traceBlockParallel that validates the last tx is actually StateSyncTxType before using len(txs)-1 as the index. The old code (getAllBlockTransactions) at least fetched the last tx and checked its type explicitly, but the new parallel path introduced by this PR removed that guard.
What the impact would be
This is a correctness bug in the forensic debugging API (debug_traceBadBlock). For any bad block with a malformed tx ordering: (1) the trace result contains a nil entry at the actual state-sync position, producing a null JSON element in the response; (2) a correctly-traced regular tx result is silently overwritten with an incorrect state-sync trace; and (3) multiple traces of txs after position j are computed from incorrect state, making them unreliable for incident investigation. Normal chain operation and canonical block tracing are completely unaffected.
How to fix it
Change isStateSyncPresent bool to stateSyncIdx = -1 (or similar sentinel), then in the txloop:
if tx.Type() == types.StateSyncTxType {
stateSyncIdx = i
continue
}And in the post-loop section:
if stateSyncIdx >= 0 {
tx := txs[stateSyncIdx]
txctx := &Context{TxIndex: stateSyncIdx, ...}
...
}This eliminates the hardcoded len(txs)-1 assumption.
Step-by-step proof
Suppose a bad block has 3 transactions: [txA (regular), txB (StateSyncTxType), txC (regular)]. Indexes: A=0, B=1, C=2.
- txloop i=0 (txA): regular tx — task queued for worker, statedb snapshot taken. Worker traces txA correctly.
results[0]= correct trace. - txloop i=1 (txB): StateSyncTxType —
isStateSyncPresent = true; continue. No task queued.results[1]stays nil. - txloop i=2 (txC): regular tx — task queued for worker. Worker snapshot was taken from statedb after txA but without txB's effects. Worker traces txC against incorrect state.
results[2]= incorrectly-computed trace. - Post-loop:
isStateSyncPresent == true→tx = txs[2](txC, the wrong tx). The post-loop code traces txC as a state-sync tx with TxIndex=2, wrong TxHash, wrong CumulativeGasUsed.results[2]is overwritten, destroying the worker's result. - Final:
results = [correct trace for txA, nil, wrong state-sync trace of txC]. Both txB and txC are incorrectly represented.



Description
Extension of #2062 from @milando12. Created a new one to run Kurtosis workflows which wouldn't run otherwise on a fork.
This PR
The reason why we add a custom tracer is because state-sync transactions can contain multiple bridge events and all of them are executed as separate EVM calls which are packed into 1 transaction. This breaks assumptions for certain tracer e.g. in
callTracer, it should only have 1 root call frame per transaction. This would be invalid for state-sync transactions with multiple state-sync events (because it would have multiple independent root call frames). Hence, referencing the erigon implementation, this PR adds a wrapper tracer which crafts a single synthetic root call frame and all bridge event calls are nested sub-calls inside it.Changes
Breaking changes
Please complete this section if any breaking changes have been made, otherwise delete it
Nodes audience
In case this PR includes changes that must be applied only to a subset of nodes, please specify how you handled it (e.g. by adding a flag with a default value...)
Checklist
Cross repository changes
Testing
Manual tests
Please complete this section with the steps you performed if you ran manual tests for this functionality, otherwise delete it
Additional comments
Please post additional comments in this section if you have them, otherwise delete it