Skip to content

(feat): tracing support for state-sync transactions#2172

Open
manav2401 wants to merge 18 commits intodevelopfrom
state-sync-tracing
Open

(feat): tracing support for state-sync transactions#2172
manav2401 wants to merge 18 commits intodevelopfrom
state-sync-tracing

Conversation

@manav2401
Copy link
Copy Markdown
Member

@manav2401 manav2401 commented Mar 31, 2026

Description

Extension of #2062 from @milando12. Created a new one to run Kurtosis workflows which wouldn't run otherwise on a fork.

This PR

  • Adds live tracing support for state-sync transactions. This means if someone has live tracing enabled while running a node, the tracing will also cover the state-sync transactions which contains the bridge events.
  • Refactors almost all tracing methods as state-sync transactions weren't handled correctly.
  • Adds a new custom tracer for state-sync transaction which wraps the main tracer. More explanation on this below.
  • Adds e2e tests to actually test a state-sync transaction trace via state receiver contract instead of using mock data.

The reason why we add a custom tracer is because state-sync transactions can contain multiple bridge events and all of them are executed as separate EVM calls which are packed into 1 transaction. This breaks assumptions for certain tracer e.g. in callTracer, it should only have 1 root call frame per transaction. This would be invalid for state-sync transactions with multiple state-sync events (because it would have multiple independent root call frames). Hence, referencing the erigon implementation, this PR adds a wrapper tracer which crafts a single synthetic root call frame and all bridge event calls are nested sub-calls inside it.

Changes

  • Bugfix (non-breaking change that solves an issue)
  • Hotfix (change that solves an urgent issue, and requires immediate attention)
  • New feature (non-breaking change that adds functionality)
  • Breaking change (change that is not backwards-compatible and/or changes current functionality)
  • Changes only for a subset of nodes

Breaking changes

Please complete this section if any breaking changes have been made, otherwise delete it

Nodes audience

In case this PR includes changes that must be applied only to a subset of nodes, please specify how you handled it (e.g. by adding a flag with a default value...)

Checklist

  • I have added at least 2 reviewer or the whole pos-v1 team
  • I have added sufficient documentation in code
  • I will be resolving comments - if any - by pushing each fix in a separate commit and linking the commit hash in the comment reply
  • Created a task in Jira and informed the team for implementation in Erigon client (if applicable)
  • Includes RPC methods changes, and the Notion documentation has been updated

Cross repository changes

  • This PR requires changes to heimdall
    • In case link the PR here:
  • This PR requires changes to matic-cli
    • In case link the PR here:

Testing

  • I have added unit tests
  • I have added tests to CI
  • I have tested this code manually on local environment
  • I have tested this code manually on remote devnet using express-cli
  • I have tested this code manually on amoy
  • I have created new e2e tests into express-cli

Manual tests

Please complete this section with the steps you performed if you ran manual tests for this functionality, otherwise delete it

Additional comments

Please post additional comments in this section if you have them, otherwise delete it

Add live tracing and refactor tracing methods for state-sync transactions.

Co-authored-by: Milan Došlić <milan.doslic@tenderly.co>
Copilot AI review requested due to automatic review settings March 31, 2026 10:26
Copy link
Copy Markdown

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This repository is configured for manual code reviews. Comment @claude review to trigger a review and subscribe this PR to future pushes, or @claude review once for a one-time review.

Tip: disable this comment in your organization's Code Review settings.

@manav2401
Copy link
Copy Markdown
Member Author

@claude review

@claude
Copy link
Copy Markdown

claude bot commented Mar 31, 2026

Claude finished @manav2401's task in 3m 51s —— View job


Code Review: State-Sync Tracing Support

  • Gather context and read changed files
  • Review core tracing changes (eth/tracers/api.go)
  • Review state-sync processor changes (consensus/bor/statefull/processor.go)
  • Review transaction type changes (core/types/transaction.go)
  • Review state processor changes (core/state_processor.go, core/parallel_state_processor.go)
  • Review test changes
  • Review remaining files
  • Post final review

Summary

Good PR overall. The refactoring consolidates state-sync tracing into the standard tracing codepaths (removing the separate api_bor.go and TraceBorBlock endpoint), adds live tracing support via vmConfig propagation through the consensus engine, and uses real StateReceiver contract bytecode in e2e tests. The net deletion of ~700 lines is a positive sign.

Below are the findings, organized by severity.


Issues

1. [HIGH] OnTxEnd called even when no state-sync receipt was added

File: core/state_processor.go:179-181 and core/parallel_state_processor.go:451-453

if hooks := evm.Config.Tracer; hooks != nil && hooks.OnTxEnd != nil {
    hooks.OnTxEnd(receipts[len(receipts)-1], nil)
}

OnTxEnd is called unconditionally after the Madhugiri check, regardless of whether appliedNewStateSyncReceipt is true. If no new state-sync receipt was added (i.e., appliedNewStateSyncReceipt == false), this will still call OnTxEnd with the last regular transaction's receipt, creating a mismatched OnTxStart/OnTxEnd pair. The OnTxStart at line 157 fires only when a state-sync tx is present, but OnTxEnd fires whenever IsMadhugiri is true.

Suggested fix: Guard OnTxEnd with the same condition:

if appliedNewStateSyncReceipt {
    allLogs = append(allLogs, receipts[len(receipts)-1].Logs...)
    if hooks := evm.Config.Tracer; hooks != nil && hooks.OnTxEnd != nil {
        hooks.OnTxEnd(receipts[len(receipts)-1], nil)
    }
}

Similarly, OnTxStart at line 156-159 should also be guarded — it fires if the last tx in the block is a state-sync tx, but that doesn't guarantee Finalize will actually produce a receipt for it. Consider moving both hooks to be conditional on receipt production.

Fix this →


2. [MEDIUM] Duplicate import in consensus/bor/heimdall/span/spanner.go

File: consensus/bor/heimdall/span/spanner.go:13-14

"github.com/ethereum/go-ethereum/consensus/bor/abi"
borabi "github.com/ethereum/go-ethereum/consensus/bor/abi"

The same package is imported twice — once unnamed and once as borabi. Only borabi is used (for borabi.SystemTxGas and borabi.ValidatorSet()). The unused abi import will cause a compile error unless the linter/compiler is lenient about it (it shouldn't be).

Note: Looking more carefully, the abi import was already there before the PR (for the abi.ValidatorSet() call), and the diff adds borabi as an alias. The fix should be to remove the unnamed import and keep only the aliased one, or just use abi.SystemTxGas without the alias.

Fix this →


3. [MEDIUM] traceChain doesn't handle state-sync transactions

File: eth/tracers/api.go:290-313

In traceChain, the inner loop iterates over task.block.Transactions() and calls traceTx for each tx. For state-sync transactions, traceTx calls ApplyStateSyncEvents which sets up a tx context via vmenv.SetTxContext(core.NewEVMTxContext(message)). However, the traceChain worker calls core.TransactionToMessage(tx, signer, ...) for every transaction including state-sync ones. TransactionToMessage uses the signer to recover the sender — for state-sync transactions (which have no signature), this may produce unexpected results or errors that are silently discarded by the _ assignment.

While traceTx internally handles state-sync via the StateSyncTxType branch, the message created by traceChain's outer loop at line 297 isn't meaningful for state-sync txs. This isn't a correctness bug per se (the message is passed through but traceTx overrides the behavior), but it's worth noting.

More importantly: traceBlockParallel (line 715) correctly skips state-sync txs in the parallel phase and traces them sequentially at the end (lines 807-824). But traceChain (line 290) processes all txs including state-sync in the parallel workers without this special handling. This means state-sync txs in traceChain are traced in the worker thread using a copied statedb, which may not have the correct accumulated state from prior transactions (since state-sync must be the last tx and depends on all prior state changes).


4. [MEDIUM] ApplyStateSyncEvents uses time.Now() for sync time

File: consensus/bor/statefull/processor.go:181

now := time.Now().Unix()

Using time.Now() as the syncTime parameter for commitState during tracing means:

  • Different tracing runs of the same block will produce different calldata (different timestamp).
  • This is non-deterministic and means trace results may not be reproducible.

For tracing (replay) purposes, consider using the block header's timestamp instead, which would be available from the EVM's block context. This would make traces deterministic and reproducible.

Fix this →


5. [LOW] GetSystemMessage uses params.MaxTxGas instead of block gas limit

File: consensus/bor/statefull/processor.go:77

Gas: params.MaxTxGas, // should be more than enough for state-sync related syscalls

MaxTxGas is used as the gas limit for the system message in ApplyStateSyncEvents. This is fine for production state-sync (where gas usage is bounded), but the comment acknowledges it's "should be more than enough." The original CommitState path via GenesisContractsClient also uses this value, so at least it's consistent.


6. [LOW] api_bor.go deletion removes TraceBorBlock and DebugBlock type

File: eth/tracers/api_bor.go (deleted), internal/cli/server/service.go:232

The TraceBorBlock RPC method, BlockTraceResult, TxTraceResult, and TraceBlockRequest types are removed. The DebugBlock gRPC handler is updated to use TraceBlockByNumber instead. This is a breaking change for any external callers of the debug_traceBorBlock RPC method. Make sure this is documented and communicated.


7. [LOW] Config reordering in DefaultConfig() changes TOML output order

File: internal/cli/server/config.go:810-1029

The fields in DefaultConfig() have been reordered and new defaults explicitly added (Ethstats, VMTrace, VMTraceJsonConfig, SyncMode, etc.). While functionally equivalent, this changes the output of make docs (the default_config.toml and example_config.toml diffs confirm this). No functional concern, but worth noting in case any tooling depends on field ordering.


Positive Observations

  1. Good refactoring: Removing the separate api_bor.go and consolidating state-sync handling into the standard tracing methods is the right approach. It reduces code duplication and ensures all tracing codepaths handle state-sync consistently.

  2. ABI extraction: Moving ABI definitions from consensus/bor/contract/ to consensus/bor/abi/ is a clean separation that allows the statefull package to use them without circular dependencies.

  3. Bounds check added: The new bounds check in eth/state_accessor.go:268-269 is a good defensive improvement.

  4. Strong test coverage: The e2e tests use real StateReceiver contract bytecode and verify actual EVM execution traces (LOG0 opcode counts, intermediate roots). This is much better than the previous mock-based approach.

  5. Live tracing support: Propagating vmConfig through Bor.SetVMConfigCommitState/CommitSpanstatefull.ApplyMessage is a clean way to enable live tracing for system transactions without major architectural changes.


Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds Bor-specific tracing support so state-sync (bridge) transactions are included in VM/live traces, and refactors tracer codepaths to treat state-sync txs as first-class transactions (post-Madhugiri). Also updates consensus “system tx” execution to propagate VM tracer config and adds end-to-end tests that replay state-sync events via the StateReceiver contract.

Changes:

  • Extend tracing APIs to replay and trace post-Madhugiri StateSyncTx execution, including correct log indexing/cumulative context plumbing.
  • Propagate vmtrace VM config into Bor consensus system calls (span/state commits) so live tracing covers bridge processing.
  • Add E2E state-sync tracing tests using actual StateReceiver bytecode and update docs/config defaults for VM tracing.

Reviewed changes

Copilot reviewed 36 out of 37 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
tests/bor/mocks/IHeimdallClient.go Regenerated mocks (mockgen metadata/any updates).
tests/bor/helper.go Update mock expectations for CommitSpan signature change.
tests/bor/bor_test.go Update mock expectations for CommitSpan signature change.
params/protocol_params.go Introduce params.BorSystemAddress for Bor system transactions.
internal/cli/server/service.go Switch DebugBlock tracing implementation to tracer API’s TraceBlockByNumber.
internal/cli/server/flags.go Clarify vmtrace flag usage text.
internal/cli/server/config.go Extend/reorder defaults (vmtrace/jsonconfig, rpc limits, sync/gc defaults, etc.).
eth/tracers/internal/tracetest/supply_test.go Update expected supply trace fields (issuance reward).
eth/tracers/dir.go Add Bor-specific context fields for state-sync tracing (cumulative gas/log index).
eth/tracers/api.go Major tracer refactor: state-sync tx replay/tracing, new traceTx return signature, removal of BorTraceEnabled paths.
eth/tracers/api_test.go Remove IO tracing dump test and related helpers.
eth/tracers/api_statesync_test.go Add E2E tests for tracing state-sync txs via StateReceiver + target contract.
eth/tracers/api_bor.go Remove Bor-only tracing API (TraceBorBlock) implementation.
eth/state_accessor.go Add bounds check for txIndex in stateAtTransaction.
eth/filters/IIterator.go Regenerated mocks (mockgen metadata/structure).
eth/filters/IDatabase.go Regenerated mocks + parameter naming cleanup/any updates.
eth/filters/IBatch.go Regenerated mocks + parameter naming cleanup/any updates.
eth/filters/IBackend.go Regenerated mocks + interface additions reflected in mock.
eth/ethconfig/config.go Use centralized Bor ABI package for validator set ABI.
eth/backend.go Pass VM tracer config into Bor engine for system tx tracing (SetVMConfig).
docs/cli/server.md Document updated vmtrace / vmtrace.jsonconfig defaults.
docs/cli/example_config.toml Add VM tracing example config entries.
docs/cli/default_config.toml Default vmtrace.jsonconfig to {} instead of empty string.
core/types/transaction.go Add Transaction.GetStateSyncData() helper for StateSyncTx.
core/state_processor.go Fire tracer hooks around state-sync tx processing and pass hooked state into Finalize.
core/parallel_state_processor.go Fire tracer hooks around state-sync tx processing in parallel processor.
consensus/bor/statefull/processor.go Add replay helper ApplyStateSyncEvents and plumb VM config into system calls.
consensus/bor/span.go Extend Spanner.CommitSpan signature to accept vm.Config.
consensus/bor/span_mock.go Regenerated mock for updated Spanner interface.
consensus/bor/heimdall/span/spanner.go Pass VM config through to system call execution; use centralized SystemTxGas constant.
consensus/bor/genesis.go Extend GenesisContract CommitState signature to accept vm.Config.
consensus/bor/genesis_contract_mock.go Regenerated mock for updated GenesisContract interface.
consensus/bor/contract/client.go Move ABI + SystemTxGas constants to consensus/bor/abi.
consensus/bor/bor.go Store VM config in Bor engine and use it for span/state system transactions.
consensus/bor/bor_test.go Update fakes/mocks for new CommitSpan / CommitState signatures.
consensus/bor/abi/common.go New centralized Bor ABI + SystemTxGas definitions.
cmd/keeper/go.sum Dependency checksum updates.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@claude
Copy link
Copy Markdown

claude bot commented Mar 31, 2026

Code Review

Found 3 issues:

1. Bug: Duplicate import causes compilation failure - consensus/bor/heimdall/span/spanner.go:13-14

The same package consensus/bor/abi is imported twice (once as abi, once as borabi). Go does not allow this. Fix: Remove the unaliased import and use borabi consistently.

2. Bug: OnTxEnd called without guard - panics on empty blocks - core/state_processor.go:176-181, core/parallel_state_processor.go:448-453

OnTxEnd is inside the IsMadhugiri block but NOT guarded by appliedNewStateSyncReceipt. On empty blocks, receipts[len(receipts)-1] panics with index out of range. On blocks without state-sync tx, OnTxEnd fires without matching OnTxStart. Fix: Move OnTxEnd inside the appliedNewStateSyncReceipt guard.

3. Bug: time.Now() produces non-deterministic traces - consensus/bor/statefull/processor.go:181

ApplyStateSyncEvents uses time.Now().Unix() as syncTime, but production path uses deterministic event.Time from Heimdall. Makes trace results non-reproducible. Root cause: StateSyncData lacks a Time field. Fix: Add Time field to StateSyncData and use it instead of time.Now().

Copilot AI review requested due to automatic review settings April 2, 2026 10:14
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 36 out of 37 changed files in this pull request and generated 5 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@codecov
Copy link
Copy Markdown

codecov bot commented Apr 2, 2026

Codecov Report

❌ Patch coverage is 29.30175% with 567 lines in your changes missing coverage. Please review.
✅ Project coverage is 51.88%. Comparing base (438596d) to head (f499ceb).

Files with missing lines Patch % Lines
eth/filters/IBackend.go 3.03% 256 Missing ⚠️
eth/tracers/api.go 58.75% 88 Missing and 11 partials ⚠️
eth/filters/IDatabase.go 0.00% 61 Missing ⚠️
consensus/bor/statefull/processor.go 0.00% 52 Missing ⚠️
eth/tracers/state_sync_txn_tracer.go 66.29% 27 Missing and 3 partials ⚠️
eth/filters/IBatch.go 0.00% 16 Missing ⚠️
tests/bor/mocks/IHeimdallClient.go 0.00% 10 Missing ⚠️
core/parallel_state_processor.go 22.22% 7 Missing ⚠️
core/state_processor.go 30.00% 6 Missing and 1 partial ⚠️
core/types/transaction.go 0.00% 5 Missing ⚠️
... and 8 more

❌ Your patch check has failed because the patch coverage (29.30%) is below the target coverage (90.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff             @@
##           develop    #2172      +/-   ##
===========================================
- Coverage    51.93%   51.88%   -0.05%     
===========================================
  Files          884      885       +1     
  Lines       155446   155440       -6     
===========================================
- Hits         80731    80652      -79     
- Misses       69509    69591      +82     
+ Partials      5206     5197       -9     
Files with missing lines Coverage Δ
core/types/bor_receipt.go 0.00% <ø> (ø)
eth/ethconfig/config.go 78.94% <100.00%> (ø)
eth/filters/IIterator.go 0.00% <ø> (ø)
eth/tracers/dir.go 76.47% <ø> (+23.52%) ⬆️
internal/cli/server/config.go 63.87% <100.00%> (+0.02%) ⬆️
internal/cli/server/flags.go 100.00% <100.00%> (ø)
params/protocol_params.go 84.21% <ø> (ø)
tests/bor/helper.go 74.51% <100.00%> (ø)
core/types/transaction_signing.go 51.11% <0.00%> (ø)
consensus/bor/bor.go 83.52% <60.00%> (-0.12%) ⬇️
... and 16 more

... and 17 files with indirect coverage changes

Files with missing lines Coverage Δ
core/types/bor_receipt.go 0.00% <ø> (ø)
eth/ethconfig/config.go 78.94% <100.00%> (ø)
eth/filters/IIterator.go 0.00% <ø> (ø)
eth/tracers/dir.go 76.47% <ø> (+23.52%) ⬆️
internal/cli/server/config.go 63.87% <100.00%> (+0.02%) ⬆️
internal/cli/server/flags.go 100.00% <100.00%> (ø)
params/protocol_params.go 84.21% <ø> (ø)
tests/bor/helper.go 74.51% <100.00%> (ø)
core/types/transaction_signing.go 51.11% <0.00%> (ø)
consensus/bor/bor.go 83.52% <60.00%> (-0.12%) ⬇️
... and 16 more

... and 17 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copilot AI review requested due to automatic review settings April 2, 2026 19:54
@manav2401
Copy link
Copy Markdown
Member Author

@claude review

@claude
Copy link
Copy Markdown

claude bot commented Apr 2, 2026

Claude finished @manav2401's task in 2m 53s —— View job


Code Review: State-Sync Tracing Support

  • Gather context and read changed files
  • Review core tracing changes (eth/tracers/api.go)
  • Review state-sync processor (consensus/bor/statefull/processor.go)
  • Review state processor hooks (core/state_processor.go, core/parallel_state_processor.go)
  • Review remaining changes
  • Post final review

I reviewed the latest state of the PR (head 6758995), including the fix commits that address feedback from the earlier two review rounds. The major issues from prior reviews (duplicate import, OnTxEnd guard, deterministic time, stream context, TransactionIndex in receipt) have all been addressed. Below are the remaining findings.


Issues

1. [MEDIUM] OnTxStart/OnTxEnd pairing broken when appliedNewStateSyncReceipt is false

Files: core/state_processor.go:155-182, core/parallel_state_processor.go:425-453

OnTxStart fires unconditionally whenever the last tx is StateSyncTxType (line 156-160), but OnTxEnd only fires inside if appliedNewStateSyncReceipt (line 176). When Finalize runs but doesn't produce a new receipt (e.g. state-sync data already applied), OnTxStart has no matching OnTxEnd. This violates the tracer hook pairing contract — tracers like callTracer maintain internal stacks that get corrupted by an orphaned OnTxStart.

Additionally, the early return at line 171-173 (receipt count mismatch error) also skips OnTxEnd.

Suggested fix: Either move OnTxStart inside the if appliedNewStateSyncReceipt block, or add OnTxEnd(nil, nil) / OnTxEnd(nil, err) on the other exit paths.

Fix this →


2. [MEDIUM] statedb.Logs() non-deterministic ordering partially mitigated but still fragile

File: eth/tracers/api.go:1221-1225

The sort added at line 1222 (sort.SliceStable by allLogs[i].Index < allLogs[j].Index) compensates for Go's non-deterministic map iteration in statedb.Logs(). However, log Index values are only meaningful within a single transaction — multiple transactions can have logs with the same Index values (both starting from 0 within their tx). The sort produces a stable ordering only because SliceStable preserves relative order of equal elements, but the input order from the map is itself non-deterministic.

A more robust approach would be to use statedb.GetLogs(tx.Hash(), ...) which does a direct map lookup by tx hash and returns only that transaction's logs, eliminating the positional indexing assumption entirely.


3. [MEDIUM] stateAtTransaction doesn't replay prior state-sync txs

File: eth/state_accessor.go:273-289

When TraceTransaction is called for a state-sync tx, stateAtTransaction replays all prior transactions using core.ApplyMessage. But if a prior tx in the same block is also a state-sync tx (which shouldn't happen in practice — state-sync is always last), or more practically, it doesn't call ApplyStateSyncEvents for state-sync tx types. This means the state replayed up to the target index may be incorrect if an earlier tx is StateSyncTxType.

This is low-risk since state-sync is always the last tx, but adding a type check for completeness would be defensive.


4. [MEDIUM] traceChain processes state-sync txs in parallel workers

File: eth/tracers/api.go:290-315

The traceChain worker loop at line 297 iterates over all task.block.Transactions() including state-sync txs, processing them in the parallel worker thread with a copied statedb. Unlike traceBlockParallel (which correctly separates state-sync txs for sequential processing at lines 771-825), traceChain doesn't have this special handling. The state-sync tx gets traced with a potentially stale copied state that may not include the effects of all prior transactions, since state is copied before the worker processes them sequentially.

This could produce incorrect trace results for state-sync txs when using debug_traceChain.


5. [LOW] Missing Bor != nil guard in IntermediateRoots and standardTraceBlockToFile

Files: eth/tracers/api.go:603-606, eth/tracers/api.go:903-905, eth/tracers/api.go:946

Three callsites access api.backend.ChainConfig().Bor.StateReceiverContract without checking Bor != nil. While traceTx correctly guards this at line 1205, these paths don't. In practice, a StateSyncTxType tx implies Bor is configured, but for defensive consistency (especially for IntermediateRoots which explicitly handles bad blocks), adding the guard would be safer.

Note: @manav2401 commented that this is OK since finding a StateSyncTxType implies Bor is running. This is a fair argument for production, but IntermediateRoots explicitly says it handles bad blocks, where invariants may not hold.


6. [LOW] TraceConfig.Path and TraceConfig.IOFlag are dead fields

File: eth/tracers/api.go:168-169

Path and IOFlag remain in the TraceConfig struct but are never read after removing the IO tracing code. Consider removing them or documenting they are no-ops, to avoid misleading API consumers.


7. [LOW] int64 cast for block timestamp in ApplyStateSyncEvents

File: consensus/bor/statefull/processor.go:184

now = int64(vmenv.Context.Time) silently truncates timestamps above math.MaxInt64 (year 2262+). While practically safe, using new(big.Int).SetUint64(vmenv.Context.Time) at line 205 instead of big.NewInt(0).SetInt64(now) would be more correct and avoid the cast entirely.


Positive Observations

  1. Previous review fixes addressed: Duplicate import, OnTxEnd guard, deterministic time, TransactionIndex in receipt, and stream.Context() usage — all addressed in follow-up commits.

  2. Good refactoring: Consolidating api_bor.go into the standard tracing paths and the consensus/bor/abi package extraction are clean improvements.

  3. Strong test coverage: E2e tests with real StateReceiver bytecode and actual EVM execution traces are significantly better than the previous mock-based approach.

  4. CumulativeGasUsed semantics: The comment at line 1232-1234 correctly documents why state-sync gas is not added to cumulative gas, matching production receipt behavior.

  5. Log sort for determinism: The sort.SliceStable addition at line 1222 shows awareness of the map ordering issue, even if a direct lookup would be more robust.


Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 37 out of 38 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot AI review requested due to automatic review settings April 3, 2026 11:27
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 37 out of 38 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot AI review requested due to automatic review settings April 7, 2026 13:42
@sonarqubecloud
Copy link
Copy Markdown

sonarqubecloud bot commented Apr 7, 2026

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 40 out of 41 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +155 to +160
// Trace state-sync transaction (if present and if tracer is enabled)
if len(txs) > 0 && txs[len(txs)-1].Type() == types.StateSyncTxType {
if hooks := evm.Config.Tracer; hooks != nil && hooks.OnTxStart != nil {
hooks.OnTxStart(evm.GetVMContext(), txs[len(txs)-1], params.BorSystemAddress)
}
}
Copy link

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Live tracers can receive multiple independent EVM root calls (depth==0) during a single state-sync transaction because Bor processes each event via a separate system CALL. At least the built-in live supply tracer assumes exactly one depth==0 frame per tx (it finalizes on the first root exit), so multiple root frames will cause incorrect accounting (e.g., double-counting or ignoring later events). Consider wrapping the tracer hooks for the state-sync tx similarly to NewStateSyncTxnTracer (synthetic root + depth shift), or alternatively invoke OnTxStart/OnTxEnd per event instead of once per state-sync tx so live tracers remain correct.

Copilot uses AI. Check for mistakes.
Comment on lines +425 to +429
if len(txs) > 0 && txs[len(txs)-1].Type() == types.StateSyncTxType {
if hooks := vmenv.Config.Tracer; hooks != nil && hooks.OnTxStart != nil {
hooks.OnTxStart(vmenv.GetVMContext(), txs[len(txs)-1], params.BorSystemAddress)
}
}
Copy link

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as StateProcessor: for state-sync transactions, Bor may execute multiple independent root CALLs (depth==0) as it commits each event separately. Calling OnTxStart once here means live tracers that assume a single root call per tx (e.g., eth/tracers/live/supply.go) can produce incorrect results when multiple events exist. Consider wrapping the tracer hooks for the state-sync tx with a synthetic root/depth shift (like NewStateSyncTxnTracer) or scoping OnTxStart/OnTxEnd to each event call.

Copilot uses AI. Check for mistakes.
// Warn in case tx indexer is not done.
if !api.backend.TxIndexDone() {
return nil, ethapi.NewTxIndexingError()
}
Copy link

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TraceTransaction now returns errTxNotFound for Bor (fake) transactions that exist in the Bor tx lookup table but are not part of the canonical tx index (pre-Madhugiri state-sync). Previously this path returned an empty trace result, which some callers may rely on for backwards compatibility. Consider preserving the old behavior (return an empty trace result) or returning a more explicit error (e.g., "bor state-sync txs are not traceable pre-Madhugiri") to avoid a silent API behavior change.

Suggested change
}
}
// Bor state-sync (fake) txs may exist in the Bor tx lookup table without being
// part of the canonical tx index on pre-Madhugiri chains. Return a dedicated
// error instead of the generic not-found to avoid a silent API behavior change.
if rawdb.ReadBorTxLookupEntry(api.backend.ChainDb(), hash) != nil {
return nil, errors.New("bor state-sync txs are not traceable pre-Madhugiri")
}

Copilot uses AI. Check for mistakes.
@claude
Copy link
Copy Markdown

claude bot commented Apr 7, 2026

Code Review (commit f499ceb): Found 3 bugs across 4 locations. (1) nil dereference on .Bor config in IntermediateRoots [api.go:605] and standardTraceBlockToFile [api.go:904]. (2) Unbalanced OnTxStart/OnTxEnd in state_processor.go:159 and parallel_state_processor.go:425. Details in follow-up comments.

@claude
Copy link
Copy Markdown

claude bot commented Apr 7, 2026

Bug [eth/tracers/api.go:605] — Missing nil guard on .Bor (potential nil pointer dereference). api.backend.ChainConfig().Bor is a *params.BorConfig that is nil on non-Bor chains, but is dereferenced unconditionally in IntermediateRoots. The old getAllBlockTransactions had an explicit nil check that was dropped in this refactor. Since IntermediateRoots is RPC-exposed, a malicious caller could trigger a nil deref by tracing a block with a StateSyncTxType tx on a non-Bor node. Fix: guard with 'if borConfig := api.backend.ChainConfig().Bor; borConfig != nil' before accessing borConfig.StateReceiverContract. See also line 938 in the same file which correctly guards this access.

@claude
Copy link
Copy Markdown

claude bot commented Apr 7, 2026

Bug [eth/tracers/api.go:904] — Same missing nil guard on .Bor as above. api.backend.ChainConfig().Bor.StateReceiverContract is accessed without checking if Bor is nil in standardTraceBlockToFile, which is also RPC-exposed. Fix: same pattern — 'if borConfig := api.backend.ChainConfig().Bor; borConfig != nil { stateReceiverAddress = borConfig.StateReceiverContract }'.

@claude
Copy link
Copy Markdown

claude bot commented Apr 7, 2026

Bug [core/state_processor.go:159] — Unbalanced OnTxStart/OnTxEnd tracer lifecycle. OnTxStart fires unconditionally (line 157-159) when the last tx is StateSyncTxType, but OnTxEnd is only called when appliedNewStateSyncReceipt is true (line 176-180). If Finalize does not produce a receipt (stateSyncData is empty), the tx/receipt count mismatch at line 171 causes an error return without ever calling OnTxEnd. This leaves tracers like callTracer in an inconsistent state. Same issue exists in parallel_state_processor.go lines 425-453. Fix: move OnTxStart after Finalize succeeds, or call OnTxEnd before the error return at line 172.

Comment on lines +64 to +73
func (t *stateSyncTxnTracer) OnTxEnd(receipt *types.Receipt, err error) {
// Close the synthetic top-level call frame before forwarding OnTxEnd,
// but only if it was actually created (i.e., at least one OnEnter occurred).
if t.createdTopLevel && t.tracer.OnExit != nil {
t.tracer.OnExit(0, nil, 0, err, err != nil)
}

if t.tracer.OnTxEnd != nil {
t.tracer.OnTxEnd(receipt, err)
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 In stateSyncTxnTracer.OnTxEnd (state_sync_txn_tracer.go:71-73), the inner OnTxEnd is forwarded unconditionally regardless of whether createdTopLevel is true — unlike OnExit on line 67 which is correctly guarded. When a StateSyncTxType transaction carries zero bridge events, ApplyStateSyncEvents returns early and no EVM hooks fire, leaving the inner callTracer's callstack empty; the unconditional OnTxEnd forward then causes a runtime panic (index out of range [0] with length 0) when callTracer.OnTxEnd accesses callstack[0]. Fix by wrapping lines 71-73 with if t.createdTopLevel { ... }, mirroring the existing guard on the OnExit call above.

Extended reasoning...

Bug description

In stateSyncTxnTracer.OnTxEnd (eth/tracers/state_sync_txn_tracer.go:64-73), the synthetic root frame bookkeeping has an asymmetry: the OnExit forwarding on line 67 is guarded by if t.createdTopLevel, but the OnTxEnd forwarding on lines 71-73 is not guarded. Any tracer whose OnTxEnd assumes that at least one call frame was seen (i.e. that OnEnter was called at least once) will misbehave when the wrapper forwards OnTxEnd without a corresponding OnEnter ever being sent.

Code path that triggers it

  1. A block contains a StateSyncTxType transaction with zero bridge events (empty event list).
  2. ApplyStateSyncEvents in consensus/bor/statefull/processor.go hits the early-return branch:
    events := tx.GetStateSyncData()
    if len(events) == 0 {
        return &core.ExecutionResult{UsedGas: 0, ReturnData: nil}, nil
    }
  3. No ApplyBorMessage is called, so no EVM hooks (OnEnter/OnExit) fire. createdTopLevel remains false.
  4. stateSyncTxnTracer.OnTxEnd is called by the tracing infrastructure (every OnTxStart is unconditionally paired with OnTxEnd).
  5. The OnExit block (line 67) is skipped correctly because \!t.createdTopLevel.
  6. The OnTxEnd block (lines 71-73) is not skipped and forwards to the inner tracer.

Why existing code does not prevent it

The outer if t.tracer.OnTxEnd \!= nil check (line 71) only guards against a nil function pointer; it does not account for the invariant that the inner tracer may require at least one call frame to have been started before OnTxEnd is called.

Impact

For the callTracer (eth/tracers/native/call.go), callstack is initialized as make([]callFrame, 0, 1) — an empty slice. OnTxEnd unconditionally reads t.callstack[0] when err == nil and receipt \!= nil:

t.callstack[0].GasUsed = receipt.GasUsed  // panic: index out of range [0] with length 0

This causes a runtime panic that crashes the tracing goroutine (or the entire node if unrecovered) whenever a user issues a debug_traceBlock / debug_traceTransaction call that covers a block containing a 0-event state-sync transaction.

Step-by-step proof

  1. Transaction T is a StateSyncTxType with GetStateSyncData() returning an empty slice.
  2. Block B is processed; the tracing framework calls stateSyncTxnTracer.OnTxStart(env, T, from) — inner tracer receives it and initializes callstack = [].
  3. ApplyStateSyncEvents is called, sees len(events) == 0, returns immediately — zero EVM hooks fire.
  4. createdTopLevel is still false.
  5. Tracing framework calls stateSyncTxnTracer.OnTxEnd(receipt, nil).
  6. Line 67: t.createdTopLevel is falseOnExit is skipped. ✓
  7. Line 71: t.tracer.OnTxEnd \!= nil is truet.tracer.OnTxEnd(receipt, nil) is called.
  8. Inside callTracer.OnTxEnd: err == nil and receipt \!= nil → executes t.callstack[0].GasUsed = receipt.GasUsedpanic: runtime error: index out of range [0] with length 0.

Proposed fix

Wrap the OnTxEnd forwarding with the same createdTopLevel guard already used for OnExit:

func (t *stateSyncTxnTracer) OnTxEnd(receipt *types.Receipt, err error) {
    if t.createdTopLevel {
        if t.tracer.OnExit \!= nil {
            t.tracer.OnExit(0, nil, 0, err, err \!= nil)
        }
        if t.tracer.OnTxEnd \!= nil {
            t.tracer.OnTxEnd(receipt, err)
        }
    }
}

This ensures the inner tracer's call-frame lifecycle invariant (OnTxStartOnEnterOnExitOnTxEnd) is preserved: if no OnEnter was ever sent, neither OnExit nor OnTxEnd is forwarded.

Comment on lines +808 to 825
if isStateSyncPresent {
tx := txs[len(txs)-1]
msg, _ := core.TransactionToMessage(tx, signer, block.BaseFee())
txctx := &Context{
BlockHash: blockHash,
BlockNumber: block.Number(),
TxIndex: len(txs) - 1,
TxHash: tx.Hash(),
CumulativeGasUsed: cumulativeGasUsed,
LogIndex: len(statedb.Logs()),
}
res, _, err := api.traceTx(ctx, tx, msg, txctx, blockCtx, statedb, config, nil)
if err != nil {
results[txctx.TxIndex] = &txTraceResult{TxHash: tx.Hash(), Error: err.Error()}
} else {
results[txctx.TxIndex] = &txTraceResult{TxHash: tx.Hash(), Result: res}
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 In traceBlockParallel (api.go:773-825), the txloop sets isStateSyncPresent = true for any StateSyncTxType at index i, but the post-loop section unconditionally uses txs[len(txs)-1] and TxIndex: len(txs)-1, hardcoding the assumption that state-sync is always the last transaction. When debug_traceBadBlock is called on a malformed bad block where the state-sync tx is at position j < len(txs)-1, this produces three errors: (1) results[j] stays nil (loop skipped it with continue), (2) results[len(txs)-1] is overwritten with an incorrect trace for the wrong tx, and (3) worker threads traced txs at positions j+1..len(txs)-1 against a statedb that was never advanced by the state-sync effects. The fix is to record the actual state-sync index during the loop (e.g. stateSyncIdx = i) and use it in the post-loop section instead of len(txs)-1.

Extended reasoning...

What the bug is and how it manifests

In traceBlockParallel (api.go lines 770–825), the main txloop iterates over all transactions. When a StateSyncTxType transaction is found at position i, the code sets isStateSyncPresent = true and continues — it never sends the tx to the worker goroutines and never records the actual index. After the loop completes, the post-loop section unconditionally runs:

if isStateSyncPresent {
    tx := txs[len(txs)-1]           // hardcoded: assumes last tx is the state-sync tx
    txctx := &Context{
        TxIndex: len(txs) - 1,       // hardcoded
        ...
    }
    ...
}

This works correctly on canonical Bor blocks, where the consensus layer enforces that any state-sync tx is always placed last (validated in bor.go). However, traceBlockParallel is reachable via TraceBadBlock (api.go:527→533→traceBlock→traceBlockParallel for JS tracers). Bad blocks are blocks that failed validation and could have state-sync at any position.

The specific code path that triggers it

A caller invokes debug_traceBadBlock with a JS tracer on a bad block whose body has a StateSyncTxType tx at position j < len(txs)-1. Execution reaches traceBlockParallel:

  1. At index j: tx.Type() == StateSyncTxTypeisStateSyncPresent = true; continue. No worker task is queued for index j. results[j] remains nil.
  2. Indexes j+1 through len(txs)-1: worker goroutines trace these txs. They received statedb snapshots taken before the state-sync effects were applied (the main thread skipped it), so their state is incorrect.
  3. After the loop: the post-loop block runs with txs[len(txs)-1] — a regular tx at the last position, not the actual state-sync tx. It is traced as a state-sync tx (via ApplyStateSyncEvents) with wrong hash, wrong index, and wrong cumulative gas. results[len(txs)-1] is overwritten, clobbering the worker's correct result.

Why existing code does not prevent it

The entire Bor codebase (including state_processor.go:156 and parallel_state_processor.go:425) consistently assumes state-sync is the last tx. This is enforced for valid blocks by the consensus layer. There is no guard in traceBlockParallel that validates the last tx is actually StateSyncTxType before using len(txs)-1 as the index. The old code (getAllBlockTransactions) at least fetched the last tx and checked its type explicitly, but the new parallel path introduced by this PR removed that guard.

What the impact would be

This is a correctness bug in the forensic debugging API (debug_traceBadBlock). For any bad block with a malformed tx ordering: (1) the trace result contains a nil entry at the actual state-sync position, producing a null JSON element in the response; (2) a correctly-traced regular tx result is silently overwritten with an incorrect state-sync trace; and (3) multiple traces of txs after position j are computed from incorrect state, making them unreliable for incident investigation. Normal chain operation and canonical block tracing are completely unaffected.

How to fix it

Change isStateSyncPresent bool to stateSyncIdx = -1 (or similar sentinel), then in the txloop:

if tx.Type() == types.StateSyncTxType {
    stateSyncIdx = i
    continue
}

And in the post-loop section:

if stateSyncIdx >= 0 {
    tx := txs[stateSyncIdx]
    txctx := &Context{TxIndex: stateSyncIdx, ...}
    ...
}

This eliminates the hardcoded len(txs)-1 assumption.

Step-by-step proof

Suppose a bad block has 3 transactions: [txA (regular), txB (StateSyncTxType), txC (regular)]. Indexes: A=0, B=1, C=2.

  1. txloop i=0 (txA): regular tx — task queued for worker, statedb snapshot taken. Worker traces txA correctly. results[0] = correct trace.
  2. txloop i=1 (txB): StateSyncTxType — isStateSyncPresent = true; continue. No task queued. results[1] stays nil.
  3. txloop i=2 (txC): regular tx — task queued for worker. Worker snapshot was taken from statedb after txA but without txB's effects. Worker traces txC against incorrect state. results[2] = incorrectly-computed trace.
  4. Post-loop: isStateSyncPresent == truetx = txs[2] (txC, the wrong tx). The post-loop code traces txC as a state-sync tx with TxIndex=2, wrong TxHash, wrong CumulativeGasUsed. results[2] is overwritten, destroying the worker's result.
  5. Final: results = [correct trace for txA, nil, wrong state-sync trace of txC]. Both txB and txC are incorrectly represented.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants