Skip to content

feat: L2→L1→L2 synchronous composability#945

Closed
AnshuJalan wants to merge 13 commits intosurge-real-time-pocfrom
feat/l2-to-l1-to-l2-sync
Closed

feat: L2→L1→L2 synchronous composability#945
AnshuJalan wants to merge 13 commits intosurge-real-time-pocfrom
feat/l2-to-l1-to-l2-sync

Conversation

@AnshuJalan
Copy link
Copy Markdown
Collaborator

Summary

  • End-to-end L2→L1→L2 pipeline for L2Direct UserOps: L2 outbound pre-simulation, L1 callback simulation with state_override, return signal injection into anchor, calldata patching for flash-loan-style apps
  • Deferred-finalization multicall shape: [user_ops, tentativePropose, l1_calls, finalizePropose] when required return signals are present
  • All contract addresses from env (no auto-derivation); consolidated L2_BRIDGE_ADDRESS as single env var

New env vars

  • L1_SIGNAL_SERVICE_ADDRESS — L1 SignalService (for callback simulation state override)
  • L2_SIGNAL_SERVICE_ADDRESS — L2 SignalService
  • L2_BRIDGE_ADDRESS — replaces TAIKO_BRIDGE_L2_ADDRESS (used by both common config and realtime)

Depends on

  • NethermindEth/surge-taiko-mono feat/l2-to-l1-to-l2-sync (tentativePropose/finalizePropose, ProposeInputV2, flash loan contracts)

Test plan

  • Deploy protocol contracts (RealTimeInbox with tentativePropose/finalizePropose) on devnet
  • Set all bridge/signal-service env vars
  • Submit L2Direct flash loan UserOp via surge_sendUserOp
  • Verify multicall trace: tentativePropose → processMessage → finalizePropose
  • Verify L2 block includes return signal in anchor fast signals
  • Verify existing L1→L2→L1 UserOp flow is unaffected

🤖 Generated with Claude Code

AnshuJalan and others added 2 commits April 13, 2026 14:13
When a proposal's L1Calls include required L1→L2 return signals (produced by
the L1 callback of a Bridge.processMessage triggered during the same L1
multicall), restructure the multicall as:

  [tentativePropose, user_ops..., l1_calls..., finalizePropose]

instead of the classic:

  [user_ops..., propose, l1_calls...]

The inbox's new tentativePropose saves the checkpoint and emits
ProposedAndProved up front so processMessage later in the multicall can verify
L2→L1 signals against the tentative L2 state root. finalizePropose verifies
that the required L1→L2 return signals were actually produced by the L1
callbacks before committing.

Changes:
- L1Call gains required_return_signal: Option<FixedBytes<32>>. Application-level
  orchestration (e.g. a FlashLoanExecutorL2) populates this from its outbound
  bridge message's expected L1→L2 return slot.
- Bindings add ProposeInputV2 (existingSignals + requiredReturnSignals) and the
  RealTimeInbox ABI is refreshed to expose tentativePropose/finalizePropose.
- proposal_tx_builder splits the signal-slot union across the two input fields
  based on which slots are required-return, then builds the two inbox calls.
  Classic propose() path is unchanged when no required signals are present.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Wire end-to-end L2→L1→L2 flow for L2Direct UserOps:
- L1 callback simulator (simulate_l1_callback_return_signal) with
  state_override on L1 SignalService to bypass signal verification
- L2 outbound pre-simulator (trace_user_op_for_outbound_message) to
  detect bridge-out before real block execution
- Three-pass block build in proposal_manager: detect outbound, simulate
  L1 callback, inject return signal into anchor, patch calldata
- Fix multicall ordering: user_ops before tentativePropose
- All contract addresses from env (no auto-derivation), consolidate
  L2_BRIDGE_ADDRESS as single env var

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
MOCK_ECDSA=1, RISC0=2, SP1=4, ZISK=8 — mirrors the updated
SurgeVerifier constants. PROOF_TYPE=mock_ecdsa selects the dummy
verifier path.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@mskrzypkows
Copy link
Copy Markdown
Collaborator

@claude

@claude
Copy link
Copy Markdown

claude Bot commented Apr 14, 2026

Claude finished @mskrzypkows's task in 2m 23s —— View job


Code Review

  • Gather context and read changed files
  • Review common/src/config/mod.rs
  • Review realtime/src/l1/ changes (bindings, config, execution_layer, proposal_tx_builder)
  • Review realtime/src/l2/ changes (execution_layer, taiko)
  • Review realtime/src/node/proposal_manager/ changes
  • Review realtime/src/utils/config.rs
  • Summarize findings

Summary

The PR implements a complete L2→L1→L2 synchronous composability pipeline. The overall architecture — pre-simulation on L2, state-override-based L1 callback simulation, calldata patching, and the deferred tentativePropose/finalizePropose multicall shape — is sound and well-structured. A few issues need attention before merge.


🔴 Critical Issues

1. .unwrap() in build_propose_blob will fail CI lint

realtime/src/l1/proposal_tx_builder.rs:122:

.input(inbox_calls.into_iter().next().unwrap().data.into())

The workspace enforces unwrap_used = "deny". This will fail cargo clippy. Needs to be replaced with ? or an explicit .ok_or_else(...).

Fix this →


2. Hardcoded storage slot 253 for _receivedSignals

realtime/src/l1/execution_layer.rs:423:

let received_signals_base_slot = U256::from(253u64);

This is a magic number tied to the Surge SignalService storage layout. If the contract is upgraded or the layout changes, the state override will silently compute the wrong slot — causing simulation to either always succeed (if slot happens to be non-zero) or always fail (if not). This should be:

  • Derived on-chain via a getSignalSlot view function if available, or
  • At minimum documented clearly with a link to the exact contract/layout file and a version pin

🟡 Medium Issues

3. Code duplication: two identical log-collection helpers

collect_logs_from_frame in realtime/src/l2/execution_layer.rs:589 is a near-duplicate of collect_logs_recursive in realtime/src/l1/execution_layer.rs:261. Both walk a CallFrame tree collecting CallLogFrames. This should live in a shared utility module.

Fix this →


4. _signal_slot_proof parameter is dead code

realtime/src/l1/execution_layer.rs:378:

async fn simulate_l1_callback_return_signal(
    &self,
    message_from_l2: Message,
    _signal_slot_proof: Bytes,   // ← never used
    ...

The parameter exists in the trait signature but is never used in the implementation — state override is used instead. Either remove it from the trait/impl or document why it's kept (e.g., "reserved for non-override fallback path").


5. New addresses missing from startup log

realtime/src/utils/config.rs:97-110: The Display impl for RealtimeConfig does not print signal_service or l2_signal_service. These are now required env vars; logging them on startup is important for debugging misconfigured deployments.


6. Signal slot invariant not validated

realtime/src/l1/proposal_tx_builder.rs:296-303: In the deferred flow, existing_signals is computed by filtering required_return_signals out of batch.signal_slots. If a required_return_signal is not present in batch.signal_slots, it silently disappears from existing_signals but is still included in ProposeInputV2.requiredReturnSignals. However, more importantly: the L2 anchor will only contain slots that were pushed into anchor_signal_slots. If a return slot is in requiredReturnSignals but not in the anchor's fast signals, the L2 block will succeed but finalizePropose on L1 may revert (or vice versa). A debug assertion here would help catch this during development:

debug_assert!(
    required_return_signals.iter().all(|s| batch.signal_slots.contains(s)),
    "required_return_signal not found in batch.signal_slots"
);

🟢 Minor / Nitpicks

7. POC comment signals incomplete implementation

realtime/src/l2/execution_layer.rs:569:

break; // first MessageSent wins for the POC

If a UserOp emits multiple MessageSent events (e.g., batched bridge calls), only the first is captured. This should be tracked as a known limitation or issue.


8. hex_encode reimplements hex::encode

realtime/src/node/proposal_manager/mod.rs:747-753:

fn hex_encode(bytes: &[u8]) -> String { ... }

The hex crate is already available (used in taiko.rs). Use hex::encode(bytes) directly and remove the local reimplementation.

Fix this →


9. Unreachable code path comment

realtime/src/l1/proposal_tx_builder.rs:114-127:

if batch.user_ops.is_empty() && batch.l1_calls.is_empty() {
    if inbox_calls.len() == 1 {

When use_deferred is true, inbox_calls.len() == 2. The comment says "Otherwise fall through to multicall assembly" but with an empty user_ops+l1_calls in deferred mode, the multicall would contain only [tentativePropose, finalizePropose] — no user ops or L1 calls. This is a degenerate case unlikely to occur in practice (you wouldn't use deferred if there are no L1 calls), but it could silently do something unexpected. An explicit guard or comment would help.


Architectural Notes

  • The state-override simulation approach for _receivedSignals is clever but couples the implementation tightly to the SignalService storage layout. Consider making the storage slot configurable via env var as a short-term safety valve.
  • The pending_return_signal field on BatchManager is a form of intra-call state transfer between add_pending_user_ops_to_draft_block and add_draft_block_to_proposal. The .take() on line 498 is correct (clears after use). Consider a local variable + explicit pass instead to make the data flow more visible.
  • The fallback-to-classic behavior on simulation failure (line 400-406 in bridge_handler.rs) is a good defensive choice for a PoC.

AnshuJalan and others added 2 commits April 14, 2026 13:49
When MOCK_MODE=true, the on-chain SubProof bit flag is set to 1
(MOCK_ECDSA) regardless of PROOF_TYPE. This allows using a real Raiko
proof type string (zisk/sp1/risc0) while routing on-chain to the
DummyProofVerifier.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The previous design routed L2 txs through surge_sendUserOp with a chainId
field, which broke existing UserOp RPC consumers and conflated two
conceptually different flows. Replaced with:

- surge_sendUserOp: back to L1→L2→L1 only (no chainId field)
- Mempool scanning: during block build, trace each pending L2 tx for
  Bridge.sendMessage and inject the return signal into the anchor
- surge_simulateReturnMessage RPC: apps call this to get the exact
  return IBridge.Message before submitting to L2 mempool

Key implementation details:
- Call-based detection (not event logs): Nethermind's callTracer doesn't
  surface event logs through UUPS proxy DELEGATECALLs, so we scan for
  CALL frames to the bridge with the sendMessage selector (0x1bdb0037)
  and decode the Message from the call input
- Bridge-assigned field patching: from/srcChainId/id are zero in the
  call input, filled by the bridge during execution. Patched with
  the caller address, chain_id, and nextMessageId respectively
- L1 callback direct invocation: instead of Bridge.processMessage
  (which requires L1 signal verification we can't bypass), call
  callback.onMessageInvocation directly with from=bridge and
  state-override the bridge's __ctx (slot 253-254) so context()
  returns the correct msgHash/from/srcChainId

Removed: L2Direct UserOp routing, FlashLoanExecutor calldata patching,
executeCall/FlashLoanReturnMessage bindings.

Tested end-to-end on devnet: full L2→L1→L2 flash loan completes
atomically with 1% fee to beneficiary, pool fully repaid.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Comment thread common/src/config/mod.rs
.map_err(|e| address_parse_error(TAIKO_ANCHOR_ADDRESS, e, &taiko_anchor_address_str))?;

const BRIDGE_ADDRESS: &str = "TAIKO_BRIDGE_L2_ADDRESS";
const BRIDGE_ADDRESS: &str = "L2_BRIDGE_ADDRESS";
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can’t rename that constant because it would interfere with our running nodes and require changes to our infrastructure, so it should be done in a separate PR

AnshuJalan and others added 4 commits April 17, 2026 12:53
- Thread EXTRA_GAS_PERCENTAGE from common_config through realtime
  ExecutionLayer into ProposalTxBuilder (was hardcoded to 10, ignoring the
  env).
- Raise BLOB_TX_GAS_LIMIT from 500k to 3M and drop the estimate_gas attempt
  (eth_estimateGas can't simulate blob txs — BLOBHASH returns 0 — so any
  multicall that included Bridge.processMessage was OOM'ing and getting
  rewrapped as B_SIGNAL_NOT_RECEIVED).
- Forward message.value on the proposer-multicall l1_call entry and on the
  bridge-impersonated processMessage trace, and override the bridge balance
  so payable L1 callbacks receive ETH on fresh devnets.
- Pass tx.value into trace_tx_for_outbound_message from both the mempool
  scan and the surge_simulateReturnMessage RPC so payable L2 entry points
  (e.g. swapETHForTokenViaL1) don't revert with ZERO_AMOUNT during tracing.
- In AsyncSubmitter, mark every in-flight user op as Rejected if
  submission_task bails before reaching its own status-update path (e.g.
  manifest encoding / sidecar build failures), so ops don't sit at Pending
  forever.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Bridge pays the callback value from its own reserves via raw assembly
call; prefunding the Multicall contract only made `call{value: X}` revert
with INSUFFICIENT_BALANCE since Multicall holds 0 ETH. Set value=0 on
the sub-call — if the Bridge is underfunded the tx now reverts naturally
at Bridge rather than masquerading as a prefund failure.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Extends UserOpStatusStore with a B256-keyed API (separate sled tree) so
mempool-scanned L2→L1→L2 txs get the same sequencing → proving →
proposing → complete lifecycle as L1→L2→L1 UserOps. async_submitter
mirrors each transition site onto the hash-keyed entries, and
surge_txStatus checks the store before falling back to on-chain lookup.
Lets the UI poll by L2 tx hash and drive the unified overlay.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
nmjustinchan and others added 4 commits April 22, 2026 12:45
Osaka/PeerDAS-enabled L1 nodes reject legacy v0 blob wrappers with
"InvalidTxProofVersion: Version of network wrapper is not supported".
Switch the shasta and realtime sidecar builders (and the realtime
Raiko submitter) to `build_7594()` so blob txs carry the v1 wrapper
with cell proofs, matching what pacaya already does.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ignal

The L2→L1→L2 return signal was being simulated twice — once by the
mempool scan (which injects the slot into the L2 anchor's fast signals)
and again by `find_l1_call` after preconf. When the two simulations
disagreed (e.g. L1 state drifted between the calls, or the UI's simulate
RPC produced a different slot than the actual mempool tx), the L1 call
ended up with no `required_return_signal`, Catalyst fell back to classic
propose, and the inbox reverted with `SignalSlotNotSent` because the
slot in the anchor was never produced on L1.

Plumb the pre-simulated slot from the mempool scan into `find_l1_call`
as the authoritative value and remove the redundant second simulation.
The anchor-injected slot and the inbox's `requiredReturnSignal` now
always match by construction.

Also add a short retry around `find_message_and_signal_slot` so a brief
log-indexing lag on the L2 RPC right after preconf doesn't cause the L1
call to be dropped entirely.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
mikhailUshakoff added a commit that referenced this pull request May 4, 2026
* feat: add realtime fork as new sibling crate

Introduces the realtime fork (`realtime/`) as a clean sibling to
shasta/permissionless/pacaya, ported from the surge-real-time-poc and
feat/l2-to-l1-to-l2-sync work without dragging in the shasta-side and
pacaya-side modifications that those branches accumulated.

What's in the new fork:
- L1 + L2 execution layers with bridge-callback simulation
- Async proposal submitter, batch builder, bridge handler (UserOp status
  tracking, mempool scan for return signals)
- Realtime chain monitor for `RealTimeInbox::ProposedAndProved`
- Raiko v3 client; deferred-finalization multicall builder
- Self-contained `NodeConfig` (does not rely on pacaya::node::config,
  which was removed in #941 when pacaya became a utility crate)

Wiring:
- `Fork::Realtime` enum variant + `FromStr` impl + `FORK` env var override
  in common/src/fork_info, default-disabled timestamp (only activates
  when `FORK=realtime` is set)
- `Realtime` match arm in `Node/src/main.rs`
- Workspace `Cargo.toml`: adds realtime member + `sled` dep

Common deltas (consumed by realtime):
- `taiko_driver::reorg_stale_block` RPC + `ReorgStaleBlock{Request,Response}`
- `BuildPreconfBlockResponse.state_root` (parsed leniently — defaults to
  B256::ZERO if missing, so existing shasta/permissionless JSON paths
  remain compatible)
- `transaction_monitor::monitor_new_transaction` accepts optional
  tx-hash and tx-result oneshot notifiers; existing 2-arg callers are
  unaffected (only the new realtime async submitter passes Some)

Untouched from origin/master:
- `shasta/`, `pacaya/` — zero diff
- `permissionless/` — single line: `state_root: B256::ZERO` placeholder
  in the BuildPreconfBlockResponse construction, mirroring
  mikhailUshakoff's PR #939 approach
- `common/src/config/mod.rs` — the `TAIKO_BRIDGE_L2_ADDRESS` →
  `L2_BRIDGE_ADDRESS` rename flagged on PR #945 is intentionally
  deferred to a separate coordinated PR
- `common/src/shared/internal_server.rs` and the warp→axum migration —
  preserved as-is
- Dockerfile + x86-64 CI fixes — separate PR

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refac(realtime): activate fork via REALTIME_TIMESTAMP_SEC instead of FORK env var

Drops the bespoke `FORK` env-var override + `Fork::FromStr` impl in
favour of the existing per-fork timestamp pattern. Adds
`config.realtime_timestamp_sec` (default 99999999999) which is then
threaded through `ForkInfoConfig`, matching how Shasta and
Permissionless are activated.

To run in realtime mode, set REALTIME_TIMESTAMP_SEC=0 (or any past
timestamp) at startup.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: drop stale RUSTSEC-2026-0002 advisory ignore

Cargo.lock resolution after the realtime crate's deps no longer pulls
in the affected `lru` version, so cargo-deny's
`advisory-not-detected` warning fails the audit. Removing the now
unused ignore entry.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* to_string istead of raiko_proof_type

* Safer iterative version of logs collecting from the CallFrame

* message zip with slot for find_message_and_signal_slot

* simplified simulate_l1_callback_return_signal

* refac(realtime): remove dead code instead of suppressing warnings

Removes all `#[allow(dead_code)]` markers in realtime/ and the items
they were guarding, rather than silencing warnings. Touches:

- ProtocolConfig: drop unused proof_verifier, signal_service fields and
  get_max_anchor_height_offset getter.
- RaikoClient: drop l2_network/l1_network fields and the corresponding
  RAIKO_L2_NETWORK / RAIKO_L1_NETWORK env reads — they were stored but
  never sent in proof requests (request body uses None).
- ContractAddresses + EthereumL1Config: drop signal_service field,
  L1_SIGNAL_SERVICE_ADDRESS env read, and the dead raiko_client chain
  (RaikoClient is constructed at lib.rs and routed through
  AsyncSubmitter, not ExecutionLayer).
- BridgeHandler: drop the l1_chain_id field and the unused
  parameter chain back to Node::new (lib.rs).
- BatchBuilder: drop unused `metrics` field/param + add_recovered_l2_block,
  add_l2_user_op_id methods.
- BatchManager: drop unused metrics, cancel_token fields and
  reanchor_block method.
- Node: drop unused metrics field/param.
- L2ExecutionLayer: drop unused TaikoConfig field, get_head_l1_origin,
  get_last_synced_block_params_from_geth, decode_block_params_from_tx_data,
  get_anchor_tx_data methods.
- Taiko: drop unused coinbase field, get_protocol_config,
  get_l2_block_by_number, fetch_l2_blocks_until_latest,
  decode_anchor_id_from_tx_data, get_anchor_tx_data wrappers.
- proposal::Proposals type alias.

Net: -249 / +5 lines, no new warnings, all 122 tests still pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Replaced sled crate with fjall.

Sled is unmaintained, cargo deny didn't accept it.

* Returning transaction result handlers instead of propagating them to
send batch functions and tx monitor

* v1.38.0

* fix(realtime): address Claude review findings on PR #953

Picks up the actionable items from the automated review on
#953 and leaves the rest documented as
follow-ups. Net change: +117 / -46 in realtime/.

Bug / correctness:
* (#1) Defer staging additions to the in-flight proposal until
  `advance_head_to_new_l2_block` succeeds. Previously a failed advance
  left orphan signal slots / user ops in the proposal that no L2 block
  corresponds to, so the next attempt would re-add the same slot from
  the mempool scan and `_verifySignalSlots` would revert on duplicate.
  `pending_return_signal` and `pending_mempool_tx_hash` are now only
  consumed in the Ok arm so retries see the same slot.
* (#3) `AsyncSubmitter::submit` no longer panics on the
  `is_busy()` invariant — returns `Err` and the caller propagates.
* (#2) `transfer_eth_from_l2_to_l1` returns an explicit
  "not implemented" `Err` instead of `Ok(())` to make accidental
  wiring loud. Realtime does not run the funds_controller flow.
* (#8) `processMessage` L2 call now uses the L2 slot's actual base
  fee for `max_fee_per_gas` instead of a hardcoded 1 gwei. Matches
  the anchor tx pattern.
* (#10) Raiko reqwest client now has a configurable timeout
  (`RAIKO_TIMEOUT_SEC`, default 30s) — a hung Raiko no longer
  deadlocks the async submitter forever.

Security:
* (#4) `BRIDGE_RPC_ADDR` defaults to `127.0.0.1:4545` instead of
  `0.0.0.0:4545` so the unauthenticated `surge_*` JSON-RPC endpoints
  are not exposed externally unless an operator explicitly opts in.

Usability:
* (#13) The fjall DB path for `UserOpStatusStore` is now configurable
  via `USER_OP_STATUS_DB_PATH` (default `data/user_op_status`),
  threaded through `BatchManager` → `BridgeHandler`.

Diagnostics / cleanup:
* (#12) `RealtimeConfig` Display now includes bridge, l2_signal_service,
  raiko_max_retries, raiko_timeout_sec, mock_mode, bridge_rpc_addr,
  user_op_status_db_path — startup log shows the full picture.
* (#11) `SEND_MESSAGE_SELECTOR` lives in `realtime/src/shared_abi/mod.rs`
  instead of being duplicated in both execution layers.

Deferred (separate PRs / issues):
* (#5) Cross-fork pacaya dependency — design decision; pacaya is a
  shared utility crate on master, deliberately reused.
* (#6) Hardcoded NodeConfig values (handover_window_slots etc.) — fine
  for the only current deployment; expose when a 2nd one shows up.
* (#7) Bridge `__ctx` storage slots hardcoded — needs runtime layout
  check; out of scope here.
* (#9) Duplicate blob encoding per submission — perf optimization.
* (#14) `surge_txStatus` cleanup timing.
* (#15) `getConfig()` called twice on startup.

Quality gate: `cargo build --workspace`, `cargo clippy --all-features`
on touched crates, `cargo fmt --check`, `cargo test --workspace`
(122 passed) all clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Changed error to debug in TransactionMonitorThread::notify_result,
becaue it's expected that there could be no one to listen for the
result.

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: Maciej Skrzypkowski <mskr@gmx.com>
Co-authored-by: mikhailUshakoff <75278099+mikhailUshakoff@users.noreply.github.com>
@AnshuJalan AnshuJalan closed this May 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants