Skip to content

Storage node fix#1559

Open
cryptoAtwill wants to merge 17 commits intostorage-cli-extractedfrom
storage-node-fix
Open

Storage node fix#1559
cryptoAtwill wants to merge 17 commits intostorage-cli-extractedfrom
storage-node-fix

Conversation

@cryptoAtwill
Copy link
Copy Markdown
Contributor

@cryptoAtwill cryptoAtwill commented Mar 31, 2026

Note

High Risk
Touches consensus-critical legacy topdown finality/sync logic and adds new on-chain actor state/methods for execution jobs, so regressions could stall subnet progress or break actor interoperability.

Overview
Adds an MVP execution jobs feature to the blobs actor: new shared types/selectors, new actor methods (CreateJob, ClaimJob, CompleteJob, FailJob, GetJob, ListJobs), persisted job state, and an expanded EVM InvokeContract facade to support operator registration/query and job operations via calldata.

Updates storage tooling to rely on delegated f410 senders (and fail fast if unfunded), adds a storage gateway/node path that avoids native-tls/OpenSSL by switching to rustls, and introduces new ipc-cli commands: storage (config/init + basic object upload/download helpers) and exec (submit/list/status execution jobs) plus a node RunExecutor loop that claims and runs jobs locally.

Improves legacy topdown behavior and performance: range-based fetching of validator changes/topdown messages (with chunked processing in the interpreter), single-validator fast paths to avoid self-attestation/catch-up stalls, and syncer resilience to stale height races; also tweaks node init to better bootstrap validator config and ensure required resolver keys exist. Documentation and example configs are updated accordingly.

Written by Cursor Bugbot for commit c14c584. This will update automatically on new commits. Configure here.

karlem and others added 17 commits March 19, 2026 15:15
- Added `--compile` option to `update-binaries` command for selecting between local and remote builds.
- Introduced local build functionality in `health.sh` for cross-compilation from macOS to Linux.
- Updated configuration file to include a path for local IPC repository.
- Enhanced documentation in the script for clarity on new options and usage examples.
- Enhanced the cross-compilation logic in `health.sh` to prefer `cargo-zigbuild` over `cross`, providing clearer installation instructions.
- Updated documentation in `ipc-subnet-manager.sh` to reflect the new requirements for macOS to Linux builds.
- Improved error handling and user guidance for cross-compilation failures.
- Replaced direct SSH commands with `exec_on_host` for creating remote directories, improving code clarity and maintainability.
- Updated binary transfer functions to use `copy_to_host`, enhancing consistency in deployment logic.
- Removed redundant variable assignments for SSH user and IP, simplifying the function's parameters.
- Updated `Cargo.toml` to disable default features for `ethers` and `reqwest`, adding `rustls` support to avoid OpenSSL issues during cross-compilation.
- Introduced a new `Cross.toml` file for cross-compilation configuration, including OpenSSL header installation.
- Enhanced `ipc-subnet-config-local.yml` and `ipc-subnet-config.yml` with new storage service configurations and updated subnet parameters.
- Updated `ipc-subnet-manager.sh` to include new commands for managing storage services and improved options for binary updates.
- Changed subnet ID in `ipc-subnet-config.yml` to a new value.
- Disabled deployment and activation of the subnet in initialization settings for a more controlled setup.
- Updated the `start_storage_services` function to use `exec_on_host_simple` for executing commands on remote hosts, improving code consistency and readability.
- Adjusted multiple command executions related to storage configuration and service startup to utilize the simplified execution method.
…tions

- Added new commands for storage operations including `cp`, `ls`, `cat`, `mv`, and `stat`, allowing users to manage files and objects in storage.
- Introduced a `client` module for HTTP interactions with the storage gateway API, facilitating blob uploads and downloads.
- Enhanced `StorageConfig` to support dynamic gateway URL retrieval from various sources.
- Implemented path parsing for storage URIs, improving the handling of storage paths.
- Updated `Cargo.toml` to include new dependencies for enhanced functionality.
- Added comprehensive error handling and user feedback for storage operations.
- Updated `ipc-subnet-config.yml` to set `deploy_subnet` and `activate_subnet` to true, allowing for automatic deployment and activation of subnet and gateway contracts during initialization.
- Added preflight checks in `health.sh` to ensure necessary dependencies and binaries are available before starting storage services, improving robustness and user guidance for setup.
- Changed subnet ID and updated parent registry and gateway addresses in `ipc-subnet-config.yml` for improved network configuration.
- Enhanced `exec.sh` to check process status more reliably by capturing output and trimming whitespace.
- Improved `health.sh` to ensure proper cleanup of stale RocksDB locks and added fallback mechanisms for stopping processes, enhancing robustness during node shutdown.
- Added validation for extracted subnet ID against on-chain subnet list to ensure consistency and correctness in subnet management.
- Changed default branch from 'main' to 'storage-node-fix' in `ipc-subnet-manager.sh`, `bootstrap.sh`, and `health.sh` to align with current development practices.
- Updated documentation and examples throughout the scripts to reflect the new default branch, ensuring consistency in usage and clarity for users.
- Added `async_trait` support to multiple storage command implementations (`cat`, `cp`, `ls`, `mv`, `rm`, `stat`, `sync`) to enable asynchronous handling.
- Updated `reqwest` dependency in `Cargo.toml` to include `multipart` feature for improved HTTP request capabilities.
- Introduced `mime_guess` dependency in `Cargo.lock` for better MIME type handling.
- Refactored client binding logic in storage commands to utilize `SignedMessageFactory`, improving message signing consistency.
@cryptoAtwill cryptoAtwill requested a review from a team as a code owner March 31, 2026 11:10
Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 3 potential issues.

Fix All in Cursor

Bugbot Autofix prepared fixes for all 3 issues found in the latest run.

  • ✅ Fixed: Hardcoded sequence and chain ID break on-chain transactions
    • Updated storage cp upload signing to use the delegated f410 sender plus on-chain sequence and chain ID instead of hardcoded zeros.
  • ✅ Fixed: Negation overflow panic for i32::MIN in debug builds
    • Replaced negation-based magnitude conversion with value.unsigned_abs() to avoid overflow on i32::MIN.
  • ✅ Fixed: Executor runs arbitrary binaries from on-chain data
    • Restricted executor job binaries to normalized local:// relative paths under a configured local binary directory with canonical path enforcement.

Create PR

Or push these changes by commenting:

@cursor push 3343b91a9d
Preview (3343b91a9d)
diff --git a/ipc-storage/ipc-decentralized-storage/src/bin/node.rs b/ipc-storage/ipc-decentralized-storage/src/bin/node.rs
--- a/ipc-storage/ipc-decentralized-storage/src/bin/node.rs
+++ b/ipc-storage/ipc-decentralized-storage/src/bin/node.rs
@@ -31,7 +31,7 @@
 use fvm_shared::message::Message;
 use ipc_decentralized_storage::node::{launch, NodeConfig};
 use std::net::{SocketAddr, SocketAddrV4, SocketAddrV6};
-use std::path::PathBuf;
+use std::path::{Component, Path, PathBuf};
 use std::str::FromStr;
 use std::time::Duration;
 use tendermint_rpc::Url;
@@ -193,6 +193,10 @@
     /// Polling interval in seconds
     #[arg(long, default_value = "5")]
     poll_interval_secs: u64,
+
+    /// Directory containing executable binaries referenced by jobs via local://
+    #[arg(long, default_value = "./executor-bin", env = "EXECUTOR_BIN_DIR")]
+    binary_dir: PathBuf,
 }
 
 #[tokio::main]
@@ -508,7 +512,7 @@
     if value >= 0 {
         EthU256::from(value as u32)
     } else {
-        EthU256::MAX - EthU256::from((-value) as u32) + EthU256::from(1u8)
+        EthU256::MAX - EthU256::from(value.unsigned_abs()) + EthU256::from(1u8)
     }
 }
 
@@ -735,6 +739,18 @@
 
     let mf = SignedMessageFactory::new(sk, from_f410, sequence, ChainID::from(chain_id));
     let mut tx_client = client.bind(mf);
+    let binary_dir = std::fs::canonicalize(&args.binary_dir).with_context(|| {
+        format!(
+            "failed to canonicalize executor binary directory: {}",
+            args.binary_dir.display()
+        )
+    })?;
+    if !binary_dir.is_dir() {
+        anyhow::bail!(
+            "executor binary directory is not a directory: {}",
+            binary_dir.display()
+        );
+    }
 
     let poll_interval = Duration::from_secs(args.poll_interval_secs);
     loop {
@@ -745,7 +761,10 @@
         }
 
         let job = pending_jobs[0].clone();
-        info!("Found candidate job {} with binary_ref={}", job.id, job.binary_ref);
+        info!(
+            "Found candidate job {} with binary_ref={}",
+            job.id, job.binary_ref
+        );
 
         let latest = get_job(&tx_client, job.id).await?;
         let Some(latest) = latest else {
@@ -776,7 +795,8 @@
         )
         .await
         .context("failed to send ClaimJob transaction via InvokeContract facade")?;
-        if claim_res.response.check_tx.code.is_err() || claim_res.response.deliver_tx.code.is_err() {
+        if claim_res.response.check_tx.code.is_err() || claim_res.response.deliver_tx.code.is_err()
+        {
             info!(
                 "ClaimJob rejected for {}: check={:?} deliver={:?} log={} info={}",
                 job.id,
@@ -790,18 +810,26 @@
         }
         info!("ClaimJob accepted for {}", job.id);
 
-        let run_result = run_job_binary(&job.binary_ref, &job.args).await;
+        let run_result = run_job_binary(&binary_dir, &job.binary_ref, &job.args).await;
         match run_result {
             Ok((exit_code, stdout, stderr)) if exit_code == 0 => {
                 let output_commitment = fendermint_actor_blobs_shared::bytes::B256(
                     *blake3::hash([stdout.as_bytes(), stderr.as_bytes()].concat().as_slice())
                         .as_bytes(),
                 );
-                let output_refs = vec![format!("inline://stdout/{}", hex::encode(output_commitment.0))];
+                let output_refs = vec![format!(
+                    "inline://stdout/{}",
+                    hex::encode(output_commitment.0)
+                )];
                 let complete_res = TxClient::<TxCommit>::fevm_invoke(
                     &mut tx_client,
                     BLOBS_ACTOR_ADDR,
-                    encode_complete_job_calldata(job.id, output_refs, output_commitment.0, exit_code),
+                    encode_complete_job_calldata(
+                        job.id,
+                        output_refs,
+                        output_commitment.0,
+                        exit_code,
+                    ),
                     TokenAmount::zero(),
                     gas_params.clone(),
                 )
@@ -922,7 +950,8 @@
 }
 
 async fn get_job(client: &impl QueryClient, id: u64) -> Result<Option<ExecutionJob>> {
-    let params = RawBytes::serialize(GetJobParams { id }).context("failed to serialize GetJob params")?;
+    let params =
+        RawBytes::serialize(GetJobParams { id }).context("failed to serialize GetJob params")?;
 
     let msg = Message {
         version: Default::default(),
@@ -951,14 +980,42 @@
     Ok(job)
 }
 
-async fn run_job_binary(binary_ref: &str, args: &[String]) -> Result<(i32, String, String)> {
-    // MVP runner supports local paths directly or `local://` URIs.
-    let binary = binary_ref
+async fn run_job_binary(
+    binary_dir: &Path,
+    binary_ref: &str,
+    args: &[String],
+) -> Result<(i32, String, String)> {
+    let binary_rel = binary_ref
         .strip_prefix("local://")
-        .unwrap_or(binary_ref)
-        .to_string();
+        .ok_or_else(|| anyhow::anyhow!("unsupported binary_ref scheme, expected local://"))?;
+    let binary_rel = Path::new(binary_rel);
+    if binary_rel.as_os_str().is_empty() || binary_rel.is_absolute() {
+        anyhow::bail!("invalid local binary_ref path: {}", binary_ref);
+    }
+    if binary_rel
+        .components()
+        .any(|c| !matches!(c, Component::Normal(_)))
+    {
+        anyhow::bail!(
+            "binary_ref must be a normalized relative path: {}",
+            binary_ref
+        );
+    }
 
-    let output = TokioCommand::new(binary)
+    let binary = binary_dir.join(binary_rel);
+    let binary = std::fs::canonicalize(&binary)
+        .with_context(|| format!("failed to resolve executable path: {}", binary.display()))?;
+    if !binary.starts_with(binary_dir) {
+        anyhow::bail!(
+            "binary_ref resolved outside allowed directory: {}",
+            binary.display()
+        );
+    }
+    if !binary.is_file() {
+        anyhow::bail!("executable is not a file: {}", binary.display());
+    }
+
+    let output = TokioCommand::new(&binary)
         .args(args)
         .output()
         .await

diff --git a/ipc/cli/src/commands/storage/cp.rs b/ipc/cli/src/commands/storage/cp.rs
--- a/ipc/cli/src/commands/storage/cp.rs
+++ b/ipc/cli/src/commands/storage/cp.rs
@@ -17,6 +17,10 @@
 
 use fendermint_actor_blobs_shared::bytes::B256;
 use fendermint_rpc::client::FendermintClient;
+use fendermint_rpc::QueryClient;
+use fendermint_vm_actor_interface::eam::EthAddress as FvmEthAddress;
+use fendermint_vm_message::query::FvmQueryHeight;
+use fvm_shared::address::Address;
 use fvm_shared::chainid::ChainID;
 
 use crate::commands::storage::{bucket, client::GatewayClient, config::StorageConfig, path};
@@ -72,11 +76,9 @@
                 // Storage -> Storage (copy between buckets)
                 copy_between_buckets(global, args).await
             }
-            (false, false) => {
-                Err(anyhow!(
-                    "At least one path must be a storage path (ipc://...)"
-                ))
-            }
+            (false, false) => Err(anyhow!(
+                "At least one path must be a storage path (ipc://...)"
+            )),
         }
     }
 }
@@ -95,9 +97,7 @@
     // Handle recursive directory upload
     if local_path.is_dir() {
         if !args.recursive {
-            return Err(anyhow!(
-                "Cannot copy directory without -r/--recursive flag"
-            ));
+            return Err(anyhow!("Cannot copy directory without -r/--recursive flag"));
         }
         return upload_directory(local_path, &storage_path, args).await;
     }
@@ -112,7 +112,11 @@
     storage_path: &path::StoragePath,
     args: &CopyArgs,
 ) -> Result<()> {
-    println!("Uploading {} -> {}", local_path.display(), storage_path.to_uri());
+    println!(
+        "Uploading {} -> {}",
+        local_path.display(),
+        storage_path.to_uri()
+    );
 
     // Read file data
     let data = fs::read(local_path)
@@ -155,7 +159,10 @@
     );
 
     // Convert hash to B256
-    let blob_hash_hex = upload_response.hash.strip_prefix("0x").unwrap_or(&upload_response.hash);
+    let blob_hash_hex = upload_response
+        .hash
+        .strip_prefix("0x")
+        .unwrap_or(&upload_response.hash);
     let blob_hash_bytes = hex::decode(blob_hash_hex)?;
     let mut hash_array = [0u8; 32];
     hash_array.copy_from_slice(&blob_hash_bytes[..32]);
@@ -172,18 +179,39 @@
     println!("Registering object on-chain...");
 
     // Create FendermintClient for on-chain operations
-    let fm_client = FendermintClient::new_http(
-        config.tendermint_rpc_url.parse()?,
-        None,
-    )?;
+    let fm_client = FendermintClient::new_http(config.tendermint_rpc_url.parse()?, None)?;
 
     // Create bound client with secret key
-    let secret_key = fendermint_rpc::message::SignedMessageFactory::read_secret_key(
-        &config.secret_key_file
-    )?;
+    let secret_key =
+        fendermint_rpc::message::SignedMessageFactory::read_secret_key(&config.secret_key_file)?;
 
-    let message_factory =
-        fendermint_rpc::message::SignedMessageFactory::new_secp256k1(secret_key, 0, ChainID::from(0));
+    let pk = secret_key.public_key();
+    let from_f1 = Address::new_secp256k1(&pk.serialize()).context("failed to create f1 address")?;
+    let from_eth = FvmEthAddress::new_secp256k1(&pk.serialize())
+        .context("failed to derive delegated address from secret key")?;
+    let from_f410 =
+        Address::new_delegated(10, &from_eth.0).context("failed to create f410 address")?;
+    let sequence = get_sequence_opt(&fm_client, &from_f410)
+        .await
+        .context("failed to get delegated account sequence")?
+        .ok_or_else(|| {
+            anyhow!(
+                "delegated sender {} not found on-chain; cross-fund this delegated address and retry (native f1 {} is intentionally not used)",
+                from_f410, from_f1
+            )
+        })?;
+    let chain_id = fm_client
+        .state_params(FvmQueryHeight::default())
+        .await
+        .context("failed to get state params")?
+        .value
+        .chain_id;
+    let message_factory = fendermint_rpc::message::SignedMessageFactory::new(
+        secret_key,
+        from_f410,
+        sequence,
+        ChainID::from(chain_id),
+    );
     let mut bound_client = fm_client.bind(message_factory);
 
     // Add object to bucket
@@ -202,11 +230,26 @@
     .await
     .context("Failed to register object on-chain")?;
 
-    println!("✓ Successfully uploaded and registered: {}", storage_path.key);
+    println!(
+        "✓ Successfully uploaded and registered: {}",
+        storage_path.key
+    );
 
     Ok(())
 }
 
+/// Get the next sequence number (nonce) of an account if it exists.
+async fn get_sequence_opt(client: &impl QueryClient, addr: &Address) -> Result<Option<u64>> {
+    let state = client
+        .actor_state(addr, FvmQueryHeight::default())
+        .await
+        .context("failed to get actor state")?;
+    match state.value {
+        Some((_id, state)) => Ok(Some(state.sequence)),
+        None => Ok(None),
+    }
+}
+
 /// Upload a directory recursively
 async fn upload_directory(
     local_dir: &Path,
@@ -229,7 +272,11 @@
             let storage_key = if storage_base.key.is_empty() {
                 rel_path_str
             } else {
-                format!("{}/{}", storage_base.key.trim_end_matches('/'), rel_path_str)
+                format!(
+                    "{}/{}",
+                    storage_base.key.trim_end_matches('/'),
+                    rel_path_str
+                )
             };
 
             let file_storage_path = path::StoragePath {
@@ -271,7 +318,11 @@
     local_path: &Path,
     args: &CopyArgs,
 ) -> Result<()> {
-    println!("Downloading {} -> {}", storage_path.to_uri(), local_path.display());
+    println!(
+        "Downloading {} -> {}",
+        storage_path.to_uri(),
+        local_path.display()
+    );
 
     // Get/load config
     let config_path = args.config.clone().unwrap_or_else(|| {
@@ -323,7 +374,10 @@
     _local_dir: &Path,
     _args: &CopyArgs,
 ) -> Result<()> {
-    println!("Downloading directory {} recursively...", storage_base.to_uri());
+    println!(
+        "Downloading directory {} recursively...",
+        storage_base.to_uri()
+    );
 
     // TODO: Implement by listing objects with prefix and downloading each
     // This requires implementing the ls command first to reuse list_objects functionality

This Bugbot Autofix run was free. To enable autofix for future PRs, go to the Cursor dashboard.

)?;

let message_factory =
fendermint_rpc::message::SignedMessageFactory::new_secp256k1(secret_key, 0, ChainID::from(0));
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hardcoded sequence and chain ID break on-chain transactions

High Severity

SignedMessageFactory::new_secp256k1(secret_key, 0, ChainID::from(0)) hardcodes both the account sequence (nonce) and chain ID to 0. Every other place in this PR (e.g., exec/mod.rs, node.rs, gateway.rs) properly queries get_sequence_opt and state_params for the real values. Additionally, new_secp256k1 creates an f1-type sender address, while the rest of the PR has moved to f410 (delegated) addresses. This means the bucket::add_object call in upload_file will always fail with a nonce mismatch or chain ID mismatch, making the entire storage cp upload flow non-functional.

Fix in Cursor Fix in Web

if value >= 0 {
EthU256::from(value as u32)
} else {
EthU256::MAX - EthU256::from((-value) as u32) + EthU256::from(1u8)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Negation overflow panic for i32::MIN in debug builds

Low Severity

In abi_int256_from_i32, the expression (-value) as u32 when value is i32::MIN (-2147483648) causes an arithmetic overflow because negating i32::MIN exceeds i32::MAX. This panics in debug builds. While i32::MIN is an unusual exit code, using value.unsigned_abs() would avoid the issue entirely.

Fix in Cursor Fix in Web

let code = output.status.code().unwrap_or(-1);
let stdout = String::from_utf8_lossy(&output.stdout).to_string();
let stderr = String::from_utf8_lossy(&output.stderr).to_string();
Ok((code, stdout, stderr))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Executor runs arbitrary binaries from on-chain data

High Severity

run_job_binary executes whatever path is in binary_ref from on-chain job data using TokioCommand::new(binary). Any on-chain user who creates a job can cause the executor to run arbitrary local commands (e.g., binary_ref = "/bin/rm" with args = ["-rf", "/"]). There's no allowlist, sandboxing, or path restriction — the executor trusts on-chain input to specify the binary to run.

Fix in Cursor Fix in Web

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants